RAIL: Risk-Averse Imitation Learning

Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in safety-critical applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail-risk within the GAIL framework. We quantify tail-risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in safety-critical applications.

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.

Inferactive data analysis

We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey’s exploratory (roughly speaking ‘model free’) and confirmatory data analysis (roughly speaking classical and ‘model based’), also allowing for Bayesian data analysis. We view this approach as close in spirit to current practice of applied statisticians and data scientists while allowing frequentist guarantees for results to be reported in the scientific literature, or Bayesian results where the data scientist may choose the statistical model (and hence the prior) after some initial exploratory analysis. While this approach to data analysis does not cover every scenario, and every possible algorithm data scientists may use, we see this as a useful step in concrete providing tools (with frequentist statistical guarantees) for current data scientists. The basis of inference we use is selective inference [Lee et al., 2016, Fithian et al., 2014], in particular its randomized form [Tian and Taylor, 2015a]. The randomized framework, besides providing additional power and shorter confidence intervals, also provides explicit forms for relevant reference distributions (up to normalization) through the {\em selective sampler} of Tian et al. [2016]. The reference distributions are constructed from a particular conditional distribution formed from what we call a DAG-DAG — a Data Analysis Generative DAG. As sampling conditional distributions in DAGs is generally complex, the selective sampler is crucial to any practical implementation of inferactive data analysis. Our principal goal is in reviewing the recent developments in selective inference as well as describing the general philosophy of selective inference.

The Covering Principle: A New Approach to Address Multiplicity in Hypotheses Testing

The closure and the partitioning principles have been used to build various multiple testing procedures in the past three decades. The essence of these two principles is based on parameter space partitioning. In this article, we propose a novel approach coined the covering principle from the perspective of rejection region coverage in the sample space. The covering principle divides the whole family of null hypotheses into a few overlapped sub-families when there is a priority of making decisions for hypothesis testing. We have proven that the multiple testing procedure constructed by the covering principle strongly controls the familywise error rate as long as the multiple tests for each sub-familiy strongly control the type I error. We have illustrated the covering principle can be applied to solve the general gate-keeping problems.

Machine Teaching: A New Paradigm for Building Machine Learning Systems

The current processes for building machine learning systems require practitioners with deep knowledge of machine learning. This significantly limits the number of machine learning systems that can be created and has led to a mismatch between the demand for machine learning systems and the ability for organizations to build them. We believe that in order to meet this growing demand for machine learning systems we must significantly increase the number of individuals that can teach machines. We postulate that we can achieve this goal by making the process of teaching machines easy, fast and above all, universally accessible. While machine learning focuses on creating new algorithms and improving the accuracy of learners, the machine teaching discipline focuses on the efficacy of the teachers. Machine teaching as a discipline is a paradigm shift that follows and extends principles of software engineering and programming languages. We put a strong emphasis on the teacher and the teacher’s interaction with data, as well as crucial components such as techniques and design principles of interaction and visualization. In this paper, we present our position regarding the discipline of machine teaching and articulate fundamental machine teaching principles. We also describe how, by decoupling knowledge about machine learning algorithms from the process of teaching, we can accelerate innovation and empower millions of new uses for machine learning models.

Neural Person Search Machines

We investigate the problem of person search in the wild in this work. Instead of comparing the query against all candidate regions generated in a query-blind manner, we propose to recursively shrink the search area from the whole image till achieving precise localization of the target person, by fully exploiting information from the query and contextual cues in every recursive search step. We develop the Neural Person Search Machines (NPSM) to implement such recursive localization for person search. Benefiting from its neural search mechanism, NPSM is able to selectively shrink its focus from a loose region to a tighter one containing the target automatically. In this process, NPSM employs an internal primitive memory component to memorize the query representation which modulates the attention and augments its robustness to other distracting regions. Evaluations on two benchmark datasets, CUHK-SYSU Person Search dataset and PRW dataset, have demonstrated that our method can outperform current state-of-the-arts in both mAP and top-1 evaluation protocols.

Semantic Image Synthesis via Adversarial Learning

In this paper, we propose a way of synthesizing realistic images directly with natural language description, which has many useful applications, e.g. intelligent image manipulation. We attempt to accomplish such synthesis: given a source image and a target text description, our model synthesizes images to meet two requirements: 1) being realistic while matching the target text description; 2) maintaining other image features that are irrelevant to the text description. The model should be able to disentangle the semantic information from the two modalities (image and text), and generate new images from the combined semantics. To achieve this, we proposed an end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements. We have evaluated our model by conducting experiments on Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated that our model is capable of synthesizing realistic images that match the given descriptions, while still maintain other features of original images.

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

SGNMT — A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies

This paper introduces SGNMT, our experimental platform for machine translation research. SGNMT provides a generic interface to neural and symbolic scoring modules (predictors) with left-to-right semantic such as translation models like NMT, language models, translation lattices, n-best lists or other kinds of scores and constraints. Predictors can be combined with other predictors to form complex decoding tasks. SGNMT implements a number of search strategies for traversing the space spanned by the predictors which are appropriate for different predictor constellations. Adding new predictors or decoding strategies is particularly easy, making it a very efficient tool for prototyping new research ideas. SGNMT is actively being used by students in the MPhil program in Machine Learning, Speech and Language Technology at the University of Cambridge for course work and theses, as well as for most of the research work in our group.

A Distributional Perspective on Reinforcement Learning

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman’s equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.

A New Family of Near-metrics for Universal Similarity

We propose a family of near-metrics based on local graph diffusion to capture similarity for a wide class of data sets. These quasi-metametrics, as their names suggest, dispense with one or two standard axioms of metric spaces, specifically distinguishability and symmetry, so that similarity between data points of arbitrary type and form could be measured broadly and effectively. The proposed near-metric family includes the forward k-step diffusion and its reverse, typically on the graph consisting of data objects and their features. By construction, this family of near-metrics is particularly appropriate for categorical data, continuous data, and vector representations of images and text extracted via deep learning approaches. We conduct extensive experiments to evaluate the performance of this family of similarity measures and compare and contrast with traditional measures of similarity used for each specific application and with the ground truth when available. We show that for structured data including categorical and continuous data, the near-metrics corresponding to normalized forward k-step diffusion (k small) work as one of the best performing similarity measures; for vector representations of text and images including those extracted from deep learning, the near-metrics derived from normalized and reverse k-step graph diffusion (k very small) exhibit outstanding ability to distinguish data points from different classes.

Memory-Efficient Implementation of DenseNets

The DenseNet architecture is highly computationally efficient as a result of feature reuse. However, a naive DenseNet implementation can require a significant amount of GPU memory: If not properly managed, pre-activation batch normalization and contiguous convolution operations can produce feature maps that grow quadratically with network depth. In this technical report, we introduce strategies to reduce the memory consumption of DenseNets during training. By strategically using shared memory allocations, we reduce the memory cost for storing feature maps from quadratic to linear. Without the GPU memory bottleneck, it is now possible to train extremely deep DenseNets. Networks with 14M parameters can be trained on a single GPU, up from 4M. A 264-layer DenseNet (73M parameters), which previously would have been infeasible to train, can now be trained on a single workstation with 8 NVIDIA Tesla M40 GPUs. On the ImageNet ILSVRC classification dataset, this large DenseNet obtains a state-of-the-art single-crop top-1 error of 20.26%.

Quantum dynamics in transverse-field Ising models from classical networks
The Graphical Horseshoe Estimator for Inverse Covariance Matrices
Estimation of Sparsity via Simple Measurements
Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner
Voltage Analytics for Power Distribution Network Topology Verification
Resting state fMRI functional connectivity-based classification using a convolutional neural network architecture
AWGN-Goodness is Enough: Capacity-Achieving Lattice Codes based on Dithered Probabilistic Shaping
Power maps in finite groups
Bayesian covariance modeling of multivariate spatial random fields
Local Geometry Inclusive Global Shape Representation
Cyclic Stochastic Optimization: Generalizations, Convergence, and Applications in Multi-Agent Systems
An Infinite Family of Circulant Graphs with Perfect State Transfer in Discrete Quantum Walks
Facets of a mixed-integer bilinear covering set with bounds on variables
Convolutional Sparse Coding: Boundary Handling Revisited
Generalized Convolutional Neural Networks for Point Cloud Data
On the Design of Secure Full-Duplex Multiuser Systems under User Grouping Method
An Algorithmic Proof of the Piff–Welsh Theorem on Transversal Matroid Representations
Efficient Defenses Against Adversarial Attacks
Predictive networking and optimization for flow-based networks
Spectrum and Energy Efficient Beamspace MIMO-NOMA for Millimeter-Wave Communications Using Lens Antenna Array
Nowhere-zero $3$-flow of graphs with small independence number
Temporal Convolution Based Action Proposal: Submission to ActivityNet 2017
Sub-Jamming Transition in Binary Sphere Mixtures
An Infinite Hidden Markov Model With Similarity-Biased Transitions
A Nonlinear Dimensionality Reduction Framework Using Smooth Geodesics
Many-body localization of spinless fermions with attractive interactions in one dimension
On the Orbits of Crossed Cubes
Outcome-Oriented Predictive Process Monitoring: Review and Benchmark
Integrability conditions for Compound Random Measures
Improved Bilinear Pooling with CNNs
Subcarrier-Chunk Assignment With Power Allocation and Multiple-Rate Constraints for Downlink OFDMA
Rendezvous on a Line by Location-Aware Robots Despite the Presence of Byzantine Faults
Constant Time Updates in Hierarchical Heavy Hitters
Improved Kernels and Algorithms for Claw and Diamond Free Edge Deletion Based on Refined Observations
3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Head Detection with Depth Images in the Wild
Graphical posterior predictive classifier: Bayesian model averaging with particle Gibbs
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks
The set of alternating sign matrices which are determined by their X-ray is a member of the Catalan family
Shallow reading with Deep Learning: Predicting popularity of online content using only its title
Recurrent Neural Networks for Online Video Popularity Prediction
The Complexity Landscape of Fixed-Parameter Directed Steiner Network Problems
Text Recognition in Scene Image and Video Frame using Color Channel Selection
On the Computation of Paracoherent Answer Sets
Scalable and robust set similarity join
Fluid and Diffusion Limits for Bike Sharing Systems
A central limit like theorem for Fourier sums
Evaluation of Hashing Methods Performance on Binary Feature Descriptors
HMM-based Writer Identification in Music Score Documents without Staff-Line Removal
Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding
An Alternative Estimation of a Time-Varying Parameter Model
Neuron Pruning for Compressing Deep Networks using Maxout Architectures
An Error-Oriented Approach to Word Embedding Pre-Training
A unified theory for exact stochastic modelling of univariate and multivariate processes with continuous, mixed type, or discrete marginal distributions and any correlation structure
Hierarchical Partial Planarity
On Quantile Risk Measures and Their Domain
Predicting disease-related genes by path-based similarity and community structure in protein-protein interaction network
Markov cubature rules for polynomial processes
A Statistical Perspective on Inverse and Inverse Regression Problems
Parameter identification via optimal control for a Cahn–Hilliard-chemotaxis system with a variable mobility
Should Evolution Necessarily be Egolution?
Load Thresholds for Cuckoo Hashing with Overlapping Blocks
$K_{1,3}$-covering red and blue points in the plane
Computation of Optimal Transport on Discrete Metric Measure Spaces
Retinal Microaneurysms Detection using Local Convergence Index Features
Fast Nearest Neighbor Preserving Embeddings
Central limit theorem for functionals of Gibbs particle processes
Why We Need New Evaluation Metrics for NLG
Learning Aerial Image Segmentation from Online Maps
Second-Order Analysis and Numerical Approximation for Bang-Bang Bilinear Control Problems
Towards learning domain-independent planning heuristics
Choosing Between Methods for Combining p-values
Bijective enumerations of $Γ$-free 0-1 matrices
Detecting random walks on graphs with heterogeneous sensors
A Verified Compiler for Probability Density Functions
Testing for breaks in variance structures with smooth changes
A strong invariance principle for the elephant random walk
What Looks Good with my Sofa: Multimodal Search Engine for Interior Design
Evolution Reinforces Cooperation with the Emergence of Self-Recognition Mechanisms: an empirical study of the Moran process for the iterated Prisoner’s dilemma
Pillar Networks for action recognition
Combinatorics for general kinetically constrained spin models
A study on text-score disagreement in online reviews
The Complexity of Concurrent Rational Synthesis
Autocompletion interfaces make crowd workers slower, but their use promotes response diversity
Securing Visible Light Communication Systems by Beamforming in the Presence of Randomly Distributed Eavesdroppers
Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Eigenvalues of random matrices with isotropic Gaussian noise and the design of Diffusion Tensor Imaging experiments
On some three color Ramsey numbers for paths, cycles, stripes and stars
On Lin’s condition for products of random variables
Reconstruction of Word Embeddings from Sub-Word Parameters
A Framework for Easing the Development of Applications Embedding Answer Set Programming
Mimicking Word Embeddings using Subword RNNs
Dictionary Learning and Sparse Coding-based Denoising for High-Resolution Task Functional Connectivity MRI Analysis
Global Optimization based on Growth Transform Dynamical Model
Steinhaus Triangles Generated by Vectors of the Canonical Bases
Hybrid marked point processes: characterisation, existence and uniqueness
Split and Rephrase
New 5-designs—revisited
A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification
Time Evolution of Many-Body Localized Systems with the Flow Equation Approach
Persistent-homology-based gait recognition
The plasticity of non-overlapping convex sets in R^{2}
Ideological Sublations: Resolution of Dialectic in Population-based Optimization