Low-Rank Kernel Subspace Clustering

Most state-of-the-art subspace clustering methods only work with linear (or affine) subspaces. In this paper, we present a kernel subspace clustering method that can handle non-linear models. While an arbitrary kernel can non-linearly map data into high-dimensional Hilbert feature space, the data in the resulting feature space are very unlikely to have the desired subspace structures. By contrast, we propose to learn a low-rank kernel mapping, with which the mapped data in feature space are not only low-rank but also self-expressive, such that the low-dimensional subspace structures are present and manifested in the high-dimensional feature space. We have evaluated the proposed method extensively on both motion segmentation and image clustering benchmarks, and obtained superior results, outperforming the kernel subspace clustering method that uses standard kernels~\cite{patel2014kernel} and other state-of-the-art linear subspace clustering methods.

Online Multi-Armed Bandit

We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber(p_i) for some underlying mean p_i) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we have suggested are indeed optimal up to a constant factor. We also investigate the case where the distribution from which the underlying means are drawn is not known ahead of time. We again, are able to suggest algorithms that are optimal up to a constant factor for this case, given certain mild conditions on the universe of distributions.

graph2vec: Learning Distributed Representations of Graphs

Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec’s embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.

Deep Learning to Attend to Risk in ICU

Modeling physiological time-series in ICU is of high clinical importance. However, data collected within ICU are irregular in time and often contain missing measurements. Since absence of a measure would signify its lack of importance, the missingness is indeed informative and might reflect the decision making by the clinician. Here we propose a deep learning architecture that can effectively handle these challenges for predicting ICU mortality outcomes. The model is based on Long Short-Term Memory, and has layered attention mechanisms. At the sensing layer, the model decides whether to observe and incorporate parts of the current measurements. At the reasoning layer, evidences across time steps are weighted and combined. The model is evaluated on the PhysioNet 2012 dataset showing competitive and interpretable results.

Iris: A Conversational Agent for Complex Tasks

Today’s conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to ‘plot a histogram’ with another to first ‘log-transform the data’. To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today’s agents, but empowers users to weave together these commands to accomplish complex goals.

Optimization by gradient boosting

Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate L^2 penalization of the loss and strong convexity arguments.

Neural Reranking for Named Entity Recognition

We propose a neural reranking system for named entity recognition (NER). The basic idea is to leverage recurrent neural network models to learn sentence-level patterns that involve named entity mentions. In particular, given an output sentence produced by a baseline NER model, we replace all entity mentions, such as \textit{Barack Obama}, into their entity types, such as \textit{PER}. The resulting sentence patterns contain direct output information, yet is less sparse without specific named entities. For example, ‘PER was born in LOC’ can be such a pattern. LSTM and CNN structures are utilised for learning deep representations of such sentences for reranking. Results show that our system can significantly improve the NER accuracies over two different baselines, giving the best reported results on a standard benchmark.

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven’t yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human ‘in the loop’ and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human’s intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent’s learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.

Translational Recommender Networks

Representing relationships as translations in vector space lives at the heart of many neural embedding models such as word embeddings and knowledge graph embeddings. In this work, we study the connections of this translational principle with collaborative filtering algorithms. We propose Translational Recommender Networks (\textsc{TransRec}), a new attentive neural architecture that utilizes the translational principle to model the relationships between user and item pairs. Our model employs a neural attention mechanism over a \emph{Latent Relational Attentive Memory} (LRAM) module to learn the latent relations between user-item pairs that best explains the interaction. By exploiting adaptive user-item specific translations in vector space, our model also alleviates the geometric inflexibility problem of other metric learning algorithms while enabling greater modeling capability and fine-grained fitting of users and items in vector space. The proposed architecture not only demonstrates the state-of-the-art performance across multiple recommendation benchmarks but also boasts of improved interpretability. Qualitative studies over the LRAM module shows evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of \textsc{TransRec} to uncover hidden relational structure within implicit datasets.

On Lasso refitting strategies

A well-know drawback of l1-penalized estimators is the systematic shrinkage of the large coefficients towards zero. A simple remedy is to treat Lasso as a model-selection procedure and to perform a second refitting step on the selected support. In this work we formalize the notion of refitting and provide oracle bounds for arbitrary refitting procedures of the Lasso solution. One of the most widely used refitting techniques which is based on least-squares may bring a problem of interpretability, since the signs of the refitted estimator might be flipped with respect to the original estimator. This problem arises from the fact that the least-square refitting considers only the support of the Lasso solution, avoiding any information about signs or amplitudes. To this end we define a sign-consistent refitting as an arbitrary refitting procedure, preserving the signs of the first step Lasso solution and provide Oracle inequalities for such estimators. Finally, we consider special refitting strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful property to converge to the sign-consistent least-squares refitting (least-squares with sign constraints), which provides with greater interpretability. We additionally study the Bregman Lasso refitting in the case of orthogonal design, providing with simple intuition behind the proposed method. Boosted Lasso, in contrast, considers information about magnitudes of the first Lasso step and allows to develop better oracle rates for prediction. Finally, we conduct an extensive numerical study to show advantages of one approach over others in different synthetic and semi-real scenarios.

Learning to select data for transfer learning with Bayesian Optimization

Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we propose to \emph{learn} data selection measures using Bayesian Optimization and evaluate them across models, domains and tasks. Our learned measures outperform existing domain similarity measures significantly on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We show the importance of complementing similarity with diversity, and that learned measures are — to some degree — transferable across models, domains, and even tasks.

MAG: A Multilingual, Knowledge-based Agnostic and Deterministic Entity Linking Approach

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.

Piecewise Deterministic Markov Chain Monte Carlo

A novel class of non-reversible Markov chain Monte Carlo schemes relying on continuous-time piecewise deterministic Markov Processes has recently emerged. In these algorithms, the state of the Markov process evolves according to a deterministic dynamics which is modified using a Markov transition kernel at random event times. These methods enjoy remarkable features including the ability to update only a subset of the state components while other components implicitly keep evolving. However, several important problems remain open. The deterministic dynamics used so far do not exploit the structure of the target. Moreover, exact simulation of the event times is feasible for an important yet restricted class of problems and, even when it is, it is application specific. This limits the applicability of these methods and prevents the development of a generic software implementation. In this paper, we introduce novel MCMC methods addressing these limitations by bringing together piecewise deterministic Markov processes, Hamiltonian dynamics and slice sampling. We propose novel continuous-time algorithms relying on exact Hamiltonian flows and novel discrete-time algorithms which can exploit complex dynamics such as approximate Hamiltonian dynamics arising from symplectic integrators. We demonstrate the performance of these schemes on a variety of applications.

Expected exponential loss for gaze-based video and volume ground truth annotation
Modeling the SBC Tanzania Production-Distribution Logistics Network
Some useful theorems for asymptotic formulas and their applications to skew plane partitions and cylindric partitions
Polylogarithmic Approximation Algorithms for Weighted-$\mathcal{F}$-Deletion Problems
Packing chromatic number versus chromatic and clique number
Pancreas Segmentation in MRI using Graph-Based Decision Fusion on Convolutional Neural Networks
End-to-End Information Extraction without Token-Level Supervision
Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features
Feedback Vertex Set Inspired Kernel for Chordal Vertex Deletion
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
The Multivariate Hawkes Process in High Dimensions: Beyond Mutual Excitation
Projected Power Iteration for Network Alignment
Pathological OCT Retinal Layer Segmentation using Branch Residual U-shape Networks
Automatized Generation of Alphabets of Symbols
Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
Comparative Performance Analysis of Neural Networks Architectures on H2O Platform for Various Activation Functions
Automatic Backward Differentiation for American Monte-Carlo Algorithms (Conditional Expectation)
Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data
A characterization of Fibonacci numbers
Probing many-body localization in a disordered quantum magnet
Almost sure growth of supercritical multi-type continuous state branching process
Random initial conditions for semi-linear PDEs
Improving Adherence to Heart Failure Management Guidelines via Abductive Reasoning
An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit
Attitude Control of a 2U Cubesat by Magnetic and Air Drag Torques
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Bad News for Chordal Partitions
Visual Question Answering with Memory-Augmented Networks
On basic graphs of symmetric graphs of valency five
A weak version of path-dependent functional Itô calculus
Stochastic Near-Optimal Controls for Path-Dependent Systems
Optimal Equilibrium for Time-Inconsistent Stopping Problems — the Discrete-Time Case
Practical Locally Private Heavy Hitters
Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
MoCoGAN: Decomposing Motion and Content for Video Generation
Jackknife Empirical Likelihood-based inference for S-Gini indices
In-Order Transition-based Constituent Parsing
Coalition formation for Multi-agent Pursuit based on Neural Network and AGRMF Model
Covariant Information Theory and Emergent Gravity
‘Maximizing rigidity’ revisited: a convex programming approach for generic 3D shape reconstruction from multiple perspective views
A Real-time Image Reconstruction System for Particle Treatment Planning Using Proton Computed Tomography (pCT)
Fully polynomial FPT algorithms for some classes of bounded clique-width graphs
Energy Conservation and Decoupling in Optical Fibers with Brick-Walls Attenuation Profile
Convergence to consensus of the general finite-dimensional Cucker-Smale model with time-varying delays
Strong Local Nondeterminism of Spherical Fractional Brownian Motion
Residual Features and Unified Prediction Network for Single Stage Detection
Discrete Extremes
Estimation and testing of survival functions via generalized fiducial inference with censored data
Line-Recovery by Programmable Particles
A simple method for the existence of a density for stochastic evolutions with rough coefficients
Dynamics of quantum information in many-body localized systems
Designing Effective Inter-Pixel Information Flow for Natural Image Matting
Chordal decomposition in operator-splitting methods for sparse semidefinite programs
Speeding up the K{ö}hler’s method of contrast thresholding
Optimal Storage under Unsynchrononized Mobile Byzantine Faults
Asymptotic degree distribution in preferential attachment graph models with multiple type edges
Geometric Rescaling Algorithms for Submodular Function Minimization
Every finite non-solvable group admits an Oriented Regular Representation
Well-posedness for SDEs driven by different type of noises
Optimal conflict-free colouring with respect to a subset of intervals
From Quenched Disorder to Continuous Time Random Walk
Lower Bounds for Searching Robots, some Faulty
Eigenvalues and Wiener index of the Zero Divisor graph $Γ[\mathbb {Z}_n]$
On explicit order 1.5 approximations with varying coefficients: the case of super-linear diffusion coefficients
Truly Sub-cubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference Min-Plus Product
On consistency of optimal pricing algorithms in repeated posted-price auctions with strategic buyer
Classification of finite groups that admit an oriented regular representation
Markov loops topology
Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation
The Power of Constraint Grammars Revisited
To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging
LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task
Approximate Directed Minimum Degree Spanning Tree in Polynomial Time
Tight Analysis of Randomized Greedy MIS
Differentially Private Testing of Identity and Closeness of Discrete Distributions
On the Parallel Undecided-State Dynamics with Two Colors
Fully Automatic and Real-Time Catheter Segmentation in X-Ray Fluoroscopy
Queues Driven by Hawkes Processes
When You Must Forget: beyond strong persistence when forgetting in answer set programming
Preliminary Exploration of Formula Embedding for Mathematical Information Retrieval: can mathematical formulae be embedded like a natural language?
Oscillations in networks of networks stem from adaptive nodes with memory
On the analysis of signals in a permutation Lempel-Ziv complexity – permutation Shannon entropy plane
Cosmological model discrimination with Deep Learning
A moment-generating formula for Erdős-Rényi component sizes
A Unified Framework for Capacitated Covering Problems in Metric and Geometric Spaces
Weak Modular Product of Bipartite Graphs, Bicliques and Isomorphism
Construction of exact constants of motion and effective models for many-body localized systems
Online codes for analog signals
A Discrete Bouncy Particle Sampler
Moment bounds for some fractional stochastic heat equations on the ball
Auxiliary Objectives for Neural Error Detection Models
Detecting Off-topic Responses to Visual Prompts
Discrete-type approximations for non-Markovian optimal stopping problems: Part I
Artificial Error Generation with Machine Translation and Syntactic Patterns
Coloring Down: $3/2$-approximation for special cases of the weighted tree augmentation problem
Aesthetic-Driven Image Enhancement by Adversarial Learning
Spanning Euler families in hypergraphs with certain vertex cuts
Random eigenfunctions on flat tori: universality for the number of intersections
Exploring text datasets by visualizing relevant words
A Simple Language Model based on PMI Matrix Approximations
Computation Rate Maximization for Wireless Powered Mobile Edge Computing
Quasi-device-independent witnessing of genuine multilevel quantum coherence
Cyclic pseudo-{L}oupekine snarks
Reverse Curriculum Generation for Reinforcement Learning