General Latent Feature Models for Heterogeneous Datasets

Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets, where all the attributes that describe each object are of the same (continuous or discrete) nature, there is a lack of work on latent feature modeling for heterogeneous databases. In this paper, we introduce a general Bayesian nonparametric latent feature model suitable for heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while keeping the properties of conjugate models, which allow us to infer the model in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploratory analysis. We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. Moreover, a software package of the GLFM is publicly available for other researcher to use and improve it.

Recurrent Neural Networks with Top-k Gains for Session-based Recommendations

RNNs have been shown to be excellent models for sequential data and in particular for session-based user behavior. The use of RNNs provides impressive performance benefits over classical methods in session-based recommendations. In this work we introduce a novel ranking loss function tailored for RNNs in recommendation settings. The better performance of such loss over alternatives, along with further tricks and improvements described in this work, allow to achieve an overall improvement of up to 35% in terms of MRR and Recall@20 over previous session-based RNN solutions and up to 51% over classical collaborative filtering approaches. Unlike data augmentation-based improvements, our method does not increase training times significantly.

Dionysius: A Framework for Modeling Hierarchical User Interactions in Recommender Systems

We address the following problem: How do we incorporate user item interaction signals as part of the relevance model in a large-scale personalized recommendation system such that, (1) the ability to interpret the model and explain recommendations is retained, and (2) the existing infrastructure designed for the (user profile) content-based model can be leveraged? We propose Dionysius, a hierarchical graphical model based framework and system for incorporating user interactions into recommender systems, with minimal change to the underlying infrastructure. We learn a hidden fields vector for each user by considering the hierarchy of interaction signals, and replace the user profile-based vector with this learned vector, thereby not expanding the feature space at all. Thus, our framework allows the use of existing recommendation infrastructure that supports content based features. We implemented and deployed this system as part of the recommendation platform at LinkedIn for more than one year. We validated the efficacy of our approach through extensive offline experiments with different model choices, as well as online A/B testing experiments. Our deployment of this system as part of the job recommendation engine resulted in significant improvement in the quality of retrieved results, thereby generating improved user experience and positive impact for millions of users.

Adversarial Feature Matching for Text Generation

The Generative Adversarial Network (GAN) has achieved great success in generating realistic (real-valued) synthetic data. However, convergence issues and difficulties dealing with discrete data hinder the applicability of GAN to text. We propose a framework for generating realistic text via adversarial training. We employ a long short-term memory network as generator, and a convolutional network as discriminator. Instead of using the standard objective of GAN, we propose matching the high-dimensional latent feature distributions of real and synthetic sentences, via a kernelized discrepancy metric. This eases adversarial training by alleviating the mode-collapsing problem. Our experiments show superior performance in quantitative evaluation, and demonstrate that our model can generate realistic-looking sentences.

A Direction Search and Spectral Clustering Based Approach to Subspace Clustering

This paper presents a new spectral-clustering-based approach to the subspace clustering problem in which the data lies in the union of an unknown number of unknown linear subspaces. Underpinning the proposed method is a convex program for optimal direction search, which for each data point d, finds an optimal direction in the span of the data that has minimum projection on the other data points and non-vanishing projection on d. The obtained directions are subsequently leveraged to identify a neighborhood set for each data point. An Alternating Direction Method of Multipliers (ADMM) framework is provided to efficiently solve for the optimal directions. The proposed method is shown to often outperform the existing subspace clustering methods, particularly for unwieldy scenarios involving high levels of noise and close subspaces, and yields the state-of-the-art results for the problem of face clustering using subspace segmentation.

Multilevel Clustering via Wasserstein Means

We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, experiment results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.

Clustering High Dimensional Dynamic Data Streams

We present data streaming algorithms for the k-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space \{1, 2, \ldots \Delta\}^d. Our algorithms use k \epsilon^{-2} poly(d \log \Delta) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the cost of the coreset (1+\epsilon)-approximates the cost of the streamed point set. We also provide algorithms that guarantee only positive weights in the coreset with additional logarithmic factors in the space and time complexities. We can use this positively-weighted coreset to compute a (1+\epsilon)-approximation for the k-median problem by any efficient offline k-median algorithm. All previous algorithms for computing a (1+\epsilon)-approximation for the k-median problem over dynamic data streams required space and time exponential in d. Our algorithms can be generalized to metric spaces of bounded doubling dimension.

A New Probabilistic Algorithm for Approximate Model Counting

Constrained counting is important in domains ranging from artificial intelligence to software analysis. There are already a few approaches for counting models over various types of constraints. Recently, hashing-based approaches achieve both theoretical guarantees and scalability, but still rely on solution enumeration. In this paper, a new probabilistic polynomial time approximate model counter is proposed, which is also a hashing-based universal framework, but with only satisfiability queries. A variant with a dynamic stopping criterion is also presented. Empirical evaluation over benchmarks on propositional logic formulas and SMT(BV) formulas shows that the approach is promising.

Deep Control – a simple automatic gain control for memory efficient and high performance training of deep convolutional neural networks

Training a deep convolutional neural net typically starts with a random initialisation of all filters in all layers which severely reduces the forward signal and back-propagated error and leads to slow and sub-optimal training. Techniques that counter that focus on either increasing the signal or increasing the gradients adaptively but the model behaves very differently at the beginning of training compared to later when stable pathways through the net have been established. To compound this problem the effective minibatch size varies greatly between layers at different depths and between individual filters as activation sparsity typically increases with depth leading to a reduction in effective learning rate since gradients may superpose rather than add and this further compounds the covariate shift problem as deeper neurons are less able to adapt to upstream shift. Proposed here is a method of automatic gain control of the signal built into each convolutional neuron that achieves equivalent or superior performance than batch normalisation and is compatible with single sample or minibatch gradient descent. The same model is used both for training and inference. The technique comprises a scaled per sample map mean subtraction from the raw convolutional filter output followed by scaling of the difference.

A Supervised Approach to Extractive Summarisation of Scientific Papers

Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on neural approaches to summarisation, which can be very data-hungry. However, few large datasets exist and none for the traditionally popular domain of scientific publications, which opens up challenging research avenues centered on encoding large, complex documents. In this paper, we introduce a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and show straightforward ways of extending it further. We develop models on the dataset making use of both neural sentence encoding and traditionally used summarisation features and show that models which encode sentences as well as their local and global context perform best, significantly outperforming well-established baseline methods.

Temporally Efficient Deep Learning with Spikes

The vast majority of natural sensory data is temporally redundant. Video frames or audio samples which are sampled at nearby points in time tend to have similar values. Typically, deep learning algorithms take no advantage of this redundancy to reduce computation. This can be an obscene waste of energy. We present a variant on backpropagation for neural networks in which computation scales with the rate of change of the data – not the rate at which we process the data. We do this by having neurons communicate a combination of their state, and their temporal change in state. Intriguingly, this simple communication rule give rise to units that resemble biologically-inspired leaky integrate-and-fire neurons, and to a weight-update rule that is equivalent to a form of Spike-Timing Dependent Plasticity (STDP), a synaptic learning rule observed in the brain. We demonstrate that on MNIST and a temporal variant of MNIST, our algorithm performs about as well as a Multilayer Perceptron trained with backpropagation, despite only communicating discrete values between layers.

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
Verb Physics: Relative Physical Knowledge of Actions and Objects
Accelerated Consensus via Min-Sum Splitting
Shorter signed circuit covers of graphs
Encoding of phonology in a recurrent neural model of grounded speech
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Attention-based Vocabulary Selection for NMT Decoding
SmoothGrad: removing noise by adding noise
Fast Maximum-Likelihood Decoder for 4*4 Quasi-Orthogonal Space-Time Block Code
The interplay between long- and short-range temporal correlations shapes cortex dynamics across vigilance states
Approximate Structure Construction Using Large Statistical Swarms
Closed-form mathematical expressions for the exponentiated Cauchy-Rayleigh distribution
Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking
Wiretap Channels: Nonasymptotic Fundamental Limits
Can We See Photosynthesis? Magnifying the Tiny Color Changes of Plant Green Leaves Using Eulerian Video Magnification
Six Challenges for Neural Machine Translation
Precise large deviation estimates for branching process in random environment
Contrast Enhancement Estimation for Digital Image Forensics
Iterated random functions and regularly varying tails
A Survey on Monochromatic Connections of Graphs
MNL-Bandit: A Dynamic Learning Approach to Assortment Selection
Symmetric stochastic integrals with respect to a class of self-similar Gaussian processes
A Well-Tempered Landscape for Non-convex Robust Subspace Recovery
The Hopf algebra of skew shapes, torsion sheaves on A^n/F_1, and ideals in Hall algebras of monoid representations
Signed Sequential Rank CUSUMs
A new design principle of robust onion-like networks self-organized in growth
SEP-Nets: Small and Effective Pattern Networks
Gaussian martingale inequality applies to random functions and maxima of empirical processes
Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Item Difficulty-Based Label Aggregation Models for Crowdsourcing
Ergodic control of multiclass multi-pool networks in the Halfin-Whitt regime: asymptotic optimality results
Exact Learning from an Honest Teacher That Answers Membership Queries
Efficient Bayesian inference for multivariate factor stochastic volatility models with leverage
Fuzzy Recommendations in Marketing Campaigns
Recommendations for Marketing Campaigns in Telecommunication Business based on the footprint analysis
Long-Term Video Interpolation with Bidirectional Predictive Network
Optimization over Degree Sequences
Modelling prosodic structure using Artificial Neural Networks
Efficient Bayesian estimation for flexible panel models for multivariate outcomes: Impact of life events on mental health and excessive alcohol consumption
A Note on the Relationship Between Conditional and Unconditional Independence, and its Extensions for Markov Kernels
Reverse juggling processes
Accelerated Dual Learning by Homotopic Initialization
RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval
Asynchronous Graph Pattern Matching on Multiprocessor Systems
A note on critical Hawkes processes
Efficient Rare-Event Simulation for Multiple Jump Events in Regularly Varying Random Walks and Compound Poisson Processes
Optimal input design for system identification using spectral decomposition
Minimum supports of eigenfunctions of Johnson graphs
Distributed Detection of Cycles
Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks
Distributed Subgraph Detection
Characterization and Minimal Embeddings of Connected Neural Codes
Recurrent Inference Machines for Solving Inverse Problems
Isoperimetric Inequalities for Non-Local Dirichlet Forms
Recurrent Latent Variable Networks for Session-Based Recommendation
Mix & Match Hamiltonian Monte Carlo
On Natural Language Generation of Formal Argumentation
Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty
Indirect Image Registration with Large Diffeomorphic Deformations
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Relaxation of monotone coupling conditions: Poisson approximation and beyond
Deleting vertices to graphs of bounded genus
Interaction-Based Distributed Learning in Cyber-Physical and Social Networks
MIMO First and Second Order Discrete Sliding Mode Controls of Uncertain Linear Systems under Implementation Imprecisions
Algebraic localization in disordered one-dimensional systems with long-range hopping
Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations
Multifractality in the generalized Aubry-Andre quasiperiodic localization model with power-law hoppings or power-law Fourier coefficients
On the size of quotient of two subsets of positive integers
Technical Report: Implementation and Validation of a Smart Health Application
Zero-Shot Relation Extraction via Reading Comprehension
Live Service Migration in Mobile Edge Clouds
From MEGATON to RASCAL: Surfing the Parameter Space of Evolutionary Algorithms
Joint Max Margin and Semantic Features for Continuous Event Detection in Complex Scenes
Video Imagination from a Single Image with Transformation Generation
Online Learning for Structured Loss Spaces
On Gallai’s conjecture for series-parallel graphs and planar 3-trees
On Martingale Problems and Feller Processes
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
Prediction of Muscle Activations for Reaching Movements using Deep Neural Networks
Layer Communities in Multiplex Networks
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks
Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier
Gradient descent GAN optimization is locally stable
Lost Relatives of the Gumbel Trick
Chip-firing on trees of loops
Triangles capturing many lattice points
Application of Market Models to Network Equilibrium Problems
The Power of Choice in Priority Scheduling
Sequential rerandomization