Unsupervised Learning Layers for Video Analysis

This paper presents two unsupervised learning layers (UL layers) for label-free video analysis: one for fully connected layers, and the other for convolutional ones. The proposed UL layers can play two roles: they can be the cost function layer for providing global training signal; meanwhile they can be added to any regular neural network layers for providing local training signals and combined with the training signals backpropagated from upper layers for extracting both slow and fast changing features at layers of different depths. Therefore, the UL layers can be used in either pure unsupervised or semi-supervised settings. Both a closed-form solution and an online learning algorithm for two UL layers are provided. Experiments with unlabeled synthetic and real-world videos demonstrated that the neural networks equipped with UL layers and trained with the proposed online learning algorithm can extract shape and motion information from video sequences of moving objects. The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization.

Proximity Variational Inference

Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initialization and optimization quirks and finds better local optima. We demonstrate our method on three proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network with both real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014). Empirically, we show that PVI consistently finds better local optima and gives better predictive performance.

Approximation and Convergence Properties of Generative Adversarial Learning

Generative adversarial networks (GAN) approximate a target data distribution by jointly optimizing an objective function through a ‘two-player game’ between a generator and a discriminator. Despite their empirical success, however, two very basic questions on how well they can approximate the target distribution remain unanswered. First, it is not known how restricting the discriminator family affects the approximation quality. Second, while a number of different objective functions have been proposed, we do not understand when convergence to the global minima of the objective function leads to convergence to the target distribution under various notions of distributional convergence. In this paper, we address these questions in a broad and unified setting by defining a notion of adversarial divergences that includes a number of recently proposed objective functions. We show that if the objective function is an adversarial divergence with some additional conditions, then using a restricted discriminator family has a moment-matching effect. Additionally, we show that for objective functions that are strict adversarial divergences, convergence in the objective function implies weak convergence, thus generalizing previous results.

Neural Decomposition of Time-Series Data for Effective Generalization

We present a neural network technique for the analysis and extrapolation of time-series data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourier-like decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the Mackey-Glass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a time-series of monthly international airline passengers, the monthly ozone concentration in downtown Los Angeles, and an unevenly sampled time-series of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular time-series forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR with a radial basis function, and Gashler and Ashmore’s model.

Towards Consistency of Adversarial Training for Generative Models

This work presents a rigorous statistical analysis of adversarial training for generative models, advancing recent work by Arjovsky and Bottou [2]. A key element is the distinction between the objective function with respect to the (unknown) data distribution, and its empirical counterpart. This yields a straight-forward explanation for common pathologies in practical adversarial training such as vanishing gradients. To overcome such issues, we pursue the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating noise in the formulation of the discriminator. As we show, this effectively leads to an empirical version of the JSD in which the true and the generator densities are replaced by kernel density estimates. We analyze statistical consistency of this objective, and demonstrate its practical effectiveness.

Neural Attribute Machines for Program Generation

Recurrent neural networks have achieved remarkable success at generating sequences with complex structures, thanks to advances that include richer embeddings of input and cures for vanishing gradients. Trained only on sequences from a known grammar, though, they can still struggle to learn rules and constraints of the grammar. Neural Attribute Machines (NAMs) are equipped with a logical machine that represents the underlying grammar, which is used to teach the constraints to the neural machine by (i) augmenting the input sequence, and (ii) optimizing a custom loss function. Unlike traditional RNNs, NAMs are exposed to the grammar, as well as samples from the language of the grammar. During generation, NAMs make significantly fewer violations of the constraints of the underlying grammar than RNNs trained only on samples from the language of the grammar.

Geometric Methods for Robust Data Analysis in High Dimension

Machine learning and data analysis now finds both scientific and industrial application in biology, chemistry, geology, medicine, and physics. These applications rely on large quantities of data gathered from automated sensors and user input. Furthermore, the dimensionality of many datasets is extreme: more details are being gathered about single user interactions or sensor readings. All of these applications encounter problems with a common theme: use observed data to make inferences about the world. Our work obtains the first provably efficient algorithms for Independent Component Analysis (ICA) in the presence of heavy-tailed data. The main tool in this result is the centroid body (a well-known topic in convex geometry), along with optimization and random walks for sampling from a convex body. This is the first algorithmic use of the centroid body and it is of independent theoretical interest, since it effectively replaces the estimation of covariance from samples, and is more generally accessible. This reduction relies on a non-linear transformation of samples from such an intersection of halfspaces (i.e. a simplex) to samples which are approximately from a linearly transformed product distribution. Through this transformation of samples, which can be done efficiently, one can then use an ICA algorithm to recover the vertices of the intersection of halfspaces. Finally, we again use ICA as an algorithmic primitive to construct an efficient solution to the widely-studied problem of learning the parameters of a Gaussian mixture model. Our algorithm again transforms samples from a Gaussian mixture model into samples which fit into the ICA model and, when processed by an ICA algorithm, result in recovery of the mixture parameters. Our algorithm is effective even when the number of Gaussians in the mixture grows polynomially with the ambient dimension

Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks

Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly focuses on modeling popularity using a single metric such as the total number of likes or shares. In this work, we propose Diffusion-LSTM, a memory-based deep recurrent network that learns to recursively predict the entire diffusion path of an image through a social network. By combining user social features and image features, and encoding the diffusion path taken thus far with an explicit memory cell, our model predicts the diffusion path of an image more accurately compared to alternate baselines that either encode only image or social features, or lack memory. By mapping individual users to user prototypes, our model can generalize to new users not seen during training. Finally, we demonstrate our model’s capability of generating diffusion trees, and show that the generated trees closely resemble ground-truth trees.

Implicit Regularization in Matrix Factorization

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix X with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

Consistent Kernel Density Estimation with Non-Vanishing Bandwidth

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Attention-based Natural Language Person Retrieval

Counterfactual Multi-Agent Policy Gradients

Compiling Quantum Circuits to Realistic Hardware Architectures using Temporal Planners

Adaptive Estimation of High Dimensional Partially Linear Model

Doubly Stochastic Variational Inference for Deep Gaussian Processes

Visual Servoing from Deep Neural Networks

Dual Dynamic Programming with cut selection: convergence proof and numerical experiments

Joint PoS Tagging and Stemming for Agglutinative Languages

Novel Deep Convolution Neural Network Applied to MRI Cardiac Segmentation

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

New Results for Provable Dynamic Robust PCA

Efficient, Safe, and Probably Approximately Complete Learning of Action Models

Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo

Communication vs Distributed Computation: an alternative trade-off curve

Logic Tensor Networks for Semantic Image Interpretation

Optimal Cooperative Inference

Cultural Diffusion and Trends in Facebook Photographs

The Onsager-Machlup functional associated with additive fractional noise

Multicut decomposition methods with cut selection for multistage stochastic programs

Automatic sequences and generalised polynomials

Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks

Plug-and-Play Unplugged: Optimization Free Reconstruction using Consensus Equilibrium

The Dual Graph Shift Operator: Identifying the Support of the Frequency Domain

Matroids Hitting Sets and Unsupervised Dependency Grammar Induction

State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

Large induced subgraphs with $k$ vertices of almost maximum degree

Extraction and Classification of Diving Clips from Continuous Video Footage

Principled Hybrids of Generative and Discriminative Domain Adaptation

The tessellation problem of quantum walks

Learning to Pour

Spectrum Sharing and Cyclical Multiple Access in UAV-Aided Cellular Offloading

Online Edge Grafting for Efficient MRF Structure Learning

Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

Lat-Net: Compressing Lattice Boltzmann Flow Simulations using Deep Neural Networks

Deriving Neural Architectures from Sequence and Graph Kernels

A Conic Integer Programming Approach to Constrained Assortment Optimization under the Mixed Multinomial Logit Model

Energy-Efficient Multi-Pair Two-Way AF Full-Duplex Massive MIMO Relaying

Cross-Domain Perceptual Reward Functions

Expectation Propagation for t-Exponential Family Using Q-Algebra

Convergence of Langevin MCMC in KL-divergence

A Clustering-based Consistency Adaptation Strategy for Distributed SDN Controllers

Weakly Supervised Semantic Segmentation Based on Co-segmentation

Circular law for the sum of random permutation matrices

Max-Cosine Matching Based Neural Models for Recognizing Textual Entailment

The cost of fairness in classification

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

A Spatial Branch-and-Cut Method for Nonconvex QCQP with Bounded Complex Variables

An Empirical Analysis of Approximation Algorithms for the Euclidean Traveling Salesman Problem

Vector Transport-Free SVRG with General Retraction for Riemannian Optimization: Complexity Analysis and Practical Implementation

Triangle Finding and Listing in CONGEST Networks

MagNet: a Two-Pronged Defense against Adversarial Examples

Gaps between avalanches in 1D Random Field Ising Models

Load Balancing for Skewed Streams on Heterogeneous Cluster

Wireless Powered Communications with Finite Battery and Finite Blocklength

Port-Hamiltonian descriptor systems

Dynamic degree-corrected blockmodels for social networks: a nonparametric approach

Performance Optimization of Co-Existing Underlay Secondary Networks

Recent progress in many-body localization

SLAM based Quasi Dense Reconstruction For Minimally Invasive Surgery Scenes

A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects

The structure of delta-matroids with width one twists

Topology Induced Oscillations in Majorana Fermions in a Quasiperiodic Superconducting Chain

First-spike based visual categorization using reward-modulated STDP

Deep image representations using caption generators

Distributionally Robust Optimisation in Congestion Control

Cut-norm and entropy minimization over weak* limits

Boolean dimension and local dimension

Shorter stabilizer circuits via Bruhat decomposition and quantum circuit transformations

On the (parameterized) complexity of recognizing well-covered (r,l)-graphs

Investigation of Using VAE for i-Vector Speaker Verification

Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs

Classification of Quantitative Light-Induced Fluorescence Images Using Convolutional Neural Network

Firing rate equations require a spike synchrony mechanism to correctly describe fast oscillations in inhibitory networks

Learning Structured Text Representations

A simplicial decomposition framework for large scale convex quadratic programming

Hypergeometric and basic hypergeometric series and integrals associated with root systems

Geometry of time-reversible group-based models

Asynchronous Parallel Bayesian Optimisation via Thompson Sampling

GSplit LBI: Taming the Procedural Bias in Neuroimaging for Disease Prediction

Arrangements of homothets of a convex body II

On the Cauchy problem for integro-differential equations in the scale of spaces of generalized smoothness

Quantum-secured blockchain

Entanglement properties of quantum grid states

Flux-dependent localisation in a disordered flat-band lattice

Is Our Model for Contention Resolution Wrong?

Filtering Variational Objectives

Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework