Neural Variational Inference and Learning in Undirected Graphical Models

Many problems in machine learning are naturally expressed in the language of undirected graphical models. Here, we propose black-box learning and inference algorithms for undirected models that optimize a variational approximation to the log-likelihood of the model. Central to our approach is an upper bound on the log-partition function parametrized by a function q that we express as a flexible neural network. Our bound makes it possible to track the partition function during learning, to speed-up sampling, and to train a broad class of hybrid directed/undirected models via a unified variational inference framework. We empirically demonstrate the effectiveness of our method on several popular generative modeling datasets.

Food Recommender Systems: Important Contributions, Challenges and Future Research Directions

The recommendation of food items is important for many reasons. Attaining cooking inspiration via digital sources is becoming evermore popular; as are systems, which recommend other types of food, such as meals in restaurants or products in supermarkets. Researchers have been studying these kinds of systems for many years, suggesting not only that can they be a means to help people find food they might want to eat, but also help them nourish themselves more healthily. This paper provides a summary of the state-of-the-art of so-called food recommender systems, highlighting both seminal and most recent approaches to the problem, as well as important specializations, such as food recommendation systems for groups of users or systems which promote healthy eating. We moreover discuss the diverse challenges involved in designing recsys for food, summarise the lessons learned from past research and outline what we believe to be important future directions and open questions for the field. In providing these contributions we hope to provide a useful resource for researchers and practitioners alike.

Metric Learning-based Generative Adversarial Network

Generative Adversarial Networks (GANs), as a framework for estimating generative models via an adversarial process, have attracted huge attention and have proven to be powerful in a variety of tasks. However, training GANs is well known for being delicate and unstable, partially caused by its sig- moid cross entropy loss function for the discriminator. To overcome such a problem, many researchers directed their attention on various ways to measure how close the model distribution and real distribution are and have applied dif- ferent metrics as their objective functions. In this paper, we propose a novel framework to train GANs based on distance metric learning and we call it Metric Learning-based Gener- ative Adversarial Network (MLGAN). The discriminator of MLGANs can dynamically learn an appropriate metric, rather than a static one, to measure the distance between generated samples and real samples. Afterwards, MLGANs update the generator under the newly learned metric. We evaluate our ap- proach on several representative datasets and the experimen- tal results demonstrate that MLGANs can achieve superior performance compared with several existing state-of-the-art approaches. We also empirically show that MLGANs could increase the stability of training GANs.

Fidelity-Weighted Learning

Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we propose ‘fidelity-weighted learning’ (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations.

Faster Fuzzing: Reinitialization with Deep Neural Models

We improve the performance of the American Fuzzy Lop (AFL) fuzz testing framework by using Generative Adversarial Network (GAN) models to reinitialize the system with novel seed files. We assess performance based on the temporal rate at which we produce novel and unseen code paths. We compare this approach to seed file generation from a random draw of bytes observed in the training seed files. The code path lengths and variations were not sufficiently diverse to fully replace AFL input generation. However, augmenting native AFL with these additional code paths demonstrated improvements over AFL alone. Specifically, experiments showed the GAN was faster and more effective than the LSTM and out-performed a random augmentation strategy, as measured by the number of unique code paths discovered. GAN helps AFL discover 14.23% more code paths than the random strategy in the same amount of CPU time, finds 6.16% more unique code paths, and finds paths that are on average 13.84% longer. Using GAN shows promise as a reinitialization strategy for AFL to help the fuzzer exercise deep paths in software.

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they don’t know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios. Inevitably, agents encounter new scenarios (e.g., new types of terrain) where optimizing that same reward may lead to undesired behavior. Our insight is that reward functions are merely observations about what the designer actually wants, and that they should be interpreted in the context in which they were designed. We introduce inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP. We introduce approximate methods for solving IRD problems, and use their solution to plan risk-averse behavior in test MDPs. Empirical results suggest that this approach can help alleviate negative side effects of misspecified reward functions and mitigate reward hacking.

SIMILARnet: Simultaneous Intelligent Localization and Recognition Network

Global Average Pooling (GAP) [4] has been used previously to generate class activation for image classification tasks. The motivation behind SIMILARnet comes from the fact that the convolutional filters possess position information of the essential features and hence, combination of the feature maps could help us locate the class instances in an image. We propose a biologically inspired model that is free of differential connections and doesn’t require separate training thereby reducing computation overhead. Our novel architecture generates promising results and unlike existing methods, the model is not sensitive to the input image size, thus promising wider application. Codes for the experiment and illustrations can be found at: https://…/Advanced-GAP .

Bootstrapping Generalization Error Bounds for Time Series

We consider the problem of finding confidence intervals for the risk of forecasting the future of a stationary, ergodic stochastic process, using a model estimated from the past of the process. We show that a bootstrap procedure provides valid confidence intervals for the risk, when the data source is sufficiently mixing, and the loss function and the estimator are suitably smooth. Autoregressive (AR(d)) models estimated by least squares obey the necessary regularity conditions, even when mis-specified, and simulations show that the finite- sample coverage of our bounds quickly converges to the theoretical, asymptotic level. As an intermediate step, we derive sufficient conditions for asymptotic independence between empirical distribution functions formed by splitting a realization of a stochastic process, of independent interest.

Stochastic Cubic Regularization for Fast Nonconvex Optimization

This paper proposes a stochastic variant of a classic algorithm—the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only \mathcal{\tilde{O}}(\epsilon^{-3.5}) stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the \mathcal{\tilde{O}}(\epsilon^{-4}) rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.

Intriguing Properties of Adversarial Examples

It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an inherent uncertainty that neural networks have about their predictions. We show that the functional form of this uncertainty is independent of architecture, dataset, and training protocol; and depends only on the statistics of the logit differences of the network, which do not change significantly during training. This leads to adversarial error having a universal scaling, as a power-law, with respect to the size of the adversarial perturbation. We show that this universality holds for a broad range of datasets (MNIST, CIFAR10, ImageNet, and random data), models (including state-of-the-art deep networks, linear models, adversarially trained networks, and networks trained on randomly shuffled labels), and attacks (FGSM, step l.l., PGD). Motivated by these results, we study the effects of reducing prediction entropy on adversarial robustness. Finally, we study the effect of network architectures on adversarial sensitivity. To do this, we use neural architecture search with reinforcement learning to find adversarially robust architectures on CIFAR10. Our resulting architecture is more robust to white \emph{and} black box attacks compared to previous attempts.

Clustering with feature selection using alternating minimization, Application to computational biology

This paper deals with unsupervised clustering with feature selection. The problem is to estimate both labels and a sparse projection matrix of weights. To address this combinatorial non-convex problem maintaining a strict control on the sparsity of the matrix of weights, we propose an alternating minimization of the Frobenius norm criterion. We provide a new efficient algorithm named K-sparse which alternates k-means with projection-gradient minimization. The projection-gradient step is a method of splitting type, with exact projection on the \ell^1 ball to promote sparsity. The convergence of the gradient-projection step is addressed, and a preliminary analysis of the alternating minimization is made. The Frobenius norm criterion converges as the number of iterates in Algorithm K-sparse goes to infinity. Experiments on Single Cell RNA sequencing datasets show that our method significantly improves the results of PCA k-means, spectral clustering, SIMLR, and Sparcl methods, and achieves a relevant selection of genes. The complexity of K-sparse is linear in the number of samples (cells), so that the method scales up to large datasets.

DLVM: A modern compiler infrastructure for deep learning

Many current approaches to deep learning make use of high-level toolkits such as TensorFlow, Torch, or Caffe. Toolkits such as Caffe have a layer-based programming framework with hard-coded gradients specified for each layer type, making research using novel layer types problematic. Toolkits such as Torch and TensorFlow define a computation graph in a host language such as Python, where each node represents a linear algebra operation parallelized as a compute kernel on GPU and stores the result of evaluation; some of these toolkits subsequently perform runtime interpretation over that graph, storing the results of forward calculations and reverse-accumulated gradients at each node. This approach is more flexible, but these toolkits take a very limited and ad-hoc approach to performing optimization. Also problematic are the facts that most toolkits lack type safety, and target only a single (usually GPU) architecture, limiting users’ abilities to make use of heterogeneous and emerging hardware architectures. We introduce a novel framework for high-level programming that addresses all of the above shortcomings.

A Simple Derivation of the Heap’s Law from the Generalized Zipf’s Law

I reproduce a rather simple formal derivation of the Heaps’ law from the generalized Zipf’s law, which I previously published in Russian.

Signatures of the Many-body Localized Regime in Two Dimensions
A cross-vendor and cross-state analysis of the GPS-probe data latency
Bayesian Inference of Selection in the Wright-Fisher Diffusion Model
The Unit Acquisition Number of a Graph
The Lattice of Amoebas
Random matrices with prescribed eigenvalues and expectation values for random quantum states
Tangent: Automatic Differentiation Using Source Code Transformation in Python
Curve-Structure Segmentation from Depth Maps: A CNN-based Approach and Its Application to Exploring Cultural Heritage Objects
Differential Sensitivity Analysis of Variational Inequalities with Locally Lipschitz Continuous Solution Operators
Algorithms to Approximate Column-Sparse Packing Problems
Adjusting for bias introduced by instrumental variable estimation in the Cox Proportional Hazards Model
Jamming and condensation in one-dimensional driven flow
On f- and h- vectors of relative simplicial complexes
Recurrent Autoregressive Networks for Online Multi-Object Tracking
Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse Corrupted Multiple Measurement Vectors
Fast gradient descent method for convex optimization problems with an oracle that generates a $(δ,L)$-model of a function in a requested point
Parameter-driven models for time series of count data
Stability Analysis of TDD Networks Revisited: A trade-off between Complexity and Precision
Pathwise superhedging on prediction sets
Space-time random walk loop measures
Bandwidth Adaptive & Error Resilient MBR Exact Repair Regenerating Codes
On the Discrimination-Generalization Tradeoff in GANs
About simple variational splines from the Hamiltonian viewpoint
The extended power distribution: A new distribution on $(0, 1)$
Fairness and Transmission-Aware Caching and Delivery Policies in OFDMA-Based HetNets
RubyStar: A Non-Task-Oriented Mixture Model Dialog System
Block-Sparse Recurrent Neural Networks
Learning to Imagine Manipulation Goals for Robot Task Planning
Optimal Brownian Stopping between radially symmetric marginals in general dimensions
Cutoff for lamplighter chains on fractals
Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis
Spatiotemporal pattern extraction by spectral analysis of vector-valued observables
A New Hybrid-parameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition
Deep Fault Analysis and Subset Selection in Solar Power Grids
Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Enumeration of lozenge tilings of a hexagon with a shamrock missing on the symmetry axis
Heuristic Search for Structural Constraints in Data Association
Traffic Prediction Based on Random Connectivity in Deep Learning with Long Short-Term Memory
Multilevel Monte Carlo for Smoothing via Transport Methods
Revealing structure components of the retina by deep learning networks
Random matrices: Probability of Normality
Optimal Auction For Edge Computing Resource Management in Mobile Blockchain Networks: A Deep Learning Approach
Tightness for the Cover Time of $\mathbf{S}^{2}$
A note on two conjectures that strengthen the four colour theorem
List colouring of graphs and generalized Dyck paths
A compressed dynamic self-index for highly repetitive text collections
Transductive Zero-Shot Hashing via Coarse-to-Fine Similarity Mining
Learning Sparse Visual Representations with Leaky Capped Norm Regularizers
The regularity of the linear drift in negatively curved spaces
Constructive Discrepancy Minimization with Hereditary L2 Guarantees
Flexible Bayesian Dynamic Modeling of Covariance and Correlation Matrices
Dimension Estimation Using Random Connection Models
Un résultat intrigant en commande sans modèle
LatentPoison – Adversarial Attacks On The Latent Space
Spectral Unmixing with Multiple Dictionaries
Universal consistency and minimax rates for online Mondrian Forests
Competing first passage percolation on random graphs with finite variance degrees
The Prime Grid. Introducing a geometric representation of natural numbers
Universality of critically pinned interfaces in 2-dimensional isotropic random media
Run Compressed Rank/Select for Large Alphabets
Transient and Slim versus Recurrent and Fat: Random Walks and the Trees they Grow
Rainbow matchings in Dirac bipartite graphs
Improving Hypernymy Extraction with Distributional Semantic Classes
An Analysis of Privacy-Aware Personalization Signals by Using Online Evaluation Methods
Coupling in the queue with impatience: case of several servers
polyDB: A Database for Polytopes and Related Objects
Ramsey graphs induce subgraphs of quadratically many sizes
Multi-dimensional BSDEs whose terminal values are bounded and have bounded Malliavin derivatives
Marginal Release Under Local Differential Privacy
Inference of signals with unknown correlation structure from non-linear measurements
Multi-User Frequency-Selective Hybrid MIMO Demonstrated Using 60 GHz RF Modules
RPYFMM: Parallel Adaptive Fast Multipole Method for Rotne-Prager-Yamakawa Tensor in Biomolecular Hydrodynamics Simulations
Variational Gaussian Dropout is not Bayesian
Cadlag stability and approximation of reversible coalescing-fragmentating Wasserstein dynamics
Stochastic Stability in Max-Product and Max-Plus Systems with Markovian Jumps
Sufficient conditions for Hamilton-connected graphs in terms of (signless Laplacian) spectral radius
Intelligent Fault Analysis in Electrical Power Grids
Location-Aided Coordinated Analog Precoding for Uplink Multi-User Millimeter Wave Systems
Optimal Control of Storage Regeneration with Repair Codes
Recency-weighted Markovian inference
Training coupled spin-torque nano-oscillators to classify patterns in real-time
Proof of a conjecture of Morales-Pak-Panova on reverse plane partitions
Matrix-normal models for fMRI analysis
Learning K-way D-dimensional Discrete Code For Compact Embedding Representations
Curing Epidemics on Networks using a Polya Contagion Model
Lower bounds over Boolean inputs for deep neural networks with ReLU gates
Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs
Functional central limit theorems for rough volatility
Offline signature authenticity verification through unambiguously connected skeleton segments
Real-Time Bi-directional Electric Vehicle Charging Control with Distribution Grid Implementation
Exploration in NetHack with Secret Discovery
Private and Online Optimization of Piecewise Lipschitz Functions