Linking Generative Adversarial Learning and Binary Classification

In this note, we point out a basic link between generative adversarial (GA) training and binary classification — any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.

Interacting Attention-gated Recurrent Networks for Recommendation

Capturing the temporal dynamics of user preferences over items is important for recommendation. Existing methods mainly assume that all time steps in user-item interaction history are equally relevant to recommendation, which however does not apply in real-world scenarios where user-item interactions can often happen accidentally. More importantly, they learn user and item dynamics separately, thus failing to capture their joint effects on user-item interactions. To better model user and item dynamics, we present the Interacting Attention-gated Recurrent Network (IARN) which adopts the attention model to measure the relevance of each time step. In particular, we propose a novel attention scheme to learn the attention scores of user and item history in an interacting way, thus to account for the dependencies between user and item dynamics in shaping user-item interactions. By doing so, IARN can selectively memorize different time steps of a user’s history when predicting her preferences over different items. Our model can therefore provide meaningful interpretations for recommendation results, which could be further enhanced by auxiliary features. Extensive validation on real-world datasets shows that IARN consistently outperforms state-of-the-art methods.

Knowledge Transfer Between Artificial Intelligence Systems

We consider the fundamental question: how a legacy ‘student’ Artificial Intelligent (AI) system could learn from a legacy ‘teacher’ AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here ‘learning’ is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the ‘student’ Artificial Intelligent system have the structure of an n-dimensional topological vector space and n is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for n sufficiently large, with probability close to one, the ‘student’ system can successfully and non-iteratively learn k\ll n new examples from the ‘teacher’ (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.

Sequence Prediction with Neural Segmental Models

Segments that span contiguous parts of inputs, such as phonemes in speech, named entities in sentences, actions in videos, occur frequently in sequence prediction problems. Segmental models, a class of models that explicitly hypothesizes segments, have allowed the exploration of rich segment features for sequence prediction. However, segmental models suffer from slow decoding, hampering the use of computationally expensive features. In this thesis, we introduce discriminative segmental cascades, a multi-pass inference framework that allows us to improve accuracy by adding higher-order features and neural segmental features while maintaining efficiency. We also show that instead of including more features to obtain better accuracy, segmental cascades can be used to speed up training and decoding. Segmental models, similarly to conventional speech recognizers, are typically trained in multiple stages. In the first stage, a frame classifier is trained with manual alignments, and then in the second stage, segmental models are trained with manual alignments and the out- puts of the frame classifier. However, obtaining manual alignments are time-consuming and expensive. We explore end-to-end training for segmental models with various loss functions, and show how end-to-end training with marginal log loss can eliminate the need for detailed manual alignments. We draw the connections between the marginal log loss and a popular end-to-end training approach called connectionist temporal classification. We present a unifying framework for various end-to-end graph search-based models, such as hidden Markov models, connectionist temporal classification, and segmental models. Finally, we discuss possible extensions of segmental models to large-vocabulary sequence prediction tasks.

Auto-G-Computation of Causal Effects on a Network

Methods for inferring average causal effects have traditionally relied on two key assumptions: (i) first, that the intervention received by one unit cannot causally influence the outcome of another, i.e. that there is no interference between units; and (ii) that units can be organized into non-overlapping groups such that outcomes of units in separate groups are independent. In this paper, we develop new statistical methods for causal inference based on a single realization of a network of connected units for which neither assumption (i) nor (ii) holds. The proposed approach allows both for arbitrary forms of interference, whereby the outcome of a unit may depend (directly or indirectly) on interventions received by other units with whom a network path through connected units exists; and long range dependence, whereby outcomes for any two units likewise connected by a path may be dependent. Under the standard assumptions of consistency and lack of unobserved confounding adapted to the network setting, statistical inference is further made tractable by an assumption that the outcome vector defined on the network is a single realization of a certain Markov random field (MRF). As we show, this assumption allows inferences about various network causal effects including direct and spillover effects via the auto-g-computation algorithm, a network generalization of Robins’ well-known g-computation algorithm previously described for causal inference under assumptions (i) and (ii).

Improving Landmark Localization with Semi-Supervised Learning

We present two techniques to improve landmark localization from partially annotated datasets. Our primary goal is to leverage the common situation where precise landmark locations are only provided for a small data subset, but where class labels for classification tasks related to the landmarks are more abundantly available. We propose a new architecture for landmark localization, where training with class labels acts as an auxiliary signal to guide the landmark localization on unlabeled data. A key aspect of our approach is that errors can be backpropagated through a complete landmark localization model. We also propose and explore an unsupervised learning technique for landmark localization based on having a model predict equivariant landmarks with respect to transformations applied to the image. We show that this technique, used as additional regularization, improves landmark prediction considerably and can learn effective detectors even when only a small fraction of the dataset has labels for landmarks. We present results on two toy datasets and three real datasets, with hands and faces, respectively, showing the performance gain of our method on each.

Unsupervised Generative Modeling Using Matrix Product States

Generative modeling, which learns joint probability distribution from training data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning by utilizing the density matrix renormalization group method which allows dynamic adjusting dimensions of the tensors, and offers an efficient direct sampling approach, Zipper, for generative tasks. We apply our method to generative modeling of several standard datasets including the principled Bars and Stripes, random binary patterns and the MNIST handwritten digits, to illustrate ability of our model, and discuss features as well as drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work shed light on many interesting directions for future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which is of possibility of being realized by a quantum device.

Deep learning from crowds

Over the last few years, deep learning has revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. However, as the size of supervised artificial neural networks grows, typically so does the need for larger labeled datasets. Recently, crowdsourcing has established itself as an efficient and cost-effective solution for labeling large sets of data in a scalable manner, but it often requires aggregating labels from multiple noisy contributors with different levels of expertise. In this paper, we address the problem of learning deep neural networks from crowds. We begin by describing an EM algorithm for jointly learning the parameters of the network and the confusion matrices of the different annotators for classification settings. Then, a novel general-purpose crowd layer is proposed, which allows us to train deep neural networks end-to-end, directly from the noisy labels of multiple annotators, using backpropagation. We empirically show that the proposed approach is able to internally capture the reliability and biases of different annotators and achieve new state-of-the-art results for various crowdsourced datasets across different settings, namely classification, regression and sequence labeling.

SPECTRE: Supporting Consumption Policies in Window-Based Parallel Complex Event Processing

Distributed Complex Event Processing (DCEP) is a paradigm to infer the occurrence of complex situations in the surrounding world from basic events like sensor readings. In doing so, DCEP operators detect event patterns on their incoming event streams. To yield high operator throughput, data parallelization frameworks divide the incoming event streams of an operator into overlapping windows that are processed in parallel by a number of operator instances. In doing so, the basic assumption is that the different windows can be processed independently from each other. However, consumption policies enforce that events can only be part of one pattern instance; then, they are consumed, i.e., removed from further pattern detection. That implies that the constituent events of a pattern instance detected in one window are excluded from all other windows as well, which breaks the data parallelism between different windows. In this paper, we tackle this problem by means of speculation: Based on the likelihood of an event’s consumption in a window, subsequent windows may speculatively suppress that event. We propose the SPECTRE framework for speculative processing of multiple dependent windows in parallel. Our evaluations show an up to linear scalability of SPECTRE with the number of CPU cores.

Neural Networks Regularization Through Invariant Features Learning

Training deep neural networks is known to require a large number of training samples. However, in many applications only few training samples are available. In this work, we tackle the issue of training neural networks for classification task when few training samples are available. We attempt to solve this issue by proposing a regularization term that constrains the hidden layers of a network to learn class-wise invariant features. In our regularization framework, learning invariant features is generalized to the class membership where samples with the same class should have the same feature representation. Numerical experiments over MNIST and its variants showed that our proposal is more efficient for the case of few training samples. Moreover, we show an intriguing property of representation learning within neural networks. The source code of our framework is freely available https://…/learning-class-invariant-features.

Polar Transformer Networks

Convolutional neural networks (CNNs) are equivariant with respect to translation; a translation in the input causes a translation in the output. Attempts to generalize equivariance have concentrated on rotations. In this paper, we combine the idea of the spatial transformer, and the canonical coordinate representations of groups (polar transform) to realize a network that is invariant to translation, and equivariant to rotation and scale. A conventional CNN is used to predict the origin of a polar transform. The polar transform is performed in a differentiable way, similar to the Spatial Transformer Networks, and the resulting polar representation is fed into a second CNN. The model is trained end-to-end with a classification loss. We apply the method on variations of MNIST, obtained by perturbing it with clutter, translation, rotation, and scaling. We achieve state of the art performance in the rotated MNIST, with fewer parameters and faster training time than previous methods, and we outperform all tested methods in the SIM2MNIST dataset, which we introduce.

Convolutional Gaussian Processes

We present a practical way of introducing convolutional structure into Gaussian processes, making them more suited to high-dimensional inputs like images. The main contribution of our work is the construction of an inter-domain inducing point approximation that is well-tailored to the convolutional kernel. This allows us to gain the generalisation benefit of a convolutional kernel, together with fast but accurate posterior inference. We investigate several variations of the convolutional kernel, and apply it to MNIST and CIFAR-10, which have both been known to be challenging for Gaussian processes. We also show how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance. We hope that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models.

Deep and Confident Prediction for Time Series at Uber

Reliable uncertainty estimation for time series prediction is critical in many fields, including physics, biology, and manufacturing. At Uber, probabilistic time series forecasting is used for robust prediction of number of trips during special events, driver incentive allocation, as well as real-time anomaly detection across millions of metrics. Classical time series models are often used in conjunction with a probabilistic formulation for uncertainty estimation. However, such models are hard to tune, scale, and add exogenous variables to. Motivated by the recent resurgence of Long Short Term Memory networks, we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation. We provide detailed experiments of the proposed solution on completed trips data, and successfully apply it to large-scale time series anomaly detection at Uber.

Anderson localization transitions with and without random potentials
Just Take the Average! An Embarrassingly Simple $2^n$-Time Algorithm for SVP (and CVP)
A Stochastic Lagrangian particle system for the Navier-Stokes equations
On the sharp lower bounds of Zagreb indices of graphs with given number of cut vertices
Estimating the epidemic risk using non-uniformly sampled contact data
A relation between conditional entropy and conditional expectation to evaluate secrecy systems
Decentralized and Recursive Identification for Cooperative Manipulation of Unknown Rigid Body with Local Measurements
Time-Indexed Relaxations for the Online Bipartite Matching Problem
Optimizing for Measure of Performance in Max-Margin Parsing
Model-Based Control Using Koopman Operators
Opening the Black Box of Financial AI with CLEAR-Trade: A CLass-Enhanced Attentive Response Approach for Explaining and Visualizing Deep Learning-Driven Stock Market Prediction
The rank function of a positroid and non-crossing partitions
Shaping and Trimming Branch-and-bound Trees
Robust Semi-Cooperative Multi-Agent Coordination in the Presence of Stochastic Disturbances
An active-learning algorithm that combines sparse polynomial chaos expansions and bootstrap for structural reliability analysis
On the Triangle Clique Cover and $K_t$ Clique Cover Problems
Evaluating Partisan Gerrymandering in Wisconsin
Deep Ordinal Ranking for Multi-Category Diagnosis of Alzheimer’s Disease using Hippocampal MRI data
Covers of Query Results
Dynamic Multiscale Tree Learning Using Ensemble Strong Classifiers for Multi-label Segmentation of Medical Images with Lesions
The Unintended Consequences of Overfitting: Training Data Inference Attacks
Factoring in the Chicken McNugget monoid
A second order primal-dual method for nonsmooth convex composite optimization
Machine Learning and Social Robotics for Detecting Early Signs of Dementia
Crank-Nicolson scheme for stochastic differential equations driven by fractional Brownian motions
PageNet: Page Boundary Extraction in Historical Handwritten Documents
A Comparative Study of 2D Numerical Methods with GPU Computing
Deep Learning Techniques for Music Generation – A Survey
Unified Formulations for Combined-Cycle Units
Exploring and Exploiting Diversity for Image Segmentation
Antenna Selection in MIMO Cognitive Radio-Inspired NOMA Systems
Using Cross-Model EgoSupervision to Learn Cooperative Basketball Intention
Mean-field theory of Bayesian clustering
The Voynich Manuscript is Written in Natural Language: The Pahlavi Hypothesis
Learning to Compose Domain-Specific Transformations for Data Augmentation
Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records
Conjunctive management of surface and groundwater under severe drought: A case study in southern Iran
Deep Convolutional Neural Network for Age Estimation based on VGG-Face Model
An accelerated proximal iterative hard thresholding method for $\ell_0$ minimization
Parameterized complexity of machine scheduling: 15 open problems
Throughput Optimal Decentralized Scheduling of Multi-Hop Networks with End-to-End Deadline Constraints: II Wireless Networks with Interference
Probabilistic Rule Realization and Selection
A Neural Language Model for Dynamically Representing the Meanings of Unknown Words and Entities in a Discourse
On the Relationship between Ideal Cluster Points and Ideal Limit Points
BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction Mention Extraction
Group-level Emotion Recognition using Transfer Learning from Face Identification
Heavy tail and light tail of Cox-Ingersoll-Ross processes with regime-switching
A Compact Kernel Approximation for 3D Action Recognition
User Assignment with Distributed Large Intelligent Surface (LIS) Systems
Graphical criteria for positive solutions to linear systems
Wireless Networks for Mobile Edge Computing: Spatial Modeling and Latency Analysis (Extended version)
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
On the approximation of the boundary layers for the controllability problem of nonlinear singularly perturbed systems
Active Sampling for Large-scale Information Retrieval Evaluation
Blind image deblurring using class-adapted image priors
Information Theory and the Length Distribution of all Discrete Systems
Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art
Optimal Sub-sampling with Influence Functions
Temporal Pattern Discovery for Accurate Sepsis Diagnosis in ICU Patients
Detecting animals in African Savanna with UAVs and the crowds
Scene Text Recognition with Sliding Convolutional Character Models
Information-theoretic analysis of the directional influence between cellular processes
Concurrence Topology of Some Cancer Genomics Data
Energy-aware Mode Selection for Throughput Maximization in RF-Powered D2D Communications
Compressive Sensing Techniques for Next-Generation Wireless Communications
Proceedings Eighth International Symposium on Games, Automata, Logics and Formal Verification
Information-Propogation-Enhanced Neural Machine Translation by Relation Model
Some Sufficient Conditions for Finding a Nesting of the Normalized Matching Posets of Rank 3
On-the-fly Historical Handwritten Text Annotation
Parameterizations for Ensemble Kalman Inversion
Automatic Document Image Binarization
Cross-Domain Image Retrieval with Attention Modeling
Radial Line Fourier Descriptor for Handwritten Word Representation
Quantum Advantage from Conjugated Clifford Circuits
CNN-Based Projected Gradient Descent for Consistent Image Reconstruction
Towards Automated Cadastral Boundary Delineation from UAV Data
Generalized twisted centralizer codes
Soft Proposal Networks for Weakly Supervised Object Localization
An inner-loop free solution to inverse problems using deep neural networks
Symmetric Variational Autoencoder and Connections to Adversarial Learning
Depression and Self-Harm Risk Assessment in Online Forums
The low-rank hurdle model
Invariant, super and quasi-martingale functions of a Markov process
Optimal Number of Transmit Antennas for Secrecy Enhancement in Massive MIMOME Channels
Clustering of Data with Missing Entries using Non-convex Fusion Penalties
Synthetic Medical Images from Dual Generative Adversarial Networks
Progress-Space Tradeoffs in Single-Writer Memory Implementations
Input-to State Stability with Respect to Boundary Disturbances for a Class of Semi-linear Parabolic Equations
From subKautz digraphs to cyclic Kautz digraphs
A multiplicative coalescent with asynchronous multiple mergers
Localisation, Communication and Networking with VLC: Challenges and Opportunities
Measuring the Similarity of Sentential Arguments in Dialog
Nonzero-sum games of optimal stopping and generalised Nash equilibrium
Towards Neural Machine Translation with Latent Tree Attention
An Influence-Receptivity Model for Topic based Information Cascades