Boosted Generative Models

We propose a new approach for using unsupervised boosting to create an ensemble of generative models, where models are trained in sequence to correct earlier mistakes. Our meta-algorithmic framework can leverage any existing base learner that permits likelihood evaluation, including recent latent variable models. Further, our approach allows the ensemble to include discriminative models trained to distinguish real data from model-generated data. We show theoretical conditions under which incorporating a new model in the ensemble will improve the fit and empirically demonstrate the effectiveness of boosting on density estimation and sample generation on synthetic and benchmark real datasets.

Active Learning Using Uncertainty Information

Many active learning methods belong to the retraining-based approaches, which select one unlabeled instance, add it to the training set with its possible labels, retrain the classification model, and evaluate the criteria that we base our selection on. However, since the true label of the selected instance is unknown, these methods resort to calculating the average-case or worse-case performance with respect to the unknown label. In this paper, we propose a different method to solve this problem. In particular, our method aims to make use of the uncertainty information to enhance the performance of retraining-based models. We apply our method to two state-of-the-art algorithms and carry out extensive experiments on a wide variety of real-world datasets. The results clearly demonstrate the effectiveness of the proposed method and indicate it can reduce human labeling efforts in many real-life applications.

Multimodal Clustering for Community Detection

Multimodal clustering is an unsupervised technique for mining interesting patterns in n-adic binary relations or n-mode networks. Among different types of such generalized patterns one can find biclusters and formal concepts (maximal bicliques) for 2-mode case, triclusters and triconcepts for 3-mode case, closed n-sets for n-mode case, etc. Object-attribute biclustering (OA-biclustering) for mining large binary datatables (formal contexts or 2-mode networks) arose by the end of the last decade due to intractability of computation problems related to formal concepts; this type of patterns was proposed as a meaningful and scalable approximation of formal concepts. In this paper, our aim is to present recent advance in OA-biclustering and its extensions to mining multi-mode communities in SNA setting. We also discuss connection between clustering coefficients known in SNA community for 1-mode and 2-mode networks and OA-bicluster density, the main quality measure of an OA-bicluster. Our experiments with 2-, 3-, and 4-mode large real-world networks show that this type of patterns is suitable for community detection in multi-mode cases within reasonable time even though the number of corresponding n-cliques is still unknown due to computation difficulties. An interpretation of OA-biclusters for 1-mode networks is provided as well.

Improving Machine Learning Ability with Fine-Tuning

Item Response Theory (IRT) allows for measuring ability of Machine Learning models as compared to a human population. However, it is difficult to create a large dataset to train the ability of deep neural network models (DNNs). We propose fine-tuning as a new training process, where a model pre-trained on a large dataset is fine-tuned with a small supplemental training set. Our results show that fine-tuning can improve the ability of a state-of-the-art DNN model for Recognizing Textual Entailment tasks.

Learning What Data to Learn

Machine learning is essentially the sciences of playing with data. An adaptive data selection strategy, enabling to dynamically choose different data at various training stages, can reach a more effective model in a more efficient way. In this paper, we propose a deep reinforcement learning framework, which we call \emph{\textbf{N}eural \textbf{D}ata \textbf{F}ilter} (\textbf{NDF}), to explore automatic and adaptive data selection in the training process. In particular, NDF takes advantage of a deep neural network to adaptively select and filter important data instances from a sequential stream of training data, such that the future accumulative reward (e.g., the convergence speed) is maximized. In contrast to previous studies in data selection that is mainly based on heuristic strategies, NDF is quite generic and thus can be widely suitable for many machine learning tasks. Taking neural network training with stochastic gradient descent (SGD) as an example, comprehensive experiments with respect to various neural network modeling (e.g., multi-layer perceptron networks, convolutional neural networks and recurrent neural networks) and several applications (e.g., image classification and text understanding) demonstrate that NDF powered SGD can achieve comparable accuracy with standard SGD process by using less data and fewer iterations.

Deep Clustering using Auto-Clustering Output Layer

In this paper, we propose a novel method to enrich the representation provided to the output layer of feedforward neural networks in the form of an auto-clustering output layer (ACOL) which enables the network to naturally create sub-clusters under the provided main class la- bels. In addition, a novel regularization term is introduced which allows ACOL to encourage the neural network to reveal its own explicit clustering objective. While the underlying process of finding the subclasses is completely unsupervised, semi-supervised learning is also possible based on the provided classification objective. The results show that ACOL can achieve a 99.2% clustering accuracy for the semi-supervised case when partial class labels are presented and a 96% accuracy for the unsupervised clustering case. These findings represent a paradigm shift especially when it comes to harnessing the power of deep networks for primary and secondary clustering applications in large datasets.

Improving the Neural GPU Architecture for Algorithm Learning

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a technique of general applicability to use hard nonlinearities with saturation cost. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.

Deep Forest: Towards An Alternative to Deep Neural Networks

In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks. In contrast to deep neural networks which require great effort in hyper-parameter tuning, gcForest is much easier to train. Actually, even when gcForest is applied to different data from different domains, excellent performance can be achieved by almost same settings of hyper-parameters. The training process of gcForest is efficient and scalable. In our experiments its training time running on a PC is comparable to that of deep neural networks running with GPU facilities, and the efficiency advantage may be more apparent because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require large-scale training data, gcForest can work well even when there are only small-scale training data. Moreover, as a tree-based approach, gcForest should be easier for theoretical analysis than deep neural networks.

Distributionally Robust Semi-supervised Learning

We propose a novel method for semi-supervised learning based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the non-labeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of the DRO formulation by proposing a stochastic gradient descent algorithm which allows to easily implement the training procedure. We demonstrate the improvement in generalization error in semi-supervised extensions of regularized logistic regression and square-root LASSO. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in semi-supervised learning.

Deep and Hierarchical Implicit Models

Implicit probabilistic models are a very flexible class for modeling data. They define a process to simulate observations, and unlike traditional models, they do not require a tractable likelihood function. In this paper, we develop two families of models: hierarchical implicit models and deep implicit models. They combine the idea of implicit densities with hierarchical Bayesian modeling and deep neural networks. The use of implicit models with Bayesian analysis has in general been limited by our ability to perform accurate and scalable inference. We develop a variational inference algorithm for implicit models. Key to our method is specifying a variational family that is also implicit. This matches the model’s flexibility and allows for accurate approximation of the posterior. Our method scales up implicit models to sizes previously not possible and opens the door to new modeling designs. We demonstrate diverse applications: a large-scale physical simulator for predator-prey populations in ecology; a Bayesian generative adversarial network for discrete data; and a deep implicit model for text generation.

Complex Networks: from Classical to Quantum

The Infinite Server Problem

Strong Chain Rules for Min-Entropy under Few Bits Spoiled

Memory-Efficient Global Refinement of Decision-Tree Ensembles and its Application to Face Alignment

The computational landscape of general physical theories

Depth Separation for Neural Networks

Multi-UAV Routing for Persistent Intelligence Surveillance & Reconnaissance Missions

Don’t Fear the Reaper: Refuting Bostrom’s Superintelligence Argument

Bayesian nonparametric generative models for causal inference with missing at random covariates

Formal Synthesis of Control Strategies for Positive Monotone Systems

Understanding Convolution for Semantic Segmentation

SGD Learns the Conjugate Kernel Class of the Network

Local Synchronization of Sampled-Data Systems on Lie Groups

Multi-agent systems and decentralized artificial superintelligence

Semi-parametric Network Structure Discovery Models

Practical issues in decoy-state quantum key distribution based on the central limit theorem

Competing Bandits: Learning under Competition

Image Analysis Using a Dual-Tree $M$-Band Wavelet Transform

Fast Threshold Tests for Detecting Discrimination

Non-Concave Network Utility Maximization in Connectionless Networks: A Fully Distributed Traffic Allocation Algorithm

Synergistic Computation of Planar Maxima and Convex Hull

Optimal rates of estimation for multi-reference alignment

A multi-strategy optimizer for arbitrary generic functions in multidimensional space

An algorithm for minimization of arbitrary generic functions in one dimension over a finite domain

Diameter-Based Active Learning

DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition

Estimating the reproductive number, total outbreak size, and reporting rates for Zika epidemics in South and Central America

Nearly Maximally Predictive Features and Their Dimensions

Optimal Experiment Design for Causal Discovery from Fixed Number of Experiments

eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys

Comparison of Confidence Interval Estimators: an Index Approach

Millimeter Wave Beam-Selection Using Out-of-Band Spatial Information

Learning Latent Networks in Vector Auto Regressive Models

Maximum Size of a Family of Pairwise Graph-Different Permutations

Depth Creates No Bad Local Minima

Private and Secure Coordination of Match-Making for Heavy-Duty Vehicle Platooning

Model-based reinforcement learning in differential graphical games

Can Boltzmann Machines Discover Cluster Updates ?

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

SPDE Limits for the age structure of a population

Enabling Sparse Winograd Convolution by Native Pruning

Market-Driven Energy Storage Planning for Microgrids with Renewable Energy Systems Using Stochastic Programming

Accurate, Scalable and Parallel Structure from Motion

The Active Atlas: Combining 3D Anatomical Models with Texture Detectors

A Roadmap for a Rigorous Science of Interpretability

Bridging Finite and Super Population Causal Inference

An efficient approach to suppress the negative role of contrarian oscillators in synchronization

The Bressoud-Göllnitz-Gordon Theorem for Overpartitions of even moduli

Process Progress Estimation and Phase Detection

Fluctuation of Dynamical Robustness in a Networked Oscillators System

Show, Attend and Interact: Perceivable Human-Robot Social Interaction through Neural Attention Q-Network

An Inexact Proximal Alternating Direction Method for Non-convex and Non-smooth Matrix Factorization and Beyond

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Nontrivial standing wave state in frequency-weighted Kuramoto model

Critical behaviour in two-dimensional Coulomb Glass at zero temperature

Super-Trajectory for Video Segmentation

Single-lead f-wave extraction using diffusion geometry

Selective Video Cutout using Global Pyramid Models and Local Uncertainty Propagation

Joint Spatio-Temporal Boundary Detection and Boundary Flow Prediction with a Fully Convolutional Siamese Network

Foundations of gauge and perspective duality

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations

Scene Flow to Action Map: A New Representation for RGB-D based Action Recognition with Convolutional Neural Networks

Scaffolding Networks for Teaching and Learning to Comprehend

Towards Deeper Understanding of Variational Autoencoding Models

Complexity of short generating functions

Sampled-Data Boundary Feedback Control of 1-D Hyperbolic PDEs with Non-Local Terms

The computational complexity of integer programming with alternations

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

3D Shape Segmentation via Shape Fully Convolutional Networks

MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information

Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

Cascade one-vs-rest detection network for fine-grained recognition without part annotations

Significant Pattern Mining on Continuous Variables

Moments of continuous-state branching processes with or without immigration

II-FCN for skin lesion analysis towards melanoma detection

The arctangent law for a certain random time related to a one-dimensional diffusion

Learning rates for classification with Gaussian kernels

Widely-Linear Precoding for Large-Scale MIMO with IQI: Algorithms and Performance Analysis

Optimal algorithms for smooth and strongly convex distributed optimization in networks

Stability of Synchrony against Local Intermittent Fluctuations in Tree-like Power Grids

Algorithmic stability and hypothesis complexity

An Extensive Technique to Detect and Analyze Melanoma: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2017

Learning Discrete Representations via Information Maximizing Self Augmented Training

Solving Boundary Value Problem for a Nonlinear Stationary Controllable System with Synthesizing Control

Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

Bayesian Verification under Model Uncertainty

Stacked Thompson Bandits

Billion-scale similarity search with GPUs

On relaxed stochastic optimal control for stochastic differential equations driven by G-Brownian motion

Analysing Congestion Problems in Multi-agent Reinforcement Learning

Efficient simulation of high dimensional Gaussian vectors

On the reconstruction of polytopes

Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm

Extension complexity of stable set polytopes of bipartite graphs

Optimal Categorical Attribute Transformation for Granularity Change in Relational Databases for Binary Decision Problems in Educational Data Mining

Multi-scale Lipschitz percolation of increasing events for Poisson random walks

A Globally Linearly Convergent Method for Pointwise Quadratically Supportable Convex-Concave Saddle Point Problems

Finite-size-induced transitions to synchrony in oscillator ensembles with nonlinear global coupling

MILD: Multi-Index hashing for Loop closure Detection

General Bayesian inference schemes in infinite mixture models

ShaResNet: reducing residual network parameter number by sharing weights

NOMA Meets Finite Resolution Analog Beamforming in Massive MIMO and Millimeter-Wave Networks

Compound Poisson approximation to estimate the Lévy density

Nash and Wardrop equilibria in aggregative games with coupling constraints

Robust Budget Allocation via Continuous Submodular Functions

Lowest Unique Bid Auctions with Resubmission Opportunities

Fused Gaussian Process for Very Large Spatial Data

Unsupervised Triplet Hashing for Fast Image Retrieval

Jamming-Resistant Receivers for the Massive MIMO Uplink

Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning

Privacy-enhancing Aggregation of Internet of Things Data via Sensors Grouping

Computing non-stationary $(s, S)$ policies using mixed integer linear programming

The Complexity of Translationally-Invariant Low-Dimensional Spin Lattices in 3D

Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees

Efficient Learning for Crowdsourced Regression

Reduced Modeling of Unknown System Trajectories

Multi-Sensor Multi-object Tracking with the Generalized Labeled Multi-Bernoulli Filter

Proportional Representation in Vote Streams

Deep Semi-Random Features for Nonlinear Function Approximation

Low-rank Label Propagation for Semi-supervised Learning with 100 Millions Samples

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Predicting Slice-to-Volume Transformation in Presence of Arbitrary Subject Motion

Bridging the Gap Between Value and Policy Based Reinforcement Learning

Minimax density estimation for growing dimension

Semiparametric Estimation of Symmetric Mixture Models with Monotone and Log-Concave Densities

Lipschitz Optimisation for Lipschitz Interpolation

Binary Search in Graphs Revisited

Asymptotic Exponentiality of the First Exit Time of the Shiryaev-Roberts Diffusion with Constant Positive Drift

Power-law out of time order correlation functions in the SYK model

Defective Coloring on Classes of Perfect Graphs

On the energy landscape of spherical spin glasses

Eulerian idempotent, pre-Lie logarithm and combinatorics of trees