Learning how to Active Learn: A Deep Reinforcement Learning Approach

Active learning aims to select a small subset of data for annotation such that a classifier learned on the data is highly accurate. This is usually done using heuristic selection methods, however the effectiveness of such methods is limited and moreover, the performance of heuristics varies between datasets. To address these shortcomings, we introduce a novel formulation by reframing the active learning as a reinforcement learning problem and explicitly learning a data selection policy, where the policy takes the role of the active learning heuristic. Importantly, our method allows the selection policy learned using simulation on one language to be transferred to other languages. We demonstrate our method using cross-lingual named entity recognition, observing uniform improvements over traditional active learning.

Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods.

Prune the Convolutional Neural Networks with Sparse Shrink

Nowadays, it is still difficult to adapt Convolutional Neural Network (CNN) based models for deployment on embedded devices. The heavy computation and large memory footprint of CNN models become the main burden in real application. In this paper, we propose a ‘Sparse Shrink’ algorithm to prune an existing CNN model. By analyzing the importance of each channel via sparse reconstruction, the algorithm is able to prune redundant feature maps accordingly. The resulting pruned model thus directly saves computational resource. We have evaluated our algorithm on CIFAR-100. As shown in our experiments, we can reduce 56.77% parameters and 73.84% multiplication in total with only minor decrease in accuracy. These results have demonstrated the effectiveness of our ‘Sparse Shrink’ algorithm.

An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.

Data-driven modelling and validation of aircraft inbound-stream at some major European airports

This paper presents an exhaustive study on the arrivals process at eight important European airports. Using inbound traffic data, we define, compare, and contrast a data-driven Poisson and PSRA point process. Although, there is sufficient evidence that the interarrivals might follow an exponential distribution, this finding does not directly translate to evidence that the arrivals stream is Poisson. The main reason is that finite-capacity constraints impose a correlation structure to the arrivals stream, which a Poisson model cannot capture. We show the weaknesses and somehow the difficulties of using a Poisson process to model with good approximation the arrivals stream. On the other hand, our innovative non-parametric, data-driven PSRA model, predicts quite well and captures important properties of the typical arrivals stream.

Learning non-parametric Markov networks with mutual information

We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching. Most of the traditional textual-visual binary encoding methods only consider holistic image representations and fail to model descriptive sentences. This renders existing methods inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To address the problem of hashing cross-modal data with semantic-rich cues, in this paper, a novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB). In particular, region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text convolutional network. Additionally, we propose a stochastic batch-wise training routine, where high-quality binary codes and deep encoding functions are efficiently optimized in an alternating manner. Experiments are conducted on three multimedia datasets, i.e. Microsoft COCO, IAPR TC-12, and INRIA Web Queries, where the proposed TVDB model significantly outperforms state-of-the-art binary coding methods in the task of cross-modal retrieval.

Stochastic Optimization with Bandit Sampling

Many stochastic optimization algorithms work by estimating the gradient of the cost function on the fly by sampling datapoints uniformly at random from a training set. However, the estimator might have a large variance, which inadvertently slows down the convergence rate of the algorithms. One way to reduce this variance is to sample the datapoints from a carefully selected non-uniform distribution. %, which then need to be determined, and is a challenging task. Previous work minimizes an upper bound of the variance, but the gap between this upper bound and the optimal variance may remain large. In this work, we propose a novel non-uniform sampling approach that uses the multi-armed bandit framework. Theoretically, we show that our algorithm asymptotically approximates the optimal variance within a factor of 3. Empirically, we show that using this datapoint-selection technique results in a significant reduction of the convergence time and variance of several stochastic optimization algorithms such as SGD and SAGA. This approach for sampling datapoints is general, and can be used in a conjunction with \emph{any} algorithm that uses an unbiased gradient estimation — we expect it to have broad applicability beyond the specific examples explored in this work.

Multi-Generator Gernerative Adversarial Nets

We propose in this paper a novel approach to address the mode collapse problem in Generative Adversarial Nets (GANs) by training many generators. The training procedure is formulated as a minimax game among many generators, a classifier, and a discriminator. Generators produce data to fool the discriminator while staying within the decision boundary defined by the classifier as much as possible; classifier estimates the probability that a sample came from each of the generators; and discriminator estimates the probability that a sample came from the training data rather than from all generators. We develop theoretical analysis to show that at equilibrium of this system, the Jensen-Shannon divergence between the equally weighted mixture of all generators’ distributions and the real data distribution is minimal while the Jensen-Shannon divergence among generators’ distributions is maximal. Generators can be trained efficiently by utilizing parameter sharing, thus adding minimal cost to the basic GAN model. We conduct extensive experiments on synthetic and real-world large scale data sets (CIFAR-10 and STL-10) to evaluate the effectiveness of our proposed method. Experimental results demonstrate the superior performance of our approach in generating diverse and visually appealing samples over the latest state-of-the-art GAN’s variants.

Asking Too Much? The Rhetorical Role of Questions in Political Discourse
Generative Statistical Models with Self-Emergent Grammar of Chord Sequences
On the clique number of the square of a line graph and its relation to Ore-degree
Analyzing Boltzmann Samplers for Bose-Einstein Condensates with Dirichlet Generating Functions
ISS-MULT: Intelligent Sample Selection for Multi-Task Learning in Question Answering
Corpus-level Fine-grained Entity Typing
Parallelizing Over Artificial Neural Network Training Runs with Multigrid
Low-Dimensionality of Noise-Free RSS and its Application in Distributed Massive MIMO
Nodal Statistics of Planar Random Waves
The generalized distance spectrum of a graph and applications
Percolation thresholds and fractal dimensions for square and cubic lattices with long-range correlated defects
Reinforced Video Captioning with Entailment Rewards
Non-Archimedean pseudodifferential operators with variable coefficients and Feller Semigroups
Complete Minors of Self-Complementary Graphs
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images
Multibiometric Secure System Based on Deep Learning
Forbidden subgraphs for graphs of bounded spectral radius, with applications to equiangular lines
An Approximate ML Detector for MIMO Channels Corrupted by Phase Noise
Odd Multiway Cut in Directed Acyclic Graphs
Analyzing Controllability of Bilinear Systems on Symmetric Groups: Mapping Lie Brackets to Permutations
Representation of asymptotic values for nonexpansive stochastic control systems
Unconstrained Face Detection and Open-Set Face Recognition Challenge
Fast Approximate Data Assimilation for High-Dimensional Problems
Temporal Context Network for Activity Localization in Videos
Towards A Novel Unified Framework for Developing Formal, Network and Validated Agent-Based Simulation Models of Complex Adaptive Systems
Verification & Validation of Agent Based Simulations using the VOMAS (Virtual Overlay Multi-agent System) approach
Beyond the technical challenges for deploying Machine Learning solutions in a software company
Derivative-Based Optimization with a Non-Smooth Simulated Criterion
Clustered Colouring in Minor-Closed Classes
Successive Quadratic Upper-Bounding for Discrete Mean-Risk Minimization and Network Interdiction
Local connectivity modulates multi-scale relaxation dynamics in a metallic glass-forming system
Structural patterns of information cascades and their implications for dynamics and semantics
Investigating Reinforcement Learning Agents for Continuous State Space Environments
Learning a Repression Network for Precise Vehicle Search
Simultaneous Nash Equilibrium Seeking and Social Cost Minimization in Graphical $N$-coalition Non-cooperative Games
Extending Bayesian structural time-series estimates of causal impact to many-household conservation initiatives
Harmonic Index of Total Graphs of Some Graphs
The bidirectional ballot polytope
Nonparametric Poisson regression from independent and weakly dependent observations by model selection
Robust Conditional Probabilities
First-passage time asymptotics over moving boundaries for random walk bridges
Strong geodetic problem on Cartesian products of graphs
Strong geodetic number of complete bipartite graphs and of graphs with specified diameter
Evidence from web-based dietary search patterns to the role of B12 deficiency in chronic pain
Towards a Concurrent and Distributed Route Selection for Payment Channel Networks
Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN
FoveaNet: Perspective-aware Urban Scene Parsing
Large Cayley graphs of small diameter
An information-theoretic approach for selecting arms in clinical trials
An improvement of Tukey’s HSD with application to ranking institutions
Subsets of posets minimising the number of chains
Ordered multiplicity inverse eigenvalue problem for graphs on six vertices
Cycle reversions and dichromatic number in tournaments
Scheduling and Power Control for V2V Broadcast Communications with Adjacent Channel Interference
Hierarchical space-time modeling of exceedances with an application to rainfall data
Fast Low-Rank Bayesian Matrix Completion with Hierarchical Gaussian Prior Models
Replica Bounds by Combinatorial Interpolation for Diluted Spin Systems
Weakly Supervised Image Annotation and Segmentation with Objects and Attributes
Optimal control of a Vlasov-Poisson plasma by an external magnetic field – The basics for variational calculus
Multiscale Strategies for Computing Optimal Transport
Cramér’s Estimate for the Reflected Process Revisited
Flexible Multiple Base Station Association and Activation for Downlink Heterogeneous Networks
An Unsupervised Game-Theoretic Approach to Saliency Detection
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
A simple permutoassociahedron
Impact of Mobility-on-Demand on Traffic Congestion: Simulation-based Study
Recovering Covariance from Functional Fragments
Nonlinear cross-spectrum analysis via the local Gaussian correlation
Unified View on Lévy White Noises: General Integrability Conditions and Applications to Linear SPDE
Covert Communication with Channel-State Information at the Transmitter
Chernoff approximation for semigroups generated by killed Feller processes and Feynman formulae for time-fractional Fokker-Planck-Kolmogorov equations
Exact Boundary Controllability for the Boussinesq Equation with Variable Coefficients
Chance-Constrained Combinatorial Optimization with a Probability Oracle and Its Application to Probabilistic Set Covering
Random walks in the hyperbolic plane and the question mark function
Adversarial Divergences are Good Task Losses for Generative Modeling
Equivalence of weak and strong modes of measures on topological vector spaces
Asymptotics for Hankel determinants associated to a Hermite weight with a varying discontinuity
Stein’s method for multivariate Brownian approximations of sums under dependence
Critical threshold for ancestral reconstruction by maximum parsimony on general phylogenies
Impossibility of $n-1$-strong-equllibrium for Distributed Consensus with Rational Agents
Fast Scene Understanding for Autonomous Driving
Semantic Instance Segmentation with a Discriminative Loss Function
Self-Correcting Variable-Metric Algorithms for Nonsmooth Optimization
Robust Computer Algebra, Theorem Proving, and Oracle AI
Neural-based Context Representation Learning for Dialog Act Classification
High Dimensional Inference in Partially Linear Models
Dual Ore’s theorem on distributive intervals of finite groups
Decomposition spaces and restriction species
The canonical join complex for biclosed sets
Belief Propagation, Bethe Approximation and Polynomials
Cascade Adversarial Machine Learning Regularized with a Unified Embedding