Progressive Boosting for Class Imbalance

Pattern recognition applications often suffer from skewed data distributions between classes, which may vary during operations w.r.t. the design data. Two-class classification systems designed using skewed data tend to recognize the majority class better than the minority class of interest. Several data-level techniques have been proposed to alleviate this issue by up-sampling minority samples or under-sampling majority samples. However, some informative samples may be neglected by random under-sampling and adding synthetic positive samples through up-sampling adds to training complexity. In this paper, a new ensemble learning algorithm called Progressive Boosting (PBoost) is proposed that progressively inserts uncorrelated groups of samples into a Boosting procedure to avoid loss of information while generating a diverse pool of classifiers. Base classifiers in this ensemble are generated from one iteration to the next, using subsets from a validation set that grows gradually in size and imbalance. Consequently, PBoost is more robust to unknown and variable levels of skew in operational data, and has lower computation complexity than Boosting ensembles in literature. In PBoost, a new loss factor is proposed to avoid bias of performance towards the negative class. Using this loss factor, the weight update of samples and classifier contribution in final predictions are set based on the ability to recognize both classes. Using the proposed loss factor instead of standard accuracy can avoid biasing performance in any Boosting ensemble. The proposed approach was validated and compared using synthetic data, videos from the FIA dataset that emulates face re-identification applications, and KEEL collection of datasets. Results show that PBoost can outperform state of the art techniques in terms of both accuracy and complexity over different levels of imbalance and overlap between classes.

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce ‘safe’ and generic responses (‘I don’t know’, ‘I can’t tell’). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it can not be deployed to have real conversations with users. Our work aims to achieve the best of both worlds — the practical usefulness of G and the strong performance of D — via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution — specifically, a RNN augmented with a sequence of GS samplers, coupled with the straight-through gradient estimator to enable end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10).

Embedding Feature Selection for Large-scale Hierarchical Classification

Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://…/featureselection.

Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

Multi-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classifying documents that are archived within dual concept hierarchies, namely, DMOZ and Wikipedia. We solve the multi-class classification problem by defining one-versus-rest binary classification tasks for each of the different classes across the two hierarchical datasets. Instead of learning a linear discriminant for each of the different tasks independently, we use a MTL approach with relationships between the different tasks across the datasets established using the non-parametric, lazy, nearest neighbor approach. We also develop and evaluate a transfer learning (TL) approach and compare the MTL (and TL) methods against the standard single task learning and semi-supervised learning approaches. Our empirical results demonstrate the strength of our developed methods that show an improvement especially when there are fewer number of training examples per classification task.

Text Summarization using Abstract Meaning Representation

Summarization of large texts is still an open problem in language processing. In this work we develop a full fledged pipeline to generate summaries of news articles using the Abstract Meaning Representation(AMR). We first generate the AMR graphs of stories then extract summary graphs from the story graphs and finally generate sentences from the summary graph. For extracting summary AMRs from the story AMRs we use a two step process. First, we find important sentences from the text and then extract the summary AMRs from those selected sentences. We outperform the previous methods using AMR for summarization by more that 3 ROGUE-1 points. On the CNN-Dailymail corpus we achieve results competitive with the strong lead-3 baseline till summary graph extraction step.

A General-Purpose Tagger with Convolutional Neural Networks

We present a general-purpose tagger based on convolutional neural networks (CNN), used for both composing word vectors and encoding context information. The CNN tagger is robust across different tagging tasks: without task-specific tuning of hyper-parameters, it achieves state-of-the-art results in part-of-speech tagging, morphological tagging and supertagging. The CNN tagger is also robust against the out-of-vocabulary problem, it performs well on artificially unnormalized texts.

Label-Dependencies Aware Recurrent Neural Networks

In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model \emph{sequence labeling} is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are re-injected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.

Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation

With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, \textit{Adversarial-Playground}, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods \footnote{The source code along with the data from all of our experiments are available at \url{https://…/AdversarialDNN-Playground}.

Deep Alignment Network: A convolutional neural network for robust face alignment

In this paper, we propose Deep Alignment Network (DAN), a robust face alignment method based on a deep neural network architecture. DAN consists of multiple stages, where each stage improves the locations of the facial landmarks estimated by the previous stage. Our method uses entire face images at all stages, contrary to the recently proposed face alignment methods that rely on local patches. This is possible thanks to the use of landmark heatmaps which provide visual information about landmark locations estimated at the previous stages of the algorithm. The use of entire face images rather than patches allows DAN to handle face images with large variation in head pose and difficult initializations. An extensive evaluation on two publicly available datasets shows that DAN reduces the state-of-the-art failure rate by up to 70%. Our method has also been submitted for evaluation as part of the Menpo challenge.

Assessing the Linguistic Productivity of Unsupervised Deep Neural Networks

Increasingly, cognitive scientists have demonstrated interest in applying tools from deep learning. One use for deep learning is in language acquisition where it is useful to know if a linguistic phenomenon can be learned through domain-general means. To assess whether unsupervised deep learning is appropriate, we first pose a smaller question: Can unsupervised neural networks apply linguistic rules productively, using them in novel situations? We draw from the literature on determiner/noun productivity by training an unsupervised, autoencoder network measuring its ability to combine nouns with determiners. Our simple autoencoder creates combinations it has not previously encountered and produces a degree of overlap matching adults. While this preliminary work does not provide conclusive evidence for productivity, it warrants further investigation with more complex models. Further, this work helps lay the foundations for future collaboration between the deep learning and cognitive science communities.

Heat content and horizontal mean curvature on the Heisenberg group
An optimal $(ε,δ)$-approximation scheme for the mean of random variables with bounded relative variance
Visual attention models for scene text recognition
Random Flag Complexes and Asymptotic Syzygies
3D spherical-cap fitting procedure for (truncated) sessile nano- and micro-droplets & -bubbles
Stochastic Gradient Monomial Gamma Sampler
UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles
Density Deconvolution for Generalized Skew-Symmetric Distributions
Time-dependent shortest paths in bounded treewidth graphs
Facial Emotion Detection Using Convolutional Neural Networks and Representational Autoencoder Units
Emergent Network Modularity
The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo
Geometric Multi-Model Fitting with a Convex Relaxation Algorithm
Deep learning for extracting protein-protein interactions from biomedical literature
The Prolific Proportion of Permutations
Controller-jammer game models of Denial of Service in control systems operating over packet-dropping links
Dynamic Bayesian Multitaper Spectral Analysis
Random Search for Hyperparameters using Determinantal Point Processes
Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
Pascal Eigenspaces and Invariant Sequences of the First or Second Kind
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
Distributed Active State Estimation with User-Specified Accuracy
Dual representations of Laplace transforms of Brownian excursion and generalized meanders
Progressions and Paths in Colorings of $\mathbb Z$
Global-Local Airborne Mapping (GLAM): Reconstructing a City from Aerial Videos
Preserving Privacy of Finite Impulse Response Systems
Stochastic continuity of random fields governed by a system of stochastic PDEs
Fishnet Statistics for Strength Scaling of Nacreous Imbricated Lamellar Materials
Dynamical patterns in individual trajectories toward extremism
Sample-Efficient Learning of Mixtures
Defective 3-Paintability of Planar Graphs
Hyperplane Clustering Via Dual Principal Component Pursuit
DeepKey: An EEG and Gait Based Dual-Authentication System
An Upper Bound of 7n/6 for the Minimum Size 2EC on Cubic 3-Edge Connected Graphs
Robust and efficient validation of the linear hexahedral element
Profit Maximization for Online Advertising Demand-Side Platforms
Equilateral $p$-gons in $\mathbb R^d$ and deformed spheres and mod $p$ Fadell-Husseini index
Approximation Algorithms for Minimizing Maximum Sensor Movement for Line Barrier Coverage in the Plane
Understanding Betting Strategy
Block gluing intensity of bidimensional SFT: computability of the entropy and periodic points
Optimal Attack against Cyber-Physical Control Systems with Reactive Attack Mitigation
Markov Chain Monte Carlo Methods for Bayesian Data Analysis in Astronomy
Group Sparse Precoding for Cloud-RAN with Multiple User Antennas
Retrosynthetic reaction prediction using neural sequence-to-sequence models
Volume Calculation of CT lung Lesions based on Halton Low-discrepancy Sequences
A Minimal Solution for Two-view Focal-length Estimation using Two Affine Correspondences
On the real zeros of random trigonometric polynomials with dependent coefficients
Learning Pairwise Disjoint Simple Languages from Positive Examples
Stochastic Multi-objective Optimization on a Budget: Application to multi-pass wire drawing with quantified uncertainties
Compression Fractures Detection on CT
A method and tool for combining differential or inclusive measurements obtained with simultaneously constrained uncertainties
Some new designs with prescribed automorphism groups
Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization
A Frame Tracking Model for Memory-Enhanced Dialogue Systems
Realization of Biquadratic Impedance as Five-Element Bridge Networks
Binary extremal self-dual codes of length $60$ and related codes
On the Q-linear Convergence of a Majorized Proximal ADMM for Convex Composite Programming and Its Applications to Regularized Logistic Regression
Vertex-disjoint directed cycles of prescribed length in tournaments with given minimum out-degree
Analytical lower bounds for the size of elementary trapping sets of variable-regular LDPC codes with any girth and irregular ones with girth 8
Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC
Clustering Spectrum of hierarchical scale-free networks
Blockchain based trust & authentication for decentralized sensor networks
Martingale decompositions and weak differential subordination in UMD Banach spaces
Minimizing the waiting time for a one-way shuttle service
Performance of DF Incremental Relaying with Energy Harvesting Relays in Underlay CRNs
Lyapunov-based Model Reference Adaptive Controller Design for a Class of Nonlinear Fractional Order Systems
Second Order Step by Step Sliding mode Observer for Fault Estimation in a Class of Nonlinear Fractional Order Systems
Ehrhart tensor polynomials
Precoder Design for Signal Superposition in MIMO-NOMA Multicell Networks
Specifying a positive threshold function via extremal points
Multi-View Kernels for Low-Dimensional Modeling of Seismic Events
Robust approximate Bayesian inference with an application to linear mixed models
Specifying Transaction Control to Serialize Concurrent Program Executions
Joint Fractional Time Allocation and Beamforming for Downlink Multiuser MISO Systems
Sampling-based vs. Design-based Uncertainty in Regression Analysis
5G Radio Access above 6 GHz
Sparse and Constrained Stochastic Predictive Control for Networked Systems
Understanding and Eliminating the Large-kernel Effect in Blind Deconvolution
Hypergraph $F$-designs for arbitrary $F$
SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation
GAN and VAE from an Optimal Transport Point of View
Dividends with random profitability rate
Face Alignment Using K-Cluster Regression Forests With Weighted Splitting
Robust Online Multi-Task Learning with Correlative and Personalized Structures
Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space
Efficient Antihydrogen Detection in Antimatter Physics by Deep Learning
Information Bottleneck in Control Tasks with Recurrent Spiking Neural Networks
Online Adaptive Machine Learning Based Algorithm for Implied Volatility Surface Modeling
Why Condorcet Consistency is Essential
Convergence analysis of the block Gibbs sampler for Bayesian probit linear mixed models
Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
Contraction and uniform convergence of isotonic regression
Added value of morphological features to breast lesion diagnosis in ultrasound
Attributed Network Embedding for Learning in a Dynamic Environment
Director Field Analysis (DFA): Exploring Local White Matter Geometric Structure in diffusion MRI
Marmara Turkish Coreference Corpus and Coreference Resolution Baseline
Shape Parameter Estimation
Disproof of a packing conjecture of Alon and Spencer
StreetStyle: Exploring world-wide clothing styles from millions of photos