Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm

In this work, we utilize a Trust Region based Derivative Free Optimization (DFO-TR) method to directly maximize the Area Under Receiver Operating Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that AUC is a smooth function, in expectation, if the distributions of the positive and negative data points obey a jointly normal distribution. The practical performance of this algorithm is compared to three prominent Bayesian optimization methods and random search. The presented numerical results show that DFO-TR surpasses Bayesian optimization and random search on various black-box optimization problem, such as maximizing AUC and hyperparameter tuning.

Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods

Recently we proposed a general, ensemble-based feature engineering wrapper (FEW) that was paired with a number of machine learning methods to solve regression problems. Here, we adapt FEW for supervised classification and perform a thorough analysis of fitness and survival methods within this framework. Our tests demonstrate that two fitness metrics, one introduced as an adaptation of the silhouette score, outperform the more commonly used Fisher criterion. We analyze survival methods and demonstrate that \epsilon-lexicase survival works best across our test problems, followed by random survival which outperforms both tournament and deterministic crowding. We conduct hyper-parameter optimization for several classification methods using a large set of problems to benchmark the ability of FEW to improve data representations. The results show that FEW can improve the best classifier performance on several problems. We show that FEW generates readable and meaningful features for a biomedical problem with different ML pairings.

Fast Spectral Ranking for Similarity Search

Despite the success of deep learning on representing images for particular object retrieval, recent studies show that the learned representations still lie on manifolds in a high dimensional space. Therefore, nearest neighbor search cannot be expected to be optimal for this task. Even if a nearest neighbor graph is computed offline, exploring the manifolds online remains expensive. This work introduces an explicit embedding reducing manifold search to Euclidean search followed by dot product similarity search. We show this is equivalent to linear graph filtering of a sparse signal in the frequency domain, and we introduce a scalable offline computation of an approximate Fourier basis of the graph. We improve the state of art on standard particular object retrieval datasets including a challenging one containing small objects. At a scale of 10^5 images, the offline cost is only a few hours, while query time is comparable to standard similarity search.

Metalearning for Feature Selection

A general formulation of optimization problems in which various candidate solutions may use different feature-sets is presented, encompassing supervised classification, automated program learning and other cases. A novel characterization of the concept of a ‘good quality feature’ for such an optimization problem is provided; and a proposal regarding the integration of quality based feature selection into metalearning is suggested, wherein the quality of a feature for a problem is estimated using knowledge about related features in the context of related problems. Results are presented regarding extensive testing of this ‘feature metalearning’ approach on supervised text classification problems; it is demonstrated that, in this context, feature metalearning can provide significant and sometimes dramatic speedup over standard feature selection heuristics.

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) to extract short-term local dependency patterns among variables, and the Recurrent Neural Network (RNN) to discover long-term patterns and trends. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods.

Nonparametric Variational Auto-encoders for Hierarchical Representation Learning

The recently developed variational autoencoders (VAEs) have proved to be an effective confluence of the rich representational power of neural networks with Bayesian methods. However, most work on VAEs use a rather simple prior over the latent variables such as standard normal distribution, thereby restricting its applications to relatively simple phenomena. In this work, we propose hierarchical nonparametric variational autoencoders, which combines tree-structured Bayesian nonparametric priors with VAEs, to enable infinite flexibility of the latent representation space. Both the neural parameters and Bayesian priors are learned jointly using tailored variational inference. The resulting model induces a hierarchical structure of latent semantic concepts underlying the data corpus, and infers accurate representations of data instances. We apply our model in video representation learning. Our method is able to discover highly interpretable activity hierarchies, and obtain improved clustering accuracy and generalization capacity based on the learned rich representations.

PriMaL: A Privacy-Preserving Machine Learning Method for Event Detection in Distributed Sensor Networks

This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning method for reducing the privacy cost of information transmitted through a network. Distributed sensor networks are often used for automated classification and detection of abnormal events in high-stakes situations, e.g. fire in buildings, earthquakes, or crowd disasters. Such networks might transmit privacy-sensitive information, e.g. GPS location of smartphones, which might be disclosed if the network is compromised. Privacy concerns might slow down the adoption of the technology, in particular in the scenario of social sensing where participation is voluntary, thus solutions are needed which improve privacy without compromising on the event detection accuracy. PriMaL is implemented as a machine-learning layer that works on top of an existing event detection algorithm. Experiments are run in a general simulation framework, for several network topologies and parameter values. The privacy footprint of state-of-the-art event detection algorithms is compared within the proposed framework. Results show that PriMaL is able to reduce the privacy cost of a distributed event detection algorithm below that of the corresponding centralized algorithm, within the bounds of some assumptions about the protocol. Moreover the performance of the distributed algorithm is not statistically worse than that of the centralized algorithm.

One-Shot Imitation Learning

Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. The use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks. Videos available at

License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks

This work details Sighthounds fully automated license plate detection and recognition system. The core technology of the system is built using a sequence of deep Convolutional Neural Networks (CNNs) interlaced with accurate and efficient algorithms. The CNNs are trained and fine-tuned so that they are robust under different conditions (e.g. variations in pose, lighting, occlusion, etc.) and can work across a variety of license plate templates (e.g. sizes, backgrounds, fonts, etc). For quantitative analysis, we show that our system outperforms the leading license plate detection and recognition technology i.e. ALPR on several benchmarks. Our system is available to developers through the Sighthound Cloud API at https://…/cloud

Approximating the Sachdev-Ye-Kitaev model with Majorana wires

Dance Dance Convolution

A Comparison of deep learning methods for environmental sound

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Non-robust phase transitions in the generalized clock model on trees

Poly-logarithmic localization for random walks among random obstacles

Learning Correspondence Structures for Person Re-identification

Distributed Constraint Problems for Utilitarian Agents with Privacy Concerns, Recast as POMDPs

A Unified Performance Analysis of the Effective Capacity of Dispersed Spectrum Cognitive Radio Systems over Generalized Fading Channels

SCALPEL: Extracting Neurons from Calcium Imaging Data

Multi-style Generative Network for Real-time Transfer

Decentralized Optimal Control for Connected Automated Vehicles at Intersections Including Left and Right Turns

Nuisance parameter based sample size re-estimation incorporating prior information

CSI: A Hybrid Deep Model for Fake News

On the dimension of downsets of integer partitions and compositions

Electric field control of emergent electrodynamics in quantum spin ice

Active Decision Boundary Annotation with Deep Generative Models

Learning to Generate Samples from Noise through Infusion Training

A Conditional Density Estimation Partition Model Using Logistic Gaussian Processes

Difference sets disjoint from a subgroup

Collapsibility to a subcomplex of given dimension is NP-complete

Evidence of the Poisson/Gaudin-Mehta phase transition for banded matrices on global scales

SORT: Second-Order Response Transform for Visual Recognition

Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

Recovery of the starting times of delayed signals

The Use of Autoencoders for Discovering Patient Phenotypes

Colouring exact distance graphs of chordal graphs

New reconstruction and data processing methods for regression and interpolation analysis of multidimensional big data

Bohr sets and multiplicative diophantine approximation

Sparse Channel Estimation for Massive MIMO System Based on Dirichlet Process and Combined Message Passing

Regularity of Schroedinger’s functional equation and mean field PDEs for h-path processes

Recurrent Topic-Transition GAN for Visual Paragraph Generation

Encouraging LSTMs to Anticipate Actions Very Early

Predictive Control of Autonomous Kites in Tow Test Experiments

Cross-modal Deep Metric Learning with Multi-task Regularization

An Investigation of Three-point Shooting through an Analysis of NBA Player Tracking Data

Pattern Division Multiple Access with Large-scale Antenna Array

A Simple Online Parameter Estimation Technique with Asymptotic Guarantees

Energy Efficient Joint Resource Allocation and Power Control for D2D Communications

Energy Efficient Power Control for the Two-tier Networks with Small Cells and Massive MIMO

The Minimum Distance Estimation with Multiple Integral in Panel Data

High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

A CMDP-based Approach for Energy Efficient Power Allocation in Massive MIMO Systems

Energy Efficient Power Allocation in Massive MIMO Systems based on Standard Interference Function

Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

Stochastic Primal Dual Coordinate Method with Non-Uniform Sampling Based on Optimality Violations

On Jacobian group and complexity of I-graph I(n,k,l) through Chebyshev polynomials

Bayesian Nonparametric Inference for M/G/1 Queueing Systems

SNR Degradation due to Carrier Frequency Offset in OFDM based Amplify-and-Forward Relay Systems

Pseudorehearsal in value function approximation

SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules

The Logarithm Map, its Limits and Frechet Means in Orthant Spaces

Simplified Frequency Offset Estimation for MIMO OFDM Systems

Interval observer for uncertain time-varying SIR-SI model of vector-borne disease

Full-duplex Amplify-and-Forward Relaying: Power and Location Optimization

Frequency Offset Estimation for OFDM Systems with a Novel Frequency Domain Training Sequence

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Prescribed Performance Control for Signal Temporal Logic Specifications

Connected sums of $Z$-knotted triangulations

Limit shapes of stable configurations of a generalized Bulgarian solitaire

An exponential limit shape of random $q$-proportion Bulgarian solitaire

Decomposition techniques applied to the Clique-Stable set Separation problem

On the Interplay between Strong Regularity and Graph Densification

Mean path length invariance in multiple light scattering

Layer-wise training of deep networks using kernel similarity

Interest-Driven Discovery of Local Process Models

Non-scaling displacement distributions as may be seen in fluorescence correlation spectroscopy

Evolving Parsimonious Networks by Mixing Activation Functions

Knowledge distillation using unlabeled mismatched images

MRI-based Surgical Planning for Lumbar Spinal Stenosis

Deep generative-contrastive networks for facial expression recognition

Proposal Flow: Semantic Correspondences from Object Proposals

Universality for critical heavy-tailed network models: Metric structure of maximal components

Vertex connectivity of the power graph of a finite cyclic group

Performance analysis of RF-FSO multi-hop networks

A Deterministic Global Optimization Method for Variational Inference

Layers and Matroids for the Traveling Salesman’s Paths

Non-Convex Rank/Sparsity Regularization and Local Minima

Transversal fluctuations of the ASEP, stochastic six vertex model, and Hall-Littlewood Gibbsian line ensembles

Exact Affine OBDDs

Optimal DoF region of the K-User MISO BC with Partial CSIT

GP-GAN: Towards Realistic High-Resolution Image Blending

Overcoming model simplifications when quantifying predictive uncertainty

Disorder chaos in some diluted spin glass models

Improving Person Re-identification by Attribute and Identity Learning

Gibbs Reference Prior for Robust Gaussian Process Emulation

Convergence of Brownian Motions on Metric Measure Spaces Under Riemannian Curvature-Dimension Conditions

A Hybrid Feasibility Constraints-Guided Search to the Two-Dimensional Bin Packing Problem with Due Dates

Sufficient Dimension Reduction via Random-Partitions for Large-p-Small-n Problem

A Note on the Tree Augmentation Problem

Focusing inside Disordered Media with the Generalized Wigner-Smith Operator

Linear combinations of Rademacher random variables

Riccati observers for velocity-aided attitude estimation of accelerated vehicles using coupled velocity measurements

ZM-Net: Real-time Zero-shot Image Manipulation Network

Statistical Topology and the Random Interstellar Medium

Poisson Malliavin calculus in Hilbert space with an application to SPDE

Black-Box Data-efficient Policy Search for Robotics

Motzkin Numbers: an Operational Point of View

On the controllability of the Navier-Stokes equation in spite of boundary layers

Linear Convergence of Stochastic Frank Wolfe Variants

Robust classification of different fingerprint copies with deep neural networks for database penetration rate reduction

Heavy Tails for an Alternative Stochastic Perpetuity Model

The unit theorem for finite-dimensional algebras

GG-mixed Poisson distributions as mixed geometric laws and related limit theorems

Resilient Monotone Submodular Function Maximization

Convergence rates in the central limit theorem for weighted sums of Bernoulli random fields

From safe screening rules to working sets for faster Lasso-type solvers

An Accelerated Analog Neuromorphic Hardware System Emulating NMDA- and Calcium-Based Non-Linear Dendrites

Just-in-Time Batch Scheduling Problem with Two-dimensional Bin Packing Constraints

Linearly many rainbow trees in properly edge-coloured complete graphs

Sequential Detection of Three-Dimensional Signals under Dependent Noise

Renormalization: a quasi-shuffle approach

Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration

Controllability to Equilibria of the 1-D Fokker-Planck Equation with Zero-Flux Boundary Condition

Recursive computation of coprime factorizations

On Classical Control and Smart Cities

Phytoplankton Hotspot Prediction With an Unsupervised Spatial Community Model

On inverse optimal control via polynomial optimization

On functional determinants of matrix differential operators with degenerate zero modes

How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments

Point-to-line polymers and orthogonal Whittaker functions

Stochastic control on the half-line and applications to the optimal dividend/consumption problem

Construction of Directed 2K Graphs

On The Projection Operator to A Three-view Cardinality Constrained Set