MemGEN: Memory is All You Need

We propose a new learning paradigm called Deep Memory. It has the potential to completely revolutionize the Machine Learning field. Surprisingly, this paradigm has not been reinvented yet, unlike Deep Learning. At the core of this approach is the \textit{Learning By Heart} principle, well studied in primary schools all over the world. Inspired by poem recitation, or by \pi decimal memorization, we propose a concrete algorithm that mimics human behavior. We implement this paradigm on the task of generative modeling, and apply to images, natural language and even the \pi decimals as long as one can print them as text. The proposed algorithm even generated this paper, in a one-shot learning setting. In carefully designed experiments, we show that the generated samples are indistinguishable from the training examples, as measured by any statistical tests or metrics.

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention, in particular when either observations or features are distributed, but not both. We propose a general stochastic algorithm where observations, features, and gradient components can be sampled in a double distributed setting, i.e., with both features and observations distributed. Very technical analyses establish convergence properties of the algorithm under different conditions on the learning rate (diminishing to zero or constant). Computational experiments in Spark demonstrate a superior performance of our algorithm versus a benchmark in early iterations of the algorithm, which is due to the stochastic components of the algorithm.

Scaling Ordered Stream Processing on Shared-Memory Multicores

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple opportunities for parallelizing its execution, in the form of data, pipeline and task parallelism. On the other hand, many important applications require that processing of the stream be ordered, where inputs are processed in the same order as they arrive. There is a fundamental conflict between ordered processing and parallelizing the streaming computation. This paper focuses on the problem of effectively parallelizing ordered streaming computations on a shared-memory multicore machine. We first address the key challenges in exploiting data parallelism in the ordered setting. We present a low-latency, non-blocking concurrent data structure to order outputs produced by concurrent workers on an operator. We also propose a new approach to parallelizing partitioned stateful operators that can handle load imbalance across partitions effectively and mostly avoid delays due to ordering. We illustrate the trade-offs and effectiveness of our concurrent data-structures on micro-benchmarks and streaming queries from the TPCx-BB benchmark. We then present an adaptive runtime that dynamically maps the exposed parallelism in the computation to that of the machine. We propose several intuitive scheduling heuristics and compare them empirically on the TPCx-BB queries. We find that for streaming computations, heuristics that exploit as much pipeline parallelism as possible perform better than those that seek to exploit data parallelism.

Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference

As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed for a personal user in embedded systems, it demands a large amount of DRAM accesses because the network size is usually much bigger than the cache size and the weights of an RNN are used only once at each time step. We overcome this problem by parallelizing the algorithm and executing it multiple time steps at a time. This approach also reduces the power consumption by lowering the number of DRAM accesses. QRNN (Quasi Recurrent Neural Networks) and SRU (Simple Recurrent Unit) based recurrent neural networks are used for implementation. The experiments for SRU showed about 300% and 930% of speed-up when the numbers of multi time steps are 4 and 16, respectively, in an ARM CPU based system.

Guide Me: Interacting with Deep Networks

Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users. While much prior work lies at the intersection of natural language and vision, such as image captioning or image generation from text descriptions, less focus has been placed on the use of language to guide or improve the performance of a learned visual processing algorithm. In this paper, we explore methods to flexibly guide a trained convolutional neural network through user input to improve its performance during inference. We do so by inserting a layer that acts as a spatio-semantic guide into the network. This guide is trained to modify the network’s activations, either directly via an energy minimization scheme or indirectly through a recurrent model that translates human language queries to interaction weights. Learning the verbal interaction is fully automatic and does not require manual text annotations. We evaluate the method on two datasets, showing that guiding a pre-trained network can improve performance, and provide extensive insights into the interaction between the guide and the CNN.

The Price is Right: Predicting Prices with Product Images

In this work, we build an ensemble of machine learning models to predict the price of a product given its image, and visualize the features that result in higher or lower price predictions. We collect two novel datasets of product images and their MSRP prices for this purpose: a bicycle dataset and a car dataset. We set baselines for price regression using linear regression on histogram of oriented gradients (HOG) and convolutional neural network (CNN) features, and a baseline for price segment classification using a multiclass SVM. For our main models, we train several deep CNNs using both transfer learning and our own architectures, for both regression and classification. We achieve strong results on both datasets, with deep CNNs significantly outperforming other models in a variety of metrics. Finally, we use several recently-developed methods to visualize the image features that result in higher or lower prices.

How an Electrical Engineer Became an Artificial Intelligence Researcher, a Multiphase Active Contours Analysis

This essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author’s lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models.

Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data

Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data.

PIMKL: Pathway Induced Multiple Kernel Learning

Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, therefore we have very little understanding about the mechanisms that lead to the prediction provided. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to classify samples reliably that can, at the same time, provide a pathway-based molecular fingerprint of the signature that underlies the classification. PIMKL exploits prior knowledge in the form of molecular interaction networks and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning algorithm (MKL), an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.

Fast and Robust Subspace Clustering Using Random Projections

Over the past several decades, subspace clustering has been receiving increasing interest and continuous progress. However, due to the lack of scalability and/or robustness, existing methods still have difficulty in dealing with the data that possesses simultaneously three characteristics: high-dimensional, massive and grossly corrupted. To tackle the scalability and robustness issues simultaneously, in this paper we suggest to consider a problem called compressive robust subspace clustering, which is to perform robust subspace clustering with the compressed data, and which is generated by projecting the original high-dimensional data onto a lower-dimensional subspace chosen at random. Given these random projections, the proposed method, row space pursuit (RSP), recovers not only the authentic row space, which provably leads to correct clustering results under certain conditions, but also the gross errors possibly existing in data. The compressive nature of the random projections gives our RSP high computational and storage efficiency, and the recovery property enables the ability for RSP to deal with the grossly corrupted data. Extensive experiments on high-dimensional and/or large-scale datasets show that RSP can maintain comparable accuracies to to prevalent methods with significant reductions in the computational time.

Transductive Unbiased Embedding for Zero-Shot Learning

Most existing Zero-Shot Learning (ZSL) methods have the strong bias problem, in which instances of unseen (target) classes tend to be categorized as one of the seen (source) classes. So they yield poor performance after being deployed in the generalized ZSL settings. In this paper, we propose a straightforward yet effective method named Quasi-Fully Supervised Learning (QFSL) to alleviate the bias problem. Our method follows the way of transductive learning, which assumes that both the labeled source images and unlabeled target images are available for training. In the semantic embedding space, the labeled source images are mapped to several fixed points specified by the source categories, and the unlabeled target images are forced to be mapped to other points specified by the target categories. Experiments conducted on AwA2, CUB and SUN datasets demonstrate that our method outperforms existing state-of-the-art approaches by a huge margin of 9.3~24.5% following generalized ZSL settings, and by a large margin of 0.2~16.2% following conventional ZSL settings.

Learning to Adapt: Meta-Learning for Model-Based Control

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.

Parallel Grid Pooling for Data Augmentation

Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicable to various CNN models. PGP performs downsampling without discarding any intermediate feature. It works as data augmentation and is complementary to commonly used data augmentation techniques. Furthermore, we demonstrate that a dilated convolution can naturally be represented using PGP operations, which suggests that the dilated convolution can also be regarded as a type of data augmentation technique. Experimental results based on popular image classification benchmarks demonstrate the effectiveness of the proposed method. Code is available at: https://…/akitotakeki

Learning to generate classifiers

We train a network to generate mappings between training sets and classification policies (a ‘classifier generator’) by conditioning on the entire training set via an attentional mechanism. The network is directly optimized for test set performance on an training set of related tasks, which is then transferred to unseen ‘test’ tasks. We use this to optimize for performance in the low-data and unsupervised learning regimes, and obtain significantly better performance in the 10-50 datapoint regime than support vector classifiers, random forests, XGBoost, and k-nearest neighbors on a range of small datasets.

On the Resistance of Neural Nets to Label Noise

We investigate the behavior of convolutional neural networks (CNN) in the presence of label noise. We show empirically that CNN prediction for a given test sample depends on the labels of the training samples in its local neighborhood. This is similar to the way that the K-nearest neighbors (K-NN) classifier works. With this understanding, we derive an analytical expression for the expected accuracy of a K-NN, and hence a CNN, classifier for any level of noise. In particular, we show that K-NN, and CNN, are resistant to label noise that is randomly spread across the training set, but are very sensitive to label noise that is concentrated. Experiments on real datasets validate our analytical expression by showing that they match the empirical results for varying degrees of label noise.

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.

Online Regression with Model Selection

Online learning algorithms have a wide variety of applications in large scale machine learning problems due to their low computational and memory requirements. However, standard online learning methods still suffer some issues such as lower convergence rates and limited capability to select features or to recover the true features. In this paper, we present a novel framework for online learning based on running averages and introduce a series of online versions of some popular existing offline algorithms such as Adaptive Lasso, Elastic Net and Feature Selection with Annealing. We prove the equivalence between our online methods and their offline counterparts and give theoretical feature selection and convergence guarantees for some of them. In contrast to the existing online methods, the proposed methods can extract model with any desired sparsity level at any time. Numerical experiments indicate that our new methods enjoy high feature selection accuracy and a fast convergence rate, compared with standard stochastic algorithms and offline learning algorithms. We also present some applications to large datasets where again the proposed framework shows competitive results compared to popular online and offline algorithms.

Substitute Teacher Networks: Learning with Almost No Supervision

Learning through experience is time-consuming, inefficient and often bad for your cortisol levels. To address this problem, a number of recently proposed teacher-student methods have demonstrated the benefits of private tuition, in which a single model learns from an ensemble of more experienced tutors. Unfortunately, the cost of such supervision restricts good representations to a privileged minority. Unsupervised learning can be used to lower tuition fees, but runs the risk of producing networks that require extracurriculum learning to strengthen their CVs and create their own LinkedIn profiles. Inspired by the logo on a promotional stress ball at a local recruitment fair, we make the following three contributions. First, we propose a novel almost no supervision training algorithm that is effective, yet highly scalable in the number of student networks being supervised, ensuring that education remains affordable. Second, we demonstrate our approach on a typical use case: learning to bake, developing a method that tastily surpasses the current state of the art. Finally, we provide a rigorous quantitive analysis of our method, proving that we have access to a calculator. Our work calls into question the long-held dogma that life is the best teacher.

Convexity of Solvability Set of Power Distribution Networks
Multiresolution analysis of point processes and statistical thresholding for wavelet-based intensity estimation
Tracy-Widom fluctuations in 2D random Schrodinger operators
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Unleashing and Speeding Up Readers in Atomic Object Implementations
Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos
Nonlinear Constitutive Models for Nano-scale Heat Conduction
Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision
Mortality in a heterogeneous population – Lee-Carter’s methodology
Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist
Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images
Prefix-Free Parsing for Building Big BWTs
Robustness of the Sobol’ indices to distributional uncertainty
Bayesian Goodness of Fit Tests: A Conversation for David Mumford
Detection, localisation and tracking of pallets using machine learning techniques and 2D range data
Efficient First-Order Algorithms for Adaptive Signal Denoising
DIY Human Action Data Set Generation
Reduction principle for functionals of vector random fields
High-Dimensional Discovery Under non-Gaussianity
A note on the determinant of the walk matrix of a graph
Two-Stream Neural Networks for Tampered Face Detection
Deep Recurrent Neural Networks for Product Attribute Extraction in eCommerce
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
FutureMapping: The Computational Structure of Spatial AI Systems
Robust Cross-lingual Hypernymy Detection using Dependency Context
Deep learning-based virtual histology staining using auto-fluorescence of label-free tissue
Quasisymmetric uniformization and heat kernel estimates
Pancreas Segmentation in CT and MRI Images via Domain Specific Network Designing and Recurrent Neural Contextual Learning
A simple canonical form for nonlinear programming problems and its use
Simulation Methods for Stochastic Storage Problems: A Statistical Learning Perspective
Improved Linear Programs for Discrete Barycenters
Task-Driven Super Resolution: Object Detection in Low-resolution Images
Counting Phylogenetic Networks with Few Reticulation Vertices: Tree-Child and Normal Networks
Deep Cascade Multi-task Learning for Slot Filling in Chinese E-commerce Shopping Guide Assistant
Large Multi-scale Spatial Kriging Using Tree Shrinkage Priors
Local Equivalence Problem in Hidden Markov Model
Learning View-Specific Deep Networks for Person Re-Identification
On the classification of linear complementary dual codes
A Variant on the Feline Josephus Problem
Irregular triangulations of $K_{12s}$ in orientable surfaces
Efficient and Deep Person Re-Identification using Multi-Level Similarity
Two-stage approaches to the analysis of occupancy data II. The heterogeneous model and conditional likelihood
Automatic Generation of Chinese Short Product Titles for Mobile Display
DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition
A barrier-type method for multiobjective optimization
Stabilisation of dynamics of oscillatory systems by non-autonomous perturbation
Contrast-Oriented Deep Neural Networks for Salient Object Detection
Interactive 3D Visualization for Theoretical Virtual Observatories
Cross-modal Deep Variational Hand Pose Estimation
Learning Structure and Strength of CNN Filters for Small Sample Size Training
Fine-Grained Attention Mechanism for Neural Machine Translation
Observer-based Adaptive Optimal Output Containment Control problem of Linear Heterogeneous Multi-agent Systems with Relative Output Measurements
Scalable Deep Learning Logo Detection
Remarks on superconcentration and Gamma calculus. Application to Spin Glasses
Strong geodetic cores and Cartesian product graphs
On the Diameter of Tree Associahedra
Critical temperature of Heisenberg models on regular trees, via random loops
A Viscosity Approach to Stochastic Differential Games of Control and Stopping Involving Impulsive Control
Space of isospectral periodic tridiagonal matrices
Exact Asymptotic Formulas for the Heat Kernels of Space and Time-Fractional Equations
A Rule for Committee Selection with Soft Diversity Constraints
Reconstruction Network for Video Captioning
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
Supercongruences for the $(p-1)$th Apéry number
Minimax Estimation of Quadratic Fourier Functionals
Co-evolution and morphogenetic systems
Log-moment estimators for the generalized Linnik and Mittag-Leffler distributions with applications to financial modeling
Hydrodynamics for symmetric exclusion in contact with reservoirs
Arctic curves for paths with arbitrary starting points: a tangent method approach
An integral characterization of the Dirichlet process
Statistical Non-linear Model, Achievable Rates and Signal Detection for Photon-level Photomultiplier Receiver
Ioffe-Regel criterion of Anderson localization in the model of resonant point scatterers
Memory effects in the ion conductor Rb$_{2}$Ti$_{2}$O$_{5}$
Weighted graphs and complex Gaussian free fields
3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
Predicting Future Instance Segmentations by Forecasting Convolutional Features
Representation of distributionally robust chance-constraints
A random critical point separates brittle and ductile yielding transitions in amorphous materials
Automatically augmenting an emotion dataset improves classification using audio
Reusing Neural Speech Representations for Auditory Emotion Recognition
GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection
Neural codes, decidability, and a new local obstruction to convexity
SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters
Towards Quantum Machine Learning with Tensor Networks
Universality of non-normality in real complex networks
Multi-modal Disease Classification in Incomplete Datasets Using Geometric Matrix Completion
The eigenvalues of stochastic blockmodel graphs
Asymptotics in bond percolation on expanders
Asymmetry in energy versus spin transport in certain interacting, disordered systems
Learning to Anonymize Faces for Privacy Preserving Action Detection
Joint Optimization Framework for Learning with Noisy Labels