Espresso: Efficient Forward Propagation for BCNNs

There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks (\approx 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso.

Accelerated Distributed Nesterov Gradient Descent

This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (Acc-DNGD) method. When the objective function is convex and L-smooth, we show that it achieves a O(\frac{1}{t^{1.4-\epsilon}}) convergence rate for all \epsilon\in(0,1.4). We also show the convergence rate can be improved to O(\frac{1}{t^2}) if the objective function is a composition of a linear map and a strongly-convex and smooth function. When the objective function is \mu-strongly convex and L-smooth, we show that it achieves a linear convergence rate of O([ 1 - O( (\frac{\mu}{L})^{5/7} )]^t), where \frac{L}{\mu} is the condition number of the objective.

Accelerated Inference for Latent Variable Models

Inference of latent feature models in the Bayesian nonparametric setting is generally difficult, especially in high dimensional settings, because it usually requires proposing features from some prior distribution. In special cases, where the integration is tractable, we could sample feature assignments according to a predictive likelihood. However, this still may not be efficient in high dimensions. We present a novel method to accelerate the mixing of latent variable model inference by proposing feature locations from the data, as opposed to the prior. This sampling method is efficient for proper mixing of the Markov chain Monte Carlo sampler, computationally attractive because this method can be performed in parallel, and is theoretically guaranteed to converge to the posterior distribution as its limiting distribution.

AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms

Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific dataset. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the output distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics.

RankPL: A Qualitative Probabilistic Programming Language

In this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing ‘normal’ from’ surprising’ events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download.

Accelerated Hierarchical Density Clustering

We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://…/hdbscan

Fast Change Point Detection on Dynamic Social Networks

A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model – where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.

Forward Thinking: Building Deep Random Forests

The success of deep neural networks has inspired many to wonder whether other learners could benefit from deep, layered architectures. We present a general framework called forward thinking for deep learning that generalizes the architectural flexibility and sophistication of deep neural networks while also allowing for (i) different types of learning functions in the network, other than neurons, and (ii) the ability to adaptively deepen the network as needed to improve results. This is done by training one layer at a time, and once a layer is trained, the input data are mapped forward through the layer to create a new learning problem. The process is then repeated, transforming the data through multiple layers, one at a time, rendering a new dataset, which is expected to be better behaved, and on which a final output layer can achieve good performance. In the case where the neurons of deep neural nets are replaced with decision trees, we call the result a Forward Thinking Deep Random Forest (FTDRF). We demonstrate a proof of concept by applying FTDRF on the MNIST dataset. We also provide a general mathematical formulation that allows for other types of deep learning problems to be considered.

Recurrent Additive Networks

We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated component-wise sum of the input and the previous state, without any of the non-linearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs outperform both LSTMs and GRUs on benchmark language modeling problems. This result shows that many of the non-linear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood.

MITHRIL: Mining Sporadic Associations for Cache Prefetching

The growing pressure on cloud application scalability has accentuated storage performance as a critical bottle- neck. Although cache replacement algorithms have been extensively studied, cache prefetching – reducing latency by retrieving items before they are actually requested remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose MITHRIL, a prefetching layer that efficiently exploits historical patterns in cache request associations. MITHRIL is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that MITHRIL is effective, giving an average of a 55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain over AMP at reasonable cost. We further show that MITHRIL can supplement any cache replacement algorithm and be readily integrated into existing systems. Furthermore, we demonstrate the improvement comes from MITHRIL being able to capture mid-frequency blocks.

CrossNets : A New Approach to Complex Learning

We propose a novel neural network structure called CrossNets, which considers architectures on directed acyclic graphs. This structure builds on previous generalizations of feed forward models, such as ResNets, by allowing for all forward cross connections between layers (both adjacent and non-adjacent). The addition of cross connections among the network increases information flow across the whole network, leading to better training and testing performances. The superior performance of the network is tested against four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN. We conclude with a proof of convergence for Crossnets to a local minimum for error, where weights for connections are chosen through backpropagation with momentum.

Shallow Updates for Deep Reinforcement Learning

Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach — the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.

Statistical inference using SGD

We present a novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

Shake-Shake regularization

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://…/shake-shake.

Annealed Generative Adversarial Networks

We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse.

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In this document, we outline opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What’s Next) project at Stanford.

Learning from Complementary Labels

Collecting labeled data is costly and thus is a critical bottleneck in real-world classification tasks. To mitigate the problem, we consider a complementary label, which specifies a class that a pattern does not belong to. Collecting complementary labels would be less laborious than ordinary labels since users do not have to carefully choose the correct class from many candidate classes. However, complementary labels are less informative than ordinary labels and thus a suitable approach is needed to better learn from complementary labels. In this paper, we show that an unbiased estimator of the classification risk can be obtained only from complementary labels, if a loss function satisfies a particular symmetric condition. We theoretically prove the estimation error bounds for the proposed method, and experimentally demonstrate the usefulness of the proposed algorithms.

Improved Clustering with Augmented k-means

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can’t be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.

AIXIjs: A Software Demo for General Reinforcement Learning

Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.

A novel algorithmic approach to Bayesian Logic Regression

Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Here we will adopt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects.

W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis

With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages use to be very costly and time consuming. In this work we describe W2VLDA, an unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspectterms/opinion-words separation and sentiment polarity classification for any given domain and language. We also evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronic-devices).

Minimax Statistical Learning and Domain Adaptation with Wasserstein Distances

As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove a generalization bound that involves the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples.

Nonparametric Online Regression while Learning the Metric

We study algorithms for online nonparametric regression that learn the directions along which the regression function is smoother. Our algorithm learns the Mahalanobis metric based on the gradient outer product matrix \boldsymbol{G} of the regression function (automatically adapting to the effective rank of this matrix), while simultaneously bounding the regret —on the same data sequence— in terms of the spectrum of \boldsymbol{G}. As a preliminary step in our analysis, we generalize a nonparametric online learning algorithm by Hazan and Megiddo by enabling it to compete against functions whose Lipschitzness is measured with respect to an arbitrary Mahalanobis metric.

Streaming Sparse Gaussian Process Approximations

Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

Deep De-Aliasing for Fast Compressive Sensing MRI

Simultaneous Multiple Surface Segmentation Using Deep Learning

A New 3D Segmentation Methodology for Lumbar Vertebral Bodies for the Measurement of BMD and Geometry

Sparse Coding on Stereo Video for Object Detection

A New 3D Method to Segment the Lumbar Vertebral Bodies and to Determine Bone Mineral Density and Geometry

Local Information with Feedback Perturbation Suffices for Dictionary Learning in Neural Circuits

Data-driven Optimal Transport Cost Selection for Distributionally Robust Optimizatio

Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

Improved Very-short-term Wind Forecasting using Atmospheric Classification

A Lightweight Approach for On-the-Fly Reflectance Estimation

Relaxed Wasserstein with Applications to GANs

Competing Bose-Glass physics with disorder-induced Bose-Einstein condensation in the doped $S=1$ antiferromagnet Ni(Cl$_{1-x}$Br$_x$)$_2$-4SC(NH$_2$)$_2$ at high magnetic fields

Doubly Robust Data-Driven Distributionally Robust Optimization

Hall-Littlewood RSK field

Nestrov’s Acceleration For Second Order Method

Model-Based Planning in Discrete Action Spaces

Large System Analysis of Power Normalization Techniques in Massive MIMO

On the coupling time of the heat-bath process for the Fortuin-Kasteleyn random-cluster model

Proximal Methods for Sparse Optimal Scoring and Discriminant Analysis

Hypothesis Testing via Euclidean Separation

The High-Dimensional Geometry of Binary Neural Networks

Smoothed and Average-case Approximation Ratios of Mechanisms: Beyond the Worst-case Analysis

Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation

Ensemble Adversarial Training: Attacks and Defenses

Machine learning modeling for time series problem: Predicting flight ticket prices

Towards Real World Human Parsing: Multiple-Human Parsing in the Wild

PixColor: Pixel Recursive Colorization

Two-temperature logistic regression based on the Tsallis divergence

Quantum versus classical simultaneity in communication complexity

Space Complexity of Fault Tolerant Register Emulations

Securing Deep Neural Nets against Adversarial Attacks with Moving Target Defense

How to Train Your DRAGAN

Families of vectors without antipodal pairs

GAR: An efficient and scalable Graph-based Activity Regularization for semi-supervised learning

Active Sampling for Graph-Cognizant Classification via Expected Model Change

Quadruplet Network with One-Shot Learning for Visual Tracking

Iteration-complexity of a Jacobi-type non-Euclidean ADMM for multi-block linearly constrained nonconvex programs

Uncertainty in Economic Growth and Inequality

Coexistence of RF-powered IoT and a Primary Wireless Network with Secrecy Guard Zones

Recurrent Scene Parsing with Perspective Understanding in the Loop

Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks

End-to-end Planning of Fixed Millimeter-Wave Networks

Speedup from a different parametrization within the Neural Net algorithm

Modeling spatial social complex networks for dynamical processes

SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms

BRPL: Backpressure RPL for High-throughput and Mobile IoTs

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

PrivMin: Differentially Private MinHash for Jaccard Similarity Computation

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

High-Dimensional Bayesian Geostatistics

Search Engine Guided Non-Parametric Neural Machine Translation

Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning

Conflict-free vertex-connections of graphs

Non-Linear Phase-Shifting of Haar Wavelets for Run-Time All-Frequency Lighting

Polar Coding for Parallel Gaussian Channel

Bayesian Belief Updating of Spatiotemporal Seizure Dynamics

Fast and simple algorithms for computing both $LCS_{k}$ and $LCS_{k+}$

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Gaze Distribution Analysis and Saliency Prediction Across Age Groups

Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

Performance Evaluation of Optimal Radio Access Technology Selection Algorithms for LTE-WiFi Network

Deep Sparse Coding Using Optimized Linear Expansion of Thresholds

Contract Design for Energy Demand Response

Toric weak Fano varieties associated to building sets

Event-Triggered Algorithms for Leader-Follower Consensus of Networked Euler-Lagrange Agents

Personalized Ranking for Context-Aware Venue Suggestion

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

Non-existence of antipodal cages of even girth

Formalized Lambek Calculus in Higher Order Logic (HOL4)

Applications of multiplicative number theory to uniform distribution and ergodic Ramsey theory

Dynamic Analysis of the Arrow Distributed Directory Protocol in General Networks

Forecasting Hand and Object Locations in Future Frames

Critical Contours: An Invariant Linking Image Flow with Salient Surface Organization

Full-Duplex Bidirectional Secure Communications under Perfect and Distributionally Ambiguous Eavesdropper’s CSI

Combining tabu search and graph reduction to solve the maximum balanced biclique problem

Phase-Shifting Separable Haar Wavelets and Applications

Why You Should Charge Your Friends for Borrowing Your Stuff

Ensemble Sampling

Calibrating Black Box Classification Models through the Thresholding Method

Stability of cross-validation and minmax-optimal number of folds

A Dynkin game on assets with incomplete information on the return

On Capacity of Noncoherent MIMO with Asymmetric Link Strengths

Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning

Honey Bee Dance Modeling in Real-time using Machine Learning

Stabilizing Adversarial Nets With Prediction Methods

Mixed Membership Word Embeddings for Computational Social Science

Broadcasting in Noisy Radio Networks

Spelling Correction as a Foreign Language

Instrument-Armed Bandits

Generalizing the Role of Determinization in Probabilistic Planning

Gradient Flows: Applications to Classification, Image Denoising, and Riemannian MCMC

Incorporating Depth into both CNN and CRF for Indoor Semantic Segmentation

Balanced Policy Evaluation and Learning

DeepMasterPrint: Generating Fingerprints for Presentation Attacks

Ergodicity of stochastic differential equations with jumps and singular coefficients

On asymptotically minimax nonparametric detection of signal in Gaussian white noise

Equating $k$ Maximum Degrees in Graphs without Short Cycles

Algebraic Aspects of Conditional Independence and Graphical Models

Answers to Holm’s questions for high oreder free arrangements

More results on the distance (signless) Laplacian eigenvalues of graphs

Structured Image Classification from Conditional Random Field with Deep Class Embedding

Generative Partition Networks for Multi-Person Pose Estimation

Learning Semantic Relatedness From Human Feedback Using Metric Learning

The Do’s and Don’ts for CNN-based Face Verification

Direct Search Methods on Reductive Homogeneous Spaces

Sketched Answer Set Programming

On uniquely k-list colorable planar graphs, graphs on surfaces, and regular graphs

Powerful sets: a generalisation of binary matroids

Exponential Capacity in an Autoencoder Neural Network with a Hidden Layer

Parallel Streaming Wasserstein Barycenters

Additive Combinatorics: A Menu of Research Problems

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Griffiths Singularities in the Random Quantum Ising Antiferromagnet: A Tree Tensor Network Renormalization Group Study

A spatial epidemic model with site contamination

Image Segmentation by Iterative Inference from Conditional Score Estimation

Monotonicity of average return probabilities for random walks in random environments

Analytical Methods and Field Theory for Disordered Systems

Experience enrichment based task independent reward model

Spatially Controlled Relay Beamforming: $2$-Stage Optimal Policies

Improved Algorithms for Matrix Recovery from Rank-One Projections

Nonautonomous Young differential equations revisited

Nice latent variable models have log-rank

On Stackelberg Mixed Strategies

Report of the HPC Correctness Summit, Jan 25–26, 2017, Washington, DC

Parallel and in-process compilation of individuals for genetic programming on GPU

On some polynomials and series of Bloch-Polya Type

A truncation model for estimating Species Richness

Sandpile Groups of Random Bipartite Graphs

Classification and Retrieval of Digital Pathology Scans: A New Dataset

Testing Degree Corrections in Stochastic Block Models

Testing hypotheses on a tree: new error rates and controlling strategies

Classification of toric manifolds over an $n$-cube with one vertex cut

Corrupted Sensing with Sub-Gaussian Measurements

A Note on the Information-Theoretic-(in)Security of Fading Generated Secret Keys

Random walks among time increasing conductances: heat kernel estimates

Integrable structure of products of finite complex Ginibre random matrices

On the Phase Transition of Corrupted Sensing

An Overview of Massive MIMO Research at the University of Bristol

Categorical relations between Langlands dual quantum affine algebras: Doubly laced types

Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks

A note on the number of edges in a Hamiltonian graph with no repeated cycle length

Parameterized Complexity of the List Coloring Reconfiguration Problem with Graph Parameters

Subgradients of Minimal Time Functions without Calmness

Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network

Note on Evolution and Forecasting of Requirements: Communications Example

Detection Estimation and Grid matching of Multiple Targets with Single Snapshot Measurements

Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent

Learning to Rank Using Localized Geometric Mean Metrics

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Hypergroups derived from random walks on some infinite graphs

Many-Body-Localization : Strong Disorder perturbative approach for the Local Integrals of Motion

Self-Fulfilling Signal of an Endogenous State in Network Congestion Games

Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

Semiparametric Efficient Empirical Higher Order Influence Function Estimators

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

Learning Robust Object Recognition Using Composed Scenes from Generative Models

Rao-Blackwellized Particle Smoothing as Message Passing

Classification Using Proximity Catch Digraphs (Technical Report)

Multi-output Polynomial Networks and Factorization Machines

External powers of tensor products as representations of general linear groups

A copula approach for dependence modeling in multivariate nonparametric time series

Deep Reinforcement Learning with Relative Entropy Stochastic Search

View-Invariant Recognition of Action Style Self-Dissimilarity

Nonconvex homogenization for one-dimensional controlled random walks in random potential

Construction of strongly regular Cayley graphs based on three-valued Gauss periods

Computer vision-based food calorie estimation: dataset, method, and experiment

Resonant Near-Field Effects in Photonic Glasses

On the Efficient Simulation of the Left-Tail of the Sum of Correlated Log-normal Variates

Dynamics Based 3D Skeletal Hand Tracking

Speed and fluctuations for some driven dimer models

From optimal transport to generative modeling: the VEGAN cookbook

Near-feasible stable matchings with budget constraints

An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems

The Widom-Rowlinson Model on the Delaunay Graph

ReFACTor: Practical Low-Rank Matrix Estimation Under Column-Sparsity

Streaming Binary Sketching based on Subspace Tracking and Diagonal Uniformization

LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks

CayleyNets: Graph Convolutional Neural Networks with Complex Rational Spectral Filters

An affine scaling method using a class of differential barrier functions

A decoding algorithm for Twisted Gabidulin codes

Quantitative stochastic homogenization and regularity theory of parabolic equations

A Linear-Time Kernel Goodness-of-Fit Test

Individualized Risk Prognosis for Critical Care Patients: A Multi-task Gaussian Process Model

Clique-Width for Graph Classes Closed under Complementation

Controllability of evolution equations with memory

An Inexact Newton-like conditional gradient method for constrained nonlinear systems

Semantic Softmax Loss for Zero-Shot Learning

Compressed Sensing with Prior Information via Maximizing Correlation

A Regularized Framework for Sparse and Structured Neural Attention

An Out-of-the-box Full-network Embedding for Convolutional Neural Networks

On the Convergence of the Accelerated Riccati Iteration Method

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

An Asynchronous Distributed Framework for Large-scale Learning Based on Parameter Exchanges

VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning

Exact Recovery with Symmetries for the Doubly-Stochastic Relaxation

Learning to Associate Words and Images Using a Large-scale Graph

Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling

Follow the Signs for Robust Stochastic Optimization

Robust Localized Multi-view Subspace Clustering

An improvement of the asymptotic Elias bound for non-binary codes

Backprop without Learning Rates Through Coin Betting

A unified view of entropy-regularized Markov decision processes

Right-sided multifractal spectra indicate small-worldness in networks

Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Information-theoretic analysis of generalization capability of learning algorithms

Sparse hierarchical interaction learning with epigraphical projection

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Regularizing deep networks using efficient layerwise adversarial training

On deep holes of generalized projective Reed-Solomon codes

Comparing the Finite-Time Performance of Simulation-Optimization Algorithms

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

Stabilizing GAN Training with Multiple Random Projections

Concrete Dropout

Counting De Bruijn sequences as perturbations of linear recursions

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels

Machine-learning-assisted correction of correlated qubit errors in a topological code

Real Time Image Saliency for Black Box Classifiers

On-the-fly Operation Batching in Dynamic Computation Graphs

Symmetry Breaking in the Congest Model: Time- and Message-Efficient Algorithms for Ruling Sets

Finite Blocklength Rates over a Fading Channel with CSIT and CSIR

SmartPaste: Learning to Adapt Source Code

Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks

A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U. S. Government

A unified approach to interpreting model predictions

The effects of noise and time delay on the synchronization of the Kuramoto model in small-world networks

Newton polytopes and symmetric Grothendieck polynomials

Block building programming for symbolic regression

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Reducing Reparameterization Gradient Variance

Dynamic Partition of Complex Networks

Facial Affect Estimation in the Wild Using Deep Residual and Convolutional Networks