Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

As part of a complete software stack for autonomous driving, NVIDIA has created a neural-network-based system, known as PilotNet, which outputs steering angles given images of the road ahead. PilotNet is trained using road images paired with the steering angles generated by a human driving a data-collection car. It derives the necessary domain knowledge by observing human drivers. This eliminates the need for human engineers to anticipate what is important in an image and foresee all the necessary rules for safe driving. Road tests demonstrated that PilotNet can successfully perform lane keeping in a wide variety of driving conditions, regardless of whether lane markings are present or not. The goal of the work described here is to explain what PilotNet learns and how it makes its decisions. To this end we developed a method for determining which elements in the road image most influence PilotNet’s steering decision. Results show that PilotNet indeed learns to recognize relevant objects on the road. In addition to learning the obvious features such as lane markings, edges of roads, and other cars, PilotNet learns more subtle features that would be hard to anticipate and program by engineers, for example, bushes lining the edge of the road and atypical vehicle classes.

Deep Text Classification Can be Fooled

Deep neural networks (DNNs) play a key role in many applications. Current studies focus on crafting adversarial samples against DNN-based image classifiers by introducing some imperceptible perturbations to the input. However, DNNs for natural language processing have not got the attention they deserve. In fact, the existing perturbation algorithms for images cannot be directly applied to text. This paper presents a simple but effective method to attack DNN-based text classifiers. Three perturbation strategies, namely insertion, modification, and removal, are designed to generate an adversarial sample for a given text. By computing the cost gradients, what should be inserted, modified or removed, where to insert and how to modify are determined effectively. The experimental results show that the adversarial samples generated by our method can successfully fool a state-of-the-art model to misclassify them as any desirable classes without compromising their utilities. At the same time, the introduced perturbations are difficult to be perceived. Our study demonstrates that DNN-based text classifiers are also prone to the adversarial sample attack.

Topically Driven Neural Language Model

Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics.

Estimating Random-X Prediction Error of Regression Models

The areas of model selection and model evaluation for predictive modeling have received extensive treatment in the statistics literature, leading to both theoretical advances and practical methods based on covariance penalties and other approaches. However, the majority of this work, and especially the practical approaches, are based on the ‘Fixed-X assumption’, where covariate values are assumed to be non-random and known. By contrast, in most modern predictive modeling applications, it is more reasonable to take the ‘Random-X’ view, where future prediction points are random and new. In this paper we concentrate on examining the applicability of the covariance-penalty approaches to this problem. We propose a decomposition of the Random-X prediction error that clarifies the additional error due to Random-X, which is present in both the variance and bias components of the error. This decomposition is general, but we focus on its application to the fundamental case of least squares regression. We show how to quantify the excess variance under some assumptions using standard random-matrix results, leading to a covariance penalty approach we term RCp. When the variance of the error is unknown, using the standard unbiased estimate leads to an approach we term \hat{RCp}, which is closely related to existing methods MSEP and GCV. To account for excess bias, we propose to take only the bias component of the ordinary cross validation (OCV) estimate, resulting in a hybrid penalty we term RCp^+. We demonstrate by theoretical analysis and simulations that this approach is consistently superior to OCV, although the difference is typically small.

A Generalization of Convolutional Neural Networks to Graph-Structured Data

This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.

Missing Data and Prediction

Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern mixture kernel submodels (PMKS) – a series of submodels for every missing data pattern that are fit using only data from that pattern – are a computationally efficient remedy for both stages. Here we show that PMKS yield the most predictive algorithm among all standard missing data strategies. Specifically, we show that the expected loss of a forecasting algorithm is minimized when each pattern-specific loss is minimized. Simulations and a re-analysis of the SUPPORT study confirms that PMKS generally outperforms zero-imputation, mean-imputation, complete-case analysis, complete-case submodels, and even multiple imputation (MI). The degree of improvement is highly dependent on the missingness mechanism and the effect size of missing predictors. When the data are Missing at Random (MAR) MI can yield comparable forecasting performance but generally requires a larger computational cost. We see that predictions from the PMKS are equivalent to the limiting predictions for a MI procedure that uses a mean model dependent on missingness indicators (the MIMI model). Consequently, the MIMI model can be used to assess the MAR assumption in practice. The focus of this paper is on out-of-sample prediction behavior, implications for model inference are only briefly explored.

Accelerating Stochastic Gradient Descent

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d’Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.

Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts

Spatial disease mapping using Directed Acyclic Graph Auto-Regressive (DAGAR) models

A structure theorem for product sets in extra special groups

Sub-string/Pattern Matching in Sub-linear Time Using a Sparse Fourier Transform Approach

Pre-computed Liquid Spaces with Generative Neural Networks and Optical Flow

Multi-View Dynamic Facial Action Unit Detection

Robust Estimators and Test-Statistics for One-Shot Device Testing Under the Exponential Distribution

New robust statistical procedures for polytomous logistic regression models

Best finite constrained approximations of one-dimensional probabilities

Weak Convergence of Stationary Empirical Processes

Automatic Compositor Attribution in the First Folio of Shakespeare

Generalized subspace subcodes with application in cryptology

Models of fault-tolerant distributed computation via dynamic epistemic logic

Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks

Diffeomorphic random sampling using optimal information transport

Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

From Uncoded Prefetching to Coded Prefetching in Coded Caching

A bijection between the set of nesting-similarity classes and L & P matchings

Prediction and Inference with Missing Data in Patient Alert Systems

List colourings of multipartite hypergraphs

Adaptive Cost Function for Pointcloud Registration

Clusters’ size-degree distribution for bond percolation

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

A Robust Utility Learning Framework via Inverse Optimization

An ensemble-based online learning algorithm for streaming data

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

Spatio-temporal Person Retrieval via Natural Language Queries

Conditioning your quantile function

Structured Production System (extended abstract)

Uplink performance of multi-antenna cellular networks with co-operative base stations and user-centric clustering

Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization

Unsupervised Geometric Learning of Hyperspectral Images

Probabilistic Existence of Large Sets of Designs

Anisotropic twicing for single particle reconstruction using autocorrelation analysis

A Flexible Framework for Hypothesis Testing in High-dimensions

Smoothed nonparametric two-sample tests

On Improving Deep Reinforcement Learning for POMDPs

Exact Algorithms via Multivariate Subroutines

Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization

Stochastic Orthant-Wise Limited-Memory Quasi-Newton Method

Joint Hybrid Precoder and Combiner Design for mmWave Spatial Multiplexing Transmission

High-Dimensional Variable Selection and Prediction under Competing Risks with Application to SEER-Medicare Linked Data

Hybrid Precoder and Combiner Design with One-Bit Quantized Phase Shifters in mmWave MIMO Systems

Secure Precise Wireless Transmission with Random-Subcarrier-Selection-based Directional Modulation Transmit Antenna Array

Iterative Hybrid Precoder and Combiner Design for mmWave MIMO-OFDM Systems

A Framework for Algorithm Stability

A second-order stochastic maximum principle for generalized mean-field control problem

Transmit Filter and Artificial Noise Design for Secure MIMO-OFDM Systems

Replica Symmetry Breaking in Compressive Sensing

A new method of joint nonparametric estimation of probability density and its support

Optimal location of resources for biased movement of species: the 1D case

Measurement Matrix Design for Phase Retrieval Based on Mutual Information

Ranking in evolving complex networks

Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database

Airway segmentation from 3D chest CT volumes based on volume of interest using gradient vector flow

Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading

Filmor Theorem for integers

The loss surface of deep and wide neural networks

Canonical RDEs and general semimartingales as rough paths

Local $h$-vectors of Quasi-Geometric and Barycentric Subdivisions

Riemannian Optimization for Skip-Gram Negative Sampling

Beyond the network of plants volatile organic compounds

SphereFace: Deep Hypersphere Embedding for Face Recognition

Systematizing Decentralization and Privacy: Lessons from 15 years of research and deployments

Bootstrap-Based Inference for Cube Root Consistent Estimators

Exploiting random projections and sparsity with random forests and gradient boosting methods — Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity

Understanding the Feedforward Artificial Neural Network Model From the Perspective of Network Flow

AutoDIAL: Automatic DomaIn Alignment Layers

Stochastic Quasi-Fejér Block-Coordinate Fixed Point Iterations With Random Sweeping II: Mean-Square and Linear Convergence

Constraint-based inverse modeling of metabolic networks: a proof of concept

Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts

A Faster Patch Ordering Method for Image Denoising

A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation

Hybrid Procoder and Combiner Design for Secure Transmission in mmWave MIMO Systems

Triangle-free graphs that do not contain an induced subdivision of $K_4$ are 3-colorable

Misdirected Registration Uncertainty

Improved Algorithms for Computing the Cycle of Minimum Cost-to-Time Ratio in Directed Graphs

Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network

Sudoku Rectangle Completion

Compact Descriptors for Video Analysis: the Emerging MPEG Standard

Coverage and Rate Analysis of Super Wi-Fi Networks Using Stochastic Geometry

Perpetual integrals convergence and extinctions in population dynamics

Quadratically-Regularized Optimal Transport on Graphs

Cooling-Rate Effects in Sodium Silicate Glasses: Bridging the Gap between Molecular Dynamics Simulations and Experiments

New region force for variational models in image segmentation and high dimensional data clustering

Punny Captions: Witty Wordplay in Image Descriptions

Density solutions to a class of integro-differential equations

Generalized G-estimation and Model Selection

Estimating the coefficients of a mixture of two linear regressions by expectation maximization

Optimal excess-of-loss reinsurance and investment problem for an insurer with default risk under a stochastic volatility model

Decremental Data Structures for Connectivity and Dominators in Directed Graphs

Gravitational allocation for uniform points on the sphere

Exploring Application Performance on Emerging Hybrid-Memory Supercomputers

Experimental Two-dimensional Quantum Walk on a Photonic Chip

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

Idle Period Propagation in Message-Passing Applications

Relative Error Tensor Low Rank Approximation