Deep Learning: A Bayesian Perspective

Deep learning is a form of machine learning for nonlinear high dimensional data reduction and prediction. A Bayesian probabilistic perspective provides a number of advantages. Specifically statistical interpretation and properties, more efficient algorithms for optimisation and hyper-parameter tuning, and an explanation of predictive performance. Traditional high-dimensional statistical techniques; principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are shown to be shallow learners. Their deep learning counterparts exploit multiple layers of of data reduction which leads to performance gains. Stochastic gradient descent (SGD) training and optimisation and Dropout (DO) provides model and variable selection. Bayesian regularization is central to finding networks and provides a framework for optimal bias-variance trade-off to achieve good out-of sample performance. Constructing good Bayesian predictors in high dimensions is discussed. To illustrate our methodology, we provide an analysis of first time international bookings on Airbnb. Finally, we conclude with directions for future research.

Selective Inference for Multi-Dimensional Multiple Change Point Detection

We consider the problem of multiple change point (CP) detection from a multi-dimensional sequence. We are mainly interested in the situation where changes are observed only in a subset of multiple dimensions at each CP. In such a situation, we need to select not only the time points but also the dimensions where changes actually occur. In this paper we study a class of multi-dimensional multiple CP detection algorithms for this task. Our main contribution is to introduce a statistical framework for controlling the false detection probability of these class of CP detection algorithms. The key idea is to regard a CP detection problem as a {\it selective inference} problem, and derive the sampling distribution of the test statistic under the condition that those CPs are detected by applying the algorithm to the data. By using an analytical tool recently developed in the selective inference literature, we show that, for a wide class of multi-dimensional multiple CP detection algorithms, it is possible to exactly (non-asymptotically) control the false detection probability at the desired significance level.

Latent Attention Networks

Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to such effective behaviors or, more critically, failure modes. In this work, we present a general method for visualizing an arbitrary neural network’s inner mechanisms and their power and limitations. Our dataset-centric method produces visualizations of how a trained network attends to components of its inputs. The computed ‘attention masks’ support improved interpretability by highlighting which input attributes are critical in determining output. We demonstrate the effectiveness of our framework on a variety of deep neural network architectures in domains from computer vision, natural language processing, and reinforcement learning. The primary contribution of our approach is an interpretable visualization of attention that provides unique insights into the network’s underlying decision-making process irrespective of the data modality.

