Deep Learning: A Bayesian Perspective

Deep learning is a form of machine learning for nonlinear high dimensional data reduction and prediction. A Bayesian probabilistic perspective provides a number of advantages. Specifically statistical interpretation and properties, more efficient algorithms for optimisation and hyper-parameter tuning, and an explanation of predictive performance. Traditional high-dimensional statistical techniques; principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are shown to be shallow learners. Their deep learning counterparts exploit multiple layers of of data reduction which leads to performance gains. Stochastic gradient descent (SGD) training and optimisation and Dropout (DO) provides model and variable selection. Bayesian regularization is central to finding networks and provides a framework for optimal bias-variance trade-off to achieve good out-of sample performance. Constructing good Bayesian predictors in high dimensions is discussed. To illustrate our methodology, we provide an analysis of first time international bookings on Airbnb. Finally, we conclude with directions for future research.


Selective Inference for Multi-Dimensional Multiple Change Point Detection

We consider the problem of multiple change point (CP) detection from a multi-dimensional sequence. We are mainly interested in the situation where changes are observed only in a subset of multiple dimensions at each CP. In such a situation, we need to select not only the time points but also the dimensions where changes actually occur. In this paper we study a class of multi-dimensional multiple CP detection algorithms for this task. Our main contribution is to introduce a statistical framework for controlling the false detection probability of these class of CP detection algorithms. The key idea is to regard a CP detection problem as a {\it selective inference} problem, and derive the sampling distribution of the test statistic under the condition that those CPs are detected by applying the algorithm to the data. By using an analytical tool recently developed in the selective inference literature, we show that, for a wide class of multi-dimensional multiple CP detection algorithms, it is possible to exactly (non-asymptotically) control the false detection probability at the desired significance level.


Latent Attention Networks

Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to such effective behaviors or, more critically, failure modes. In this work, we present a general method for visualizing an arbitrary neural network’s inner mechanisms and their power and limitations. Our dataset-centric method produces visualizations of how a trained network attends to components of its inputs. The computed ‘attention masks’ support improved interpretability by highlighting which input attributes are critical in determining output. We demonstrate the effectiveness of our framework on a variety of deep neural network architectures in domains from computer vision, natural language processing, and reinforcement learning. The primary contribution of our approach is an interpretable visualization of attention that provides unique insights into the network’s underlying decision-making process irrespective of the data modality.


Tensor Contraction Layers for Parsimonious Deep Nets

The Entropy Power Inequality with quantum memory

Provenance Filtering for Multimedia Phylogeny

NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

F-index of graphs based on four operations related to the lexicographic product

Machine Assisted Analysis of Vowel Length Contrasts in Wolof

Function Assistant: A Tool for NL Querying of APIs

Monodromy in Kazhdan-Lusztig cells in affine type A

The Mixing method: coordinate descent for low-rank semidefinite programming

Simplices for Numeral Systems

Classical properties of algebras using a new graph association

Randomized Constraints Consensus for Distributed Robust Linear Programming

Optimal paths on the road network as directed polymers

Personalized Pancreatic Tumor Growth Prediction via Group Learning

Generic Secure Repair for Distributed Storage

Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks

Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling

Morphological Embeddings for Named Entity Recognition in Morphologically Rich Languages

A Vision System for Multi-View Face Recognition

Authorship Verification based on Compression-Models

CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

Knowledge Representation in Bicategories of Relations

Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks

Generalized non-crossing Partitions and Buildings

Integrated Deep and Shallow Networks for Salient Object Detection

PixelGAN Autoencoders

Bias-Variance Tradeoff of Graph Laplacian Regularizer

On the complexity of k-rainbow cycle colouring problems

On Unifying Deep Generative Models

SAR Image Despeckling Using a Convolutional

Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification

Disordered BKT transition and superinsulation

Recursive Cross-Domain Face/Sketch Generation from Limited Facial Parts

Probabilistic aspects of the theory of vertex algebras

Higher-order meshing of implicit geometries – part I: Integration and interpolation in cut elements

Exception-Based Knowledge Updates

Learning-based Surgical Workflow Detection from Intra-Operative Signals

Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

An efficient global optimization algorithm for maximizing the sum of two generalized Rayleigh quotients

Image Restoration from Patch-based Compressed Sensing Measurement

Dynamic Steerable Blocks in Deep Residual Networks

On a Global Objective Prior from Score Rules

Quantum key distribution protocol with pseudorandom bases

Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Facies classification from well logs using an inception convolutional network

Exploring the complexity of layout parameters in tournaments and semi-complete digraphs

Dual-reference Face Retrieval: What Does He/She Look Like at Age `X’?

An adaptive Newton algorithm for optimal control problems with application to optimal electrode design

Robust Deep Learning via Reverse Cross-Entropy Training and Thresholding Test

Exploiting Multiple-Antenna Techniques for Non-Orthogonal Multiple Access

WiFi based trajectory alignment, calibration and easy site survey using smart phones and foot-mounted IMUs

Joint Matrix-Tensor Factorization for Knowledge Base Inference

ICABiDAS: Intuition Centred Architecture for Big Data Analysis and Synthesis

Improved high-dimensional prediction with Random Forests by the use of co-data

Complete solution of an optimization problem in tropical semifield

Isometries of almost-Riemannian structures on Lie groups

Computer aided synthesis: a game theoretic approach

Tyler shape depth

The role of asymptotic functions in network optimization and feasibility studies

Testing Gaussian Process with Applications to Super-Resolution

Weight Sharing is Crucial to Succesful Optimization

Fast approximate Bayesian inference for stable differential equation models

Vertex-disjoint cycles in tournaments

An improved Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves

Hashtag-centric Immersive Search on Social Media

Temporal Action Labeling using Action Sets

Characterization of quadratic Cauchy-Stieltjes Kernel families by orthogonality of polynomials

Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

Shalom’s property $H_{\mathrm{FD}}$ and extensions by $\mathbb{Z}$ of locally finite groupsJérémie Brieussel and Tianyi Zheng

Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?

Automating Carotid Intima-Media Thickness Video Interpretation with Convolutional Neural Networks

Millimeter Wave LOS Coverage Enhancements with Coordinated High-Rise Access Points

Parameter identification in Markov chain choice models

Stochastic Model Predictive Control: Output-Feedback, Duality and Guaranteed Performance

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

Long range dependence of heavy tailed random functions

Conjecture $\mathcal{O}$ holds for the odd symplectic Grassmannian

Shuffle-compatible permutation statistics

On the fourth moment condition for Rademacher chaos

Double-Edge Factor Graphs: Definition, Properties, and Examples

Learning Bayes networks using interventional path queries in polynomial time and sample complexity

Hyperparameter Optimization: A Spectral Approach

Temporal Task Planning and Intermittent Communication Control of Mobile Robot Networks

Advertisements