The (Un)reliability of saliency methods

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step —adding a constant shift to the input data— to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

Structured Generative Adversarial Networks

We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately control the semantics of generated samples. We propose structured generative adversarial networks (SGANs) for semi-supervised conditional generative modeling. SGAN assumes the data x is generated conditioned on two independent latent variables: y that encodes the designated semantics, and z that contains other factors of variation. To ensure disentangled semantics in y and z, SGAN builds two collaborative games in the hidden space to minimize the reconstruction error of y and z, respectively. Training SGAN also involves solving two adversarial games that have their equilibrium concentrating at the true joint data distributions p(x, z) and p(x, y), avoiding distributing the probability mass diffusely over data space that MLE-based methods may suffer. We assess SGAN by evaluating its trained networks, and its performance on downstream tasks. We show that SGAN delivers a highly controllable generator, and disentangled representations; it also establishes start-of-the-art results across multiple datasets when applied for semi-supervised image classification (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and CIFAR-10 using 50, 1000 and 4000 labels, respectively). Benefiting from the separate modeling of y and z, SGAN can generate images with high visual quality and strictly following the designated semantic, and can be extended to a wide spectrum of applications, such as style transfer.

Conditional fiducial models

The fiducial is not unique in general, but we prove that in a restricted class of models it is uniquely determined by the sampling distribution of the data. It depends in particular not on the choice of a data generating model. The arguments lead to a generalization of the classical formula found by Fisher (1930). The restricted class includes cases with discrete distributions, the case of the shape parameter in the Gamma distribution, and also the case of the correlation coefficient in a bivariate Gaussian model. One of the examples can also be used in a pedagogical context to demonstrate possible difficulties with likelihood-, Bayesian-, and bootstrap-inference. Examples that demonstrate non-uniqueness are also presented. It is explained that they can be seen as cases with restrictions on the parameter space. Motivated by this the concept of a conditional fiducial model is introduced. This class of models includes the common case of iid samples from a one-parameter model investigated by Hannig (2013), the structural group models investigated by Fraser (1968), and also certain models discussed by Fisher (1973) in his final writing on the subject.

Neural Discrete Representation Learning

Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of ‘posterior collapse’ — where the latents are ignored when they are paired with a powerful autoregressive decoder — typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.

A Classification-Based Perspective on GAN Distributions

A fundamental, and still largely unanswered, question in the context of Generative Adversarial Networks (GANs) is whether GANs are actually able to capture the key characteristics of the datasets they are trained on. The current approaches to examining this issue require significant human supervision, such as visual inspection of sampled images, and often offer only fairly limited scalability. In this paper, we propose new techniques that employ a classification-based perspective to evaluate synthetic GAN distributions and their capability to accurately reflect the essential properties of the training data. These techniques require only minimal human supervision and can easily be scaled and adapted to evaluate a variety of state-of-the-art GANs on large, popular datasets. Our analysis indicates that GANs have significant problems in reproducing the more distributional properties of the training dataset. In particular, the diversity of such synthetic data is orders of magnitude smaller than that of the true data.

SPARK: Static Program Analysis Reasoning and Retrieving Knowledge

Program analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that were deduced by our machine learning pipeline. The machine pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an Extractor that converts an RNN to symbolic rules. To evaluate the quality of the learned symbolic rules, we propose a sampling-based similarity measurement between two infinite regular languages. We conduct a case study using real-world data. In this work, we discuss the limitations of existing techniques and possible improvements in the future. The results show that with sufficient training data and a fair distribution of program paths it is feasible to deducing symbolic security rules for the OpenJDK library with millions lines of code.

PS-DBSCAN: An Efficient Parallel DBSCAN Algorithm Based on Platform Of AI (PAI)

We present PS-DBSCAN, a communication efficient parallel DBSCAN algorithm that combines the disjoint-set data structure and Parameter Server framework in Platform of AI (PAI). Since data points within the same cluster may be distributed over different workers which result in several disjoint-sets, merging them incurs large communication costs. In our algorithm, we employ a fast global union approach to union the disjoint-sets to alleviate the communication burden. Experiments over the datasets of different scales demonstrate that PS-DBSCAN outperforms the PDSDBSCAN with 2-10 times speedup on communication efficiency. We have released our PS-DBSCAN in an algorithm platform called Platform of AI (PAI – ) in Alibaba Cloud. We have also demonstrated how to use the method in PAI.

Accountability of AI Under the Law: The Role of Explanation

The ubiquity of systems using artificial intelligence or ‘AI’ has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before—applications range from clinical decision support to autonomous driving and predictive policing. That said, there exist legitimate concerns about the intentional and unintentional negative consequences of AI systems. There are many ways to hold AI systems accountable. In this work, we focus on one: explanation. Questions about a legal right to explanation from AI systems was recently debated in the EU General Data Protection Regulation, and thus thinking carefully about when and how explanation from AI systems might improve accountability is timely. In this work, we review contexts in which explanation is currently required under the law, and then list the technical considerations that must be considered if we desired AI systems that could provide kinds of explanations that are currently required of humans.

A mathematical framework for graph signal processing of time-varying signals

We propose a general framework from which to understand the design of filters for time-series signals supported on graphs. We organize linear, time-invariant filters into three increasingly restrictive classes of operators: linear time-invariant filters, linear time-invariant filters which commute with a graph operator, and linear time-invariant filters which are functions of a graph operator. Using spectral theory, we show that these yield \mathcal{O}(n^2), \mathcal{O}(n), and \mathcal{O}(1) design parameters respectively. We consider arbitrary graph operators as to accommodate non-self-adjoint weight operators and all classes of graph Laplacian-based operators. We provide an example application of each class of filter.

Metrics for Deep Generative Models

Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source—the latent space—to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits.

ResBinNet: Residual Binary Neural Network

Recent efforts on training light-weight binary neural networks offer promising execution/memory efficiency. This paper introduces ResBinNet, which is a composition of two interlinked methodologies aiming to address the slow convergence speed and limited accuracy of binary convolutional neural networks. The first method, called residual binarization, learns a multi-level binary representation for the features within a certain neural network layer. The second method, called temperature adjustment, gradually binarizes the weights of a particular layer. The two methods jointly learn a set of soft-binarized parameters that improve the convergence rate and accuracy of binary neural networks. We corroborate the applicability and scalability of ResBinNet by implementing a prototype hardware accelerator. The accelerator is reconfigurable in terms of the numerical precision of the binarized features, offering a trade-off between runtime and inference accuracy.

Lifelong Learning by Adjusting Priors

In representational lifelong learning an agent aims to continually learn to solve novel tasks while updating its representation in light of previous tasks. Under the assumption that future tasks are ‘related’ to previous tasks, representations should be learned in such a way that they capture the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of a new task. We develop a framework for lifelong learning in deep neural networks that is based on generalization bounds, developed within the PAC-Bayes framework. Learning takes place through the construction of a distribution over networks based on the tasks seen so far, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting a history-dependent prior for novel tasks. We develop a gradient-based algorithm implementing these ideas, based on minimizing an objective function motivated by generalization bounds, and demonstrate its effectiveness through numerical examples. In addition to establishing the improved performance available through lifelong learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network.

Distributed Graph Clustering and Sparsification

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation, and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest, and has wide applications for processing big datasets. In this paper we present a simple and distributed algorithm for graph clustering: for a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds, and recovers a partition of the graph close to optimal. One of the main components behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the cluster-structure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this component is easy to implement in a distributed way and runs fast in practice.

Bayesian latent Gaussian graphical models for mixed data with marginal prior information
On the Steady State of Continuous Time Stochastic Opinion Dynamics
Correcting Nuisance Variation using Wasserstein Distance
An algebraic formulation of the locality principle in renormalisation
Set-to-Set Hashing with Applications in Visual Recognition
Multivariate stochastic integrals with respect to independently scattered random measures on δ-rings
Balas formulation for the union of polytopes is optimal
Multi-Mention Learning for Reading Comprehension with Neural Cascades
Moderate maximal inequalities for the Ornstein-Uhlenbeck process
Acceleration of tensor-product operations for high-order finite element methods
Sparse-View X-Ray CT Reconstruction Using $\ell_1$ Prior with Learned Transform
Variance-Aware Optimal Power Flow
Feynman-Kac formula for the stochastic Bessel operator
Weight-Based Variable Ordering in the Context of High-Level Consistencies
Does Phase Matter For Monaural Source Separation?
Binary Bouncy Particle Sampler
A Comparison of Feature-Based and Neural Scansion of Poetry
Deep Air Learning: Interpolation, Prediction, and Feature Analysis of Fine-grained Air Quality
Monotone bargaining is Nash-solvable
Deep Active Learning over the Long Tail
Learning Linear Dynamical Systems via Spectral Filtering
Selective inference for the problem of regions via multiscale bootstrap
Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting
Automatic Query Image Disambiguation for Content-Based Image Retrieval
Running Time Analysis of the (1+1)-EA for OneMax and LeadingOnes under Bit-wise Noise
Energy-Delay Efficient Power Control in Wireless Networks
The Computational Complexity of Finding Separators in Temporal Graphs
Deep Reinforcement Learning for Resource Allocation in V2V Communications
The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques
From which world is your graph?
Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs
Partial correlation graphs and the neighborhood lattice
The quasi principal rank characteristic sequence
AxonDeepSeg: automatic axon and myelin segmentation from microscopy data using convolutional neural networks
In-Bed Pose Estimation: Deep Learning with Shallow Dataset
Towards Neural Machine Translation with Partially Aligned Corpora
Wireless Network Simplification: The Performance of Routing
Cost-Efficient and Robust On-Demand Video Transcoding Using Heterogeneous Cloud Services
Stationary Harmonic Measure and DLA in the Upper half Plane
Genetic Policy Optimization
On sets of zero stationary harmonic measure
A Simply Exponential Upper Bound on the Maximum Number of Stable Matchings
Sparsity, variance and curvature in multi-armed bandits
A Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical Supervision
Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing
Rainbow saturation and graph capacities
Dual Language Models for Code Mixed Speech Recognition
Optimal Pricing-Based Edge Computing Resource Management in Mobile Blockchain
A Socially-Aware Incentive Mechanism for Mobile Crowdsensing Service Market
Shadow Tomography of Quantum States
Competition and Cooperation Analysis for Data Sponsored Market: A Network Effects Model
Divisor graph of complement of Gamma(R)
On Automata Recognizing Birecurrent Sets
Multi-Glimpse LSTM with Color-Depth Feature Fusion for Human Detection
Existence and uniqueness for Mean Field Games with state constraints
Edge precoloring extension of hypercubes
Compressing Word Embeddings via Deep Compositional Code Learning
Moving Block and Tapered Block Bootstrap for Functional Time Series with an Application to the K-Sample Mean Problem
Restricted extension of sparse partial edge colorings of hypercubes
The Minimum Distance of Some Narrow-Sense Primitive BCH Codes
On the Capacity of SWIPT Systems with a Nonlinear Energy Harvesting Circuit
Estimation of Zipf parameter by means of a sequence of counts of different words
k-server via multiscale entropic regularization
Metric-locating-dominating partitions in graphs
Stratified exponential integrator for modulated nonlinear Schrödinger equations
Cost-Optimal Operation of Energy Storage Units: Impact of Uncertainties and Robust Estimator
$Ω$-Net: Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks
Ubiquity of macroscopic chaos in balanced networks of spiking neurons
One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis
On determinantal ideals and algebraic dependence
A Rudimentary Model for Low-Latency Anonymous Communication Systems
Motion Artifact Detection in Confocal Laser Endomicroscopy Images
End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Spintronics based Stochastic Computing for Efficient Bayesian Inference System
Structured Variational Inference for Coupled Gaussian Processes
Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence under Bregman Distance Growth Conditions
Exact controllability of stochastic differential equations with multiplicative noise
The Varchenko Determinant of a Coxeter Arrangement
Learning Filterbanks from Raw Speech for Phone Recognition
The Bane of Low-Dimensionality Clustering
Invariance entropy for a class of partially hyperbolic sets
Optimal actuator design based on shape calculus
Vertices with the Second Neighborhood Property in Eulerian Digraphs
New Bounds on the Biplanar Crossing Number of Low-dimensional Hypercubes
Convolutional Drift Networks for Video Classification
Robust Decoding from 1-Bit Compressive Sampling with Least Squares
Lonely runners in function fields
Background Subtraction via Fast Robust Matrix Completion
Toward real-time data query systems in HEP
A Tight Approximation for Fully Dynamic Bin Packing without Bundling
Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
Bayesian Nonparametric Mixed Effects Models in Microbiome Data Analysis
Degree-regular triangulations of surfaces
Distributed Unmixing of Hyperspectral Data With Sparsity Constraint
The Robustness of LWPP and WPP, with an Application to Graph Reconstruction
Renormalization Methods for Random Walk in a Strong Mixing Environment: Kalikow and T Conditions Reloaded