On The Robustness of a Neural Network

With the development of neural networks based machine learning and their usage in mission critical applications, voices are rising against the \textit{black box} aspect of neural networks as it becomes crucial to understand their limits and capabilities. With the rise of neuromorphic hardware, it is even more critical to understand how a neural network, as a distributed system, tolerates the failures of its computing nodes, neurons, and its communication channels, synapses. Experimentally assessing the robustness of neural networks involves the quixotic venture of testing all the possible failures, on all the possible inputs, which ultimately hits a combinatorial explosion for the first, and the impossibility to gather all the possible inputs for the second. In this paper, we prove an upper bound on the expected error of the output when a subset of neurons crashes. This bound involves dependencies on the network parameters that can be seen as being too pessimistic in the average case. It involves a polynomial dependency on the Lipschitz coefficient of the neurons activation function, and an exponential dependency on the depth of the layer where a failure occurs. We back up our theoretical results with experiments illustrating the extent to which our prediction matches the dependencies between the network parameters and robustness. Our results show that the robustness of neural networks to the average crash can be estimated without the need to neither test the network on all failure configurations, nor access the training set used to train the network, both of which are practically impossible requirements.

A Unified Joint Matrix Factorization Framework for Data Integration

Nonnegative matrix factorization (NMF) is a powerful tool in data exploratory analysis by discovering the hidden features and part-based patterns from high-dimensional data. NMF and its variants have been successfully applied into diverse fields such as pattern recognition, signal processing, data mining, bioinformatics and so on. Recently, NMF has been extended to analyze multiple matrices simultaneously. However, a unified framework is still lacking. In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration. Next, we present four update algorithms to solve this framework. The merits and demerits of these algorithms are systematically explored. Furthermore, extensive computational experiments using both synthetic data and real data demonstrate the effectiveness of JMF framework and related algorithms on pattern recognition and data mining.

DReLUs: Dual Rectified Linear Units

Rectified Linear Units (ReLUs) are widely used in feed-forward neural networks, and in convolutional neural networks in particular. However, they can be rarely found in recurrent neural networks due to the unboundedness and the positive image of the rectified linear activation function. In this paper, we introduce Dual Rectified Linear Units (DReLUs), a novel type of rectified unit that comes with a positive and negative image that is unbounded. We show that we can successfully replace the tanh activation function in the recurrent step of quasi recurrent neural networks. In addition, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. Therefore, we are able to stack up to eight quasi recurrent layers, making it possible to improve the current state-of-the-art in character-level language modeling over architectures based on shallow Long Short-Term Memory (LSTM).

An Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model

We study the active learning problem of top-k ranking from multi-wise comparisons under the popular multinomial logit model. Our goal is to identify the top-k items with high probability by adaptively querying sets for comparisons and observing the noisy output of the most preferred item from each comparison. To achieve this goal, we design a new active ranking algorithm without using any information about the underlying items’ preference scores. We also establish a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm. These two results together show that the proposed algorithm is instance optimal (up to logarithmic factors). Our work extends the existing literature on rank aggregation in three directions. First, instead of studying a static problem with fixed data, we investigate the top-k ranking problem in an active learning setting. Second, we provide the instance optimality, which is a much stronger theoretical guarantee. Finally, we extend the pairwise comparison to the multi-wise comparison, which has not been fully explored in ranking literature.

Tensor Regression Networks

To date, most convolutional neural network architectures output predictions by flattening 3rd-order activation tensors, and applying fully-connected output layers. This approach has two drawbacks: (i) we lose rich, multi-modal structure during the flattening process and (ii) fully-connected layers require many parameters. We present the first attempt to circumvent these issues by expressing the output of a neural network directly as the the result of a multi-linear mapping from an activation tensor to the output. By imposing low-rank constraints on the regression tensor, we can efficiently solve problems for which existing solutions are badly parametrized. Our proposed tensor regression layer replaces flattening operations and fully-connected layers by leveraging multi-modal structure in the data and expressing the regression weights via a low rank tensor decomposition. Additionally, we combine tensor regression with tensor contraction to further increase efficiency. Augmenting the VGG and ResNet architectures, we demonstrate large reductions in the number of parameters with negligible impact on performance on the ImageNet dataset.

General Latent Feature Modeling for Data Exploration Tasks

This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while can be inferred in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploration tasks.

Supermetric Search

Metric search is concerned with the efficient evaluation of queries in metric spaces. In general,a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space which is finitely four-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric space as, in terms of metric search, they are significantly more tractable. Supermetric spaces include all those governed by Euclidean, Cosine, Jensen-Shannon and Triangular distances, and are thus commonly used within many domains. In previous work we have given a generic mathematical basis for the supermetric property and shown how it can improve indexing performance for a given exact search structure. Here we present a full investigation into its use within a variety of different hyperplane partition indexing structures, and go on to show some more of its flexibility by examining a search structure whose partition and exclusion conditions are tailored, at each node, to suit the individual reference points and data set present there. Among the results given, we show a new best performance for exact search using a well-known benchmark.

The Advantage of Evidential Attributes in Social Networks

Nowadays, there are many approaches designed for the task of detecting communities in social networks. Among them, some methods only consider the topological graph structure, while others take use of both the graph structure and the node attributes. In real-world networks, there are many uncertain and noisy attributes in the graph. In this paper, we will present how we detect communities in graphs with uncertain attributes in the first step. The numerical, probabilistic as well as evidential attributes are generated according to the graph structure. In the second step, some noise will be added to the attributes. We perform experiments on graphs with different types of attributes and compare the detection results in terms of the Normalized Mutual Information (NMI) values. The experimental results show that the clustering with evidential attributes gives better results comparing to those with probabilistic and numerical attributes. This illustrates the advantages of evidential attributes.

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA’s vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts – even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC).

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning.

Monte-Carlo acceleration: importance sampling and hybrid dynamic systems
Combinatorial and Arithmetical Properties of the Restricted and Associated Bell and Factorial Numbers
Analogs of Linguistic Structure in Deep Representations
A note on marginal correlation based screening
Automatic Image Transformation for Inducing Affect
Patch-based Carcinoma Detection on Confocal Laser Endomicroscopy Images – A Cross-Site Robustness Assessment
Speeding-up ProbLog’s Parameter Learning
How much baseline correction do we need in ERP research? Extended GLM model can replace baseline correction while lifting its limits
The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations
Persistent Cache-oblivious Streaming Indexes
Rook theory of the finite general linear group
Fast Label Extraction in the CDAWG
Physical problem solving: Joint planning with symbolic, geometric, and dynamic constraints
An Algorithm for the 2D Radix-2 Sliding Window Fourier Transform
An improved approach to Bayesian computer model calibration and prediction
Bayesian hierarchical weighting adjustment and survey inference
Estimating parameters associated with monotone properties
Quality-Driven Resource Allocation for Full-Duplex Delay-Constrained Wireless Video Transmissions
Optimal Testing of Self-Driving Cars
Proceedings Sixteenth Conference on Theoretical Aspects of Rationality and Knowledge
Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation
Navigability with Imperfect Information
Positive Semidefinite Univariate Matrix Polynomials
SLEEPNET: Automated Sleep Staging System via Deep Learning
Dragon: A Computation Graph Virtual Machine Based Deep Learning Framework
Polynomial-Time Approximation Schemes for k-Center and Bounded-Capacity Vehicle Routing in Metrics with Bounded Highway Dimension
A Preamble Collision Resolution Scheme via Tagged Preambles for Cellular IoT/M2M Communications
A Change-Sensitive Algorithm for Maintaining Maximal Bicliques in a Dynamic Bipartite Graph
MMGAN: Manifold Matching Generative Adversarial Network for Generating Images
A universal tree-based network with the minimum number of reticulations
An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures
A General and Yet Efficient Scheme for Sub-Nyquist Radar Processing
Thermoelectricity near Anderson localization transitions
Unsupervised Motion Artifact Detection in Wrist-Measured Electrodermal Activity Data
Fast Deep Matting for Portrait Animation on Mobile Phone
Fast calculation of entropy with Zhang’s estimator
Online Wideband Spectrum Sensing Using Sparsity
Gamma-positivity and Rees product homology
Variable Selection for High-dimensional Generalized Linear Models using an Iterated Conditional Modes/Medians Algorithm
Practical Adversarial Combinatorial Bandit Algorithm via Compression of Decision Sets
Graph-Based Classification of Omnidirectional Images
Hybrid Precoding in Millimeter Wave Systems: How Many Phase Shifters Are Needed?
On Non-Orthogonal Multiple Access with Finite-Alphabet Inputs in Z-Channels
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities
Simultaneous Sparse Approximation Using an Iterative Method with Adaptive Thresholding
Optimal Control with State Constraints for Stochastic Evolution Equation with Jumps in Hilbert Space
Cascaded Scene Flow Prediction using Semantic Segmentation
Learning Sparse Representations in Reinforcement Learning with Sparse Coding
Discrete Latent Factor Model for Cross-Modal Hashing
Asymmetric Deep Supervised Hashing
Integrating car path optimization with train formation plan: a non-linear binary programming model and simulated annealing based heuristics
Asymptotic forecast uncertainty and the unstable subspace in the presence of additive model error
Directed, cylindric and radial Brownian webs
On permutation-invariance of limit theorems
Prior specification for binary Markov mesh models
Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning
Declarative Sequential Pattern Mining of Care Pathways
RankIQA: Learning from Rankings for No-reference Image Quality Assessment
Can string kernels pass the test of time in Native Language Identification?
Modelling the Scene Dependent Imaging in Cameras with a Deep Neural Network
A hierarchical Bayesian model for predicting host-parasite interactions using phylogenetic information
Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench
Notes on optimal approximations for importance sampling
Deep Interactive Region Segmentation and Captioning
The reverse mathematics of wqos and bqos
On Generalizations of ($k_1,k_2$)-runs
Updating Singular Value Decomposition for Rank One Matrix Perturbation
High-Dimensional Simplexes for Supermetric Search
Edge-coloring linear hypergraphs with medium-sized edges
Time Warping and Interpolation Operators for Piecewise Smooth Maps
Product recognition in store shelves as a sub-graph isomorphism problem
Prediction of amino acid side chain conformation using a deep neural network
Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator
A Novel Transfer Learning Approach upon Hindi, Arabic, and Bangla Numerals using Convolutional Neural Networks
Reduction of Overfitting in Diabetes Prediction Using Deep Learning Neural Network
Composition problems for braids: Membership, Identity and Freeness
What You Sketch Is What You Get: 3D Sketching using Multi-View Deep Volumetric Prediction
Maximum entropy based non-negative optoacoustic tomographic image reconstruction
Location of maximizers of eigenfunctions of fractional Schrödinger’s equation
A Harmony Search Based Wrapper Feature Selection Method for Holistic Bangla word Recognition
Detecting and classifying lesions in mammograms with Deep Learning
Gaussian Processes for Individualized Continuous Treatment Rule Estimation
A Note on Implementing a Special Case of the LEAR Covariance Model in Standard Software
Caching Policy for Cache-enabled D2D Communications by Learning User Preference
Wave equation with a coloured stable noise
Quasi-stationarity and quasi-ergodicity for discrete-time Markov chains with absorbing boundaries moving periodically
Cooperative Global Robust Output Regulation for a Class of Nonlinear Multi-Agent Systems by Distributed Event-Triggered Control
Non-Stationary Bandits with Habituation and Recovery Dynamics
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
Context-Independent Polyphonic Piano Onset Transcription with an Infinite Training Dataset
Comparing the speed of convergence for Wong-Zakai-type approximations of Itô and Stratonovich stochastic differential equations
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal Associations
An augmented Lagrange method for ill-posed elliptic state constrained optimal control problems with sparse controls
Delocalization of eigenvectors of random matrices. Lecture notes
An Optimal Control Formulation of Pulse-Based Control Using Koopman Operator
Equilibrium Liquidity Premia
Implicit Entity Linking in Tweets
MST in O(1) Rounds of the Congested Clique
Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods (Coordinate Descent, Directional Search, Derivative-Free Method)
A new family of MRD-codes
Expected number and distribution of critical points of real Lefschetz pencils
Reproducing kernels and choices of associated feature spaces, in the form of $L^{2}$-spaces
Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models
Making the best of data derived from a daily practice in clinical legal medicine for research and practice – the example of Spe3dLab
Fast Distributed Approximation for Max-Cut
Loop expansion around the Bethe approximation through the $M$-layer construction
A weak law of large numbers for estimating the correlation in bivariate Brownian semistationary processes
A central limit theorem for the realised covariation of a bivariate Brownian semistationary process
On the proximity operator of the sum of two convex functions
Asymptotic variance for Random walk Metropolis chains in high dimensions: logarithmic growth via the Poisson equation
Sensitivity analysis of variational inequalities via twice epi-differentiability and proto-differentiability of the proximity operator
Markov Chain Monte Carlo sampling for conditional tests: A link between permutation tests and algebraic statistics
A Guided Spatial Transformer Network for Histology Cell Differentiation
Extracting High-Dimensional Dynamics from Limited Data
Inference from Randomized Transmissions by Many Backscatter Sensors
On the ‘Poisson Trick’ and its Extensions for Fitting Multinomial Regression Models
$p_c$, $p_u$ and graph limits
Robust Pricing and Hedging around the Globe
Unique Continuation through Hyperplane for Higher Order Parabolic and Shrödinger Equations
A Robust Multi-Batch L-BFGS Method for Machine Learning
Direct Load Control of Thermostatically Controlled Loads Based on Sparse Observations Using Deep Reinforcement Learning
Interpatient Respiratory Motion Model Transfer for Virtual Reality Simulations of Liver Punctures
Video Highlight Prediction Using Audience Chat Reactions
Quantum machine learning: a classical perspective