With the development of neural networks based machine learning and their usage in mission critical applications, voices are rising against the \textit{black box} aspect of neural networks as it becomes crucial to understand their limits and capabilities. With the rise of neuromorphic hardware, it is even more critical to understand how a neural network, as a distributed system, tolerates the failures of its computing nodes, neurons, and its communication channels, synapses. Experimentally assessing the robustness of neural networks involves the quixotic venture of testing all the possible failures, on all the possible inputs, which ultimately hits a combinatorial explosion for the first, and the impossibility to gather all the possible inputs for the second. In this paper, we prove an upper bound on the expected error of the output when a subset of neurons crashes. This bound involves dependencies on the network parameters that can be seen as being too pessimistic in the average case. It involves a polynomial dependency on the Lipschitz coefficient of the neurons activation function, and an exponential dependency on the depth of the layer where a failure occurs. We back up our theoretical results with experiments illustrating the extent to which our prediction matches the dependencies between the network parameters and robustness. Our results show that the robustness of neural networks to the average crash can be estimated without the need to neither test the network on all failure configurations, nor access the training set used to train the network, both of which are practically impossible requirements.
Nonnegative matrix factorization (NMF) is a powerful tool in data exploratory analysis by discovering the hidden features and part-based patterns from high-dimensional data. NMF and its variants have been successfully applied into diverse fields such as pattern recognition, signal processing, data mining, bioinformatics and so on. Recently, NMF has been extended to analyze multiple matrices simultaneously. However, a unified framework is still lacking. In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration. Next, we present four update algorithms to solve this framework. The merits and demerits of these algorithms are systematically explored. Furthermore, extensive computational experiments using both synthetic data and real data demonstrate the effectiveness of JMF framework and related algorithms on pattern recognition and data mining.
Rectified Linear Units (ReLUs) are widely used in feed-forward neural networks, and in convolutional neural networks in particular. However, they can be rarely found in recurrent neural networks due to the unboundedness and the positive image of the rectified linear activation function. In this paper, we introduce Dual Rectified Linear Units (DReLUs), a novel type of rectified unit that comes with a positive and negative image that is unbounded. We show that we can successfully replace the tanh activation function in the recurrent step of quasi recurrent neural networks. In addition, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. Therefore, we are able to stack up to eight quasi recurrent layers, making it possible to improve the current state-of-the-art in character-level language modeling over architectures based on shallow Long Short-Term Memory (LSTM).
We study the active learning problem of top-$k$ ranking from multi-wise comparisons under the popular multinomial logit model. Our goal is to identify the top-$k$ items with high probability by adaptively querying sets for comparisons and observing the noisy output of the most preferred item from each comparison. To achieve this goal, we design a new active ranking algorithm without using any information about the underlying items’ preference scores. We also establish a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm. These two results together show that the proposed algorithm is instance optimal (up to logarithmic factors). Our work extends the existing literature on rank aggregation in three directions. First, instead of studying a static problem with fixed data, we investigate the top-$k$ ranking problem in an active learning setting. Second, we provide the instance optimality, which is a much stronger theoretical guarantee. Finally, we extend the pairwise comparison to the multi-wise comparison, which has not been fully explored in ranking literature.
To date, most convolutional neural network architectures output predictions by flattening 3rd-order activation tensors, and applying fully-connected output layers. This approach has two drawbacks: (i) we lose rich, multi-modal structure during the flattening process and (ii) fully-connected layers require many parameters. We present the first attempt to circumvent these issues by expressing the output of a neural network directly as the the result of a multi-linear mapping from an activation tensor to the output. By imposing low-rank constraints on the regression tensor, we can efficiently solve problems for which existing solutions are badly parametrized. Our proposed tensor regression layer replaces flattening operations and fully-connected layers by leveraging multi-modal structure in the data and expressing the regression weights via a low rank tensor decomposition. Additionally, we combine tensor regression with tensor contraction to further increase efficiency. Augmenting the VGG and ResNet architectures, we demonstrate large reductions in the number of parameters with negligible impact on performance on the ImageNet dataset.
This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while can be inferred in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploration tasks.
Metric search is concerned with the efficient evaluation of queries in metric spaces. In general,a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space which is finitely four-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric space as, in terms of metric search, they are significantly more tractable. Supermetric spaces include all those governed by Euclidean, Cosine, Jensen-Shannon and Triangular distances, and are thus commonly used within many domains. In previous work we have given a generic mathematical basis for the supermetric property and shown how it can improve indexing performance for a given exact search structure. Here we present a full investigation into its use within a variety of different hyperplane partition indexing structures, and go on to show some more of its flexibility by examining a search structure whose partition and exclusion conditions are tailored, at each node, to suit the individual reference points and data set present there. Among the results given, we show a new best performance for exact search using a well-known benchmark.
Nowadays, there are many approaches designed for the task of detecting communities in social networks. Among them, some methods only consider the topological graph structure, while others take use of both the graph structure and the node attributes. In real-world networks, there are many uncertain and noisy attributes in the graph. In this paper, we will present how we detect communities in graphs with uncertain attributes in the first step. The numerical, probabilistic as well as evidential attributes are generated according to the graph structure. In the second step, some noise will be added to the attributes. We perform experiments on graphs with different types of attributes and compare the detection results in terms of the Normalized Mutual Information (NMI) values. The experimental results show that the clustering with evidential attributes gives better results comparing to those with probabilistic and numerical attributes. This illustrates the advantages of evidential attributes.
Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA’s vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts – even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC).
Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning.