In this work we propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds the temporal relationships in the data. To overcome the limitations of the reservoir vanishing memory, we introduce a bidirectional reservoir, whose last state captures also the past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network, which is trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network, and it can be trained much faster than a fully-trained recurrent network.
Principal component regression is a linear regression model with principal components as regressors. This type of modelling is particularly useful for prediction in settings with high-dimensional covariates. Surprisingly, the existing literature treating of Bayesian approaches is relatively sparse. In this paper, we aim at filling some gaps through the following practical contribution: we introduce a Bayesian approach with detailed guidelines for a straightforward implementation. The approach features two characteristics that we believe are important. First, it effectively involves the relevant principal components in the prediction process. This is achieved in two steps. The first one is model selection; the second one is to average out the predictions obtained from the selected models according to model averaging mechanisms, allowing to account for model uncertainty. The model posterior probabilities are required for model selection and model averaging. For this purpose, we include a procedure leading to an efficient reversible jump algorithm. The second characteristic of our approach is whole robustness, meaning that the impact of outliers on inference gradually vanishes as they approach plus or minus infinity. The conclusions obtained are consequently consistent with the majority of observations (the bulk of the data).
There is an increasing interest in exploiting mobile sensing technologies and machine learning techniques for mental health monitoring and intervention. Researchers have effectively used contextual information, such as mobility, communication and mobile phone usage patterns for quantifying individuals’ mood and wellbeing. In this paper, we investigate the effectiveness of neural network models for predicting users’ level of stress by using the location information collected by smartphones. We characterize the mobility patterns of individuals using the GPS metrics presented in the literature and employ these metrics as input to the network. We evaluate our approach on the open-source StudentLife dataset. Moreover, we discuss the challenges and trade-offs involved in building machine learning models for digital mental health and highlight potential future work in this direction.
The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully end-to-end action-attending graphic neural network (A$^2$GNN) for skeleton-based action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract high-level semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an action-attending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, action-attending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the state-of-the-art performances.
Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a new model is proposed for generating explanations by utilizing localized grounding of constituent phrases in generated explanations to ensure image relevance. Specifically, we introduce a phrase-critic model to refine (re-score/re-rank) generated candidate explanations and employ a relative-attribute inspired ranking loss using ‘flipped’ phrases as negative examples for training. At test time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image.
Generative Adversarial Networks (GANs) convergence in a high-resolution setting with a computational constrain of GPU memory capacity (from 12GB to 24 GB) has been beset with difficulty due to the known lack of convergence rate stability. In order to boost network convergence of DCGAN (Deep Convolutional Generative Adversarial Networks) and achieve good-looking high-resolution results we propose a new layered network structure, HR-DCGAN, that incorporates current state-of-the-art techniques for this effect. A novel dataset, CZ Faces (CZF), containing human faces from different ethnical groups in a wide variety of illumination conditions and image resolutions is introduced. We conduct extensive experiments on CelebA and CZF.
Very important breakthroughs in data-centric machine learning algorithms led to impressive performance in transactional point applications such as detecting anger in speech, alerts from a Face Recognition system, or EKG interpretation. Non-transactional applications, e.g. medical diagnosis beyond the EKG results, require AI algorithms that integrate deeper and broader knowledge in their problem-solving capabilities, e.g. integrating knowledge about anatomy and physiology of the heart with EKG results and additional patient findings. Similarly, for military aerial interpretation, where knowledge about enemy doctrines on force composition and spread helps immensely in situation assessment beyond image recognition of individual objects. The Double Deep Learning approach advocates integrating data-centric machine self-learning techniques with machine-teaching techniques to leverage the power of both and overcome their corresponding limitations. To take AI to the next level, it is essential that we rebalance the roles of data and knowledge. Data is important but knowledge- deep and commonsense- are equally important. An initiative is proposed to build Wikipedia for Smart Machines, meaning target readers are not human, but rather smart machines. Named ReKopedia, the goal is to develop methodologies, tools, and automatic algorithms to convert humanity knowledge that we all learn in schools, universities and during our professional life into Reusable Knowledge structures that smart machines can use in their inference algorithms. Ideally, ReKopedia would be an open source shared knowledge repository similar to the well-known shared open source software code repositories. Examples in the article are based on- or inspired by- real-life non-transactional AI systems I deployed over decades of AI career that benefit hundreds of millions of people around the globe.
In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance. Inspired by the way humans utilize semantic knowledge between objects of interests, we propose a framework that incorporates knowledge graphs for describing the relationships between multiple labels. Our model learns an information propagation mechanism from the semantic label space, which can be applied to model the interdependencies between seen and unseen class labels. With such investigation of structured knowledge graphs for visual reasoning, we show that our model can be applied for solving multi-label classification and ML-ZSL tasks. Compared to state-of-the-art approaches, comparable or improved performances can be achieved by our method.
We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy.
The development of the mlpack C++ machine learning library (http://www.mlpack.org ) has required the design and implementation of a flexible, robust optimization system that is able to solve the types of arbitrary optimization problems that may arise all throughout machine learning problems. In this paper, we present the generic optimization framework that we have designed for mlpack. A key priority in the design was ease of implementation of both new optimizers and new objective functions to be optimized; therefore, implementation of a new optimizer requires only one method and implementation of a new objective function requires at most four functions. This leads to simple and intuitive code, which, for fast prototyping and experimentation, is of paramount importance. When compared to optimization frameworks of other libraries, we find that mlpack’s supports more types of objective functions, is able to make optimizations that other frameworks do not, and seamlessly supports user-defined objective functions and optimizers.
Local Binary Pattern (LBP) is a traditional descriptor for texture analysis that gained attention in the last decade. Being robust to several properties such as invariance to illumination translation and scaling, LBPs achieved state-of-the-art results in several applications. However, LBPs are not able to capture high-level features from the image, merely encoding features with low abstraction levels. In this work, we propose Deep LBP, which borrow ideas from the deep learning community to improve LBP expressiveness. By using parametrized data-driven LBP, we enable successive applications of the LBP operators with increasing abstraction levels. We validate the relevance of the proposed idea in several datasets from a wide range of applications. Deep LBP improved the performance of traditional and multiscale LBP in all cases.
Machine learning models are vulnerable to adversarial examples: minor, in many cases imperceptible, perturbations to classification inputs. Among other suspected causes, adversarial examples exploit ML models that offer no well-defined indication as to how well a particular prediction is supported by training data, yet are forced to confidently extrapolate predictions in areas of high entropy. In contrast, Bayesian ML models, such as Gaussian Processes (GP), inherently model the uncertainty accompanying a prediction in the well-studied framework of Bayesian Inference. This paper is first to explore adversarial examples and their impact on uncertainty estimates for Gaussian Processes. To this end, we first present three novel attacks on Gaussian Processes: GPJM and GPFGS exploit forward derivatives in GP latent functions, and Latent Space Approximation Networks mimic the latent space representation in unsupervised GP models to facilitate attacks. Further, we show that these new attacks compute adversarial examples that transfer to non-GP classification models, and vice versa. Finally, we show that GP uncertainty estimates not only differ between adversarial examples and benign data, but also between adversarial examples computed by different algorithms.
A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.
We propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach, which we call MINT, is based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances. The proposed critical values, which may be obtained from simulation (in the case where one marginal is known) or resampling, guarantee that the test has nominal size, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide a new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector. The theory is supported by numerical studies on both simulated and real data.