We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
Traffic flow prediction is an important research issue for solving the traffic congestion problem in an Intelligent Transportation System (ITS). Traffic congestion is one of the most serious problems in a city, which can be predicted in advance by analyzing traffic flow patterns. Such prediction is possible by analyzing the real-time transportation data from correlative roads and vehicles. This article first gives a brief introduction to the transportation data, and surveys the state-of-the-art prediction methods. Then, we verify whether or not the prediction performance is able to be improved by fitting actual data to optimize the parameters of the prediction model which is used to predict the traffic flow. Such verification is conducted by comparing the optimized time series prediction model with the normal time series prediction model. This means that in the era of big data, accurate use of the data becomes the focus of studying the traffic flow prediction to solve the congestion problem. Finally, experimental results of a case study are provided to verify the existence of such performance improvement, while the research challenges of this data-analytics-based prediction are presented and discussed.
Mobile edge computing (MEC) is a promising approach for enabling cloud-computing capabilities at the edge of cellular networks. Nonetheless, security is becoming an increasingly important issue in MEC-based applications. In this paper, we propose a deep-learning-based model to detect security threats. The model uses unsupervised learning to automate the detection process, and uses location information as an important feature to improve the performance of detection. Our proposed model can be used to detect malicious applications at the edge of a cellular network, which is a serious security threat. Extensive experiments are carried out with 10 different datasets, the results of which illustrate that our deep-learning-based model achieves an average gain of 6% accuracy compared with state-of-the-art machine learning algorithms.
This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded operation. Such a drawback of BN motivates us to explore recently proposed weight normalization algorithms (WN algorithms), i.e. weight normalization, normalization propagation and weight normalization with translated ReLU. These algorithms don’t slow-down training iterations and were experimentally shown to outperform BN on relatively small networks and datasets. However, it is not clear if these algorithms could replace BN in practical, large-scale applications. We answer this question by providing a detailed comparison of BN and WN algorithms using ResNet-50 network trained on ImageNet. We found that although WN achieves better training accuracy, the final test accuracy is significantly lower ($\approx 6\%$) than that of BN. This result demonstrates the surprising strength of the BN regularization effect which we were unable to compensate for using standard regularization techniques like dropout and weight decay. We also found that training of deep networks with WN algorithms is significantly less stable compared to BN, limiting their practical applications.
In this work, we present tensor-based linear and nonlinear models for hyperspectral data classification and analysis. By exploiting principles of tensor algebra, we introduce new classification architectures, the weight parameters of which satisfies the {\it rank}-1 canonical decomposition property. Then, we introduce learning algorithms to train both the linear and the non-linear classifier in a way to i) to minimize the error over the training samples and ii) the weight coefficients satisfies the {\it rank}-1 canonical decomposition property. The advantages of the proposed classification model is that i) it reduces the number of parameters required and thus reduces the respective number of training samples required to properly train the model, ii) it provides a physical interpretation regarding the model coefficients on the classification output and iii) it retains the spatial and spectral coherency of the input samples. To address issues related with linear classification, characterizing by low capacity, since it can produce rules that are linear in the input space, we introduce non-linear classification models based on a modification of a feedforward neural network. We call the proposed architecture {\it rank}-1 Feedfoward Neural Network (FNN), since their weights satisfy the {\it rank}-1 caconical decomposition property. Appropriate learning algorithms are also proposed to train the network. Experimental results and comparisons with state of the art classification methods, either linear (e.g., SVM) and non-linear (e.g., deep learning) indicates the outperformance of the proposed scheme, especially in cases where a small number of training samples are available. Furthermore, the proposed tensor-based classfiers are evaluated against their capabilities in dimensionality reduction.
Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs – in terms of image processing and natural language processing. The algorithm further needs to learn how to perform reasoning over this multi-modal representation so it can answer the questions correctly. This paper presents a survey of different approaches proposed to solve the problem of Visual Question Answering. We also describe the current state of the art model in later part of paper. In particular, the paper describes the approaches taken by various algorithms to extract image features, text features and the way these are employed to predict answers. We also briefly discuss the experiments performed to evaluate the VQA models and report their performances on diverse datasets including newly released VQA2.0[8].
With the recent increase in data online, discovering meaningful opportunities can be time-consuming and complicated for many individuals. To overcome this data overload challenge, we present a novel text-content-based recommender system as a valuable tool to predict user interests. To that end, we develop a specific procedure to create user models and item feature-vectors, where items are described in free text. The user model is generated by soliciting from a user a few keywords and expanding those keywords into a list of weighted near-synonyms. The item feature-vectors are generated from the textual descriptions of the items, using modified tf-idf values of the users’ keywords and their near-synonyms. Once the users are modeled and the items are abstracted into feature vectors, the system returns the maximum-similarity items as recommendations to that user. Our experimental evaluation shows that our method of creating the user models and item feature-vectors resulted in higher precision and accuracy in comparison to well-known feature-vector-generating methods like Glove and Word2Vec. It also shows that stemming and the use of a modified version of tf-idf increase the accuracy and precision by 2% and 3%, respectively, compared to non-stemming and the standard tf-idf definition. Moreover, the evaluation results show that updating the user model from usage histories improves the precision and accuracy of the system. This recommender system has been developed as part of the Agnes application, which runs on iOS and Android platforms and is accessible through the Agnes website.
Comparing, or benchmarking, of optimization algorithms is a complicated task that involves many subtle considerations to yield a fair and unbiased evaluation. In this paper, we systematically review the benchmarking process of optimization algorithms, and discuss the challenges of fair comparison. We provide suggestions for each step of the comparison process and highlight the pitfalls to avoid when evaluating the performance of optimization algorithms. We also discuss various methods of reporting the benchmarking results. Finally, some suggestions for future research are presented to improve the current benchmarking process.
The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.
Structural nested mean models (SNMMs) are among the fundamental tools for inferring causal effects of time-dependent exposures from longitudinal studies. With binary outcomes, however, current methods for estimating multiplicative and additive SNMM parameters suffer from variation dependence between the causal SNMM parameters and the non-causal nuisance parameters. Estimating methods for logistic SNMMs do not suffer from this dependence. Unfortunately, in contrast with the multiplicative and additive models, unbiased estimation of the causal parameters of a logistic SNMM rely on additional modeling assumptions even when the treatment probabilities are known. These difficulties have hindered the uptake of SNMMs in epidemiological practice, where binary outcomes are common. We solve the variation dependence problem for the binary multiplicative SNMM by a reparametrization of the non-causal nuisance parameters. Our novel nuisance parameters are variation independent of the causal parameters, and hence allows the fitting of a multiplicative SNMM by unconstrained maximum likelihood. It also allows one to construct true (i.e. congenial) doubly robust estimators of the causal parameters. Along the way, we prove that an additive SNMM with binary outcomes does not admit a variation independent parametrization, thus explaining why we restrict ourselves to the multiplicative SNMM.
In this paper, we present a deep extension of Sparse Subspace Clustering, termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere distribution assumption for the learned deep features, DSSC can infer a new data affinity matrix by simultaneously satisfying the sparsity principle of SSC and the nonlinearity given by neural networks. One of the appealing advantages brought by DSSC is: when original real-world data do not meet the class-specific linear subspace distribution assumption, DSSC can employ neural networks to make the assumption valid with its hierarchical nonlinear transformations. To the best of our knowledge, this is among the first deep learning based subspace clustering methods. Extensive experiments are conducted on four real-world datasets to show the proposed DSSC is significantly superior to 12 existing methods for subspace clustering.
We show that univariate and symmetric multivariate Hawkes processes are only weakly causal: the true log-likelihoods of real and reversed event time vectors are almost equal, thus parameter estimation via maximum likelihood only weakly depends on the direction of the arrow of time. In ideal (synthetic) conditions, tests of goodness of parametric fit unambiguously reject backward event times, which implies that inferring kernels from time-symmetric quantities, such as the autocovariance of the event rate, only rarely produce statistically significant fits. Finally, we find that fitting financial data with many-parameter kernels may yield significant fits for both arrows of time for the same event time vector, sometimes favouring the backward time direction. This goes to show that a significant fit of Hawkes processes to real data with flexible kernels does not imply a definite arrow of time unless one tests it.
Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network’s internal state representation to target predicting future observations. Predictive-State Decoders are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, mapping inputs to outputs (recognition) and vice-versa (generation). We propose an intermediate approach. First, we show that forward computation in DNNs with logistic sigmoid activations corresponds to a simplified approximate Bayesian inference in a directed probabilistic multi-layer model. This connection allows to interpret DNN as a probabilistic model of the output and all hidden units given the input. Second, we propose that in order for the recognition and generation networks to be more consistent with the joint model of the data, weights of the recognition and generator network should be related by transposition. We demonstrate in a tentative experiment that such a coupled pair can be learned generatively, modelling the full distribution of the data, and has enough capacity to perform well in both recognition and generation.
Cloud computing has permeated into the information technology industry in the last few years, and it is emerging nowadays in scientific environments. Science user communities are demanding a broad range of computing power to satisfy the needs of high-performance applications, such as local clusters, high-performance computing systems, and computing grids. Different workloads are needed from different computational models, and the cloud is already considered as a promising paradigm. The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads, hence, their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service provider supporting scientific applications.
A new prior is proposed for representation learning, which can be combined with other priors in order to help disentangling abstract factors from each other. It is inspired by the phenomenon of consciousness seen as the formation of a low-dimensional combination of a few concepts constituting a conscious thought, i.e., consciousness as awareness at a particular time instant. This provides a powerful constraint on the representation in that such low-dimensional thought vectors can correspond to statements about reality which are true, highly probable, or very useful for taking decisions. The fact that a few elements of the current state can be combined into such a predictive or useful statement is a strong constraint and deviates considerably from the maximum likelihood approaches to modelling data and how states unfold in the future based on an agent’s actions. Instead of making predictions in the sensory (e.g. pixel) space, the consciousness prior allows the agent to make predictions in the abstract space, with only a few dimensions of that space being involved in each of these predictions. The consciousness prior also makes it natural to map conscious states to natural language utterances or to express classical AI knowledge in the form of facts and rules, although the conscious states may be richer than what can be expressed easily in the form of a sentence, a fact or a rule.