We introduce a method for automatically selecting the path, or syllabus, that a neural network follows through a curriculum so as to maximise learning efficiency. A measure of the amount that the network learns from each data sample is provided as a reward signal to a nonstationary multi-armed bandit algorithm, which then determines a stochastic syllabus. We consider a range of signals derived from two distinct indicators of learning progress: rate of increase in prediction accuracy, and rate of increase in network complexity. Experimental results for LSTM networks on three curricula demonstrate that our approach can significantly accelerate learning, in some cases halving the time required to attain a satisfactory performance level.
Deep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks combined with an information-theoretic regularizer. Our experiments show that this combination is effective in learning a wide span of interpretable skills in a sample-efficient way, and can significantly boost the learning performance uniformly across a wide range of downstream tasks.
Real-world relations among entities can often be observed and determined by different perspectives/views. For example, the decision made by a user on whether to adopt an item relies on multiple aspects such as the contextual information of the decision, the item’s attributes, the user’s profile and the reviews given by other users. Different views may exhibit multi-way interactions among entities and provide complementary information. In this paper, we introduce a multi-tensor-based approach that can preserve the underlying structure of multi-view data in a generic predictive model. Specifically, we propose structural factorization machines (SFMs) that learn the common latent spaces shared by multi-view tensors and automatically adjust the importance of each view in the predictive model. Furthermore, the complexity of SFMs is linear in the number of parameters, which make SFMs suitable to large-scale problems. Extensive experiments on real-world datasets demonstrate that the proposed SFMs outperform several state-of-the-art methods in terms of prediction accuracy and computational cost.
This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in ‘big data’ regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the well-established need for stochastic variational inference, a scalable algorithm for approximating posterior distributions. The effectiveness of the proposed approach is demonstrated using an illustrative example with simulated data and a benchmark dataset in the airline industry with approximately $6$ million records.
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion model for fusing multiple input modalities that are inherently temporal in nature. Key features of our proposed model include: (i) simultaneous learning of the joint representation and temporal dependencies between modalities, (ii) use of multiple loss terms in the objective function, including a maximum correlation loss term to enhance learning of cross-modal information, and (iii) the use of an attention model to dynamically adjust the contribution of different input modalities to the joint representation. We validate our model via experimentation on two different tasks: video- and sensor-based activity classification, and audio-visual speech recognition. We empirically analyze the contributions of different components of the proposed CorrRNN model, and demonstrate its robustness, effectiveness and state-of-the-art performance on multiple datasets.
It has been believed that stochastic feedforward neural networks (SFNNs) have several advantages beyond deterministic deep neural networks (DNNs): they have more expressive power allowing multi-modal mappings and regularize better due to their stochastic nature. However, training large-scale SFNN is notoriously harder. In this paper, we aim at developing efficient training methods for SFNN, in particular using known architectures and pre-trained parameters of DNN. To this end, we propose a new intermediate stochastic model, called Simplified-SFNN, which can be built upon any baseline DNNand approximates certain SFNN by simplifying its upper latent units above stochastic ones. The main novelty of our approach is in establishing the connection between three models, i.e., DNN->Simplified-SFNN->SFNN, which naturally leads to an efficient training procedure of the stochastic models utilizing pre-trained parameters of DNN. Using several popular DNNs, we show how they can be effectively transferred to the corresponding stochastic models for both multi-modal and classification tasks on MNIST, TFD, CASIA, CIFAR-10, CIFAR-100 and SVHN datasets. In particular, we train a stochastic model of 28 layers and 36 million parameters, where training such a large-scale stochastic network is significantly challenging without using Simplified-SFNN
One of the major advantages in using Deep Learning for Finance is to embed a large collection of information into investment decisions. A way to do that is by means of compression, that lead us to consider a smaller feature space. Several studies are proving that non-linear feature reduction performed by Deep Learning tools is effective in price trend prediction. The focus has been put mainly on Restricted Boltzmann Machines (RBM) and on output obtained by them. Few attention has been payed to Auto-Encoders (AE) as an alternative means to perform a feature reduction. In this paper we investigate the application of both RBM and AE in more general terms, attempting to outline how architectural and input space characteristics can affect the quality of prediction.
In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this regard, review of existing work on text summarization process is useful for carrying out further research. In this paper, recent literature on automatic keyword extraction and text summarization are presented since text summarization process is highly depend on keyword extraction. This literature includes the discussion about different methodology used for keyword extraction and text summarization. It also discusses about different databases used for text summarization in several domains along with evaluation matrices. Finally, it discusses briefly about issues and research challenges faced by researchers along with future direction.
Developing satisfactory methodology for the analysis of Markov random field is a very challenging task. Indeed, due to the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. This forms a central issue for any statistical approach as the likelihood is an integral part of the procedure. Furthermore, such unobserved fields cannot be integrated out and the likelihood evaluation becomes a doubly intractable problem. This report gives an overview of some of the methods used in the literature to analyse such observed or unobserved random fields.
Business Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\&A are evolving to provide intelligence at different scales and perspectives – operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\&A will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\&A solutions.
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.