Automated Curriculum Learning for Neural Networks

We introduce a method for automatically selecting the path, or syllabus, that a neural network follows through a curriculum so as to maximise learning efficiency. A measure of the amount that the network learns from each data sample is provided as a reward signal to a nonstationary multi-armed bandit algorithm, which then determines a stochastic syllabus. We consider a range of signals derived from two distinct indicators of learning progress: rate of increase in prediction accuracy, and rate of increase in network complexity. Experimental results for LSTM networks on three curricula demonstrate that our approach can significantly accelerate learning, in some cases halving the time required to attain a satisfactory performance level.

Stochastic Neural Networks for Hierarchical Reinforcement Learning

Deep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks combined with an information-theoretic regularizer. Our experiments show that this combination is effective in learning a wide span of interpretable skills in a sample-efficient way, and can significantly boost the learning performance uniformly across a wide range of downstream tasks.

Learning from Multi-View Structural Data via Structural Factorization Machines

Real-world relations among entities can often be observed and determined by different perspectives/views. For example, the decision made by a user on whether to adopt an item relies on multiple aspects such as the contextual information of the decision, the item’s attributes, the user’s profile and the reviews given by other users. Different views may exhibit multi-way interactions among entities and provide complementary information. In this paper, we introduce a multi-tensor-based approach that can preserve the underlying structure of multi-view data in a generic predictive model. Specifically, we propose structural factorization machines (SFMs) that learn the common latent spaces shared by multi-view tensors and automatically adjust the importance of each view in the predictive model. Furthermore, the complexity of SFMs is linear in the number of parameters, which make SFMs suitable to large-scale problems. Extensive experiments on real-world datasets demonstrate that the proposed SFMs outperform several state-of-the-art methods in terms of prediction accuracy and computational cost.

Parametric Gaussian Process Regression for Big Data

This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in ‘big data’ regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the well-established need for stochastic variational inference, a scalable algorithm for approximating posterior distributions. The effectiveness of the proposed approach is demonstrated using an illustrative example with simulated data and a benchmark dataset in the airline industry with approximately 6 million records.

Deep Multimodal Representation Learning from Temporal Data

In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion model for fusing multiple input modalities that are inherently temporal in nature. Key features of our proposed model include: (i) simultaneous learning of the joint representation and temporal dependencies between modalities, (ii) use of multiple loss terms in the objective function, including a maximum correlation loss term to enhance learning of cross-modal information, and (iii) the use of an attention model to dynamically adjust the contribution of different input modalities to the joint representation. We validate our model via experimentation on two different tasks: video- and sensor-based activity classification, and audio-visual speech recognition. We empirically analyze the contributions of different components of the proposed CorrRNN model, and demonstrate its robustness, effectiveness and state-of-the-art performance on multiple datasets.

Simplified Stochastic Feedforward Neural Networks

It has been believed that stochastic feedforward neural networks (SFNNs) have several advantages beyond deterministic deep neural networks (DNNs): they have more expressive power allowing multi-modal mappings and regularize better due to their stochastic nature. However, training large-scale SFNN is notoriously harder. In this paper, we aim at developing efficient training methods for SFNN, in particular using known architectures and pre-trained parameters of DNN. To this end, we propose a new intermediate stochastic model, called Simplified-SFNN, which can be built upon any baseline DNNand approximates certain SFNN by simplifying its upper latent units above stochastic ones. The main novelty of our approach is in establishing the connection between three models, i.e., DNN->Simplified-SFNN->SFNN, which naturally leads to an efficient training procedure of the stochastic models utilizing pre-trained parameters of DNN. Using several popular DNNs, we show how they can be effectively transferred to the corresponding stochastic models for both multi-modal and classification tasks on MNIST, TFD, CASIA, CIFAR-10, CIFAR-100 and SVHN datasets. In particular, we train a stochastic model of 28 layers and 36 million parameters, where training such a large-scale stochastic network is significantly challenging without using Simplified-SFNN

On Feature Reduction using Deep Learning for Trend Prediction in Finance

One of the major advantages in using Deep Learning for Finance is to embed a large collection of information into investment decisions. A way to do that is by means of compression, that lead us to consider a smaller feature space. Several studies are proving that non-linear feature reduction performed by Deep Learning tools is effective in price trend prediction. The focus has been put mainly on Restricted Boltzmann Machines (RBM) and on output obtained by them. Few attention has been payed to Auto-Encoders (AE) as an alternative means to perform a feature reduction. In this paper we investigate the application of both RBM and AE in more general terms, attempting to outline how architectural and input space characteristics can affect the quality of prediction.

Automatic Keyword Extraction for Text Summarization: A Survey

In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this regard, review of existing work on text summarization process is useful for carrying out further research. In this paper, recent literature on automatic keyword extraction and text summarization are presented since text summarization process is highly depend on keyword extraction. This literature includes the discussion about different methodology used for keyword extraction and text summarization. It also discusses about different databases used for text summarization in several domains along with evaluation matrices. Finally, it discusses briefly about issues and research challenges faced by researchers along with future direction.

A review on statistical inference methods for discrete Markov random fields

Developing satisfactory methodology for the analysis of Markov random field is a very challenging task. Indeed, due to the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. This forms a central issue for any statistical approach as the likelihood is an integral part of the procedure. Furthermore, such unobserved fields cannot be integrated out and the likelihood evaluation becomes a doubly intractable problem. This report gives an overview of some of the methods used in the literature to analyse such observed or unobserved random fields.

Next Generation Business Intelligence and Analytics: A Survey

Business Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\&A are evolving to provide intelligence at different scales and perspectives – operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\&A will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\&A solutions.

Efficient Large Scale Clustering based on data partitioning

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.

Weakly-Supervised Spatial Context Networks

Estimating Local Interactions Among Many Agents Who Observe Their Neighbors

Constant Modulus Beamforming via Convex Optimization

Functional Regression with Unknown Manifold Structures

Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years

Automatic semantic role labeling on non-revised syntactic trees of journalistic texts

Quenched central limit theorem rates of convergence for one-dimensional random walks in random environments

Tight Lower Bounds for Differentially Private Selection

A probabilistic data-driven model for planar pushing

Semantically Consistent Regularization for Zero-Shot Recognition

Matching Media Contents with User Profiles by means of the Dempster-Shafer Theory

On transversality condition for overtaking optimality in infinite horizon control problem

Nonlinear consensus protocols with applications to quantized systems

DRAW: Deep networks for Recognizing styles of Artists Who illustrate children’s books

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

Controlling Lipschitz functions

Probing Slow Relaxation and Many-Body Localization in Two-Dimensional Quasi-Periodic Systems

A Method to Guarantee Local Convergence for Sequential Quadratic Programming with Poor Hessian Approximation

New Properties of Numbers of Plane Graphs

Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing

A semidiscrete version of the Petitot model as a plausible model for anthropomorphic image reconstruction and pattern recognition

Online Nonparametric Anomaly Detection based on Geometric Entropy Minimization

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

WRPN: Training and Inference using Wide Reduced-Precision Networks

Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning

Three Graph Duals and A Bijection

Network Information Science

Control Synthesis of Nonlinear Sampled Switched Systems using Euler’s Method

Minkowski Operations of Sets with Application to Robot Localization

3D mean Projective Shape Change for Face Differentiation from Multiple Digital Camera Images

Detecting Visual Relationships with Deep Relational Networks

DOPE: Distributed Optimization for Pairwise Energies

Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery

On graphs with $m(\partial^L_1)=n-3$

Large-scale distributed Kalman filtering via an optimization approach

Feature Sensitive Curve Registration by Maximizing Kernel based Matching Scores

Improving Pairwise Ranking for Multi-label Image Classification

Resolution-Adaptive Hybrid MIMO Architectures for Millimeter Wave Communications

Restoration of Atmospheric Turbulence-distorted Images via RPCA and Quasiconformal Maps

Federated Tensor Factorization for Computational Phenotyping

Packing tree degree sequences

Minimum polyhedron with $n$ vertices

EAST: An Efficient and Accurate Scene Text Detector

Exponential stability of modified truncated EM method for stochastic differential equations

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

struc2vec: Learning Node Representations from Structural Identity

Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

Mining Object Parts from CNNs via Active Question-Answering

A Bell state in a Penning Trap as a quantum simulator of the factorization problem

On the Spectral Properties of Symmetric Functions

Time, Frequency & Time-Varying Causality Measures in Neuroscience

Computing nearest stable matrix pairs

Recovering the Structure of Random Linear Graphs

Finite-time attitude synchronization with a discontinuous protocol

On the modeling of neural cognition for social network applications

Nearly resolution V plans on blocks of small size

Massively parallel MCMC for Bayesian hierarchical models

Pyramidal Gradient Matching for Optical Flow Estimation

Error Vector Magnitude Analysis in Generlaized Fading with Co-Channel Interference

Persian Wordnet Construction using Supervised Learning

Reconstruction of three-dimensional porous media using generative adversarial neural networks

Observability of large-scale Boolean control networks via network aggregations

Error Bounds for Uplink and Downlink 3D Localization in 5G mmWave Systems

Sparse Bayesian vector autoregressions in huge dimensions

Nonlinear Unknown Input Observability: The General Analytic Solution

Non-Linear Least-Squares Optimization of Rational Filters for the Solution of Interior Eigenvalue Problems

A Multi-type Preferential Attachment Model

Gaussian autoregressive process with dependent innovations. Some asymptotic results

Learning Deep CNN Denoiser Prior for Image Restoration

Simultaneous Stereo Video Deblurring and Scene Flow Estimation

Scavenger 0.1: A Theorem Prover Based on Conflict Resolution

Unfolding and Shrinking Neural Machine Translation Ensembles

Online Video Deblurring via Dynamic Temporal Blending Network

Phase Retrieval via Sparse Wirtinger Flow

Uplink Multiuser Massive MIMO Systems with Low-Resolution ADCs: A Coding-Theoretic Approach

Energy Efficiency in Cell-Free Massive MIMO with Zero-Forcing Precoding Design

Impact Of Content Features For Automatic Online Abuse Detection

Enumeration Complexity of Poor Man’s Propositional Dependence Logic

Phase reduction approach to synchronization of nonlinear oscillators

Automatic segmentation of MR brain images with a convolutional neural network

Interpretable Explanations of Black Boxes by Meaningful Perturbation

The MATLAB Toolbox SciXMiner: User’s Manual and Programmer’s Guide

Weak convergence of a non-Markov transition probability estimator with applications to expected lengths of stay

$L^p$-valued stochastic convolution integral driven by Volterra noise

$b$-symbol distance distribution of repeated-root cyclic codes

Weighted k-Server Bounds via Combinatorial Dichotomies

Speeding up Consensus by Chasing Fast Decisions

Gang-GC: Locality-aware Parallel Data Placement Optimizations for Key-Value Storages

A Domain Specific Language for Performance Portable Molecular Dynamics Algorithms

Continuously tempered Hamiltonian Monte Carlo

Beliefs and Probability in Bacchus’ l.p. Logic: A~3-Valued Logic Solution to Apparent Counter-intuition

Pedestrian Positioning Using WiFi Fingerprints and a Foot-mounted Inertial Sensor

Optimized Data Pre-Processing for Discrimination Prevention

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices

Quality Aware Network for Set to Set Recognition

Reconstruction of~3-D Rigid Smooth Curves Moving Free when Two Traceable Points Only are Available

Extremal attractors of Liouville copulas

Deep Learning for Multi-Task Medical Image Segmentation in Multiple Modalities

An ad-hoc modified Likelihood Function Applied to Optimization of Data Analysis in Atomic Spectroscopy

Portable, high-performance containers for HPC

Enhancement of Physical Layer Security Using Destination Artificial Noise Based on Outage Probability

Source-Sensitive Belief Change

ENWalk: Learning Network Features for Spam Detection in Twitter

Big jobs arrive early: From critical queues to random graphs

What we really want to find by Sentiment Analysis: The Relationship between Computational Models and Psychological State

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Effective Resistances and Kirchhoff index of Prism Graphs

Stochastic control of mean-field SPDEs with jumps

Forecasting Human Dynamics from Static Images

Node-centric community detection in multilayer networks with layer-coverage diversification bias

Solving the L1 regularized least square problem via a box-constrained smooth minimization

Directivity-Beamwidth Tradeoff of Massive MIMO Uplink Beamforming for High Speed Train Communication

The Space of Transferable Adversarial Examples