Neural Granger Causality for Nonlinear Time Series

While most classical approaches to Granger causality detection assume linear dynamics, many interactions in applied domains, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured multilayer perceptrons (MLPs) or recurrent neural networks (RNNs) combined with sparsity-inducing penalties on the weights. By encouraging specific sets of weights to be zero—in particular through the use of convex group-lasso penalties—we can extract the Granger causal structure. To further contrast with traditional approaches, our framework naturally enables us to efficiently capture long-range dependencies between series either via our RNNs or through an automatic lag selection in the MLP. We show that our neural Granger causality methods outperform state-of-the-art nonlinear Granger causality methods on the DREAM3 challenge data. This data consists of nonlinear gene expression and regulation time courses with only a limited number of time points. The successes we show in this challenging dataset provide a powerful example of how deep learning can be useful in cases that go beyond prediction on large datasets. We likewise demonstrate our methods in detecting nonlinear interactions in a human motion capture dataset.

A Unified View of Causal and Non-causal Feature Selection

In this paper, we unify causal and non-causal feature feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and non-causal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and non-causal feature selection take different assumptions of dependency among features to find Markov blanket, and their algorithms are shown different level of approximation for finding Markov blanket. In this framework, we are able to analyze the sample and error bounds of casual and non-causal methods. We conducted extensive experiments to show the correctness of our theoretical analysis.

Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data

Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete data and apply a continuous-valued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a non-Gaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations.

Pattern Localization in Time Series through Signal-To-Model Alignment in Latent Space

In this paper, we study the problem of locating a predefined sequence of patterns in a time series. In particular, the studied scenario assumes a theoretical model is available that contains the expected locations of the patterns. This problem is found in several contexts, and it is commonly solved by first synthesizing a time series from the model, and then aligning it to the true time series through dynamic time warping. We propose a technique that increases the similarity of both time series before aligning them, by mapping them into a latent correlation space. The mapping is learned from the data through a machine-learning setup. Experiments on data from non-destructive testing demonstrate that the proposed approach shows significant improvements over the state of the art.

Quantum Variational Autoencoder

Variational autoencoders (VAEs) are powerful generative models with the salient ability to perform inference. Here, we introduce a \emph{quantum variational autoencoder} (QVAE): a VAE whose latent generative process is implemented as a quantum Boltzmann machine (QBM). We show that our model can be trained end-to-end by maximizing a well-defined loss-function: a ‘quantum’ lower-bound to a variational approximation of the log-likelihood. We use quantum Monte Carlo (QMC) simulations to train and evaluate the performance of QVAEs. To achieve the best performance, we first create a VAE platform with discrete latent space generated by a restricted Boltzmann machine (RBM). Our model achieves state-of-the-art performance on the MNIST dataset when compared against similar approaches that only involve discrete variables in the generative process. We consider QVAEs with a smaller number of latent units to be able to perform QMC simulations, which are computationally expensive. We show that QVAEs can be trained effectively in regimes where quantum effects are relevant despite training via the quantum bound. Our findings open the way to the use of quantum computers to train QVAEs to achieve competitive performance for generative models. Placing a QBM in the latent space of a VAE leverages the full potential of current and next-generation quantum computers as sampling devices.

Truth Validation with Evidence

In the modern era, abundant information is easily accessible from various sources, however only a few of these sources are reliable as they mostly contain unverified contents. We develop a system to validate the truthfulness of a given statement together with underlying evidence. The proposed system provides supporting evidence when the statement is tagged as false. Our work relies on an inference method on a knowledge graph (KG) to identify the truthfulness of statements. In order to extract the evidence of falseness, the proposed algorithm takes into account combined knowledge from KG and ontologies. The system shows very good results as it provides valid and concise evidence. The quality of KG plays a role in the performance of the inference method which explicitly affects the performance of our evidence-extracting algorithm.

Horovod: fast and easy distributed deep learning in TensorFlow

Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library’s API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://…/horovod.

Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning

In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as ‘catastrophic forgetting’ and hyper-parameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning. The network grows in a tree-like manner to accommodate the new classes of data without losing the ability to identify the previously trained classes. The proposed network was tested on CIFAR-10 and CIFAR-100 datasets, and compared against the method of fine tuning specific layers of a conventional CNN. We obtained comparable accuracies and achieved 40% and 20% reduction in training effort in CIFAR-10 and CIFAR 100 respectively. The network was able to organize the incoming classes of data into feature-driven super-classes. Our model improves upon existing hierarchical CNN models by adding the capability of self-growth and also yields important observations on feature selective classification.

Variational Autoencoders for Collaborative Filtering

We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.

Learning Latent Features with Pairwise Penalties in Matrix Completion

Low-rank matrix completion (MC) has achieved great success in many real-world data applications. A latent feature model formulation is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning, e.g., the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use a squared L2 norm to measure the pairwise difference, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non-)convex pairwise penalty functions. A new and efficient algorithm is further developed to uniformly solve the optimization problem, with a theoretical convergence guarantee. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully characterized. In particular, we theoretically characterize the complexity-regularized maximum likelihood estimator, as a special case of our framework. It has a better error bound when compared to the standard trace-norm regularized matrix completion. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework.

Online Machine Learning in Big Data Streams

The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.

Spectral Normalization for Generative Adversarial Networks

One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.

Information Theory: A Tutorial Introduction

Shannon’s mathematical theory of communication defines fundamental limits on how much information can be transmitted between the different components of any man-made or biological system. This paper is an informal but rigorous introduction to the main ideas implicit in Shannon’s theory. An annotated reading list is provided for further reading.

Towards an Engine for Lifelong Interactive Knowledge Learning in Human-Machine Conversations

Although chatbots have been very popular in recent years, they still have some serious weaknesses which limit the scope of their applications. One major weakness is that they cannot learn new knowledge during the conversation process, i.e., their knowledge is fixed beforehand and cannot be expanded or updated during conversation. In this paper, we propose to build a general knowledge learning engine for chatbots to enable them to continuously and interactively learn new knowledge during conversations. As time goes by, they become more and more knowledgeable and better and better at learning and conversation. We model the task as an open-world knowledge base completion problem and propose a novel technique called lifelong interactive learning and inference (LiLi) to solve it. LiLi works by imitating how humans acquire knowledge and perform inference during an interactive conversation. Our experimental results show LiLi is highly promising.

Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called ‘heterogeneous information networks’ or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.

Fair Clustering Through Fairlets
JU_KS@SAIL_CodeMixed-2017: Sentiment Analysis for Indian Code Mixed Social Media Texts
Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
Image Tranformer
Bayesian variable selection in linear dynamical systems
Generalized McKean-Vlasov (Mean Field) Control: a stochastic maximum principle and a transport perspective
Stochastic Wasserstein Barycenters
Cross-topic Argument Mining from Heterogeneous Sources Using Attention-based Neural Networks
Prediction of spatial functional random processes: Comparing functional and spatio-temporal kriging approaches
ASP:A Fast Adversarial Attack Example Generation Framework based on Adversarial Saliency Prediction
Learning to Count Objects in Natural Images for Visual Question Answering
Optimal Shelter Location-Allocation during Evacuation with Uncertainties: A Scenario-Based Approach
Maximum-A-Posteriori Signal Recovery with Prior Information: Applications to Compressive Sensing
A comparison of machine learning techniques for taxonomic classification of teeth from the Family Bovidae
The TAP-Plefka variational principle for the spherical SK model
Schur Ring over Group $\Z_{2}^{n}$, Circulant $S-$Sets Invariant by Decimation and Hadamard Matrices
A Faster FPTAS for #Knapsack
Masked Conditional Neural Networks for Automatic Sound Events Recognition
Duality Gap in Interval Linear Programming
Detecting Anomalous Faces with ‘No Peeking’ Autoencoders
A Model Free Perspective for Linear Regression: Uniform-in-model Bounds for Post Selection Inference
MPC-Inspired Neural Network Policies for Sequential Decision Making
Optimal Actuator Location for Semi-linear Systems
Distributed Stochastic Optimization via Adaptive Stochastic Gradient Descent
Chain Posets
Robust Eco-Driving Control of Autonomous Vehicles Connected to Traffic Lights
ISEC: Iterative over-Segmentation via Edge Clustering
Disentangling Aspect and Opinion Words in Target-based Sentiment Analysis using Lifelong Learning
Auto-Encoding Total Correlation Explanation
Combinatorial minimal surfaces in pseudomanifolds
A Comparison of Constraint Handling Techniques for Dynamic Constrained Optimization Problems
Improving Power Grid Resilience Through Predictive Outage Estimation
A Reputation-based Stackelberg Game Model to Enhance Secrecy Rate in Spectrum Leasing to Selfish IoT Devices
An Anytime Algorithm for Task and Motion MDPs
New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code
Rapid Bayesian optimisation for synthesis of short polymer fiber materials
Parameter-free Network Sparsification and Data Reduction by Minimal Algorithmic Information Loss
Inferring relevant features: from QFT to PCA
Train on Validation: Squeezing the Data Lemon
Homotopy type of Neighborhood Complexes of Kneser graphs, $KG_{2,k}$
Articulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition
Algorithmic Complexity and Reprogrammability of Chemical Structure Networks
A Parameterized Strongly Polynomial Algorithm for Block Structured Integer Programs
On the maximal number of real embeddings of spatial minimally rigid graphs
Generalizing Bottleneck Problems
A Reallocation Algorithm for Online Split Packing of Circles
Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy
Detecting truth on components
SpaRTA – Tracking across occlusions via global partitioning of 3D clouds of points
Joint Estimation of Room Geometry and Modes with Compressed Sensing
Mean field rough differential equations
Deep Generative Model for Joint Alignment and Word Representation
Asymptotic lower bounds for modular and semimodular lattices
The martingale problem for anisotropic nonlocal operators
Training Deep Face Recognition Systems with Synthetic Data
Neuroscientific User Models: The Source of Uncertain User Feedback and Potentials for Improving Web Personalisation
Dealing with Uncertainties in User Feedback: Strategies Between Denying and Accepting
PRoST: Distributed Execution of SPARQL Queries Using Mixed Partitioning Strategies
The existence of designs II
A complete hand-drawn sketch vectorization framework
Parallel Tempering for the planted clique problem
Changing times to optimise reachability in temporal graphs
Online LZ77 Parsing and Matching Statistics with RLBWTs
Recognizing Cuneiform Signs Using Graph Based Methods
An Image Processing based Object Counting Approach for Machine Vision Application
Convergence of a degenerate microscopic dynamics to the porous medium equation
3D Regression Neural Network for the Quantification of Enlarged Perivascular Spaces in Brain MRI
Robust estimation in controlled branching processes: Bayesian estimators via disparities
A compact topology for $σ$-algebra convergence
Measuring Human-perceived Similarity in Heterogeneous Collections
Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing
Instance-based Inductive Deep Transfer Learning by Cross-Dataset Querying with Locality Sensitive Hashing
Bayesian cross-validation of geostatistical models
Large deviation for extremes of branching random walk with regularly varying displacements
Monte Carlo Q-learning for General Game Playing
Analysis of Usage in the Tourism Domain
Weak Dynamic Coloring of Planar Graphs
Continuous dependence of the pressure field with respect to endpoints for ideal incompressible fluids
Dynamics of a stochastically perturbed prey-predator system with modified Leslie-Gower and Holling type II schemes incorporating a prey refuge
Paxos Consensus, Deconstructed and Abstracted (Extended Version)
A Mismatched Joint Source-Channel Coding Perspective of Probabilistic Amplitude Shaping: Achievable Rates and Error Exponents
Nonparametric Bayesian estimation of multivariate Hawkes processes
WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models
Disentangling by Factorising
Some determinants of path generating functions, II
The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation
Improved GQ-CNN: Deep Learning Model for Planning Robust Grasps
Abductive reasoning as the basis to reproduce expert criteria in ECG Atrial Fibrillation identification
Optimal Hybrid Full-Duplex/Half-Duplex Scheme of the Buffer Aided Relay System
Dropout Model Evaluation in MOOCs
Hitting probabilities of a Brownian flow with Radial Drift
WebEye – Automated Collection of Malicious HTTP Traffic
Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis
On the extremal Betti numbers of binomial edge ideals of block graphs
Gray codes and symmetric chains
Parameterized Algorithms for Zero Extension and Metric Labelling Problems
Inverter Probing for Power Distribution Network Topology Processing
Improving the Florentine algorithms: recovering algorithms for Motzkin and Schröder paths
Learning Implicit Communication Strategies for the Purpose of Illicit Collusion
Policy Evaluation and Optimization with Continuous Treatments
Fast dynamics in glass-forming salol investigated by dielectric spectroscopy
Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT
High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition
Online Continuous Submodular Maximization
Bayesian Models for Unit Discovery on a Very Low Resource Language
Learning Patterns for Detection with Multiscale Scan Statistics
Variance-based Gradient Compression for Efficient Distributed Deep Learning
A Centrality Measure for Cycles and Subgraphs II
Artificial intelligence and pediatrics: A synthetic mini review
Diversity is All You Need: Learning Skills without a Reward Function