Principled reasoning about the identifiability of causal effects from non-experimental data is an important application of graphical causal models. We present an algorithmic framework for efficiently testing, constructing, and enumerating $m$-separators in ancestral graphs (AGs), a class of graphical causal models that can represent uncertainty about the presence of latent confounders. Furthermore, we prove a reduction from causal effect identification by covariate adjustment to $m$-separation in a subgraph for directed acyclic graphs (DAGs) and maximal ancestral graphs (MAGs). Jointly, these results yield constructive criteria that characterize all adjustment sets as well as all minimal and minimum adjustment sets for identification of a desired causal effect with multivariate exposures and outcomes in the presence of latent confounding. Our results extend several existing solutions for special cases of these problems. Our efficient algorithms allowed us to empirically quantify the identifiability gap between covariate adjustment and the do-calculus in random DAGs, covering a wide range of scenarios. Implementations of our algorithms are provided in the R package dagitty.
In this paper, we propose deep learning techniques for econometrics, specifically for causal inference and for estimating individual as well as average treatment effects. The contribution of this paper is twofold: 1. For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining the local neighborhood structure among the data points in the embedding space. This deep learning based technique is shown to perform better than simple k nearest neighbor matching for estimating treatment effects, especially when the data points have several features/covariates but reside in a low dimensional manifold in high dimensional space. We also observe better performance than manifold learning methods for neighbor matching. 2. Propensity score matching is one specific and popular way to perform matching in order to estimate average and individual treatment effects. We propose the use of deep neural networks (DNNs) for propensity score matching, and present a network called PropensityNet for this. This is a generalization of the logistic regression technique traditionally used to estimate propensity scores and we show empirically that DNNs perform better than logistic regression at propensity score matching. Code for both methods will be made available shortly on Github at: https://…/vikas84bf
The aim this study is discussed on the detection and correction of data containing the additive outlier (AO) on the model ARIMA (p, d, q). The process of detection and correction of data using an iterative procedure popularized by Box, Jenkins, and Reinsel (1994). By using this method we obtained an ARIMA models were fit to the data containing AO, this model is added to the original model of ARIMA coefficients obtained from the iteration process using regression methods. This shows that there is an improvement of forecasting error rate data.
There has been a wide interest to extend univariate and multivariate nonparametric procedures to clustered and hierarchical data. Traditionally, parametric mixed models have been used to account for the correlation structures among the dependent observational units. In this work we extend multivariate nonparametric procedures for one-sample and several samples location problems to clustered data settings. The results are given for a general score function, but with an emphasis on spatial sign and rank methods. Mixed models notation involving design matrices for fixed and random effects is used throughout. The asymptotic variance formulas and limiting distributions of the test statistics under the null hypothesis and under a sequence of alternatives are derived, as well as the limiting distributions for the corresponding estimates. The approach based on a general score function also shows, for example, how $M$-estimates behave with clustered data. Efficiency studies demonstrate practical advantages and disadvantages of the use of spatial sign and rank scores, and their weighted versions. Small sample procedures based on sign change and permutation principles are discussed. Further development of nonparametric methods for cluster correlated data would benefit from the notation already familiar to statisticians working under normality assumptions.
The question regarding the relationship between money stock and GDP in the Euro Area is still under debate. In this paper we address the theme by resorting to Granger-causality spectral estimation and inference in the frequency domain. We propose a new bootstrap test on unconditional and conditional Granger-causality, as well as on their difference, to catch particularly prominent causality cycles. The null hypothesis is that each causality or causality difference is equal to the median across frequencies. In a dedicated simulation study, we prove that our tool is able to disambiguate causalities significantly larger than the median even in presence of a rich causality structure. Our results hold until the stationary bootstrap of Politis & Romano (1994) is consistent on the underlying stochastic process. By this method, we point out that in the Euro Area money and output co-implied before the financial crisis of 2008, while after the crisis the only significant direction is from money to output with a shortened period.
Data protection constraints frequently require distributed analysis of data, i.e. individual-level data remains at many different sites, but analysis nevertheless has to be performed jointly. The data exchange is often handled manually, requiring explicit permission before transfer, i.e. the number of data calls and the amount of data should be limited. Thus, only simple summary statistics are typically transferred and aggregated with just a single call, but this does not allow for complex statistical techniques, e.g., automatic variable selection for prognostic signature development. We propose a multivariable regression approach for building a prognostic signature by automatic variable selection that is based on aggregated data from different locations in iterative calls. To minimize the amount of transferred data and the number of calls, we also provide a heuristic variant of the approach. To further strengthen data protection, the approach can also be combined with a trusted third party architecture. We evaluate our proposed method in a simulation study comparing our results to the results obtained with the pooled individual data. The proposed method is seen to be able to detect covariates with true effect to a comparable extent as a method based on individual data, although the performance is moderately decreased if the number of sites is large. In a typical scenario, the heuristic decreases the number of data calls from more than 10 to 3. To make our approach widely available for application, we provide an implementation on top of the DataSHIELD framework.
Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty of the task, due to the existence of multiple valid translations for a single source sentence, and to the extrinsic uncertainty caused by noisy training data. We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations. Our results show that search works remarkably well but that the models tend to spread too much probability mass over the hypothesis space. Next, we propose tools to assess model calibration and show how to easily fix some shortcomings of current models. We release both code and multiple human reference translations for two popular benchmarks.
Smart environments integrates various types of technologies, including cloud computing, fog computing, and the IoT paradigm. In such environments, it is essential to organize and manage efficiently the broad and complex set of heterogeneous resources. For this reason, resources classification and categorization becomes a vital issue in the control system. In this paper we make an exhaustive literature survey about the various computing systems and architectures which defines any type of ontology in the context of smart environments, considering both, authors that explicitly propose resources categorization and authors that implicitly propose some resources classification as part of their system architecture. As part of this research survey, we have built a table that summarizes all research works considered, and which provides a compact and graphical snapshot of the current classification trends. The goal and primary motivation of this literature survey has been to understand the current state of the art and identify the gaps between the different computing paradigms involved in smart environment scenarios. As a result, we have found that it is essential to consider together several computing paradigms and technologies, and that there is not, yet, any research work that integrates a merged resources classification, taxonomy or ontology required in such heterogeneous scenarios.
Recovering a function or high-dimensional parameter vector from indirect measurements is a central task in various scientific areas. Several methods for solving such inverse problems are well developed and well understood. Recently, novel algorithms using deep learning and neural networks for inverse problems appeared. While still in their infancy, these techniques show astonishing performance for applications like low-dose CT or various sparse data problems. However, theoretical results for deep learning in inverse problems are missing so far. In this paper, we establish such a convergence analysis for the proposed NETT (Network Tikhonov) approach to inverse problems. NETT considers regularized solutions having small value of a regularizer defined by a trained neural network. Opposed to existing deep learning approaches, our regularization scheme enforces data consistency also for the actual unknown to be recovered. This is beneficial in case the unknown to be recovered is not sufficiently similar to available training data. We present a complete convergence analysis for NETT, where we derive well-posedness results and quantitative error estimates, and propose a possible strategy for training the regularizer. Numerical results are presented for a tomographic sparse data problem using the $\ell^q$-norm of auto-encoder as trained regularizer, which demonstrate good performance of NETT even for unknowns of different type from the training data.
Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.
In this paper, we propose a listwise approach for constructing user-specific rankings in recommendation systems in a collaborative fashion. We contrast the listwise approach to previous pointwise and pairwise approaches, which are based on treating either each rating or each pairwise comparison as an independent instance respectively. By extending the work of (Cao et al. 2007), we cast listwise collaborative ranking as maximum likelihood under a permutation model which applies probability mass to permutations based on a low rank latent score matrix. We present a novel algorithm called SQL-Rank, which can accommodate ties and missing data and can run in linear time. We develop a theoretical framework for analyzing listwise ranking methods based on a novel representation theory for the permutation model. Applying this framework to collaborative ranking, we derive asymptotic statistical rates as the number of users and items grow together. We conclude by demonstrating that our SQL-Rank method often outperforms current state-of-the-art algorithms for implicit feedback such as Weighted-MF and BPR and achieve favorable results when compared to explicit feedback algorithms such as matrix factorization and collaborative ranking.
With advancements in sensor technology, a heterogeneous set of data, containing samples of scalar, waveform signal, image, or even structured point cloud are becoming increasingly popular. Developing a statistical model, representing the behavior of the underlying system based upon such a heterogeneous set of data can be used in monitoring, control, and optimization of the system. Unfortunately, available methods only focus on the scalar and curve data and do not provide a general framework that can integrate different sources of data to construct a model. This paper poses the problem of estimating a process output, measured by a scalar, curve, an image, or a point cloud by a set of heterogeneous process variables such as scalar process setting, sensor readings, and images. We introduce a general multiple tensor on tensor regression (MTOT) approach in which each set of input data (predictor) as well as the output measurements are represented by tensors. We formulate a linear regression model between the input and output tensors and estimate the parameters by minimizing a least square loss function. In order to avoid overfitting and to reduce the number of parameters to be estimated, we decompose the model parameters using several bases, spanning the input and output spaces. Next, we learn both the bases and their spanning coefficients when minimizing the loss function using an alternating least square (ALS) algorithm. We show that such a minimization has a closed-form solution in each iteration and can be computed very efficiently. Through several simulation and case studies, we evaluate the performance of the proposed method. The results reveal the advantage of the proposed method over some benchmarks in the literature in terms of the mean square prediction error.
This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multi-tasked machine translation/parsing. XNMT is available open-source at https://…/xnmt
Product recommendation systems are important for major movie studios during the movie greenlight process and as part of machine learning personalization pipelines. Collaborative Filtering (CF) models have proved to be effective at powering recommender systems for online streaming services with explicit customer feedback data. CF models do not perform well in scenarios in which feedback data is not available, in cold start situations like new product launches, and situations with markedly different customer tiers (e.g., high frequency customers vs. casual customers). Generative natural language models that create useful theme-based representations of an underlying corpus of documents can be used to represent new product descriptions, like new movie plots. When combined with CF, they have shown to increase the performance in cold start situations. Outside of those cases though in which explicit customer feedback is available, recommender engines must rely on binary purchase data, which materially degrades performance. Fortunately, purchase data can be combined with product descriptions to generate meaningful representations of products and customer trajectories in a convenient product space in which proximity represents similarity. Learning to measure the distance between points in this space can be accomplished with a deep neural network that trains on customer histories and on dense vectorizations of product descriptions. We developed a system based on Collaborative (Deep) Metric Learning (CML) to predict the purchase probabilities of new theatrical releases. We trained and evaluated the model using a large dataset of customer histories, and tested the model for a set of movies that were released outside of the training window. Initial experiments show gains relative to models that do not train on collaborative preferences.
The problem of machine learning with missing values is common in many areas. A simple approach is to first construct a dataset without missing values simply by discarding instances with missing entries or by imputing a fixed value for each missing entry, and then train a prediction model with the new dataset. A drawback of this naive approach is that the uncertainty in the missing entries is not properly incorporated in the prediction. In order to evaluate prediction uncertainty, the multiple imputation (MI) approach has been studied, but the performance of MI is sensitive to the choice of the probabilistic model of the true values in the missing entries, and the computational cost of MI is high because multiple models must be trained. In this paper, we propose an alternative approach called the Interval-based Prediction Uncertainty Bounding (IPUB) method. The IPUB method represents the uncertainties due to missing entries as intervals, and efficiently computes the lower and upper bounds of the prediction results when all possible training sets constructed by imputing arbitrary values in the intervals are considered. The IPUB method can be applied to a wide class of convex learning algorithms including penalized least-squares regression, support vector machine (SVM), and logistic regression. We demonstrate the advantages of the IPUB method by comparing it with an existing method in numerical experiment with benchmark datasets.
This paper presents a distance-based discriminative framework for learning with probability distributions. Instead of using kernel mean embeddings or generalized radial basis kernels, we introduce embeddings based on dissimilarity of distributions to some reference distributions denoted as templates. Our framework extends the theory of similarity of \citet{balcan2008theory} to the population distribution case and we prove that, for some learning problems, Wasserstein distance achieves low-error linear decision functions with high probability. Our key result is to prove that the theory also holds for empirical distributions. Algorithmically, the proposed approach is very simple as it consists in computing a mapping based on pairwise Wasserstein distances and then learning a linear decision function. Our experimental results show that this Wasserstein distance embedding performs better than kernel mean embeddings and computing Wasserstein distance is far more tractable than estimating pairwise Kullback-Leibler divergence of empirical distributions.
Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance.
Multilayer graphs encode different kind of interactions between the same set of entities. When one wants to cluster such a multilayer graph, the natural question arises how one should merge the information different layers. We introduce in this paper a one-parameter family of matrix power means for merging the Laplacians from different layers and analyze it in expectation in the stochastic block model. We show that this family allows to recover ground truth clusters under different settings and verify this in real world data. While computing the matrix power mean can be very expensive for large graphs, we introduce a numerical scheme to efficiently compute its eigenvectors for the case of large sparse graphs.
In this paper, we present a theoretical framework for understanding vector embedding, a fundamental building block of many deep learning models, especially in NLP. We discover a natural unitary-invariance in vector embeddings, which is required by the distributional hypothesis. This unitary-invariance states the fact that two embeddings are essentially equivalent if one can be obtained from the other by performing a relative-geometry preserving transformation, for example a rotation. This idea leads to the Pairwise Inner Product (PIP) loss, a natural unitary-invariant metric for the distance between two embeddings. We demonstrate that the PIP loss captures the difference in functionality between embeddings. By formulating the embedding training process as matrix factorization under noise, we reveal a fundamental bias-variance tradeoff in dimensionality selection. With tools from perturbation and stability theory, we provide an upper bound on the PIP loss using the signal spectrum and noise variance, both of which can be readily inferred from data. Our framework sheds light on many empirical phenomena, including the existence of an optimal dimension, and the robustness of embeddings against over-parametrization. The bias-variance tradeoff of PIP loss explicitly answers the fundamental open problem of dimensionality selection for vector embeddings.