In this paper, we provide a novel approach to capture causal interaction in a dynamical system from time-series data. In \cite{sinha_IT_CDC2016}, we have shown that the existing measures of information transfer, namely directed information, granger causality and transfer entropy fail to capture true causal interaction in dynamical system and proposed a new definition of information transfer that captures true causal interaction. The main contribution of this paper is to show that the proposed definition of information transfer in \cite{sinha_IT_CDC2016}\cite{sinha_IT_ICC} can be computed from time-series data. We use transfer operator theoretic framework involving Perron-Frobenius and Koopman operators for the data-driven approximation of the system dynamics and for the computation of information transfer. Several examples involving linear and nonlinear system dynamics are presented to verify the efficiency of the developed algorithm.
Causal consistency for key-value stores has two main requirements (1) do not make a version visible if some of its dependencies are invisible as it may violate causal consistency in the future and (2) make a version visible as soon as possible so that clients have the most recent information (to the extent feasible). These two requirements conflict with each other. Existing key-value stores that provide causal consistency (or detection of causal violation) utilize a static approach in the trade-off between these requirements. Depending upon the choice, it assists some applications and penalizes some applications. We propose an alternative where the system provides a set of tracking groups and checking groups. This allows the application to choose the settings that are most suitable for that application. Furthermore, these groups can be dynamically changed based on application requirements.
Predictive process monitoring is concerned with the analysis of events produced during the execution of a process in order to predict the future state of ongoing cases thereof. Existing techniques in this field are able to predict, at each step of a case, the likelihood that the case will end up in an undesired outcome. These techniques, however, do not take into account what process workers may do with the generated predictions in order to decrease the likelihood of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive monitoring approaches with concepts of alarms, interventions, compensations, and mitigation effects. The framework incorporates a parameterized cost model to assess the cost-benefit tradeoffs of applying prescriptive process monitoring in a given setting. The paper also outlines an approach to optimize the generation of alarms given a dataset and a set of cost model parameters. The proposed approach is empirically evaluated using a range of real-life event logs.
Semiparametric inference with missing outcome data (including causal inference) is based on partially specified models which are not of direct interest (e.g., model for missingness/treatment assignment mechanism). Different class of estimators exist, which are more or less robust to misspecification of these models. Another type of threat to the validity of the inference occur in situations where some observations are contaminated (generated by some nuisance distribution). Classical semiparametric inference is not robust to such contamination, and a single observation may have an arbitrary large effect on bias as measured by the influence function. We introduce inverse probability weighted, double robust and outcome regression estimators of location and scale parameters, which are robust to contamination in the sense that their influence function is bounded. We give asymptotic properties and study finite sample behaviour. Our simulated experiments show that contamination can be more serious a threat to the quality of inference than model misspecification. An interesting aspect of our results is that the auxiliary outcome model used to adjust for ignorable missingness (confounding) is also useful to protect against contamination. We also illustrate through a case study how both adjustment to ignorable missingness and protection against contamination are achieved through weighting schemes, which can be contrasted to gain further insights.
Random Differential Equations provide a natural extension of Ordinary Differential Equations to the stochastic setting. We show how, and under which conditions, every equilibrium state of a Random Differential Equation (RDE) can be described by a Structural Causal Model (SCM), while pertaining the causal semantics. This provides an SCM that captures the stochastic and causal behavior of the RDE, which can model both cycles and confounders. This enables the study of the equilibrium states of the RDE by applying the theory and statistical tools available for SCMs, for example, marginalizations and Markov properties, as we illustrate by means of an example. Our work thus provides a direct connection between two fields that so far have been developing in isolation.
This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations and empirical studies will be reviewed. The ultimate goal is to shed light on the role of each layer of processing involved in a ConvNet architecture, distill what we currently understand about ConvNets and highlight critical open problems.
Recent neural network and language models rely on softmax distributions with an extremely large number of categories. Since calculating the softmax normalizing constant in this context is prohibitively expensive, there is a growing literature of efficiently computable but biased estimates of the softmax. In this paper we propose the first unbiased algorithms for maximizing the softmax likelihood whose work per iteration is independent of the number of classes and datapoints (and no extra work is required at the end of each epoch). We show that our proposed unbiased methods comprehensively outperform the state-of-the-art on seven real world datasets.
The great advances of learning-based approaches in image processing and computer vision are largely based on deeply nested networks that compose linear transfer functions with suitable non-linearities. Interestingly, the most frequently used non-linearities in imaging applications (variants of the rectified linear unit) are uncommon in low dimensional approximation problems. In this paper we propose a novel non-linear transfer function, called lifting, which is motivated from a related technique in convex optimization. A lifting layer increases the dimensionality of the input, naturally yields a linear spline when combined with a fully connected layer, and therefore closes the gap between low and high dimensional approximation problems. Moreover, applying the lifting operation to the loss layer of the network allows us to handle non-convex and flat (zero-gradient) cost functions. We analyze the proposed lifting theoretically, exemplify interesting properties in synthetic experiments and demonstrate its effectiveness in deep learning approaches to image classification and denoising.
We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as ‘real’ samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST-1K, CelebA and CIFAR-10 datasets. Experimental results show that our method can approximate well multi-modal distribution and achieve better results than state-of-the-art methods on these benchmark datasets. Our model implementation is published here: https://…/gaan
Stance detection is a subproblem of sentiment analysis where the stance of the author of a piece of natural language text for a particular target (either explicitly stated in the text or not) is explored. The stance output is usually given as Favor, Against, or Neither. In this paper, we target at stance detection on sports-related tweets and present the performance results of our SVM-based stance classifiers on such tweets. First, we describe three versions of our proprietary tweet data set annotated with stance information, all of which are made publicly available for research purposes. Next, we evaluate SVM classifiers using different feature sets for stance detection on this data set. The employed features are based on unigrams, bigrams, hashtags, external links, emoticons, and lastly, named entities. The results indicate that joint use of the features based on unigrams, hashtags, and named entities by SVM classifiers is a plausible approach for stance detection problem on sports-related tweets.
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the $m$ machines which allegedly compute stochastic gradients every iteration, an $\alpha$-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $\varepsilon$-approximate minimizers of convex functions in $T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big)$ iterations. In contrast, traditional mini-batch SGD needs $T = O\big( \frac{1}{\varepsilon^2 m} \big)$ iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.