We consider the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. This problem has appeared in the literature as probabilistic latent semantic analysis (pLSA), and has connections to non-negative matrix factorization. When the simplicity of the variable is measured through its cardinality, we show that a solution to this latent variable discovery problem can be used to distinguish direct causal relations from spurious correlations among almost all joint distributions on simple causal graphs with two observed variables. Conjecturing a similar identifiability result holds with Shannon entropy, we study a loss function that trades-off between entropy of the latent variable and the conditional mutual information of the observed variables. We then propose a latent variable discovery algorithm — LatentSearch — and show that its stationary points are the stationary points of our loss function. We experimentally show that LatentSearch can indeed be used to distinguish direct causal relations from spurious correlations.
This paper introduces new effect parameters for factorial survival designs with possibly right-censored time-to-event data. In the special case of a two-sample design it coincides with the concordance or Wilcoxon parameter in survival analysis. More generally, the new parameters describe treatment or interaction effects and we develop estimates and tests to infer their presence. We rigorously study the asymptotic properties by means of empirical process techniques and additionally suggest wild bootstrapping for a consistent and distribution-free application of the inference procedures. The small sample performance is discussed based on simulation results. The practical usefulness of the developed methodology is exemplified on a data example about patients with colon cancer by conducting one- and two-factorial analyses.
Singular Spectrum Analysis (SSA) or Singular Value Decomposition (SVD) are often used to de-noise univariate time series or to study their spectral profile. Both techniques rely on the eigendecomposition of the correlation matrix estimated after embedding the signal into its delayed coordinates. In this work we show that the eigenvectors can be used to calculate the coefficients of a set of filters which form a filter bank. The properties of these filters are derived. In particular we show that their outputs can be grouped according to their frequency response. Furthermore, the frequency at the maximum of each frequency response and the corresponding eigenvalue can provide a power spectrum estimation of the time series. Two different applications illustrate how both characteristics can be applied to analyze wideband signals in order to achieve narrow-band signals or to infer their frequency occupation.
Latent truth discovery, LTD for short, refers to the problem of aggregating multiple claims from various sources in order to estimate the plausibility of statements about entities. In the absence of a ground truth, this problem is highly challenging, when some sources provide conflicting claims and others no claims at all. In this work we provide an unsupervised stochastic inference procedure on top of a model that combines restricted Boltzmann machines with feed-forward neural networks to accurately infer the reliability of sources as well as the plausibility of statements about entities. In comparison to prior work our approach stands out (1) by allowing the incorporation of arbitrary features about sources and claims, (2) by generalizing from reliability per source towards a reliability function, and thus (3) enabling the estimation of source reliability even for sources that have provided no or very few claims, (4) by building on efficient and scalable stochastic inference algorithms, and (5) by outperforming the state-of-the-art by a considerable margin.
Adversarial attacks find perturbations that can fool models into misclassifying images. Previous works had successes in generating noisy/edge-rich adversarial perturbations, at the cost of degradation of image quality. Such perturbations, even when they are small in scale, are usually easily spottable by human vision. In contrast, we propose Harmonic Adversarial Attack Methods (HAAM), that generates edge-free perturbations by using harmonic functions. The property of edge-free guarantees that the generated adversarial images can still preserve visual quality, even when perturbations are of large magnitudes. Experiments also show that adversaries generated by HAAM often have higher rates of success when transferring between models. In addition, we find harmonic perturbations can simulate natural phenomena like natural lighting and shadows. It would then be possible to help find corner cases for given models, as a first step to improving them.
Prediction and explanation are key objects in supervised machine learning, where predictive models are known as black boxes and explanatory models are known as glass boxes. Explanation provides the necessary and sufficient information to interpret the model output in terms of the model input. It includes assessments of model output dependence on important input variables and measures of input variable importance to model output. High dimensional model representation (HDMR), also known as the generalized functional ANOVA expansion, provides useful insight into the input-output behavior of supervised machine learning models. This article gives applications of HDMR in supervised machine learning. The first application is characterizing information leakage in “big-data” settings. The second application is reduced-order representation of elementary symmetric polynomials. The third application is analysis of variance with correlated variables. The last application is estimation of HDMR from kernel machine and decision tree black box representations. These results suggest HDMR to have broad utility within machine learning as a glass box representation.
Machine learning has become a basic tool in scientific research and for the development of technologies with significant impact on society. In fact, such methods allow to discover regularities in data and make predictions without explicit knowledge of the rules governing the system under analysis. However, a price must be paid for exploiting such a modeling flexibility: machine learning methods are usually black-box, meaning that it is difficult to fully understand what the machine is doing and how. This poses constraints on the applicability of such methods, neglecting the possibility to gather novel scientific insights from experimental data. Our research aims to open the black-box of recurrent neural networks, an important family of neural networks suitable to process sequential data. Here, we propose a novel methodology that allows to provide a mechanistic interpretation of their behaviour when used to solve computational tasks. The methodology is based on mathematical constructs called excitable network attractors, which are models represented as networks in phase space composed by stable attractors and excitable connections between them. As the behaviour of recurrent neural networks depends on training and inputs driving the autonomous system, we introduce an algorithm to extract network attractors directly from a trajectory generated by the neural network while solving tasks. Simulations conducted on a controlled benchmark highlight the relevance of the proposed methodology for interpreting the behaviour of recurrent neural networks on tasks that involve learning a finite number of stable states.
Cross-dataset transfer learning is an important problem in person re-identification (Re-ID). Unfortunately, not too many deep transfer Re-ID models exist for realistic settings of practical Re-ID systems. We propose a purely deep transfer Re-ID model consisting of a deep convolutional neural network and an autoencoder. The latent code is divided into metric embedding and nuisance variables. We then utilize an unsupervised training method that does not rely on co-training with non-deep models. Our experiments show improvements over both the baseline and competitors’ transfer learning models.
Principal Filter Analysis (PFA), is an elegant, easy to implement, yet effective methodology for neural network compression. PFA exploits the intrinsic correlation between filter responses within network layers to recommend a smaller network footprint. We propose two compression algorithms: the first allows a user to specify the proportion of the original spectral energy that should be preserved in each layer after compression, while the second is a parameter-free approach that automatically selects the compression used at each layer. Both algorithms are evaluated against several architectures and datasets, and we show considerable compression rates without compromising accuracy, e.g., for VGG-16 on CIFAR-10 and CIFAR-100 PFA achieves a compression rate of 8x and 3x with an accuracy gain of 0.4% points and 1.4% points, respectively. In our tests we also demonstrate that networks compressed with PFA achieve an accuracy that is very close to the empirical upper bound for a given compression ratio.
This work proposes an adaptive trace lasso regularized L1-norm based graph cut method for dimensionality reduction of Hyperspectral images, called as Trace Lasso-L1 Graph Cut’ (TL-L1GC). The underlying idea of this method is to generate the optimal projection matrix by considering both the sparsity as well as the correlation of the data samples. The conventional L2-norm used in the objective function is sensitive to noise and outliers. Therefore, in this work L1-norm is utilized as a robust alternative to L2-norm. Besides, for further improvement of the results, we use a penalty function of trace lasso with the L1GC method. It adaptively balances the L2-norm and L1-norm simultaneously by considering the data correlation along with the sparsity. We obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm with trace lasso as the penalty. Furthermore, an iterative procedure for this TL-L1GC method is proposed to solve the optimization function. The effectiveness of this proposed method is evaluated on two benchmark HSI datasets.
Multivariate signal processing is often based on dimensionality reduction techniques. We propose a new method, Dynamical Component Analysis (DyCA), leading to a classification of the underlying dynamics and – for a certain type of dynamics – to a signal subspace representing the dynamics of the data. In this paper the algorithm is derived leading to a generalized eigenvalue problem of correlation matrices. The application of the DyCA on high-dimensional chaotic signals is presented both for simulated data as well as real EEG data of epileptic seizures.
This study improves the performance of neural named entity recognition by a margin of up to 11% in F-score on the example of a low-resource language like German, thereby outperforming existing baselines and establishing a new state-of-the-art on each single open-source dataset. Rather than designing deeper and wider hybrid neural architectures, we gather all available resources and perform a detailed optimization and grammar-dependent morphological processing consisting of lemmatization and part-of-speech tagging prior to exposing the raw data to any training process. We test our approach in a threefold monolingual experimental setup of a) single, b) joint, and c) optimized training and shed light on the dependency of downstream-tasks on the size of corpora used to compute word embeddings.
In this article we propose a new supervised ensemble learning method called Data Shared Adaptive Bootstrap Aggregated (AdaBag) Lasso for capturing low dimensional useful features for word based sentiment analysis and mining problems. The literature on ensemble methods is very rich in both statistics and machine learning. The algorithm is a substantial upgrade of the Data Shared Lasso uplift algorithm. The most significant conceptual addition to the existing literature lies in the final selection of bag of predictors through a special bootstrap aggregation scheme. We apply the algorithm to one simulated data and perform dimension reduction in grouped IMDb data (drama, comedy and horror) to extract reduced set of word features for predicting sentiment ratings of movie reviews demonstrating different aspects. We also compare the performance of the present method with the classical Principal Components with associated Linear Discrimination (PCA-LD) as baseline. There are few limitations in the algorithm. Firstly, the algorithm workflow does not incorporate online sequential data acquisition and it does not use sentence based models which are common in ANN algorithms . Our results produce slightly higher error rate compare to the reported state-of-the-art as a consequence.
We explore methods for option discovery based on variational inference and make two algorithmic contributions. First: we highlight a tight connection between variational option discovery methods and variational autoencoders, and introduce Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection. In VALOR, the policy encodes contexts from a noise distribution into trajectories, and the decoder recovers the contexts from the complete trajectories. Second: we propose a curriculum learning approach where the number of contexts seen by the agent increases whenever the agent’s performance is strong enough (as measured by the decoder) on the current set of contexts. We show that this simple trick stabilizes training for VALOR and prior variational option discovery methods, allowing a single agent to learn many more modes of behavior than it could with a fixed context distribution. Finally, we investigate other topics related to variational option discovery, including fundamental limitations of the general approach and the applicability of learned options to downstream tasks.
An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot’s performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a ‘semantic score’ from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views.
High quality Automatic Speech Recognition (ASR) is a prerequisite for speech-based applications and research. While state-of-the-art ASR software is freely available, the language dependent acoustic models are lacking for languages other than English, due to the limited amount of freely available training data. We train acoustic models for German with Kaldi on two datasets, which are both distributed under a Creative Commons license. The resulting model is freely redistributable, lowering the cost of entry for German ASR. The models are trained on a total of 412 hours of German read speech data and we achieve a relative word error reduction of 26% by adding data from the Spoken Wikipedia Corpus to the previously best freely available German acoustic model recipe and dataset. Our best model achieves a word error rate of 14.38 on the Tuda-De test set. Due to the large amount of speakers and the diversity of topics included in the training data, our model is robust against speaker variation and topic shift.
Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is $\textbf{not}$ a SCAMP tuning parameter. The search phase of SCAMP consists of finding sub-collections of the data matrix, called candidate clusters, that obey shape constraints along each coordinate projection. An extension of the dip test of Hartigan and Hartigan (1985) is developed to assist the search. Selection occurs by scoring each candidate cluster with a preference function that quantifies prior belief about the mixture composition. Clustering proceeds by selecting candidates to maximize their total preference score. SCAMP concludes by annotating each selected cluster with labels that describe how cluster-level statistics compare to certain dataset-level quantities. SCAMP can be run multiple times on a single data matrix. Comparison of annotations obtained across iterations provides a measure of clustering uncertainty. Simulation studies and applications to real data are considered. A C++ implementation with R interface is $latex \href{https://…/scamp}{available\ online}$.
In this work we introduce the class of beta autoregressive fractionally integrated moving average models for continuous random variables taking values in the continuous unit interval $(0,1)$. The proposed model accommodates a set of regressors and a long-range dependent time series structure. We derive the partial likelihood estimator for the parameters of the proposed model, obtain the associated score vector and Fisher information matrix. We also prove the consistency and asymptotic normality of the estimator under mild conditions. Hypotheses testing, diagnostic tools and forecasting are also proposed. A Monte Carlo simulation is considered to evaluate the finite sample performance of the partial likelihood estimators and to study some of the proposed tests. An empirical application is also presented and discussed.
Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. This is unfortunate as researchers typically have specific hypotheses about which condition means differ from each other. A priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be ‘tested instead of, rather than as a supplement to, the ordinary omnibus’ F test’ (Hayes, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, Helmert, and polynomial contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the weight coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more precise confirmatory tests of experimental hypotheses than standard omnibus F-test.
Beetle antennae search (BAS) is an efficient meta-heuristic algorithm. However, the convergent results of BAS rely heavily on the random beetle direction in every iterations. More specifically, different random seeds may cause different optimized results. Besides, the step-size update algorithm of BAS cannot guarantee objective become smaller in iterative process. In order to solve these problems, this paper proposes Beetle Swarm Antennae Search Algorithm (BSAS) which combines swarm intelligence algorithm with feedback-based step-size update strategy. BSAS employs k beetles to find more optimal position in each moving rather than one beetle. The step-size updates only when k beetles return without better choices. Experiments are carried out on building system identification. The results reveal the efficacy of the BSAS algorithm to avoid influence of random direction of Beetle. In addition, the estimation errors decrease as the beetles number goes up.
We present a ‘pull’ approach to approximate products of Gaussian mixtures within message updates for Nonparametric Belief Propagation (NBP) inference. Existing NBP methods often represent messages between continuous-valued latent variables as Gaussian mixture models. To avoid computational intractability in loopy graphs, NBP necessitates an approximation of the product of such mixtures. Sampling-based product approximations have shown effectiveness for NBP inference. However, such approximations used within the traditional ‘push’ message update procedures quickly become computationally prohibitive for multi-modal distributions over high-dimensional variables. In contrast, we propose a ‘pull’ method, as the Pull Message Passing for Nonparametric Belief propagation (PMPNBP) algorithm, and demonstrate its viability for efficient inference. We report results using an experiment from an existing NBP method, PAMPAS, for inferring the pose of an articulated structure in clutter. Results from this illustrative problem found PMPNBP has a greater ability to efficiently scale the number of components in its mixtures and, consequently, improve inference accuracy.
Recently, link prediction has attracted more attentions from various disciplines such as computer science, bioinformatics and economics. In this problem, unknown links between nodes are discovered based on numerous information such as network topology, profile information and user generated contents. Most of the previous researchers have focused on the structural features of the networks. While the recent researches indicate that contextual information can change the network topology. Although, there are number of valuable researches which combine structural and content information, but they face with the scalability issue due to feature engineering. Because, majority of the extracted features are obtained by a supervised or semi supervised algorithm. Moreover, the existing features are not general enough to indicate good performance on different networks with heterogeneous structures. Besides, most of the previous researches are presented for undirected and unweighted networks. In this paper, a novel link prediction framework called ‘DeepLink’ is presented based on deep learning techniques. In contrast to the previous researches which fail to automatically extract best features for the link prediction, deep learning reduces the manual feature engineering. In this framework, both the structural and content information of the nodes are employed. The framework can use different structural feature vectors, which are prepared by various link prediction methods. It considers all proximity orders that are presented in a network during the structural feature learning. We have evaluated the performance of DeepLink on two real social network datasets including Telegram and irBlogs. On both datasets, the proposed framework outperforms several structural and hybrid approaches for link prediction problem.