Models for analyzing multivariate data sets with missing values require strong, often unassessable, assumptions. The most common of these is that the mechanism that created the missing data is ignorable – a twofold assumption dependent on the mode of inference. The first part, which is the focus here, under the Bayesian and direct likelihood paradigms, requires that the missing data are missing at random (MAR); in contrast, the frequentist-likelihood paradigm demands that the missing data mechanism always produces MAR data, a condition known as missing always at random (MAAR). Under certain regularity conditions, assuming MAAR leads to an assumption that can be tested using the observed data alone namely, the missing data indicators only depend on fully observed variables. Here, we propose three different diagnostics procedures that not only indicate when this assumption is invalid but also suggest which variables are the most likely culprits. Although MAAR is not a necessary condition to ensure validity under the Bayesian and direct likelihood paradigms, it is sufficient, and evidence for its violation should encourage the statistician to conduct a targeted sensitivity analysis.
This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel ‘temporally-coupled’ extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging prediction and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating competitive imputation performance and improved forecasting accuracy as compared to several state-of-the art baselines. We also show that the model discovers interpretable clusters in datasets with hundreds of time series using macroeconomic data from the Gapminder Foundation.
Data-driven predictive analytics are in use today across a number of industrial applications, but further integration is hindered by the requirement of similarity among model training and test data distributions. This paper addresses the need of learning from possibly nonstationary data streams, or under concept drift, a commonly seen phenomenon in practical applications. A simple dual-learner ensemble strategy, alternating learners framework, is proposed. A long-memory model learns stable concepts from a long relevant time window, while a short-memory model learns transient concepts from a small recent window. The difference in prediction performance of these two models is monitored and induces an alternating policy to select, update and reset the two models. The method features an online updating mechanism to maintain the ensemble accuracy, and a concept-dependent trigger to focus on relevant data. Through empirical studies the method demonstrates effective tracking and prediction when the steaming data carry abrupt and/or gradual changes.
We review recent advances in modal regression studies using kernel density estimation. Modal regression is an alternative approach for investigating relationship between a response variable and its covariates. Specifically, modal regression summarizes the interactions between the response variable and covariates using the conditional mode or local modes. We first describe the underlying model of modal regression and its estimators based on kernel density estimation. We then review the asymptotic properties of the estimators and strategies for choosing the smoothing bandwidth. We also discuss useful algorithms and similar alternative approaches for modal regression, and propose future direction in this field.
We identify a strong equivalence between neural network based machine learning (ML) methods and the formulation of statistical data assimilation (DA), known to be a problem in statistical physics. DA, as used widely in physical and biological sciences, systematically transfers information in observations to a model of the processes producing the observations. The correspondence is that layer label in the ML setting is the analog of time in the data assimilation setting. Utilizing aspects of this equivalence we discuss how to establish the global minimum of the cost functions in the ML context, using a variational annealing method from DA. This provides a design method for optimal networks for ML applications and may serve as the basis for understanding the success of ‘deep learning’. Results from an ML example are presented. When the layer label is taken to be continuous, the Euler-Lagrange equation for the ML optimization problem is an ordinary differential equation, and we see that the problem being solved is a two point boundary value problem. The use of continuous layers is denoted ‘deepest learning’. The Hamiltonian version provides a direct rationale for back propagation as a solution method for the canonical momentum; however, it suggests other solution methods are to be preferred.
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
Causal inference on multiple non-independent outcomes raises serious challenges, because multivariate techniques that properly account for the outcome’s dependence structure need to be considered. We focus on the case of binary outcomes framing our discussion in the potential outcome approach to causal inference. We define causal effects of treatment on joint outcomes introducing the notion of product outcomes. We also discuss a decomposition of the causal effect on product outcomes into intrinsic and extrinsic causal effects, which respectively provide information on treatment effect on the intrinsic (product) structure of the product outcomes and on the outcomes’ dependence structure. We propose a log-mean linear regression approach for modeling the distribution of the potential outcomes, which is particularly appealing because all the causal estimands of interest and the decomposition into intrinsic and extrinsic causal effects can be easily derived by model parameters. The method is illustrated in two randomized experiments concerning (i) the effect of the administration of oral pre-surgery morphine on pain intensity after surgery; and (ii) the effect of honey on nocturnal cough and sleep difficulty associated with childhood upper respiratory tract infections.
We use decision trees to build a helpdesk agent reference network to facilitate the on-the-job advising of junior or less experienced staff on how to better address telecommunication customer fault reports. Such reports generate field measurements and remote measurements which, when coupled with location data and client attributes, and fused with organization-level statistics, can produce models of how support should be provided. Beyond decision support, these models can help identify staff who can act as advisors, based on the quality, consistency and predictability of dealing with complex troubleshooting reports. Advisor staff models are then used to guide less experienced staff in their decision making; thus, we advocate the deployment of a simple mechanism which exploits the availability of staff with a sound track record at the helpdesk to act as dormant tutors.
This paper addresses the land cover classification task for remote sensing images by deep self-taught learning. Our self-taught learning approach learns suitable feature representations of the input data using sparse representation and undercomplete dictionary learning. We propose a deep learning framework which extracts representations in multiple layers and use the output of the deepest layer as input to a classification algorithm. We evaluate our approach using a multispectral Landsat 5 TM image of a study area in the North of Novo Progresso (South America) and the Zurich Summer Data Set provided by the University of Zurich. Experiments indicate that features learned by a deep self-taught learning framework can be used for classification and improve the results compared to classification results using the original feature representation.
Sea level change, one of the most dire impacts of anthropogenic global warming, will affect a large amount of the world’s population. However, sea level change is not uniform in time and space, and the skill of conventional prediction methods is limited due to the ocean’s internal variabi-lity on timescales from weeks to decades. Here we study the potential of neural network methods which have been used successfully in other applications, but rarely been applied for this task. We develop a combination of a convolutional neural network (CNN) and a recurrent neural network (RNN) to ana-lyse both the spatial and the temporal evolution of sea level and to suggest an independent, accurate method to predict interannual sea level anomalies (SLA). We test our method for the northern and equatorial Pacific Ocean, using gridded altimeter-derived SLA data. We show that the used network designs outperform a simple regression and that adding a CNN improves the skill significantly. The predictions are stable over several years.
Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researches on ‘meta-learning’ are being actively conducted. Recent work has suggested a Memory Augmented Neural Network (MANN) for meta-learning. MANN is an implementation of a Neural Turing Machine (NTM) with the ability to rapidly assimilate new data in its memory, and use this data to make accurate predictions. In models such as MANN, the input data samples and their appropriate labels from previous step are bound together in the same memory locations. This often leads to memory interference when performing a task as these models have to retrieve a feature of an input from a certain memory location and read only the label information bound to that location. In this paper, we tried to address this issue by presenting a more robust MANN. We revisited the idea of meta-learning and proposed a new memory augmented neural network by explicitly splitting the external memory into feature and label memories. The feature memory is used to store the features of input data samples and the label memory stores their labels. Hence, when predicting the label of a given input, our model uses its feature memory unit as a reference to extract the stored feature of the input, and based on that feature, it retrieves the label information of the input from the label memory unit. In order for the network to function in this framework, a new memory-writingmodule to encode label information into the label memory in accordance with the meta-learning task structure is designed. Here, we demonstrate that our model outperforms MANN by a large margin in supervised one-shot classification tasks using Omniglot and MNIST datasets.
In this study, we present Swift Linked Data Miner, an interruptible algorithm that can directly mine an online Linked Data source (e.g., a SPARQL endpoint) for OWL 2 EL class expressions to extend an ontology with new SubClassOf: axioms. The algorithm works by downloading only a small part of the Linked Data source at a time, building a smart index in the memory and swiftly iterating over the index to mine axioms. We propose a transformation function from mined axioms to RDF Data Shapes. We show, by means of a crowdsourcing experiment, that most of the axioms mined by Swift Linked Data Miner are correct and can be added to an ontology. We provide a ready to use Prot\’eg\’e plugin implementing the algorithm, to support ontology engineers in their daily modeling work.
Consider a polynomial optimisation problem, whose instances vary continuously over time. We propose to use a coordinate-descent algorithm for solving such time-varying optimisation problems. In particular, we focus on relaxations of transmission-constrained problems in power systems. On the example of the alternating-current optimal power flows (ACOPF), we bound the difference between the current approximate optimal cost generated by our algorithm and the optimal cost for a relaxation using the most recent data from above by a function of the properties of the instance and the rate of change to the instance over time. We also bound the number of floating-point operations that need to be performed between two updates in order to guarantee the error is bounded from above by a given constant.
Reducing labeling costs in supervised learning is a critical issue in many practical machine learning applications. In this paper, we consider positive-confidence (Pconf) classification, the problem of training a binary classifier only from positive data equipped with confidence. Pconf classification can be regarded as a discriminative extension of one-class classification (which is aimed at ‘describing’ the positive class), with ability to tune hyper-parameters for ‘classifying’ positive and negative samples. Pconf classification is also related to positive-unlabeled (PU) classification (which uses hard-labeled positive data and unlabeled data), allowing us to avoid estimating the class priors, which is a critical bottleneck in typical PU classification methods. For the Pconf classification problem, we provide a simple empirical risk minimization framework and give a formulation for linear-in-parameter models that can be implemented easily and computationally efficiently. We also theoretically establish the consistency and generalization error bounds for Pconf classification, and demonstrate the practical usefulness of the proposed method through experiments.
Recent advances in model compression have provided procedures for compressing large neural networks to a fraction of their original size while retaining most if not all of their accuracy. However, all of these approaches rely on access to the original training set, which might not always be possible if the network to be compressed was trained on a very large dataset, or on a dataset whose release poses privacy or safety concerns as may be the case for biometrics tasks. We present a method for data-free knowledge distillation, which is able to compress deep neural networks trained on large-scale datasets to a fraction of their size leveraging only some extra metadata to be provided with a pretrained model release. We also explore different kinds of metadata that can be used with our method, and discuss tradeoffs involved in using each of them.
In Change point detection task Likelihood Ratio Test (LRT) is sequentially applied in a sliding window procedure. Its high values indicate changes of parametric distribution in the data sequence. Correspondingly LRT values require predefined bound for their maximum. The maximum value has unknown distribution and may be calibrated with multiplier bootstrap. Bootstrap procedure convolves independent components of the Likelihood function with random weights, that enables to estimate empirically LRT distribution. For this empirical distribution of the LRT we show convergence rates to the real maximum value distribution.
The importance of geo-spatial data in critical applications such as emergency response, transportation, agriculture etc., has prompted the adoption of recent GeoSPARQL standard in many RDF processing engines. In addition to large repositories of geo-spatial data — e.g., LinkedGeoData, OpenStreetMap, etc. — spatial data is also routinely found in automatically constructed knowledgebases such as Yago and WikiData. While there have been research efforts for efficient processing of spatial data in RDF/SPARQL, very little effort has gone into building end-to-end systems that can holistically handle complex SPARQL queries along with spatial filters. In this paper, we present Streak, a RDF data management system that is designed to support a wide-range of queries with spatial filters including complex joins, top-k, higher-order relationships over spatially enriched databases. Streak introduces various novel features such as a careful identifier encoding strategy for spatial and non-spatial entities, the use of a semantics-aware Quad-tree index that allows for early-termination and a clever use of adaptive query processing with zero plan-switch cost. We show that Streak can scale to some of the largest publicly available semantic data resources such as Yago3 and LinkedGeoData which contain spatial entities and quantifiable predicates useful for result ranking. For experimental evaluations, we focus on top-k distance join queries and demonstrate that Streak outperforms popular spatial join algorithms as well as state of the art end-to-end systems like Virtuoso and PostgreSQL.
Change point estimation in its offline version is traditionally performed by optimizing over the data set of interest, by considering each data point as the true location parameter and computing a data fit criterion. Subsequently, the data point that minimizes the criterion is declared as the change point estimate. For estimating multiple change points, the procedures are analogous in spirit, but significantly more involved in execution. Since change-points are local discontinuities, only data points close to the actual change point provide useful information for estimation, while data points far away are superfluous, to the point where using only a few points close to the true parameter is just as precise as using the full data set. Leveraging this ‘locality principle’, we introduce a two-stage procedure for the problem at hand, which in the 1st stage uses a sparse subsample to obtain pilot estimates of the underlying change points, and in the 2nd stage refines these estimates by sampling densely in appropriately defined neighborhoods around them. We establish that this method achieves the same rate of convergence and even virtually the same asymptotic distribution as the analysis of the full data, while reducing computational complexity to O(N^0.5) time (N being the length of data set), as opposed to at least O(N) time for all current procedures, making it promising for the analysis on exceedingly long data sets with adequately spaced out change points. The main results are established under a signal plus noise model with independent and identically distributed error terms, but extensions to dependent data settings, as well as multiple stage (>2) procedures are also provided. The performance of our procedure — which is coined ‘intelligent sampling’ — is illustrated on both synthetic and real Internet data streams.
We propose a novel pooling strategy that learns how to adaptively rank deep convolutional features for selecting more informative representations. To this end, we exploit discriminative analysis to project the features onto a space spanned by the number of classes in the dataset under study. This maps the notion of labels in the feature space into instances in the projected space. We employ these projected distances as a measure to rank the existing features with respect to their specific discriminant power for each individual class. We then apply multipartite ranking to score the separability of the instances and aggregate one-versus-all scores to compute an overall distinction score for each feature. For the pooling, we pick features with the highest scores in a pooling window instead of maximum, average or stochastic random assignments. Our experiments on various benchmarks confirm that the proposed strategy of multipartite pooling is highly beneficial to consistently improve the performance of deep convolutional networks via better generalization of the trained models for the test-time data.
Transfer learning is a popular practice in deep neural networks, but fine-tuning of large number of parameters is a hard task due to the complex wiring of neurons between splitting layers and imbalance distributions of data in pretrained and transferred domains. The reconstruction of the original wiring for the target domain is a heavy burden due to the size of interconnections across neurons. We propose a distributed scheme that tunes the convolutional filters individually while backpropagates them jointly by means of basic probability assignment. Some of the most recent advances in evidence theory show that in a vast variety of the imbalanced regimes, optimizing of some proper objective functions derived from contingency matrices prevents biases towards high-prior class distributions. Therefore, the original filters get gradually transferred based on individual contributions to overall performance of the target domain. This largely reduces the expected complexity of transfer learning whilst highly improves precision. Our experiments on standard benchmarks and scenarios confirm the consistent improvement of our distributed deep transfer learning strategy.
A common practice in most of deep convolutional neural architectures is to employ fully-connected layers followed by Softmax activation to minimize cross-entropy loss for the sake of classification. Recent studies show that substitution or addition of the Softmax objective to the cost functions of support vector machines or linear discriminant analysis is highly beneficial to improve the classification performance in hybrid neural networks. We propose a novel paradigm to link the optimization of several hybrid objectives through unified backpropagation. This highly alleviates the burden of extensive boosting for independent objective functions or complex formulation of multiobjective gradients. Hybrid loss functions are linked by basic probability assignment from evidence theory. We conduct our experiments for a variety of scenarios and standard datasets to evaluate the advantage of our proposed unification approach to deliver consistent improvements into the classification performance of deep convolutional neural networks.
In this paper, we deal with the task of building a dynamic ensemble of chain classifiers for multi-label classification. To do so, we proposed two concepts of classifier chains algorithms that are able to change label order of the chain without rebuilding the entire model. Such modes allows anticipating the instance-specific chain order without a significant increase in computational burden. The proposed chain models are built using the Naive Bayes classifier and nearest neighbour approach as a base single-label classifiers. To take the benefits of the proposed algorithms, we developed a simple heuristic that allows the system to find relatively good label order. The heuristic sort labels according to the label-specific classification quality gained during the validation phase. The heuristic tries to minimise the phenomenon of error propagation in the chain. The experimental results showed that the proposed model based on Naive Bayes classifier the above-mentioned heuristic is an efficient tool for building dynamic chain classifiers.
The optimization-based design of renewable energy systems is a computationally demanding task because of the high temporal fluctuation of supply and demand time series. In order to reduce these time series, the aggregation of typical operation periods has become common. The problem with this method is that these aggregated typical periods are modeled independently and cannot exchange energy. Therefore, seasonal storage cannot be adequately taken into account, although this will be necessary for energy systems with a high share of renewable generation. To address this issue, this paper proposes a novel mathematical description for storage inventories based on the superposition of inter-period and intra-period states. Inter-period states connect the typical periods and are able to account their sequence. The approach has been adopted for different energy system configurations. The results show that a significant reduction in the computational load can be achieved also for long term storage-based energy system models in comparison to optimization models based on the full annual time series.
Since the publication of ‘Complex Contagions and the Weakness of Long Ties’ in 2007, complex contagions have been studied across an enormous variety of social domains. In reviewing this decade of research, we discuss recent advancements in applied studies of complex contagions, particularly in the domains of health, innovation diffusion, social media, and politics. We also discuss how these empirical studies have spurred complementary advancements in the theoretical modeling of contagions, which concern the effects of network topology on diffusion, as well as the effects of individual-level attributes and thresholds. In synthesizing these developments, we suggest three main directions for future research. The first concerns the study of how multiple contagions interact within the same network and across networks, in what may be called an ecology of contagions. The second concerns the study of how the structure of thresholds and their behavioral consequences can vary by individual and social context. The third area concerns the roles of diversity and homophily in the dynamics of complex contagion, including both diversity of demographic profiles among local peers, and the broader notion of structural diversity within a network. Throughout this discussion, we make an effort to highlight the theoretical and empirical opportunities that lie ahead.