Cost-Sensitive Online Classification has drawn extensive attention in recent years, where the main approach is to directly online optimize two well-known cost-sensitive metrics: (i) weighted sum of sensitivity and specificity; (ii) weighted misclassification cost. However, previous existing methods only considered first-order information of data stream. It is insufficient in practice, since many recent studies have proved that incorporating second-order information enhances the prediction performance of classification models. Thus, we propose a family of cost-sensitive online classification algorithms with adaptive regularization in this paper. We theoretically analyze the proposed algorithms and empirically validate their effectiveness and properties in extensive experiments. Then, for better trade off between the performance and efficiency, we further introduce the sketching technique into our algorithms, which significantly accelerates the computational speed with quite slight performance loss. Finally, we apply our algorithms to tackle several online anomaly detection tasks from real world. Promising results prove that the proposed algorithms are effective and efficient in solving cost-sensitive online classification problems in various real-world domains.
In this paper, localized information privacy (LIP) is proposed, as a new privacy definition, which allows statistical aggregation while protecting users’ privacy without relying on a trusted third party. The notion of context-awareness is incorporated in LIP by the introduction of priors, which enables the design of privacy-preserving data aggregation with knowledge of priors. We show that LIP relaxes the Localized Differential Privacy (LDP) notion by explicitly modeling the adversary’s knowledge. However, it is stricter than $2\epsilon$-LDP and $\epsilon$-mutual information privacy. The incorporation of local priors allows LIP to achieve higher utility compared to other approaches. We then present an optimization framework for privacy-preserving data aggregation, with the goal of minimizing the expected squared error while satisfying the LIP privacy constraints. Utility-privacy tradeoffs are obtained under several models in closed-form. We then validate our analysis by {numerical analysis} using both synthetic and real-world data. Results show that our LIP mechanism provides better utility-privacy tradeoffs than LDP and when the prior is not uniformly distributed, the advantage of LIP is even more significant.
Most of the literature around text classification treats it as a supervised learning problem: given a corpus of labeled documents, train a classifier such that it can accurately predict the classes of unseen documents. In industry, however, it is not uncommon for a business to have entire corpora of documents where few or none have been classified, or where existing classifications have become meaningless. With web content, for example, poor taxonomy management can result in labels being applied indiscriminately, making filtering by these labels unhelpful. Our work aims to make it possible to classify an entire corpus of unlabeled documents using a human-in-the-loop approach, where the content owner manually classifies just one or two documents per category and the rest can be automatically classified. This ‘few-shot’ learning approach requires rich representations of the documents such that those that have been manually labeled can be treated as prototypes, and automatic classification of the rest is a simple case of measuring the distance to prototypes. This approach uses pre-trained word embeddings, where documents are represented using a simple weighted average of constituent word embeddings. We have tested the accuracy of the approach on existing labeled datasets and provide the results here. We have also made code available for reproducing the results we got on the 20 Newsgroups dataset.
Design of experiments is a fundamental topic in applied statistics with a long history. Yet its application is often limited by the complexity and costliness of constructing experimental designs in the first place. For example, in optimal design, constructing an experimental design involves searching the high-dimensional input space — a computationally expensive procedure that only guarantees a locally optimal solution. This is a difficult problem that, typically, can only be ‘solved’ by changing the optimality criterion to be based on a simplified model. Such approximations are sometimes justifiable but rarely broadly desirable. In this work, we introduce a novel approach to the challenging design problem. We will take a probabilistic view of the problem by representing the optimal design as being one element (or a subset of elements) of a probability space. Given a suitable distribution on this space, a generative process can be specified from which stochastic design realizations can be drawn. We describe a scenario where the classical (point estimate) optimal design solution coincides with the mode of the generative process we specify. We conclude with outlining an algorithm for drawing such design realizations, its extension to sequential design, and applying the techniques developed to constructing space-filling designs for Stochastic Gradient Descent and entropy-optimal designs for Gaussian process regression.
Inverse problems play a central role for many classical computer vision and image processing tasks. A key challenge in solving an inverse problem is to find an appropriate prior to convert an ill-posed problem into a well-posed task. Many of the existing priors, like total variation, are based on ad-hoc assumptions that have difficulties to represent the actual distribution of natural images. In this work, we propose the Adaptive Quantile Sparse Image (AQuaSI) prior. It is based on a quantile filter, can be used as a joint filter on guidance data, and be readily plugged into a wide range of numerical optimization algorithms. We demonstrate the efficacy of the proposed prior in joint RGB/depth upsampling, on RGB/NIR image restoration, and in a comparison with related regularization by denoising approaches.
The extensive use of social media for sharing and obtaining information has resulted in the development of topic detection models to facilitate the comprehension of the overwhelming amount of short and distributed posts. Probabilistic topic models, such as Latent Dirichlet allocation, represent topics as sets of terms that are useful for many automated processes. However, the determination of what a topic is about is left as a further task. Alternatively, techniques that produce summaries are human comprehensible, but less suitable for automated processing. This work proposes an approach that utilizes Linked Open Data (LOD) resources to extract semantically represented topics from collections of microposts. The proposed approach utilizes entity linking to identify the elements of topics from microposts. The elements are related through co-occurrence graphs, which are processed to yield topics. The topics are represented using an ontology that is introduced for this purpose. A prototype of the approach is used to identify topics from 11 datasets consisting of more than one million posts collected from Twitter during various events, such as the 2016 US election debates and the death of Carrie Fisher. The characteristics of the approach and more than 5 thousand generated topics are described in detail. A human evaluation of topics from 30 randomly selected intervals resulted in a precision of 81.0% and F1 score of 93.3%. Furthermore, they are compared with topics generated from the same datasets with two different kinds of topic models. The potentials of semantic topics in revealing information, that is not otherwise easily observable, is demonstrated with semantic queries of various complexities.
The problem of comparing concepts of dependence in general rough sets with those in probability theory had been initiated by the present author in some of her recent papers. This problem relates to the identification of the limitations of translating between the methodologies and possibilities in the identification of concepts. Comparison of ideas of dependence in the approaches had been attempted from a set-valuation based minimalist perspective by the present author. The deviant probability framework has been the result of such an approach. Other Bayesian reasoning perspectives (involving numeric valuations) and frequentist approaches are also known. In this research, duality results are adapted to demonstrate the possibility of improved comparisons across implications between ontologically distinct concepts in a common logic-based framework by the present author. Both positive and negative results are proved that delimit possible comparisons in a clearer way by her.
A large class of linear memory differential equations in one dimension, where the evolution depends on the whole history, can be equivalently described as a projection of a Markov process living in a higher dimensional space. Starting with such a memory equation, we propose an explicit construction of the corresponding Markov process. From a physical point of view the Markov process can be understood as a change of the type of some quasiparticles along one-way loops. Typically, the arising Markov process does not have the detailed balance property. The method leads to a more realistic modeling of memory equations. Moreover, it carries over the large number of investigation tools for Markov processes to memory equations, like the calculation of the equilibrium state, the asymptotic behavior and so on. The method can be used for an approximative solution of some degenerate memory equations like delay differential equations.
We propose and analyze a novel adaptive step size variant of the Davis-Yin three operator splitting, a method that can solve optimization problems composed of a sum of a smooth term for which we have access to its gradient and an arbitrary number of potentially non-smooth terms for which we have access to their proximal operator. The proposed method leverages local information of the objective function, allowing for larger step sizes while preserving the convergence properties of the original method. It only requires two extra function evaluations per iteration and does not depend on any step size hyperparameter besides an initial estimate. We provide a convergence rate analysis of this method, showing sublinear convergence rate for general convex functions and linear convergence under stronger assumptions, matching the best known rates of its non adaptive variant. Finally, an empirical comparison with related methods on 6 different problems illustrates the computational advantage of the adaptive step size strategy.
This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets.
Support Vector Machine (SVM) is an efficient classification approach, which finds a hyperplane to separate data from different classes. This hyperplane is determined by support vectors. In existing SVM formulations, the objective function uses L2 norm or L1 norm on slack variables. The number of support vectors is a measure of generalization errors. In this work, we propose a Minimal SVM, which uses L0.5 norm on slack variables. The result model further reduces the number of support vectors and increases the classification performance.