Sketched Subspace Clustering (Sketch-SC) google
The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data. Subspace clustering (SC) is a relatively recent method that is able to successfully classify nonlinearly separable data in a multitude of settings. In spite of their high clustering accuracy, SC methods incur prohibitively high computational complexity when processing large volumes of high-dimensional data. Inspired by random sketching approaches for dimensionality reduction, the present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored for large volumes of high-dimensional data. Sketch-SC accelerates the computationally heavy parts of state-of-the-art SC approaches by compressing the data matrix across both dimensions using random projections, thus enabling fast and accurate large-scale SC. Performance analysis as well as extensive numerical tests on real data corroborate the potential of Sketch-SC and its competitive performance relative to state-of-the-art scalable SC approaches. …

Horseshoe Estimator google
This paper proposes a new approach to sparse-signal detection called the horseshoe estimator. We show that the horseshoe is a close cousin of the lasso in that it arises from the same class of multivariate scale mixtures of normals, but that it is almost universally superior to the double-exponential prior at handling sparsity. A theoretical framework is proposed for understanding why the horseshoe is a better default “sparsity” estimator than those that arise from powered-exponential priors. Comprehensive numerical evidence is presented to show that the difference in performance can often be large. Most importantly, we show that the horseshoe estimator corresponds quite closely to the answers one would get if one pursued a full Bayesian model-averaging approach using a “two-groups” model: a point mass at zero for noise, and a continuous density for signals. Surprisingly, this correspondence holds both for the estimator itself and for the classification rule induced by a simple threshold applied to the estimator. We show how the resulting thresholded horseshoe can also be viewed as a novel Bayes multiple-testing procedure. …

subgraph2vec google
In this paper, we present subgraph2vec, a novel approach for learning latent representations of rooted subgraphs from large graphs inspired by recent advancements in Deep Learning and Graph Kernels. These latent representations encode semantic substructure dependencies in a continuous vector space, which is easily exploited by statistical models for tasks such as graph classification, clustering, link prediction and community detection. subgraph2vec leverages on local information obtained from neighbourhoods of nodes to learn their latent representations in an unsupervised fashion. We demonstrate that subgraph vectors learnt by our approach could be used in conjunction with classifiers such as CNNs, SVMs and relational data clustering algorithms to achieve significantly superior accuracies. Also, we show that the subgraph vectors could be used for building a deep learning variant of Weisfeiler-Lehman graph kernel. Our experiments on several benchmark and large-scale real-world datasets reveal that subgraph2vec achieves significant improvements in accuracies over existing graph kernels on both supervised and unsupervised learning tasks. Specifically, on two realworld program analysis tasks, namely, code clone and malware detection, subgraph2vec outperforms state-of-the-art kernels by more than 17% and 4%, respectively. …

Advertisements