User engagement in social networks depends critically on the number of online actions their users take in the network. Can we design an algorithm that finds when to incentivize users to take actions to maximize the overall activity in a social network? In this paper, we model the number of online actions over time using multidimensional Hawkes processes, derive an alternate representation of these processes based on stochastic differential equations (SDEs) with jumps and, exploiting this alternate representation, address the above question from the perspective of stochastic optimal control of SDEs with jumps. We find that the optimal level of incentivized actions depends linearly on the current level of overall actions. Moreover, the coefficients of this linear relationship can be found by solving a matrix Riccati differential equation, which can be solved efficiently, and a first order differential equation, which has a closed form solution. As a result, we are able to design an efficient online algorithm, Cheshire, to sample the optimal times of the users’ incentivized actions. Experiments on both synthetic and real data gathered from Twitter show that our algorithm is able to consistently maximize the number of online actions more effectively than the state of the art.
Large networks of queueing systems model important real-world systems such as MapReduce clusters, web-servers, hospitals, call-centers and airport passenger terminals.To model such systems accurately we must infer queueing parameters from data. Unfortunately, for many queueing networks there is no clear way to proceed with parameter inference from data. Approximate Bayesian computation could offer a straight-forward way to infer parameters for such networks if we could simulate data quickly enough. We present a computationally efficient method for simulating from a very general set of queueing networks with the R package queuecomputer. Remarkable speedups of more than 2 orders of magnitude are observed relative to the popular DES packages simmer and simpy. We replicate output from these packages to validate the package. The package is modular and integrates well with the popular R package dplyr. Complex queueing networks with tandem, parallel and fork/join topologies can easily be built with these two packages together. We show how to use this package with two examples: a call-centre and an airport terminal.
While Multiple Instance (MI) data are point patterns — sets or multi-sets of unordered points — appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.
Evaluating similarity between graphs is of major importance in several computer vision and pattern recognition problems, where graph representations are often used to model objects or interactions between elements. The choice of a distance or similarity metric is, however, not trivial and can be highly dependent on the application at hand. In this work, we propose a novel metric learning method to evaluate distance between graphs that leverages the power of convolutional neural networks, while exploiting concepts from spectral graph theory to allow these operations on irregular graphs. We demonstrate the potential of our method in the field of connectomics, where neuronal pathways or functional connections between brain regions are commonly modelled as graphs. In this problem, the definition of an appropriate graph similarity function is critical to unveil patterns of disruptions associated with certain brain disorders. Experimental results on the ABIDE dataset show that our method can learn a graph similarity metric tailored for a clinical application, improving the performance of a simple k-nn classifier by 11.9% compared to a traditional distance metric.
Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation (C-TMLE) procedure. The original implementation/instantiation of the C-TMLE template can be presented as a greedy forward stepwise C-TMLE algorithm. It does not scale well when the number $p$ of covariates increases drastically. This motivates the introduction of a novel instantiation of the C-TMLE template where the covariates are pre-ordered. Its time complexity is $\mathcal{O}(p)$ as opposed to the original $\mathcal{O}(p^2)$, a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another implementation/instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is $\mathcal{O}(p)$ as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy C-TMLE algorithm is unacceptably slow. Simulation studies indicate our scalable C-TMLE and SL-C-TMLE algorithms work well.
Generative adversarial nets (GANs) are good at generating realistic images and have been extended for semi-supervised classification. However, under a two-player formulation, existing work shares competing roles of identifying fake samples and predicting labels via a single discriminator network, which can lead to undesirable incompatibility. We present triple generative adversarial net (Triple-GAN), a flexible game-theoretical framework for classification and class-conditional generation in semi-supervised learning. Triple-GAN consists of three players – a generator, a discriminator and a classifier, where the generator and classifier characterize the conditional distributions between images and labels, and the discriminator solely focuses on identifying fake image-label pairs. With designed utilities, the distributions characterized by the classifier and generator both concentrate to the data distribution under nonparametric assumptions. We further propose unbiased regularization terms to make the classifier and generator strongly coupled and some biased techniques to boost the performance of Triple-GAN in practice. Our results on several datasets demonstrate the promise in semi-supervised learning, where Triple-GAN achieves comparable or superior performance than state-of-the-art classification results among DGMs; it is also able to disentangle the classes and styles and transfer smoothly on the data level via interpolation on the latent space class-conditionally.
In this paper, we address the problem of recovering arbitrary-shaped data clusters from massive datasets. We present DBMSTClu a new density-based non-parametric method working on a limited number of linear measurements i.e. a sketched version of the similarity graph $G$ between the $N$ objects to cluster. Unlike $k$-means, $k$-medians or $k$-medoids algorithms, it does not fail at distinguishing clusters with particular structures. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a graph-based technique relies on the similarity graph $G$ which costs theoretically $O(N^2)$ in memory. However, our algorithm follows the dynamic semi-streaming model by handling $G$ as a stream of edge weight updates and sketches it in one pass over the data into a compact structure requiring $O(\operatorname{poly} \operatorname{log} (N))$ space. Thanks to the property of the Minimum Spanning Tree (MST) for expressing the underlying structure of a graph, our algorithm successfully detects the right number of non-convex clusters by recovering an approximate MST from the graph sketch of $G$. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing state-of-the-art on several datasets.
Distributed minimum spanning tree (MST) problem is one of the most central and fundamental problems in distributed graph algorithms. Garay et al. \cite{GKP98,KP98} devised an algorithm with running time $O(D + \sqrt{n} \cdot \log^* n)$, where $D$ is the hop-diameter of the input $n$-vertex $m$-edge graph, and with message complexity $O(m + n^{3/2})$. Peleg and Rubinovich \cite{PR99} showed that the running time of the algorithm of \cite{KP98} is essentially tight, and asked if one can achieve near-optimal running time **together with near-optimal message complexity**. In a recent breakthrough, Pandurangan et al. \cite{PRS16} answered this question in the affirmative, and devised a **randomized** algorithm with time $\tilde{O}(D+ \sqrt{n})$ and message complexity $\tilde{O}(m)$. They asked if such a simultaneous time- and message-optimality can be achieved by a **deterministic** algorithm. In this paper, building upon the work of \cite{PRS16}, we answer this question in the affirmative, and devise a **deterministic** algorithm that computes MST in time $O((D + \sqrt{n}) \cdot \log n)$, using $O(m \cdot \log n + n \log n \cdot \log^* n)$ messages. The polylogarithmic factors in the time and message complexities of our algorithm are significantly smaller than the respective factors in the result of \cite{PRS16}. Also, our algorithm and its analysis are very **simple** and self-contained, as opposed to rather complicated previous sublinear-time algorithms \cite{GKP98,KP98,E04b,PRS16}.
Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, decisions and predictions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state space models. There is no closed-form solution available for this problem, implying that we are forced to use approximations. In this tutorial we will provide a self-contained introduction to one of the state-of-the-art methods—the particle Metropolis-Hastings algorithm—which has proven to offer very practical approximations. This is a Monte Carlo based method, where the so-called particle filter is used to guide a Markov chain Monte Carlo method through the parameter space. One of the key merits of the particle Metropolis-Hastings method is that it is guaranteed to converge to the ‘true solution’ under mild assumptions, despite being based on a practical implementation of a particle filter (i.e., using a finite number of particles). We will also provide a motivating numerical example illustrating the method which we have implemented in an in-house developed modeling language, serving the purpose of abstracting away the underlying mathematics of the Monte Carlo approximations from the user. This modeling language will open up the power of sophisticated Monte Carlo methods, including particle Metropolis-Hastings, to a large group of users without requiring them to know all the underlying mathematical details.
We employ unsupervised machine learning techniques to learn latent parameters which best describe states of the two-dimensional Ising model and the three-dimensional XY model. These methods range from principle component analysis to artificial neural network based variational autoencoders. The states are sampled using a Monte-Carlo simulation above and below the critical temperature. We find that the predicted latent parameters correspond to the known order parameters. The latent representation of the states of the models in question are clustered, which makes it possible to identify phases without prior knowledge of their existence or the underlying Hamiltonian. Furthermore, we find that the reconstruction loss function can be used as a universal identifier for phase transitions.
A method for online tensor dictionary learning is proposed. With the assumption of separable dictionaries, tensor contraction is used to diminish a $N$-way model of $\mathcal{O}\left(L^N\right)$ into a simple matrix equation of $\mathcal{O}\left(NL^2\right)$ with a real-time capability. To avoid numerical instability due to inversion of sparse matrix, a class of stochastic gradient with memory is formulated via a least-square solution to guarantee convergence and robustness. Both gradient descent with exact line search and Newton’s method are discussed and realized. Extensions onto how to deal with bad initialization and outliers are also explained in detail. Experiments on two synthetic signals confirms an impressive performance of our proposed method.
The development of Smart Grid in Norway in specific and Europe/US in general will shortly lead to the availability of massive amount of fine-grained spatio-temporal consumption data from domestic households. This enables the application of data mining techniques for traditional problems in power system. Clustering customers into appropriate groups is extremely useful for operators or retailers to address each group differently through dedicated tariffs or customer-tailored services. Currently, the task is done based on demographic data collected through questionnaire, which is error-prone. In this paper, we used three different clustering techniques (together with their variants) to automatically segment electricity consumers based on their consumption patterns. We also proposed a good way to extract consumption patterns for each consumer. The grouping results were assessed using four common internal validity indexes. We found that the combination of Self Organizing Map (SOM) and k-means algorithms produce the most insightful and useful grouping. We also discovered that grouping quality cannot be measured effectively by automatic indicators, which goes against common suggestions in literature.
This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse – but still acceptable – performance when compared to the single language model, while benefiting from better generalization properties across languages.
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, and on many tasks even beats supervised models, highlighting the robustness of the produced sentence embeddings.
Online learning to rank is an important problem in machine learning, information retrieval, recommender systems, and display advertising. Many provably efficient algorithms have been developed for this problem recently, under specific click models. A click model is a model of how users click on a list of documents. Though these results are significant, the proposed algorithms have limited application because they are designed for specific click models, and do not have guarantees beyond them. To overcome this challenge, we propose a novel algorithm, which we call MergeRank, for learning to rank in a class of click models that satisfy mild technical assumptions. This class encompasses two most fundamental click models, the cascade and position-based models. We derive a gap-dependent upper bound on the expected $n$-step regret of MergeRank and evaluate it on web search queries. We observe that MergeRank performs better than ranked bandits and is more robust than CascadeKL-UCB, an existing algorithm for learning to rank in the cascade model.
Advertisements