Several studies have shown that the network traffic that is generated by a visit to a website over Tor reveals information specific to the website through the timing and sizes of network packets. By capturing traffic traces between users and their Tor entry guard, a network eavesdropper can leverage this meta-data to reveal which website Tor users are visiting. The success of such attacks heavily depends on the particular set of traffic features that are used to construct the fingerprint. Typically, these features are manually engineered and, as such, any change introduced to the Tor network can render these carefully constructed features ineffective. In this paper, we show that an adversary can automate the feature engineering process, and thus automatically deanonymize Tor traffic by applying our novel method based on deep learning. We evaluate our approach on a dataset comprised of more than three million network traces, which is the largest dataset of web traffic ever gathered for website fingerprinting, and find that the performance achieved by deep learning techniques is comparable to known approaches which include various research efforts spanning over multiple years. Furthermore, it eliminates the need for feature design and selection which is a tedious work and has been one of the main focus of prior work. We conclude that the ability to automatically construct the most relevant traffic features and perform accurate traffic recognition makes our deep learning based approach an efficient, flexible and robust technique for website fingerprinting.
This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data – we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix
Although personalized recommendation has been investigated for decades, the wide adoption of Latent Factor Models (LFM) has made the explainability of recommendations a critical issue to both the research community and practical application of recommender systems. For example, in many practical systems the algorithm just provides a personalized item recommendation list to the users, without persuasive personalized explanation about why such an item is recommended while another is not. Unexplainable recommendations introduce negative effects to the trustworthiness of recommender systems, and thus affect the effectiveness of recommendation engines. In this work, we investigate explainable recommendation in aspects of data explainability, model explainability, and result explainability, and the main contributions are as follows: 1. Data Explainability: We propose Localized Matrix Factorization (LMF) framework based Bordered Block Diagonal Form (BBDF) matrices, and further applied this technique for parallelized matrix factorization. 2. Model Explainability: We propose Explicit Factor Models (EFM) based on phrase-level sentiment analysis, as well as dynamic user preference modeling based on time series analysis. In this work, we extract product features and user opinions towards different features from large-scale user textual reviews based on phrase-level sentiment analysis techniques, and introduce the EFM approach for explainable model learning and recommendation. 3. Economic Explainability: We propose the Total Surplus Maximization (TSM) framework for personalized recommendation, as well as the model specification in different types of online applications. Based on basic economic concepts, we provide the definitions of utility, cost, and surplus in the application scenario of Web services, and propose the general framework of web total surplus calculation and maximization.
SafePredict is a novel meta-algorithm that works with any base prediction algorithm for online data to guarantee an arbitrarily chosen correctness rate, $1-\epsilon$, by allowing refusals. Allowing refusals means that the meta-algorithm may refuse to emit a prediction produced by the base algorithm on occasion so that the error rate on non-refused predictions does not exceed $\epsilon$. The SafePredict error bound does not rely on any assumptions on the data distribution or the base predictor. When the base predictor happens not to exceed the target error rate $\epsilon$, SafePredict refuses only a finite number of times. When the error rate of the base predictor changes through time SafePredict makes use of a weight-shifting heuristic that adapts to these changes without knowing when the changes occur yet still maintains the correctness guarantee. Empirical results show that (i) SafePredict compares favorably with state-of-the art confidence based refusal mechanisms which fail to offer robust error guarantees; and (ii) combining SafePredict with such refusal mechanisms can in many cases further reduce the number of refusals. Our software (currently in Python) is included in the supplementary material.
This paper introduces a new probabilistic architecture called Sum-Product Graphical Model (SPGM). SPGMs combine traits from Sum-Product Networks (SPNs) and Graphical Models (GMs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, the new architecture represents a class of probability distributions that combines, for the first time, the semantics of graphical models with the evaluation efficiency of SPNs. We also propose a novel algorithm for learning both the structure and the parameters of SPGMs. A comparative empirical evaluation demonstrates competitive performances of our approach in density estimation.
We define a generalization of the arithmetic mean to bounded well-ordered sequences of real numbers. We show that every probability space admits a well-ordered sequences of points such that the measure of each measurable subset is equal to the frequency with which the sequence is in this subset. We include an argument suggested by Woodin that the club filter on $\omega_1$ does not admit such a sequence of order type $\omega_1$.
Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log n-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. The analysis gives some insights why multilayer feedforward neural networks perform well in practice. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural.
Image classification is one of the fundamental problems in computer vision. Owing to the availability of large image datasets like ImageNet and YFCC100M, a plethora of research has been conducted to do high precision image classification and many remarkable achievements have been made. The success of most existing methods hinges on a basic hypothesis that the testing image set has the same distribution as the training image set. However, in many real applications, we cannot guarantee the validity of the i.i.d. hypothesis since the testing image set is unseen. It is thus desirable to learn an image classifier, which can perform well even in non-i.i.d. situations. In this paper, we propose a novel Causally Regularized Logistic Regression (CRLR) algorithm to address the non-i.i.d. problem without knowing testing data information by searching for causal features. The causal features refer to characteristics truly determining whether a special object belongs to a category or not. Algorithmically, we propose a causal regularizer for causal feature identification by jointly optimizing it with a logistic loss term. Assisted with the causal regularizer, we can estimate the causal contribution (causal effect) of each focal image feature (viewed as a treatment variable) by sample reweighting which ensures the distributions of all remaining image features between images with different focal feature levels are close. The resultant classifier will be based on the estimated causal contributions of the features, rather than traditional correlation-based contributions. To validate the e effectiveness of our CRLR algorithm, we manually construct a new image dataset from YFCC100M, simulating various non-i.i.d. situations in the real world, and conduct extensive experiments for image classification. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods.
Actual causation is concerned with the question ‘what caused what?’. Consider a transition between two subsequent observations within a system of elements. Even under perfect knowledge of the system, a straightforward answer to this question may not be available. Counterfactual accounts of actual causation based on graphical models, paired with system interventions, have demonstrated initial success in addressing specific problem cases. We present a formal account of actual causation, applicable to discrete dynamical systems of interacting elements, that considers all counterfactual states of a state transition from t-1 to t. Within such a transition, causal links are considered from two complementary points of view: we can ask if any occurrence at time t has an actual cause at t-1, but also if any occurrence at time t-1 has an actual effect at t. We address the problem of identifying such actual causes and actual effects in a principled manner by starting from a set of basic requirements for causation (existence, composition, information, integration, and exclusion). We present a formal framework to implement these requirements based on system manipulations and partitions. This framework is used to provide a complete causal account of the transition by identifying and quantifying the strength of all actual causes and effects linking two occurrences. Finally, we examine several exemplary cases and paradoxes of causation and show that they can be illuminated by the proposed framework for quantifying actual causation.
In an era where big data is becoming the norm, we are becoming less concerned with the quantity of the data for our models, but rather the quality. With such large amounts of data collected from multiple heterogeneous sources comes the associated problems, often missing views. As most models could not handle whole view missing problem, it brings up a significant challenge when conducting any multi-view analysis, especially when used in the context of very large and heterogeneous datasets. However if dealt with properly, joint learning from these complementary sources can be advantageous. In this work, we present a method for imputing missing views based on generative adversarial networks called VIGAN which combines cross-domain relations given unpaired data with multi-view relations given paired data. In our model, VIGAN first learns bidirectional mapping between view X and view Y using a cycle-consistent adversarial network. Moreover, we incorporate a denoising multimodal autoencoder to refine the initial approximation by making use of the joint representation. Empirical results give evidence indicating VIGAN offers competitive results compared to other methods on both numeric and image data.
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and—because the behavior of neural networks is difficult to explicate—stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.