Automated Feature Extraction for Website Fingerprinting through Deep Learning

Several studies have shown that the network traffic that is generated by a visit to a website over Tor reveals information specific to the website through the timing and sizes of network packets. By capturing traffic traces between users and their Tor entry guard, a network eavesdropper can leverage this meta-data to reveal which website Tor users are visiting. The success of such attacks heavily depends on the particular set of traffic features that are used to construct the fingerprint. Typically, these features are manually engineered and, as such, any change introduced to the Tor network can render these carefully constructed features ineffective. In this paper, we show that an adversary can automate the feature engineering process, and thus automatically deanonymize Tor traffic by applying our novel method based on deep learning. We evaluate our approach on a dataset comprised of more than three million network traces, which is the largest dataset of web traffic ever gathered for website fingerprinting, and find that the performance achieved by deep learning techniques is comparable to known approaches which include various research efforts spanning over multiple years. Furthermore, it eliminates the need for feature design and selection which is a tedious work and has been one of the main focus of prior work. We conclude that the ability to automatically construct the most relevant traffic features and perform accurate traffic recognition makes our deep learning based approach an efficient, flexible and robust technique for website fingerprinting.

A Tutorial on Hawkes Processes for Events in Social Media

This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data – we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix

Explainable Recommendation: Theory and Applications

Although personalized recommendation has been investigated for decades, the wide adoption of Latent Factor Models (LFM) has made the explainability of recommendations a critical issue to both the research community and practical application of recommender systems. For example, in many practical systems the algorithm just provides a personalized item recommendation list to the users, without persuasive personalized explanation about why such an item is recommended while another is not. Unexplainable recommendations introduce negative effects to the trustworthiness of recommender systems, and thus affect the effectiveness of recommendation engines. In this work, we investigate explainable recommendation in aspects of data explainability, model explainability, and result explainability, and the main contributions are as follows: 1. Data Explainability: We propose Localized Matrix Factorization (LMF) framework based Bordered Block Diagonal Form (BBDF) matrices, and further applied this technique for parallelized matrix factorization. 2. Model Explainability: We propose Explicit Factor Models (EFM) based on phrase-level sentiment analysis, as well as dynamic user preference modeling based on time series analysis. In this work, we extract product features and user opinions towards different features from large-scale user textual reviews based on phrase-level sentiment analysis techniques, and introduce the EFM approach for explainable model learning and recommendation. 3. Economic Explainability: We propose the Total Surplus Maximization (TSM) framework for personalized recommendation, as well as the model specification in different types of online applications. Based on basic economic concepts, we provide the definitions of utility, cost, and surplus in the application scenario of Web services, and propose the general framework of web total surplus calculation and maximization.

SafePredict: A Meta-Algorithm for Machine Learning That Uses Refusals to Guarantee Correctness

SafePredict is a novel meta-algorithm that works with any base prediction algorithm for online data to guarantee an arbitrarily chosen correctness rate, 1-\epsilon, by allowing refusals. Allowing refusals means that the meta-algorithm may refuse to emit a prediction produced by the base algorithm on occasion so that the error rate on non-refused predictions does not exceed \epsilon. The SafePredict error bound does not rely on any assumptions on the data distribution or the base predictor. When the base predictor happens not to exceed the target error rate \epsilon, SafePredict refuses only a finite number of times. When the error rate of the base predictor changes through time SafePredict makes use of a weight-shifting heuristic that adapts to these changes without knowing when the changes occur yet still maintains the correctness guarantee. Empirical results show that (i) SafePredict compares favorably with state-of-the art confidence based refusal mechanisms which fail to offer robust error guarantees; and (ii) combining SafePredict with such refusal mechanisms can in many cases further reduce the number of refusals. Our software (currently in Python) is included in the supplementary material.

Sum-Product Graphical Models

This paper introduces a new probabilistic architecture called Sum-Product Graphical Model (SPGM). SPGMs combine traits from Sum-Product Networks (SPNs) and Graphical Models (GMs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, the new architecture represents a class of probability distributions that combines, for the first time, the semantics of graphical models with the evaluation efficiency of SPNs. We also propose a novel algorithm for learning both the structure and the parameters of SPGMs. A comparative empirical evaluation demonstrates competitive performances of our approach in density estimation.

The transfinite mean

We define a generalization of the arithmetic mean to bounded well-ordered sequences of real numbers. We show that every probability space admits a well-ordered sequences of points such that the measure of each measurable subset is equal to the frequency with which the sequence is in this subset. We include an argument suggested by Woodin that the club filter on \omega_1 does not admit such a sequence of order type \omega_1.

Nonparametric regression using deep neural networks with ReLU activation function

Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log n-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. The analysis gives some insights why multilayer feedforward neural networks perform well in practice. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural.

On Image Classification: Correlation v.s. Causality

Image classification is one of the fundamental problems in computer vision. Owing to the availability of large image datasets like ImageNet and YFCC100M, a plethora of research has been conducted to do high precision image classification and many remarkable achievements have been made. The success of most existing methods hinges on a basic hypothesis that the testing image set has the same distribution as the training image set. However, in many real applications, we cannot guarantee the validity of the i.i.d. hypothesis since the testing image set is unseen. It is thus desirable to learn an image classifier, which can perform well even in non-i.i.d. situations. In this paper, we propose a novel Causally Regularized Logistic Regression (CRLR) algorithm to address the non-i.i.d. problem without knowing testing data information by searching for causal features. The causal features refer to characteristics truly determining whether a special object belongs to a category or not. Algorithmically, we propose a causal regularizer for causal feature identification by jointly optimizing it with a logistic loss term. Assisted with the causal regularizer, we can estimate the causal contribution (causal effect) of each focal image feature (viewed as a treatment variable) by sample reweighting which ensures the distributions of all remaining image features between images with different focal feature levels are close. The resultant classifier will be based on the estimated causal contributions of the features, rather than traditional correlation-based contributions. To validate the e effectiveness of our CRLR algorithm, we manually construct a new image dataset from YFCC100M, simulating various non-i.i.d. situations in the real world, and conduct extensive experiments for image classification. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods.

What caused what? An irreducible account of actual causation

Actual causation is concerned with the question ‘what caused what?’. Consider a transition between two subsequent observations within a system of elements. Even under perfect knowledge of the system, a straightforward answer to this question may not be available. Counterfactual accounts of actual causation based on graphical models, paired with system interventions, have demonstrated initial success in addressing specific problem cases. We present a formal account of actual causation, applicable to discrete dynamical systems of interacting elements, that considers all counterfactual states of a state transition from t-1 to t. Within such a transition, causal links are considered from two complementary points of view: we can ask if any occurrence at time t has an actual cause at t-1, but also if any occurrence at time t-1 has an actual effect at t. We address the problem of identifying such actual causes and actual effects in a principled manner by starting from a set of basic requirements for causation (existence, composition, information, integration, and exclusion). We present a formal framework to implement these requirements based on system manipulations and partitions. This framework is used to provide a complete causal account of the transition by identifying and quantifying the strength of all actual causes and effects linking two occurrences. Finally, we examine several exemplary cases and paradoxes of causation and show that they can be illuminated by the proposed framework for quantifying actual causation.

VIGAN: Missing View Imputation with Generative Adversarial Networks

In an era where big data is becoming the norm, we are becoming less concerned with the quantity of the data for our models, but rather the quality. With such large amounts of data collected from multiple heterogeneous sources comes the associated problems, often missing views. As most models could not handle whole view missing problem, it brings up a significant challenge when conducting any multi-view analysis, especially when used in the context of very large and heterogeneous datasets. However if dealt with properly, joint learning from these complementary sources can be advantageous. In this work, we present a method for imputing missing views based on generative adversarial networks called VIGAN which combines cross-domain relations given unpaired data with multi-view relations given paired data. In our model, VIGAN first learns bidirectional mapping between view X and view Y using a cycle-consistent adversarial network. Moreover, we incorporate a denoising multimodal autoencoder to refine the initial approximation by making use of the joint representation. Empirical results give evidence indicating VIGAN offers competitive results compared to other methods on both numeric and image data.

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and—because the behavior of neural networks is difficult to explicate—stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

Representation Learning by Learning to Count

We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.

On a Formal Model of Safe and Scalable Self-driving Cars
Low-dimensional lonely branching random walks die out
The Peterson recurrence formula for the chromatic discriminant of a graph
Expressions for the entropy of binomial-type distributions
Approximate nearest neighbors search without false negatives for $l_2$
Non-indexability of the Stochastic Appointment Scheduling Problem
Performance Gains of Optimal Antenna Deployment for Massive MIMO Systems
On Minimum Bisection and Related Cut Problems in Trees and Tree-Like Graphs
A Tight Configuration-Component Based Hybrid Model for Combined-Cycle Units in MISO Day-Ahead Market
STNet: Selective Tuning of Convolutional Networks for Object Localization
A Method with Feedback for Aggregation of Group Incomplete Pair-Wise Comparisons
Practical Evaluation of the Lasp Programming Model at Large Scale – An Experience Report
Cold Fusion: Training Seq2Seq Models Together with Language Models
Urn models with two types of strategies
Approximating the Minimum $k$-Section Width in Bounded-Degree Trees with Linear Diameter
Decentralized Event-Driven Algorithms for Multi-Agent Persistent Monitoring
PiCANet: Learning Pixel-wise Contextual Attention in ConvNets and Its Application in Saliency Detection
Applications of James-Stein Shrinkage (I): Variance Reduction without Bias
Energy Harvesting Based Secure Two-Way Communication Using an Untrusted Relay
Applications of James-Stein Shrinkage (II): Bias Reduction in Instrumental Variable Estimation
The p-convolution forest: a method for solving graphical models with additive probabilistic equations
Arrangements of Pseudocircles: Triangles and Drawings
Optimal control of a delayed HIV model
Sharpness-aware Low dose CT denoising using conditional generative adversarial network
Constructing Words with High Distinct Square Densities
Co-design of Safe and Efficient Networked Control Systems in Factory Automation with State-dependent Wireless Fading Channels
Trajectory Optimization for Completion Time Minimization in UAV-Enabled Multicasting
On the analysis of incomplete spectra in random matrix theory through an extension of the Jimbo-Miwa-Ueno differential
Deconstructing Type III
Sequential Monte Carlo algorithms for a class of outer measures
Towards Automatic Construction of Diverse, High-quality Image Dataset
A Novel Network NOMA Scheme for Downlink Coordinated Three-Point Systems
Game Efficiency through Linear Programming Duality
Sparsity Invariant CNNs
ProbFlow: Joint Optical Flow and Uncertainty Estimation
Handling Homographs in Neural Machine Translation
Simplified Cooperative Detection for Multi-Receiver Molecular Communication
Learning Efficient Convolutional Networks through Network Slimming
Large-Scale User Modeling with Recurrent Neural Networks for Music Discovery on Multiple Time Scales
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection
On the binomial interpolated triangles
Stacked transfer learning for tropical cyclone intensity prediction
Large deviation estimates for branching random walks
A Self-Stabilizing General De Bruijn Graph
Probabilistic Tri-level Market Models for Demand Side Management in Power Distribution Systems
Mixed Deterministic and Random Optimal Control of Linear Stochastic Systems with Quadratic Costs
Golden Years, Golden Shores: A Study of Elders in Online Travel Communities
Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets
Long-Short Range Context Neural Networks for Language Modeling
Color and Gradient Features for Text Segmentation from Video Frames
A variant of the Lovász-Theta number based on projection matrices
The Continuous Hint Factory – Providing Hints in Vast and Sparsely Populated Edit Distance Spaces
Synchronization of organ pipes
Graphs with orders at least 5 with orders between 20 and 32
On Solving the Quadratic Shortest Path Problem
Automatic detection and decoding of honey bee waggle dances
Contrast and visual saliency similarity induced index for image quality assessment
Activity Recognition based on a Magnitude-Orientation Stream Network
Distortion Distribution of Neural Spike Train Sequence Matching with Optogenetics
On optimal control in a model of rigid-viscoplastic media with Dirichlet boundary conditions
Computing the poset of layers of a toric arrangement
Coded caching schemes for Flexible Memory Sizes
On the existence of directional derivatives for strongly cone-paraconvex mappings
K-orbit closures and Barbasch-Evens-Magyar varieties
CNN Fixations: An unraveling approach to visualize the discriminative image regions
Tags2Parts: Discovering Semantic Regions from Shape Tags
The Graph of Critical Pairs of a Crown
Learning Combinations of Sigmoids Through Gradient Estimation
A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition
Berge’s Conjecture and Aharoni-Hartman-Hoffman’s Conjecture for locally in-semicomplete digraphs
What does 2D geometric information really tell us about 3D face shape?
Annealed scaling for a charged polymer in dimensions two and higher
A Deterministic Nonsmooth Frank Wolfe Algorithm with Coreset Guarantees
Simple polytopes without small separators, II: Thurston’s bound
WordSup: Exploiting Word Annotations for Character based Text Detection
Upward Partitioned Book Embeddings
Twin Networks: Using the Future as a Regularizer
Optimal Water-Power Flow Problem: Formulation and Distributed Optimal Solution