Measuring Personalization of Web Search

Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing level of personalization is leading to concerns about Filter Bubble effects, where certain users are simply unable to access information that the search engines’ algorithm decides is irrelevant. Despite these concerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions. First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users on Google Web Search and 100 users on Bing. We find that, on average, 11.7% of results show differences due to personalization on Google, while 15.8% of results are personalized on Bing, but that this varies widely by search query and by result ranking. Third, we investigate the user features used to personalize on Google Web Search and Bing. Surprisingly, we only find measurable personalization as a result of searching with a logged in account and the IP address of the searching user. Our results are a first step towards understanding the extent and effects of personalization on Web search engines today.

Distributed Transfer Linear Support Vector Machines

Transfer learning has been developed to improve the performances of different but related tasks in machine learning. However, such processes become less efficient with the increase of the size of training data and the number of tasks. Moreover, privacy can be violated as some tasks may contain sensitive and private data, which are communicated between nodes and tasks. We propose a consensus-based distributed transfer learning framework, where several tasks aim to find the best linear support vector machine (SVM) classifiers in a distributed network. With alternating direction method of multipliers, tasks can achieve better classification accuracies more efficiently and privately, as each node and each task train with their own data, and only decision variables are transferred between different tasks and nodes. Numerical experiments on MNIST datasets show that the knowledge transferred from the source tasks can be used to decrease the risks of the target tasks that lack training data or have unbalanced training labels. We show that the risks of the target tasks in the nodes without the data of the source tasks can also be reduced using the information transferred from the nodes who contain the data of the source tasks. We also show that the target tasks can enter and leave in real-time without rerunning the whole algorithm.

A new look at clustering through the lens of deep convolutional neural networks

Classification and clustering have been studied separately in machine learning and computer vision. Inspired by the recent success of deep learning models in solving various vision problems (e.g., object recognition, semantic segmentation) and the fact that humans serve as the gold standard in assessing clustering algorithms, here, we advocate for a unified treatment of the two problems and suggest that hierarchical frameworks that progressively build complex patterns on top of the simpler ones (e.g., convolutional neural networks) offer a promising solution. We do not dwell much on the learning mechanisms in these frameworks as they are still a matter of debate, with respect to biological constraints. Instead, we emphasize on the compositionality of the real world structures and objects. In particular, we show that CNNs, trained end to end using back propagation with noisy labels, are able to cluster data points belonging to several overlapping shapes, and do so much better than the state of the art algorithms. The main takeaway lesson from our study is that mechanisms of human vision, particularly the hierarchal organization of the visual ventral stream should be taken into account in clustering algorithms (e.g., for learning representations in an unsupervised manner or with minimum supervision) to reach human level clustering performance. This, by no means, suggests that other methods do not hold merits. For example, methods relying on pairwise affinities (e.g., spectral clustering) have been very successful in many cases but still fail in some cases (e.g., overlapping clusters).

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.

Face Clustering: Representation and Pairwise Constraints

Clustering face images according to their identity has two important applications: (i) grouping a collection of face images when no external labels are associated with images, and (ii) indexing for efficient large scale face retrieval. The clustering problem is composed of two key parts: face representation and choice of similarity for grouping faces. We first propose a representation based on ResNet, which has been shown to perform very well in image classification problems. Given this representation, we design a clustering algorithm, Conditional Pairwise Clustering (ConPaC), which directly estimates the adjacency matrix only based on the similarity between face images. This allows a dynamic selection of number of clusters and retains pairwise similarity between faces. ConPaC formulates the clustering problem as a Conditional Random Field (CRF) model and uses Loopy Belief Propagation to find an approximate solution for maximizing the posterior probability of the adjacency matrix. Experimental results on two benchmark face datasets (LFW and IJB-B) show that ConPaC outperforms well known clustering algorithms such as k-means, spectral clustering and approximate rank-order. Additionally, our algorithm can naturally incorporate pairwise constraints to obtain a semi-supervised version that leads to improved clustering performance. We also propose an k-NN variant of ConPaC, which has a linear time complexity given a k-NN graph, suitable for large datasets.

An Overview of Multi-Task Learning in Deep Neural Networks

Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.

A Mixture Model for Learning Multi-Sense Word Embeddings

Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.

Deep Generative Models for Relational Data with Side Information

We present a probabilistic framework for overlapping community discovery and link prediction for relational data, given as a graph. The proposed framework has: (1) a deep architecture which enables us to infer multiple layers of latent features/communities for each node, providing superior link prediction performance on more complex networks and better interpretability of the latent features; and (2) a regression model which allows directly conditioning the node latent features on the side information available in form of node attributes. Our framework handles both (1) and (2) via a clean, unified model, which enjoys full local conjugacy via data augmentation, and facilitates efficient inference via closed form Gibbs sampling. Moreover, inference cost scales in the number of edges which is attractive for massive but sparse networks. Our framework is also easily extendable to model weighted networks with count-valued edges. We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.

One Model To Learn Them All

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

An Automatic Approach for Document-level Topic Model Evaluation

Topic models jointly learn topics and document-level topic distribution. Extrinsic evaluation of topic models tends to focus exclusively on topic-level evaluation, e.g. by assessing the coherence of topics. We demonstrate that there can be large discrepancies between topic- and document-level model quality, and that basing model evaluation on topic-level analysis can be highly misleading. We propose a method for automatically predicting topic model quality based on analysis of document-level topic allocations, and provide empirical evidence for its robustness.

Dynamic Filters in Graph Convolutional Networks

Convolutional neural networks (CNNs) have massively impacted visual recognition in 2D images, and are now ubiquitous in state-of-the-art approaches. While CNNs naturally extend to other domains, such as audio and video, where data is also organized in rectangular grids, they do not easily generalize to other types of data such as 3D shape meshes, social network graphs or molecular graphs. To handle such data, we propose a novel graph-convolutional network architecture that builds on a generic formulation that relaxes the 1-to-1 correspondence between filter weights and data elements around the center of the convolution. The main novelty of our architecture is that the shape of the filter is a function of the features in the previous network layer, which is learned as an integral part of the neural network. Experimental evaluations on digit recognition, semi-supervised document classification, and 3D shape correspondence yield state-of-the-art results, significantly improving over previous work for shape correspondence.

Learning with Feature Evolvable Streams

Learning with streaming data has attracted much attention during the past few years. Though most studies consider data stream with fixed features, in real practice the features may be evolvable. For example, features of data gathered by limited-lifespan sensors will change when these sensors are substituted by new ones. In this paper, we propose a novel learning paradigm: Feature Evolvable Streaming Learning where old features would vanish and new features will occur. Rather than relying on only the current features, we attempt to recover the vanished features and exploit it to improve performance. Specifically, we learn two models from the recovered features and the current features, respectively. To benefit from the recovered features, we develop two ensemble methods. In the first method, we combine the predictions from two models and theoretically show that with assistance of old features, the performance on new features can be improved. In the second approach, we dynamically select the best single prediction and establish a better performance guarantee when the best model switches. Experiments on both synthetic and real data validate the effectiveness of our proposal.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the ‘lazy agent’ problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels.

L2 Regularization versus Batch and Weight Normalization

Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue.

Twigraph: Discovering and Visualizing Influential Words between Twitter Profiles

The social media craze is on an ever increasing spree, and people are connected with each other like never before, but these vast connections are visually unexplored. We propose a methodology Twigraph to explore the connections between persons using their Twitter profiles. First, we propose a hybrid approach of recommending social media profiles, articles, and advertisements to a user.The profiles are recommended based on the similarity score between the user profile, and profile under evaluation. The similarity between a set of profiles is investigated by finding the top influential words thus causing a high similarity through an Influence Term Metric for each word. Then, we group profiles of various domains such as politics, sports, and entertainment based on the similarity score through a novel clustering algorithm. The connectivity between profiles is envisaged using word graphs that help in finding the words that connect a set of profiles and the profiles that are connected to a word. Finally, we analyze the top influential words over a set of profiles through clustering by finding the similarity of that profiles enabling to break down a Twitter profile with a lot of followers to fine level word connections using word graphs. The proposed method was implemented on datasets comprising 1.1 M Tweets obtained from Twitter. Experimental results show that the resultant influential words were highly representative of the relationship between two profiles or a set of profiles

Multifractality without fine-tuning in a Floquet quasiperiodic chain
Slow Dynamics in Translation-Invariant Quantum Lattice Models
Hierarchical Label Inference for Video Classification
Distance weighted discrimination of face images for gender classification
On optimal tests for rotational symmetry against new classes of hyperspherical distributions
Internal Stabilization of a Class of Parabolic Integro-Differential Equations: Application to Viscoelastic Fluids
Strong Solutions of Stochastic Models for Viscoelastic Flows of Oldroyd Type
Existence of weak martingale solution of Nematic Liquid Crystals driven by Pure Jump Noise
Conjunctions of Among Constraints
Towards a Realistic Assessment of Multiple Antenna HCNs: Residual Additive Transceiver Hardware Impairments and Channel Aging
Generalization for Adaptively-chosen Estimators via Stable Median
Learning Disjunctions of Predicates
Approximate Best-Response Dynamics in Random Interference Games
Ensembling Factored Neural Machine Translation Models for Automatic Post-Editing and Quality Estimation
Spatial Coding Techniques for Molecular MIMO
On M-ary Distributed Detection for Power Constraint Wireless Sensor Networks
Symplectomorphic registration with phase space regularization by entropy spectrum pathways
Benchmarking measures of network controllability on canonical graph models
Breaking the 3/2 barrier for unit distances in three dimensions
On Structural Controllability of Symmetric (Brain) Networks
Community interactions determine role of species in parasite spread amplification: the ecomultiplex network model
Bib2vec: An Embedding-based Search System for Bibliographic Information
Deriving Compact Laws Based on Algebraic Formulation of a Data Set
Exact Simulation for Multivariate Itô Diffusions
Deal or No Deal? End-to-End Learning for Negotiation Dialogues
A Quantile Estimate Based on Local Curve Fitting
Regularity in mixed-integer convex representability
Some conditions on 5-cycles that make planar graphs 4-choosable
AI-Powered Social Bots
On the splitting types of bundles of logarithmic vector fields along plane curves
Spectral Domain Sampling of Graph Signals
Veiled Attributes of the Variational Autoencoder
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge
Distributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs
A Fully Trainable Network with RNN-based Pooling
Interference-Alignment and Soft-Space-Reuse Based Cooperative Transmission for Multi-cell Massive MIMO Networks
Interactive 3D Modeling with a Generative Adversarial Network
Improving Scalability of Inductive Logic Programming via Pruning and Best-Effort Optimisation
The distance between a naive cumulative estimator and its least concave majorant
Base Station Selection for Massive MIMO Networks with Two-stage Precoding
Nerves, minors, and piercing numbers
Average Length of Cycles in Rectangular Lattice
On large groups of symmetries of finite graphs embedded in spheres
Parameterized Verification of Algorithms for Oblivious Robots on a Ring
The spectral expansion approach to index transforms and connections with the theory of diffusion processes
Long-distance spin transport in a disordered magnetic insulator
Structured Best Arm Identification with Fixed Confidence
Conditions for Unique Reconstruction of Sparse Signals Using Compressive Sensing Methods
Self-ensembling for domain adaptation
Constructing edge-disjoint spanning trees in augmented cubes
Optimal Transport for Diffeomorphic Registration
Successive Cancellation Decoding of Single Parity-Check Product Codes
Invariance Feedback Entropy of Uncertain Control Systems
[1, 2]-sets and [1, 2]-total Sets in Trees with Algorithms
Multispectral and Hyperspectral Image Fusion Using a 3-D-Convolutional Neural Network
From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem
Concurrent Geometric Multicasting
Wireless Link Capacity under Shadowing and Fading
Declarative Modeling for Building a Cloud Federation and Cloud Applications
Perceptual Generative Adversarial Networks for Small Object Detection
Sparsity Order Estimation from a Single Compressed Observation Vector
Phylogenetic diversity and biodiversity indices on phylogenetic networks
Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models
Substitution-based structures with absolutely continuous spectrum
High-voltage solution in radial power networks:Existence, properties and equivalent algorithms
Pathwise large deviations for the Rough Bergomi model
Modeling and Analysis of Switching Diffusion Systems: Past-Dependent Switching with a Countable State Space
Sequential quasi-Monte Carlo: Introduction for Non-Experts, Dimension Reduction, Application to Partly Observed Diffusion Processes
Precoded Chebyshev-NLMS based pre-distorter for nonlinear LED compensation in NOMA-VLC
NOMA in Distributed Antenna System for Max-Min Fairness and Max-Sum-Rate
Optimal Online Two-way Trading with Bounded Number of Transactions
$\textsf{S}^3T$: An Efficient Score-Statistic for Spatio-Temporal Surveillance
Biased Bagging for Unsupervised Domain Adaptation
Robotic Ironing with 3D Perception and Force/Torque Feedback in Household Environments
Taylor Expansions of the Value Function Associated with a Bilinear Optimal Control Problem
A Survey on Non-Orthogonal Multiple Access for 5G Networks: Research Challenges and Future Trends
Active learning in annotating micro-blogs dealing with e-reputation
Distributed Estimation of Oscillations in Power Systems: an Extended Kalman Filtering Approach
Local Feature Descriptor Learning with Adaptive Siamese Network
Phaseless Reconstruction from Space-Time Samples