DAWT: Densely Annotated Wikipedia Texts across multiple languages

In this work, we open up the DAWT dataset – Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The data set contains total of 13.6M articles, 5.0B tokens, 13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text to entity links than originally present in the Wikipedia markup. Moreover, it spans several languages including English, Spanish, Italian, German, French and Arabic. We also present the methodology used to generate the dataset which enriches Wikipedia markup in order to increase number of links. In addition to the main dataset, we open up several derived datasets including mention entity co-occurrence counts and entity embeddings, as well as mappings between Freebase ids and Wikidata item ids. We also discuss two applications of these datasets and hope that opening them up would prove useful for the Natural Language Processing and Information Retrieval communities, as well as facilitate multi-lingual research.

Controllable Text Generation

Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain. This paper aims at generating plausible natural language sentences, whose attributes are dynamically controlled by learning disentangled latent representations with designated semantics. We propose a new neural generative model which combines variational auto-encoders and holistic attribute discriminators for effective imposition of semantic structures. With differentiable approximation to discrete text samples, explicit constraints on independent attribute controls, and efficient collaborative learning of generator and discriminators, our model learns highly interpretable representations from even only word annotations, and produces realistic sentences with desired attributes. Quantitative evaluation validates the accuracy of sentence and attribute generation.

A Laplacian Framework for Option Discovery in Reinforcement Learning

Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL). Proto-RL is a well known approach for representation learning in MDPs. The representations learned with this framework are called proto-value functions (PVFs). In this paper we address the option discovery problem by showing how PVFs implicitly define options. We do it by introducing eigenpurposes, intrinsic reward functions derived from the learned representations. The options discovered from eigenpurposes traverse the principal directions of the state space. They are useful for multiple tasks because they are independent of the agents’ intentions. Moreover, by capturing the diffusion process of a random walk, different options act at different time scales, making them helpful for exploration strategies. We demonstrate features of eigenpurposes in traditional tabular domains as well as in Atari 2600 games.

Self-Paced Multitask Learning with Shared Knowledge

This paper introduces self-paced task selection to multitask learning, where instances from more closely related tasks are selected in a progression of easier-to-harder tasks, to emulate an effective human education strategy but applied to multitask machine learning. We develop the mathematical foundation for the approach based on an iterative selection of the most appropriate task, learning the task parameters, and updating the shared knowledge, optimizing a new bi-convex loss function. This proposed method applies quite generally, including to multitask feature learning, multitask learning with alternating structure optimization and multitask manifold regularization learning. Results show that in each of the above formulations self-paced (easier-to-harder) task selection outperforms the baseline version of these methods in all the experiments.

Co-Clustering for Multitask Learning

This paper presents a new multitask learning framework that learns a shared representation among the tasks, incorporating both task and feature clusters. The jointly-induced clusters yield a shared latent subspace where task relationships are learned more effectively and more generally than in state-of-the-art multitask learning methods. The proposed general framework enables the derivation of more specific or restricted state-of-the-art multitask methods. The paper also proposes a highly-scalable multitask learning algorithm, based on the new framework, using conjugate gradient descent and generalized \textit{Sylvester equations}. Experimental results on synthetic and benchmark datasets show that the proposed method systematically outperforms several state-of-the-art multitask learning methods.

End-to-End Task-Completion Neural Dialogue Systems

This paper presents an end-to-end learning framework for task-completion neural dialogue systems, which leverages supervised and reinforcement learning with various deep-learning models. The system is able to interface with a structured database, and interact with users for assisting them to access information and complete tasks such as booking movie tickets. Our experiments in a movie-ticket booking domain show the proposed system outperforms a modular-based dialogue system and is more robust to noise produced by other components in the system.

Unsupervised Basis Function Adaptation for Reinforcement Learning

When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate approximation architectures. We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. This method is ‘unsupervised’ in the sense that it makes no direct reference to reward or the VF estimate. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line. A common method of scoring a VF estimate is to weight the squared Bellman error of each state-action by the probability of that state-action occurring. Adopting this scoring method, and assuming S states, we demonstrate theoretically that – provided (1) the number of cells X in the state aggregation architecture is of order \sqrt{S}\log_2{S}\ln{S} or greater, (2) the policy and transition function are close to deterministic, and (3) the prior for the transition function is uniformly distributed – our algorithm, used in conjunction with a suitable RL algorithm, can guarantee a score which is arbitrarily close to zero as S becomes large. It is able to do this despite having only O(X \log_2S) space complexity and negligible time complexity. The results take advantage of certain properties of the stationary distributions of Markov chains.

Outlier Cluster Formation in Spectral Clustering

Outlier detection and cluster number estimation is an important issue for clustering real data. This paper focuses on spectral clustering, a time-tested clustering method, and reveals its important properties related to outliers. The highlights of this paper are the following two mathematical observations: first, spectral clustering’s intrinsic property of an outlier cluster formation, and second, the singularity of an outlier cluster with a valid cluster number. Based on these observations, we designed a function that evaluates clustering and outlier detection results. In experiments, we prepared two scenarios, face clustering in photo album and person re-identification in a camera network. We confirmed that the proposed method detects outliers and estimates the number of clusters properly in both problems. Our method outperforms state-of-the-art methods in both the 128-dimensional sparse space for face clustering and the 4,096-dimensional non-sparse space for person re-identification.

A Framework for Time-Consistent, Risk-Averse Model Predictive Control: Theory and Algorithms

In this paper we present a framework for risk-averse model predictive control (MPC) of linear systems affected by multiplicative uncertainty. Our key innovation is to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk preferences from risk-neutral to worst case. Within this framework, we propose and analyze an online risk-averse MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk metrics, we cast the computation of the MPC control law as a convex optimization problem amenable to real-time implementation. Simulation results are presented and discussed.

Deconvolving Feedback Loops in Recommender Systems

Collaborative filtering is a popular technique to infer users’ preferences on new content based on the collective information of all users preferences. Recommender systems then use this information to make personalized suggestions to users. When users accept these recommendations it creates a feedback loop in the recommender system, and these loops iteratively influence the collaborative filtering algorithm’s predictions over time. We investigate whether it is possible to identify items affected by these feedback loops. We state sufficient assumptions to deconvolve the feedback loops while keeping the inverse solution tractable. We furthermore develop a metric to unravel the recommender system’s influence on the entire user-item rating matrix. We use this metric on synthetic and real-world datasets to (1) identify the extent to which the recommender system affects the final rating matrix, (2) rank frequently recommended items, and (3) distinguish whether a user’s rated item was recommended or an intrinsic preference. Our results indicate that it is possible to recover the ratings matrix of intrinsic user preferences using a single snapshot of the ratings matrix without any temporal information.

When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user’s network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold \tau. In contrast to previous work where \tau is assumed to be quite close to 1, we focus on recommendation applications where \tau is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small \tau. To the best of our knowledge, there is no practical solution for computing all user pairs with, say \tau = 0.2 on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm — WHIMP — that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the ‘wedge-sampling’ approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP’s scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges.

On the Behavior of Convolutional Nets for Feature Extraction

Convolutional neural networks (CNN) are representation learning techniques that achieve state-of-the-art performance on almost every image-related, machine learning task. Applying the representation languages build by these models to tasks beyond the one they were originally trained for is a field of interest known as transfer learning for feature extraction. Through this approach, one can apply the image descriptors learnt by a CNN after processing millions of images to any dataset, without an expensive training phase. Contributions to this field have so far focused on extracting CNN features from layers close to the output (e.g., fully connected layers), particularly because they work better when used out-of-the-box to feed a classifier. Nevertheless, the rest of CNN features is known to encode a wide variety of visual information, which could be potentially exploited on knowledge representation and reasoning tasks. In this paper we analyze the behavior of each feature individually, exploring their intra/inter class activations for all classes of three different datasets. From this study we learn that low and middle level features behave very differently to high level features, the former being more descriptive and the latter being more discriminant. We show how low and middle level features can be used for knowledge representation purposes both by their presence or by their absence. We also study how much noise these features may encode, and propose a thresholding approach to discard most of it. Finally, we discuss the potential implications of these results in the context of knowledge representation using features extracted from a CNN.

FeUdal Networks for Hierarchical Reinforcement Learning

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels — allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker. The Worker generates primitive actions at every tick of the environment. The decoupled structure of FuN conveys several benefits — in addition to facilitating very long timescale credit assignment it also encourages the emergence of sub-policies associated with different goals set by the Manager. These properties allow FuN to dramatically outperform a strong baseline agent on tasks that involve long-term credit assignment or memorisation. We demonstrate the performance of our proposed system on a range of tasks from the ATARI suite and also from a 3D DeepMind Lab environment.

Everware toolkit. Supporting reproducible science and challenge-driven education

Modern science clearly demands for a higher level of reproducibility and collaboration. To make research fully reproducible one has to take care of several aspects: research protocol description, data access, environment preservation, workflow pipeline, and analysis script preservation. Version control systems like git help with the workflow and analysis scripts part. Virtualization techniques like Docker or Vagrant can help deal with environments. Jupyter notebooks are a powerful platform for conducting research in a collaborative manner. We present project Everware that seamlessly integrates git repository management systems such as Github or Gitlab, Docker and Jupyter helping with a) sharing results of real research and b) boosts education activities. With the help of Everware one can not only share the final artifacts of research but all the depth of the research process. This been shown to be extremely helpful during organization of several data analysis hackathons and machine learning schools. Using Everware participants could start from an existing solution instead of starting from scratch. They could start contributing immediately. Everware allows its users to make use of their own computational resources to run the workflows they are interested in, which leads to higher scalability of the toolkit.

EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Efficient exploration in high-dimensional environments remains a key challenge in reinforcement learning (RL). Deep reinforcement learning methods have demonstrated the ability to learn with highly general policy classes for complex tasks with high-dimensional inputs, such as raw images. However, many of the most effective exploration techniques rely on tabular representations, or on the ability to construct a generative model over states and actions. Both are exceptionally difficult when these inputs are complex and high dimensional. On the other hand, it is comparatively easy to build discriminative models on top of complex states such as images using standard deep neural networks. This paper introduces a novel approach, EX2, which approximates state visitation densities by training an ensemble of discriminators, and assigns reward bonuses to rarely visited states. We demonstrate that EX2 achieves comparable performance to the state-of-the-art methods on low-dimensional tasks, and its effectiveness scales into high-dimensional state spaces such as visual domains without hand-designing features or density models.

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-layer of servers that is geographically near; the edge-layer in turn interacts with a back-end layer of servers. The edge-layer provides low latency access and temporary storage for client operations, and uses the back-end layer for persistent storage. Our algorithm, termed Layered Data Storage (LDS) algorithm, offers several features suitable for edge-computing systems, works under asynchronous message-passing environments, supports multiple readers and writers, and can tolerate f_1 < n_1/2 and f_2 < n_2/3 crash failures in the two layers having n_1 and n_2 servers, respectively. We use a class of erasure codes known as regenerating codes for storage of data in the back-end layer. The choice of regenerating codes, instead of popular choices like Reed-Solomon codes, not only optimizes the cost of back-end storage, but also helps in optimizing communication cost of read operations, when the value needs to be recreated all the way from the back-end. The two-layer architecture permits a modular implementation of atomicity and erasure-code protocols; the implementation of erasure-codes is mostly limited to interaction between the two layers. We prove liveness and atomicity of LDS, and also compute performance costs associated with read and write operations. Further, in a multi-object system running N independent instances of LDS, where only a small fraction of the objects undergo concurrent accesses at any point during the execution, the overall storage cost is dominated by that of persistent storage in the back-end layer, and is given by \Theta(N).

A note on conditional covariance matrices for elliptical distributions

Depth Estimation using Modified Cost Function for Occlusion Handling

Workload Analysis of Blue Waters

On the asymptotic behavior of the price of anarchy: Is selfish routing bad in highly congested networks?

Computable Randomness is Inherently Imprecise

Atomic Norm Minimization for Modal Analysis from Random and Compressed Samples

On the Fine-grained Complexity of One-Dimensional Dynamic Programming

A splitter theorem for connected clutters

Performance Analysis of a Hybrid Downlink-Uplink Cooperative NOMA Scheme

The Hilton–Zhao Conjecture is True for Graphs with Maximum Degree 4

Simultaneous global exact controllability in projection

Bayesian inference for generalized extreme value distribution with Gaussian copula dependence

Eliciting Private User Information for Residential Demand Response

Application of Bayes’ theorem for pulse shape discrimination

Compositional Falsification of Cyber-Physical Systems with Machine Learning Components

A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex

Belief Propagation in Conditional RBMs for Structured Prediction

Optimization of distributions differences for classification

Estimating the resolution of real images

A Comparative Study of Word Embeddings for Reading Comprehension

The M33 Synoptic Stellar Survey. II. Mira Variables

Statistical Properties of the Risk-Transfer Formula in the Affordable Care Act

Scalable Deep Traffic Flow Neural Networks for Urban Traffic Congestion Prediction

Optimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets

Active Learning for Cost-Sensitive Classification

Exponential Moving Average Model in Parallel Speech Recognition Training

A Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and Classification

Recent Developments on the Moment Problem

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

An example concerning set addition in F_2^n

Analytical and simulation studies of pedestrian flow at a crossing with random update rule

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

Large-Scale Evolution of Image Classifiers

On Generalized Progressive Hybrid Censoring in presence of competing risks

Interval Estimation of the Unknown Exponential Parameter Based on Time Truncated Data

Skin Lesion Classification using Class Activation Map

Gauge Optimization via ADMM for Approximate Inference

Calorimetric Power Measurements in the EAST ECRH System

On the small time asymptotics of 3D stochastic primitive equations

Full-Duplex Operations in Wireless Powered Communication Networks

Good cyclic codes and the uncertainty principle

Sequential Plan Recognition

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Zero-Delay Source-Channel Coding with a One-Bit ADC Front End and Correlated Side Information at the Receiver

Employing Spectral Domain Features for Efficient Collaborative Filtering

Kolmogorov Equations and Weak Order Analysis for SPDES with Nonlinear Diffusion Coefficient

Adversarial Examples for Semantic Image Segmentation

A New Test of Multivariate Nonlinear Causality

Differentially Private Bayesian Learning on Distributed Data

Deep artifact learning for compressed sensing and parallel MRI

On Parameterized Complexity of Group Activity Selection Problems on Social Networks

Statistical biases in measurements with multiple candidates

Deep Learning with Domain Adaptation for Accelerated Projection Reconstruction MR

Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos

Dynamic State Warping

Symmetric Laplacians, Quantum Density Matrices and their Von-Neumann Entropy

Why is it hard to beat $O(n^2)$ for Longest Common Weakly Increasing Subsequence?

Runtime Optimization of Join Location in Parallel Data Management Systems

On the intersection graph of ideals of $\mathbb{Z}_m$

Equivalence of Lattice Orbit Polytopes

A simple thermodynamic model for the hydrogen phase diagram

Efficient Network Measurements through Approximated Windows

Sum-set Inequalities from Aligned Image Sets: Instruments for Robust GDoF Bounds

A Survey on Content-Aware Video Analysis for Sports

Weighted Growing Simplicial Complexes

Multicast Transmissions in Directional mmWave Communications

Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity

Parallel energy-stable phase field crystal simulations based on domain decomposition methods

Stochastic Separation Theorems

Virtually fibering random right-angled Coxeter groups

Preserving Confidentiality in The Gaussian Broadcast Channel Using Compute-and-Forward

EmotioNet Challenge: Recognition of facial expressions of emotion in the wild

$p$-adic analogues of hypergeometric identities

On linear-quadratic control theory of implicit difference equations

Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions

Denoising Adversarial Autoencoders

Context Aware Query Image Representation for Particular Object Retrieval

Deep Collaborative Learning for Visual Recognition

Inconsistency of Template Estimation with the Fréchet mean in Quotient Space

A Bayesian computer model analysis of Robust Bayesian analyses

Incident Light Frequency-based Image Defogging Algorithm

Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Contextuality in Canonical Systems of Random Variables

Machine Learning on Sequential Data Using a Recurrent Weighted Average

The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing

Percentile Policies for Tracking of Markovian Random Processes with Asymmetric Cost and Observation

On squares of cyclic codes

Estimating Spatial Econometrics Models with Integrated Nested Laplace Approximation

Actor-Critic Reinforcement Learning with Simultaneous Human Control and Feedback

Downlink Cellular Network Analysis with LOS/NLOS Propagation and Elevated Base Stations

On the MISO Channel with Feedback: Can Infinitely Massive Antennas Achieve Infinite Capacity?

Instance Flow Based Online Multiple Object Tracking

Bridging Saliency Detection to Weakly Supervised Object Detection Based on Self-paced Curriculum Learning