VAIN: Attentional Multi-agent Predictive Modeling

Multi-agent predictive modeling is an essential step for understanding physical, social and team-play systems. Recently, Interaction Networks (INs) were proposed for the task of modeling multi-agent physical systems, INs scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). In this paper we introduce VAIN, a novel attentional architecture for multi-agent predictive modeling that scales linearly with the number of agents. We show that VAIN is effective for multi-agent predictive modeling. Our method is evaluated on tasks from challenging multi-agent prediction domains: chess and soccer, and outperforms competing multi-agent approaches.

User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

In this report, we provide a comparative analysis of different techniques for user intent classification towards the task of app recommendation. We analyse the performance of different models and architectures for multi-label classification over a dataset with a relative large number of classes and only a handful examples of each class. We focus, in particular, on memory network architectures, and compare how well the different versions perform under the task constraints. Since the classifier is meant to serve as a module in a practical dialog system, it needs to be able to work with limited training data and incorporate new data on the fly. We devise a 1-shot learning task to test the models under the above constraint. We conclude that relatively simple versions of memory networks perform better than other approaches. Although, for tasks with very limited data, simple non-parametric methods perform comparably, without needing the extra training data.

Topic Modeling for Classification of Clinical Reports

Electronic health records (EHRs) contain important clinical information about patients. Efficient and effective use of this information could supplement or even replace manual chart review as a means of studying and improving the quality and safety of healthcare delivery. However, some of these clinical data are in the form of free text and require pre-processing before use in automated systems. A common free text data source is radiology reports, typically dictated by radiologists to explain their interpretations. We sought to demonstrate machine learning classification of computed tomography (CT) imaging reports into binary outcomes, i.e. positive and negative for fracture, using regular text classification and classifiers based on topic modeling. Topic modeling provides interpretable themes (topic distributions) in reports, a representation that is more compact than the commonly used bag-of-words representation and can be processed faster than raw text in subsequent automated processes. We demonstrate new classifiers based on this topic modeling representation of the reports. Aggregate topic classifier (ATC) and confidence-based topic classifier (CTC) use a single topic that is determined from the training dataset based on different measures to classify the reports on the test dataset. Alternatively, similarity-based topic classifier (STC) measures the similarity between the reports’ topic distributions to determine the predicted class. Our proposed topic modeling-based classifier systems are shown to be competitive with existing text classification techniques and provides an efficient and interpretable representation.

Dualing GANs

Generative adversarial nets (GANs) are a promising technique for modeling a distribution from samples. It is however well known that GAN training suffers from instability due to the nature of its maximin formulation. In this paper, we explore ways to tackle the instability problem by dualizing the discriminator. We start from linear discriminators in which case conjugate duality provides a mechanism to reformulate the saddle point objective into a maximization problem, such that both the generator and the discriminator of this ‘dualing GAN’ act in concert. We then demonstrate how to extend this intuition to non-linear formulations. For GANs with linear discriminators our approach is able to remove the instability in training, while for GANs with nonlinear discriminators our approach provides an alternative to the commonly used GAN training algorithm.

A review and comparative study on functional time series techniques

This paper reviews the main estimation and prediction results derived in the context of functional time series, when Hilbert and Banach spaces are considered, specially, in the context of autoregressive processes of order one (ARH(1) and ARB(1) processes, for H and B being a Hilbert and Banach space, respectively). Particularly, we pay attention to the estimation and prediction results, and statistical tests, derived in both parametric and non-parametric frameworks. A comparative study between different ARH(1) prediction approaches is developed in the simulation study undertaken.

pyRecLab: A Software Library for Quick Prototyping of Recommender Systems

This paper introduces pyRecLab, a software library written in C++ with Python bindings which allows to quickly train, test and develop recommender systems. Although there are several software libraries for this purpose, only a few let developers to get quickly started with the most traditional methods, permitting them to try different parameters and approach several tasks without a significant loss of performance. Among the few libraries that have all these features, they are available in languages such as Java, Scala or C#, what is a disadvantage for less experienced programmers more used to the popular Python programming language. In this article we introduce details of pyRecLab, showing as well performance analysis in terms of error metrics (MAE and RMSE) and train/test time. We benchmark it against the popular Java-based library LibRec, showing similar results. We expect programmers with little experience and people interested in quickly prototyping recommender systems to be benefited from pyRecLab.

SPLBoost: An Improved Robust Boosting Algorithm Based on Self-paced Learning

It is known that Boosting can be interpreted as a gradient descent technique to minimize an underlying loss function. Specifically, the underlying loss being minimized by the traditional AdaBoost is the exponential loss, which is proved to be very sensitive to random noise/outliers. Therefore, several Boosting algorithms, e.g., LogitBoost and SavageBoost, have been proposed to improve the robustness of AdaBoost by replacing the exponential loss with some designed robust loss functions. In this work, we present a new way to robustify AdaBoost, i.e., incorporating the robust learning idea of Self-paced Learning (SPL) into Boosting framework. Specifically, we design a new robust Boosting algorithm based on SPL regime, i.e., SPLBoost, which can be easily implemented by slightly modifying off-the-shelf Boosting packages. Extensive experiments and a theoretical characterization are also carried out to illustrate the merits of the proposed SPLBoost.

FA*IR: A Fair Top-k Ranking Algorithm

We present a formal problem definition and an algorithm to solve the Fair Top-k Ranking problem. The problem consists of creating a ranking of k elements out of a pool of n >> k candidates. The objective is to maximize utility, and maximization is subject to a ranked group fairness constraint. Our definition of ranked group fairness uses the standard notion of protected group to extend the concept of group fairness. It ensures that every prefix of the rank contains a number of protected candidates that is statistically indistinguishable from a given target proportion, or exceeds it. The utility objective favors rankings in which every candidate included in the ranking is more qualified than any candidate not included, and rankings in which candidates are sorted by decreasing qualifications. We describe an efficient algorithm for this problem, which is tested on a series of existing datasets, as well as new datasets. Experimentally, this approach yields a ranking that is similar to the so-called ‘color-blind’ ranking, while respecting the fairness criteria. To the best of our knowledge, FA*IR is the first algorithm grounded in statistical tests that can be used to mitigate biases in ranking against an under-represented group.

Programmable Agents

We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.

Outlier Regularization for Vector Data and L21 Norm Robustness

In many real-world applications, data usually contain outliers. One popular approach is to use L2,1 norm function as a robust error/loss function. However, the robustness of L2,1 norm function is not well understood so far. In this paper, we propose a new Vector Outlier Regularization (VOR) framework to understand and analyze the robustness of L2,1 norm function. Our VOR function defines a data point to be outlier if it is outside a threshold with respect to a theoretical prediction, and regularize it-pull it back to the threshold line. We then prove that L2,1 function is the limiting case of this VOR with the usual least square/L2 error function as the threshold shrinks to zero. One interesting property of VOR is that how far an outlier lies away from its theoretically predicted value does not affect the final regularization and analysis results. This VOR property unmasks one of the most peculiar property of L2,1 norm function: The effects of outliers seem to be independent of how outlying they are-if an outlier is moved further away from the intrinsic manifold/subspace, the final analysis results do not change. VOR provides a new way to understand and analyze the robustness of L2,1 norm function. Applying VOR to matrix factorization leads to a new VORPCA model. We give a comprehensive comparison with trace-norm based L21-norm PCA to demonstrate the advantages of VORPCA.

THUMT: An Open Source Toolkit for Neural Machine Translation

This paper introduces THUMT, an open-source toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attention-based encoder-decoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk training, and semi-supervised training. It features a visualization tool for displaying the relevance between hidden states in neural networks and contextual words, which helps to analyze the internal workings of NMT. Experiments on Chinese-English datasets show that THUMT using minimum risk training significantly outperforms GroundHog, a state-of-the-art toolkit for NMT.

Inference in Deep Networks in High Dimensions

Deep generative networks provide a powerful tool for modeling complex data in a wide range of applications. In inverse problems that use these networks as generative priors on data, one must often perform inference of the inputs of the networks from the outputs. Inference is also required for sampling during stochastic training on these generative models. This paper considers inference in a deep stochastic neural network where the parameters (e.g., weights, biases and activation functions) are known and the problem is to estimate the values of the input and hidden units from the output. While several approximate algorithms have been proposed for this task, there are few analytic tools that can provide rigorous guarantees in the reconstruction error. This work presents a novel and computationally tractable output-to-input inference method called Multi-Layer Vector Approximate Message Passing (ML-VAMP). The proposed algorithm, derived from expectation propagation, extends earlier AMP methods that are known to achieve the replica predictions for optimality in simple linear inverse problems. Our main contribution shows that the mean-squared error (MSE) of ML-VAMP can be exactly predicted in a certain large system limit (LSL) where the numbers of layers is fixed and weight matrices are random and orthogonally-invariant with dimensions that grow to infinity. ML-VAMP is thus a principled method for output-to-input inference in deep networks with a rigorous and precise performance achievability result in high dimensions.

Multi-Label Annotation Aggregation in Crowdsourcing
Scalable Co-Optimization of Morphology and Control in Embodied Machines
On comparing clusterings: an element-centric framework unifies overlaps and hierarchy
Faster Algorithms for Mean-Payoff Parity Games
Performance Analysis of Inband FD-D2D Communications with Imperfect SI Cancellation for Wireless Video Distribution
Recognizing and testing isomorphism of Cayley graphs over an abelian group of order $4p$ in polynomial time
A dynamic model for the two-parameter Dirichlet process
A Comparison of Resampling and Recursive Partitioning Methods in Random Forest for Estimating the Asymptotic Variance Using the Infinitesimal Jackknife
Causal Dantzig: fast inference in linear structural equation models with hidden variables under additive interventions
Compressive optical interferometry
On the Union-Closed Sets Conjecture
Satellite Imagery Feature Detection using Deep Convolutional Neural Network: A Kaggle Competition
Infinite Mixture Model of Markov Chains
Bernoulli Correlations and Cut Polytopes
On heavy-tail phenomena in some large deviations problems
Flexible High-Dimensional Unsupervised Learning with Missing Data
Spectral statistics for product matrix ensembles of Hermite type with external source
Second order logic on random rooted trees
Unsure When to Stop? Ask Your Semantic Neighbors
Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
Maximal Planar Subgraphs of Fixed Girth in Random Graphs
Comments on ‘Finite-SNR Diversity-Multiplexing Tradeoff for Network Coded Cooperative OFDMA Systems’
Using deep learning to reveal the neural code for images in primary visual cortex
Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning
Anisotropic Challenges in Pedestrian Flow Modeling
On the Optimality of Secure Communication Without Using Cooperative Jamming
Hybrid Spatio-Temporal Artificial Noise Design for Secure MIMOME-OFDM Systems
A Bayesian algorithm for detecting identity matches and fraud in image databases
Descents and des-Wilf Equivalence of Permutations Avoiding Certain Vincular and Barred Patterns
Mean-field optimal control problem of SDDE driven by fractional Brownian motion
A Location-Sentiment-Aware Recommender System for Both Home-Town and Out-of-Town Users
The Complexity of Campaigning
Lattice model for Fast Diffusion Equation
Low Resolution Face Recognition Using a Two-Branch Deep Convolutional Neural Network Architecture
Learning-based Ensemble Average Propagator Estimation
Solutions of SPDE’s associated with a stochastic flow
On excluded minors for classes of graphical matroids
Fast multi-frame image super-resolution based on MRF
Learning Graphical Models Using Multiplicative Weights
Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation
Short-Term Forecasting of Passenger Demand under On-Demand Ride Services: A Spatio-Temporal Deep Learning Approach
Non-commutative association schemes and their fusion association schemes
Controlled Reflected SDEs and Neumann Problem for Backward SPDEs
Nonparametric estimation of the kernel function of symmetric stable moving average random functions
Efficient and Accurate Machine-Learning Interpolation of Atomic Energies in Compositions with Many Species
Statistical Consistency of Kernel PCA with Random Features
Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization
Mining Significant Microblogs for Misinformation Identification: An Attention-based Approach
Spontaneous collective synchronization in the Kuramoto model with additional non-local interactions
Markov semi-groups generated by elliptic operators with divergence-free drift
Session Analysis using Plan Recognition
Fast Load Balancing Approach for Growing Clusters by Bioinformatics
Convex geometries on AT-free graphs and an application to generating the AT-free orders
Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods
Clustering-Based Quantisation for PDE-Based Image Compression
Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint
Testing for Change in Stochastic Volatility with Long Range Dependence
Consistency of the plug-in functional predictor of the Ornstein-Uhlenbeck process in Hilbert and Banach spaces
On the Representation of Involutive Jamesian Functions
Learning Markov Models from Closed Loop Data-sets
Shellability of posets of labeled partitions and arrangements defined by root systems
Improving text classification with vectors of reduced precision
Lattice Codes for Physical Layer Communications
A Thorough Formalization of Conceptual Spaces
Anticipating stochastic equation of two-dimensional second grade fluids
Almost-equidistant sets
On the Apparent Yield Stress in Non-Brownian Magnetorheological Fluids
Mixed Effect Dirichlet-Tree Multinomial for Longitudinal Microbiome Data and Weight Prediction
Domain Specific Semantic Validation of Annotations
A Perturbation Scheme for Passivity Verification and Enforcement of Parameterized Macromodels
Rotor-angle versus voltage instability in the third-order model
Distributed PCP Theorems for Hardness of Approximation in P
Investigating Exploratory Search Activities based on the Stratagem Level in Digital Libraries
Proposal of a new quantum annealing schedule for studying transverse-field-based quantum versus classical annealing of the Ising model: a case study of the Ising spin glass model on the square lattice by the Monte Carlo simulation
An efficient Algorithm to partition a Sequence of Integers into Subsets with equal Sums
Massive Connectivity with Massive MIMO-Part II: Achievable Rate Characterization
Massive Connectivity with Massive MIMO-Part I: Device Activity Detection and Channel Estimation
A New Multiple Access Technique for 5G: Power Domain Sparse Code Multiple Access (PSMA)
Quasi-homogeneity of the moduli space of stable maps to homogeneous spaces
Eccentricities in the flip-graphs of polygons
First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems
Towards Proof Synthesis Guided by Neural Machine Translation for Intuitionistic Propositional Logic
Approximating the Volume of Tropical Polytopes is Difficult
On mitigating the analytical limitations of finely stratified experiments
Advanced Steel Microstructure Classification by Deep Learning Methods
A Hybrid Method of Combinatorial Search and Gradient Descent for Discrete Optimization
Asymptotic properties of a componentwise ARH(1) plug-in predictor
Stanley-Reisner rings of Buchsbaum complexes with a free action by an abelian group
Optimal modularity and memory capacity of neural networks
Orthogonal Compaction Using Additional Bends
Unperturbed: spectral analysis beyond Davis-Kahan
A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification
On the existence of specified cycles in bipartite tournaments
A Divergence Bound for Hybrids of MCMC and Variational Inference and an Application to Langevin Dynamics and SGVI
A comparative study of breast surface reconstruction for aesthetic outcome assessment
Relative distance between tracers as a measure of diffusivity within moving aggregates
Combining edge and cloud computing for mobility analytics
Robust and Efficient Transfer Learning with Hidden-Parameter Markov Decision Processes
Grounded Language Learning in a Simulated 3D World
Technical Report for Real-Time Certified Probabilistic Pedestrian Forecasting
On the Integrality Gap of the Prize-Collecting Steiner Forest LP
Adaptive OFDM Index Modulation for Two-Hop Relay-Assisted Networks
A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization
The Distribution of Knots in the Petaluma Model