Effective Inference for Generative Neural Parsing

Generative neural models have recently achieved state-of-the-art results for constituency parsing. However, without a feasible search procedure, their use has so far been limited to reranking the output of external parsers in which decoding is more tractable. We describe an alternative to the conventional action-level beam search used for discriminative neural models that enables us to decode directly in these generative models. We then show that by improving our basic candidate selection strategy and using a coarse pruning function, we can improve accuracy while exploring significantly less of the search space. Applied to the model of Choe and Charniak (2016), our inference procedure obtains 92.56 F1 on section 23 of the Penn Treebank, surpassing prior state-of-the-art results for single-model systems.

A Shared Task on Bandit Learning for Machine Translation

We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On each of a sequence of rounds, a machine translation system is required to propose a translation for an input, and receives a real-valued estimate of the quality of the proposed translation for learning. This paper describes the shared task’s learning and evaluation setup, using services hosted on Amazon Web Services (AWS), the data and evaluation metrics, and the results of various machine translation architectures and learning protocols.

Adapting Sequence Models for Sentence Correction

In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches. Our strongest sequence-to-sequence model improves over our strongest phrase-based statistical machine translation model, with access to the same data, by 6 M2 (0.5 GLEU) points. Additionally, in the data environment of the standard CoNLL-2014 setup, we demonstrate that modeling (and tuning against) diffs yields similar or better M2 scores with simpler models and/or significantly less data than previous sequence-to-sequence approaches.

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability

Tartan (TRT), a hardware accelerator for inference with Deep Neural Networks (DNNs), is presented and evaluated on Convolutional Neural Networks. TRT exploits the variable per layer precision requirements of DNNs to deliver execution time that is proportional to the precision p in bits used per layer for convolutional and fully-connected layers. Prior art has demonstrated an accelerator with the same execution performance only for convolutional layers. Experiments on image classification CNNs show that on average across all networks studied, TRT outperforms a state-of-the-art bit-parallel accelerator by 1:90x without any loss in accuracy while it is 1:17x more energy efficient. TRT requires no network retraining while it enables trading off accuracy for additional improvements in execution performance and energy efficiency. For example, if a 1% relative loss in accuracy is acceptable, TRT is on average 2:04x faster and 1:25x more energy efficient than a conventional bit-parallel accelerator. A Tartan configuration that processes 2-bits at time, requires less area than the 1-bit configuration, improves efficiency to 1:24x over the bit-parallel baseline while being 73% faster for convolutional layers and 60% faster for fully-connected layers is also presented.

Learning to Teach Reinforcement Learning Agents

In this article we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution we formulate the problem as a learning one and propose a novel RL algorithm capable of learning when to advise, adapting to the student and the task at hand. Furthermore, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.

An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation

Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling is often done via Gaussian mixture models (GMMs), which use computationally expensive and potentially unstable training algorithms. We provide an overview of a fast and robust implementation of GMMs in the C++ language, employing multi-threaded versions of the Expectation Maximisation (EM) and k-means training algorithms. Multi-threading is achieved through reformulation of the EM and k-means algorithms into a MapReduce-like framework. Furthermore, the implementation uses several techniques to improve numerical stability and modelling accuracy. We demonstrate that the multi-threaded implementation achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can achieve higher modelling accuracy than a previously well-established publically accessible implementation. The multi-threaded implementation is included as a user-friendly class in recent releases of the open source Armadillo C++ linear algebra library. The library is provided under the permissive Apache~2.0 license, allowing unencumbered use in commercial products.

Toward the Starting Line: A Systems Engineering Approach to Strong AI

Artificial General Intelligence (AGI) or Strong AI aims to create machines with human-like or human-level intelligence, which is still a very ambitious goal when compared to the existing computing and AI systems. After many hype cycles and lessons from AI history, it is clear that a big conceptual leap is needed for crossing the starting line to kick-start mainstream AGI research. This position paper aims to make a small conceptual contribution toward reaching that starting line. After a broad analysis of the AGI problem from different perspectives, a system-theoretic and engineering-based research approach is introduced, which builds upon the existing mainstream AI and systems foundations. Several promising cross-fertilization opportunities between systems disciplines and AI research are identified. Specific potential research directions are discussed.

Deep Co-Space: Sample Mining Across Feature Transformation for Semi-Supervised Learning

Aiming at improving performance of visual classification in a cost-effective manner, this paper proposes an incremental semi-supervised learning paradigm called Deep Co-Space (DCS). Unlike many conventional semi-supervised learning methods usually performing within a fixed feature space, our DCS gradually propagates information from labeled samples to unlabeled ones along with deep feature learning. We regard deep feature learning as a series of steps pursuing feature transformation, i.e., projecting the samples from a previous space into a new one, which tends to select the reliable unlabeled samples with respect to this setting. Specifically, for each unlabeled image instance, we measure its reliability by calculating the category variations of feature transformation from two different neighborhood variation perspectives, and merged them into an unified sample mining criterion deriving from Hellinger distance. Then, those samples keeping stable correlation to their neighboring samples (i.e., having small category variation in distribution) across the successive feature space transformation, are automatically received labels and incorporated into the model for incrementally training in terms of classification. Our extensive experiments on standard image classification benchmarks (e.g., Caltech-256 and SUN-397) demonstrate that the proposed framework is capable of effectively mining from large-scale unlabeled images, which boosts image classification performance and achieves promising results compared to other semi-supervised learning methods.

Modeling and Forecasting the Evolution of Preferences over Time: A Hidden Markov Model of Travel Behavior

Literature suggests that preferences, as denoted by taste parameters and consideration sets, may evolve over time in response to changes in demographic and situational variables, psychological, sociological and biological constructs, and available alternatives and their attributes. However, existing representations typically overlook the influence of past experiences on present preferences. This study develops, applies and tests a hidden Markov model with a discrete choice kernel to model and forecast the evolution of individual preferences and behaviors over long-range forecasting horizons. The hidden states denote different preferences i.e. modes considered in the choice set, and sensitivity to level-of-service attributes. The evolutionary path of those hidden states is hypothesized to be a first-order Markov process. The framework is applied to study the evolution of travel mode preferences, or modality styles, over time, in response to a major change in the public transportation system. We use longitudinal travel diary from Santiago, Chile. The dataset consists of four one-week pseudo travel diaries collected before and after the introduction of Transantiago, a complete redesign of the public transportation system in the city. Our model identifies four modality styles in the population: drivers, bus users, bus-metro users, and auto-metro users. The modality styles differ in terms of the travel modes that they consider and their sensitivity to level-of-service attributes. At the population level, there are significant shifts in the distribution of individuals across modality styles before and after the change in the system, but the distribution is relatively stable in the periods after the change. Finally, a comparison between the proposed dynamic framework and comparable static frameworks reveals differences in aggregate forecasts for different policy scenarios.

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

Data-Driven Nested Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era

A novel data-driven nested stochastic robust optimization (DDNSRO) framework is proposed to systematically and automatically handle labeled multi-class uncertainty data in optimization problems. Uncertainty realizations in large datasets are often collected from various conditions, which are encoded by class labels. A group of Dirichlet process mixture models is employed for uncertainty modeling from the multi-class uncertainty data. The proposed data-driven nonparametric uncertainty model could automatically adjust its complexity based on the data structure and complexity, thus accurately capturing the uncertainty information. A DDNSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different classes of data; robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A tailored column-and-constraint generation algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on strategic planning of process networks are presented to demonstrate the applicability of the proposed framework.

Recurrent Ladder Networks

We propose a recurrent extension of the Ladder network, which is motivated by the inference required in hierarchical latent variable models. We demonstrate that the recurrent Ladder is able to handle a wide variety of complex learning tasks that benefit from iterative inference and temporal modeling. The architecture shows close-to-optimal results on temporal modeling of video data, competitive results on music modeling, and improved perceptual grouping based on higher order abstractions, such as stochastic textures and motion cues. We present results for fully supervised, semi-supervised, and unsupervised tasks. The results suggest that the proposed architecture and principles are powerful tools for learning a hierarchy of abstractions, handling temporal information, modeling relations and interactions between objects.

Generator Reversal

We consider the problem of training generative models with deep neural networks as generators, i.e. to map latent codes to data points. Whereas the dominant paradigm combines simple priors over codes with complex deterministic models, we propose instead to use more flexible code distributions. These distributions are estimated non-parametrically by reversing the generator map during training. The benefits include: more powerful generative models, better modeling of latent structure and explicit control of the degree of generalization.

Patterns of Multistakeholder Recommendation

Recommender systems are personalized information systems. However, in many settings, the end-user of the recommendations is not the only party whose needs must be represented in recommendation generation. Incorporating this insight gives rise to the notion of multistakeholder recommendation, in which the interests of multiple parties are represented in recommendation algorithms and evaluation. In this paper, we identify patterns of stakeholder utility that characterize different multistakeholder recommendation applications, and provide a taxonomy of the different possible systems, only some of which have currently been implemented.

The Role of Mastery Learning in Intelligent Tutoring Systems: Principal Stratification on a Latent Variable

Mastery learning–the idea that students’ mastery of target skills should govern their advancement through a curriculum–lies at the heart of the Cognitive Tutor, a computer program designed to help teach. This paper uses log data from a large-scale effectiveness trial of the Cognitive Tutor Algebra I curriculum to estimate the role mastery learning plays in the tutor’s effect, using principal stratification. A continuous principal stratification analysis models treatment effect as a function of students’ potential adherence to mastery learning. However, adherence is not observed, but may be measured as a latent variable in an item response model. This paper describes a model for mastery learning in the Cognitive Tutor that includes an item response model in the principal stratification framework, and finds that the treatment effect may in fact decrease with adherence to mastery, or may be nearly unrelated on average.

Sparse Deep Nonnegative Matrix Factorization

Nonnegative matrix factorization is a powerful technique to realize dimension reduction and pattern recognition through single-layer data representation learning. Deep learning, however, with its carefully designed hierarchical structure, is able to combine hidden features to form more representative features for pattern recognition. In this paper, we proposed sparse deep nonnegative matrix factorization models to analyze complex data for more accurate classification and better feature interpretation. Such models are designed to learn localized features or generate more discriminative representations for samples in distinct classes by imposing L_1-norm penalty on the columns of certain factors. By extending one-layer model into multi-layer one with sparsity, we provided a hierarchical way to analyze big data and extract hidden features intuitively due to nonnegativity. We adopted the Nesterov’s accelerated gradient algorithm to accelerate the computing process with the convergence rate of O(1/k^2) after k steps iteration. We also analyzed the computing complexity of our framework to demonstrate their efficiency. To improve the performance of dealing with linearly inseparable data, we also considered to incorporate popular nonlinear functions into this framework and explored their performance. We applied our models onto two benchmarking image datasets, demonstrating our models can achieve competitive or better classification performance and produce intuitive interpretations compared with the typical NMF and competing multi-layer models.

Understanding Aesthetics in Photography using Deep Convolutional Neural Networks
An introduction to the qualitative and quantitative theory of homogenization
On hypergroups arising from perturbation of semigroups
Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting
Fundamental Limits on Latency in Transceiver Cache-Aided HetNets
A Locally Adapting Technique for Boundary Detection using Image Segmentation
On the unbalanced cut problem and the generalized Sherrington-Kirkpatrick model
Recursive Variational Bayesian Dual Estimation for Nonlinear Dynamics and Non-Gaussian Observations
Bandit Convex Optimization for Scalable and Dynamic IoT Management
Deep Kernels for Optimizing Locomotion Controllers
Sectoring in Multi-cell Massive MIMO Systems
Learning from Video and Text via Large-Scale Discriminative Clustering
Early Fusion Strategy for Entity-Relationship Retrieval
Sensitivity Analysis for Unmeasured Confounding in Meta-Analyses
A stronger topology for the Brownian web
Convergence of first-order methods via the convex conjugate
Asymptotic Analysis of Mean Field Games with Small Common Noise
On Equivalence of M$^\natural$-concavity of a Set Function and Submodularity of Its Conjugate
Benchmarking 6DOF Urban Visual Localization in Changing Conditions
How Often Should CSI be Updated for Massive MIMO Systems with Massive Connectivity?
Beamspace Channel Estimation in mmWave Systems via Cosparse Image Reconstruction Technique
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
Object Detection of Satellite Images Using Multi-Channel Higher-order Local Autocorrelation
STD-PD: Generating Synthetic Training Data for Pedestrian Detection in Unannotated Videos
Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
Rotundus: triangulations, Chebyshev polynomials, and Pfaffians
Ensemble Performance of Biometric Authentication Systems Based on Secret Key Generation
The Convergence of Least-Squares Progressive Iterative Approximation with Singular Iterative Matrix
Almost Everywhere Matrix Recovery
Adaptive Inferential Method for Monotone Graph Invariants
The critical group of the Kneser graph on $2$-subsets of an $n$-element set
Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation
Counting Planar Eulerian Orientations
Network Formation in the Sky: Unmanned Aerial Vehicles for Multi-hop Wireless Backhauling
Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising
Localizing Actions from Video Labels and Pseudo-Annotations
Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions
A spectrahedral representation of the first derivative relaxation of the positive semidefinite cone
Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization
Super-replication with proportional transaction cost under model uncertainty
Empirical Bayes Estimators for High-Dimensional Sparse Vectors
Quantum Enhanced Classical Sensor Networks
Learning to Predict Charges for Criminal Cases with Legal Basis
An application of proof mining to the proximal point algorithm in CAT(0) spaces
Group Re-Identification via Unsupervised Transfer of Sparse Features Encoding
Generalized deconvolution procedure for structural modeling of turbulence
Newton-Raphson Consensus under asynchronous and lossy communications for peer-to-peer networks
Sparse Identification and Estimation of High-Dimensional Vector AutoRegressive Moving Averages
Linear Programming Formulations of Singular Stochastic Control Problems: Time-Homogeneous Problems
Maximal Bootstrap Percolation Time on the Hypercube via Generalised Snake-in-the-Box
Extracting Event-Centric Document Collections from Large-Scale Web Archives
Mathematical Programming formulations for the efficient solution of the $k$-sum approval voting problem
Improving coreference resolution with automatically predicted prosodic information
A weighting strategy for Active Shape Models
Consistency models with global operation sequencing and their composition (extended version)
Flocking estimates for the Cucker-Smale model with time lag and hierarchical leadership
Small simplicial complexes with prescribed torsion in homology
Optimal tests for circular reflective symmetry about an unknown central direction
Central limit theorems for the real zeros of Weyl polynomials
Simplified Energy Landscape for Modularity Using Total Variation
The WILDTRACK Multi-Camera Person Dataset
Multi-point Gaussian states, quadratic-exponential cost functionals, and large deviations estimates for linear quantum stochastic systems
Monomial tropical cones for multicriteria optimization
Self-Synchronization in Duty-cycled Internet of Things (IoT) Applications
A Minimum-Cost Flow Model for Workload Optimization on Cloud Infrastructure
Empirical Evaluation of Abstract Argumentation: Supporting the Need for Bipolar and Probabilistic Approaches
Two Hilbert schemes in computer vision
Pell and Clapeyron Words as Stable Trajectories in Dynamical Systems
Compressive Sensing with Cross-Validation and Stop-Sampling for Sparse Polynomial Chaos Expansions
Macroscopic loops in the loop $O(n)$ model at Nienhuis’ critical point
On Robust Stability of Switched Systems in the Context of Filippov Solutions
Generic second-order macroscopic traffic node model for general multi-input multi-output road junctions via a dynamic system approach
Out-degree reducing partitions of digraphs
Centrality measures for graphons
Nash equilibria for game contingent claims with utility-based hedging