Data-driven Natural Language Generation: Paving the Road to Success

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more reliable metric. The second problem is addressed by presenting a novel framework for developing and evaluating a high quality corpus for NLG training.

(Machine) Learning to Do More with Less

Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard ‘fully supervised’ approach (that relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called ‘weakly supervised’ technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided at https://…/master.

Neural SLAM

We present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAM-like behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where long-term memory is essential. We validate our approach in both challenging grid-world environments and preliminary Gazebo experiments.

Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.

Distributional Adversarial Networks

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination. Inspired by discrepancy measures and two-sample tests between probability distributions, we propose two such distributional adversaries that operate and predict on samples, and show how they can be easily implemented on top of existing models. Various experimental results show that generators trained with our distributional adversaries are much more stable and are remarkably less prone to mode collapse than traditional models trained with pointwise prediction discriminators. The application of our framework to domain adaptation also results in considerable improvement over recent state-of-the-art.

Machine Listening Intelligence

This manifesto paper will introduce machine listening intelligence, an integrated research framework for acoustic and musical signals modelling, based on signal processing, deep learning and computational musicology.

Online Convolutional Dictionary Learning

While a number of different algorithms have recently been proposed for convolutional dictionary learning, this remains an expensive problem. The single biggest impediment to learning from large training sets is the memory requirements, which grow at least linearly with the size of the training set since all existing methods are batch algorithms. The work reported here addresses this limitation by extending online dictionary learning ideas to the convolutional context.

Bayesian Semisupervised Learning with Deep Generative Models

Neural network based generative models with discriminative components are a powerful approach for semi-supervised learning. However, these techniques a) cannot account for model uncertainty in the estimation of the model’s discriminative component and b) lack flexibility to capture complex stochastic patterns in the label generation process. To avoid these problems, we first propose to use a discriminative component with stochastic inputs for increased noise flexibility. We show how an efficient Gibbs sampling procedure can marginalize the stochastic inputs when inferring missing labels in this model. Following this, we extend the discriminative component to be fully Bayesian and produce estimates of uncertainty in its parameter values. This opens the door for semi-supervised Bayesian active learning.

A Fixed-Point of View on Gradient Methods for Big Data

Using their interpretation as fixed-point iterations, we review first order gradient methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, first order gradient methods are widely used in machine learning methods involving massive datasets. In particular, stochastic first order methods are considered the de-facto standard for training deep neural networks. By studying these methods within fixed-point theory provides us with powerful tools to study the convergence properties of a wide range of gradient methods. In particular, first order methods using inexact or noisy gradients, such as in stochastic gradient descent, can be studied using well-known results on inexact fixed-point iterations. Moreover, as illustrated clearly in this paper, the fixed-point picture allows an elegant derivation of accelerations for basic gradient methods. In particular, we show how gradient descent can be accelerated by an fixed- point preserving transformation of an operator associated with the objective function.

Summarization of ICU Patient Motion from Multimodal Multiview Videos
Approximate Quantum Error Correction Revisited: Introducing the Alphabit
Asymptotic dimensioning of stochastic service systems
You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data
Robust Regulation of Infinite-Dimensional Port-Hamiltonian Systems
On the tightness of Gaussian concentration for convex functions
Cooperative Vehicle Speed Fault Diagnostics and Correction
The application of deep convolutional neural networks to ultrasound for modelling of dynamic states within human skeletal muscle
Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations
On the heat content for the poisson kernels over sets of finite perimeter
A Markov decision process approach to optimizing cancer therapy using multiple modalities
User Clustering for Multicast Precoding in Multi-Beam Satellite Systems
All properly ergodic Markov chains over a free group are orbit equivalent
Parameterized Algorithms for Partitioning Graphs into Highly Connected Clusters
Berry-Esseen Theorem and Quantitative homogenization for the Random Conductance Model with degenerate Conductances
Grid-forming Control for Power Converters based on Matching of Synchronous Machines
Real-time Distracted Driver Posture Classification
Asymptotic results in weakly increasing subsequences in random words
Fighting biases with dynamic boosting
Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects
Mallows Permutations and Finite Dependence
Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold
Metric duality between positive definite kernels and boundary processes
Multi-district preference modelling
Flow-free Video Object Segmentation
Toward Inverse Control of Physics-Based Sound Synthesis
Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations
Transforming Musical Signals through a Genre Classifying Convolutional Neural Network
Music Signal Processing Using Vector Product Neural Networks
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Talking Drums: Generating drum grooves with neural networks
Audio Spectrogram Representations for Processing with Convolutional Neural Networks
Frame-Based Continuous Lexical Semantics through Exponential Family Tensor Factorization and Semantic Proto-Roles
Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition
On the analysis of personalized medication response and classification of case vs control patients in mobile health studies: the mPower case study
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
Online Reweighted Least Squares Algorithm for Sparse Recovery and Application to Short-Wave Infrared Imaging
Multi-scale Multi-band DenseNets for Audio Source Separation
Defining Equitable Geographic Districts in Road Networks via Stable Matching
Path Integral Networks: End-to-End Differentiable Optimal Control
CS591 Report: Application of siamesa network in 2D transformation
Distributed model predictive control for continuous-time nonlinear systems based on suboptimal ADMM
Actor-Critic Sequence Training for Image Captioning
A Novel Tool to Evaluate the Accuracy of Predicting Survival in Cystic Fibrosis
Statistical methods for evaluating the time-dependent sensitivity and specificity of biomarker-driven dynamic decision-making
A sharp recovery condition for sparse signals with partial support information via orthogonal matching pursuit
Token Jumping in minor-closed classes
Recovery of signals by a weighted $\ell_2/\ell_1$ minimization under arbitrary prior support information
Simultaneous Lightwave Information and Power Transfer (SLIPT) for Indoor IoT Applications
Deep learning bank distress from news and numerical financial data
Quantum Bernstein’s Theorem and the Hyperoctahedral Quantum Group
Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images
On magic factors in Stein’s method for compound Poisson approximation
Central limit theorem and Diophantine approximations
Machine Learning Approaches to Energy Consumption Forecasting in Households
Counting chambers in restricted Coxeter arrangements
Co-salient Object Detection Based on Deep Saliency Networks and Seed Propagation over an Integrated Graph
Forward Backward Stochastic Differential Equation Games with Delay and Noisy Memory
Management of a hydropower system via convex duality
Asymptotics for the Discrete-Time Average of the Geometric Brownian Motion and Asian Options
CLT for fluctuations of $β$-ensembles with general potential
Comparing Information-Theoretic Measures of Complexity in Boltzmann Machines
Thermal conductivity for a system of harmonic oscillators in a magnetic field with noise
Improving Distributed Representations of Tweets – Present and Future
Depth and regularity modulo a principal ideal
Plane Graphs are Facially-non-repetitively $10^{4 \cdot10^7}$-Choosable
Power-Based Direction-of-Arrival Estimation Using a Single Multi-Mode Antenna
Speaker Identification in the Shouted Environment Using Suprasegmental Hidden Markov Models
Image classification using local tensor singular value decompositions
On the relation between representations and computability
Composition of Gray Isometries
Topology Reconstruction of Dynamical Networks via Constrained Lyapunov Equations
Thermodynamic efficiency of learning a rule in neural networks
Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models
Independence characterization for Wishart and Kummer matrices
Iterative Spectral Clustering for Unsupervised Object Localization
Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments
Hausdorff Dimension of the Record Set of a Fractional Brownian
Stein approximation for functionals of independent random sequences
Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s
Stronger Baselines for Trustable Results in Neural Machine Translation
Speaking Style Authentication Using Suprasegmental Hidden Markov Models
Indoor UAV scheduling with Restful Task Assignment Algorithm
On arithmetic of one class of plane maps
A Deep Multimodal Approach for Cold-start Music Recommendation
From Individual Motives to Partial Consensus: A Dynamic Game Model
Refractory period in network models of excitable nodes: self-sustaining stable dynamics, extended scaling region and oscillatory behavior
On Sampling Edges Almost Uniformly
Bounds on Information Combining With Quantum Side Information
Speaker Identification Investigation and Analysis in Unbiased and Biased Emotional Talking Environments
Using Second-Order Hidden Markov Model to Improve Speaker Identification Recognition Performance under Neutral Condition
Dynamical selection of Nash equilibria using Experience Weighted Attraction Learning: emergence of heterogeneous mixed equilibria
Multi-tap Digital Canceller for Full-Duplex Applications
Numerical Semigroups and Codes
Speaker Identification in each of the Neutral and Shouted Talking Environments based on Gender-Dependent Approach Using SPHMMs
New Lower Bounds on the Generalized Hamming Weights of AG Codes
Numerical assessment of two-level domain decomposition preconditioners for incompressible Stokes and elasticity equations
Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
Many-body localization caused by temporal disorder
Feature uncertainty bounding schemes for large robust nonlinear SVM classifiers
A note on selective inference after likelihood- or test-based model selection in linear models
Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation
Robust Face Tracking using Multiple Appearance Models and Graph Relational Learning
On the Bickel-Rosenblatt test of goodness-of-fit for the residuals of autoregressive processes
Generalization Error Bounds for Extreme Multi-class Classification
Cooperative Slotted ALOHA for Massive M2M Random Access Using Directional Antennas
Structural Analysis and Optimal Design of Distributed System Throttlers
Classification of Population Using Voronoi Area Based Density Estimation
Linear Estimation of Treatment Effects in Demand Response: An Experimental Design Approach
New Fairness Metrics for Recommendation that Embrace Differences
Election forensic analysis of the Turkish Constitutional Referendum 2017
Heitler-London model for acceptor-acceptor interactions in doped semiconductors
Runaway Feedback Loops in Predictive Policing
Quantum computation with indefinite causal structures
Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations
What’s Mine is Yours: Pretrained CNNs for Limited Training Sonar ATR
Rational Trust Modeling
Superdiffusions with large mass creation — construction and growth estimates
Generalising Random Forest Parameter Optimisation to Include Stability and Cost
Approximate Maximin Shares for Groups of Agents
Importance sampling and delayed acceptance via a Peskun type ordering
Scale-Aware Face Detection
Towards Understanding the Dynamics of Generative Adversarial Networks
Automatic Face Image Quality Prediction
Fast model-fitting of Bayesian variable selection regression using the iterative complex factorization algorithm