Probabilistic Model-Agnostic Meta-Learning

Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data. However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be too ambiguous to acquire a single model (e.g., a classifier) for that task that is accurate. In this paper, we propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution. Our approach extends model-agnostic meta-learning, which adapts to new tasks via gradient descent, to incorporate a parameter distribution that is trained via a variational lower bound. At meta-test time, our algorithm adapts via a simple procedure that injects noise into gradient descent, and at meta-training time, the model is trained such that this stochastic adaptation procedure produces samples from the approximate model posterior. Our experimental results show that our method can sample plausible classifiers and regressors in ambiguous few-shot learning problems.

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.

A bootstrap test for equality of variances

We introduce a bootstrap procedure to test the hypothesis H_o that K+1 variances are homogeneous. The procedure uses a variance-based statistic, and is derived from a normal-theory test for equality of variances. The test equivalently expressed the hypothesis as H_o: \mathbf{\eta}=( \eta_1,\ldots,\eta_{K+1})^T=\mathbf{0}, where \eta_i‘s are log contrasts of the population variances. A box-type acceptance region is constructed to test the hypothesis H_o. Simulation results indicated that our method is generally superior to the Shoemaker and Levene tests, and the bootstrapped version of Levene test in controlling the Type I and Type II errors.

Semi-Supervised Learning via Compact Latent Space Clustering

We present a novel cost function for semi-supervised learning of neural networks that encourages compact clustering of the latent space to facilitate separation. The key idea is to dynamically create a graph over embeddings of labeled and unlabeled samples of a training batch to capture underlying structure in feature space, and use label propagation to estimate its high and low density regions. We then devise a cost function based on Markov chains on the graph that regularizes the latent space to form a single compact cluster per class, while avoiding to disturb existing clusters during optimization. We evaluate our approach on three benchmarks and compare to state-of-the art with promising results. Our approach combines the benefits of graph-based regularization with efficient, inductive inference, does not require modifications to a network architecture, and can thus be easily applied to existing networks to enable an effective use of unlabeled data.

Scalable Bayesian Nonparametric Clustering and Classification

We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is ’embarrassingly parallel’ and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.

Re-evaluating evaluation

Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation — since there is no harm (computational cost aside) from including all available tasks and agents.

Dimensionality-Driven Learning with Noisy Labels

Datasets with significant proportions of noisy (incorrect) class labels present challenges for training accurate Deep Neural Networks (DNNs). We propose a new perspective for understanding DNN generalization for such datasets, by investigating the dimensionality of the deep representation subspace of training samples. We show that from a dimensionality perspective, DNNs exhibit quite distinctive learning styles when trained with clean labels versus when trained with a proportion of noisy labels. Based on this finding, we develop a new dimensionality-driven learning strategy, which monitors the dimensionality of subspaces during training and adapts the loss function accordingly. We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution.

Multi-Source Neural Machine Translation with Missing Data

Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports). Existing studies on multi-source translation did not explicitly handle such situations. This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol <NULL>. These methods allow us to use incomplete corpora both at training time and test time. In experiments with real incomplete multilingual corpora of TED Talks, the multi-source NMT with the <NULL> tokens achieved higher translation accuracies measured by BLEU than those by any one-to-one NMT systems.

GP-RVM: Genetic Programing-based Symbolic Regression Using Relevance Vector Machine

This paper proposes a hybrid basis function construction method (GP-RVM) for Symbolic Regression problem, which combines an extended version of Genetic Programming called Kaizen Programming and Relevance Vector Machine to evolve an optimal set of basis functions. Different from traditional evolutionary algorithms where a single individual is a complete solution, our method proposes a solution based on linear combination of basis functions built from individuals during the evolving process. RVM which is a sparse Bayesian kernel method selects suitable functions to constitute the basis. RVM determines the posterior weight of a function by evaluating its quality and sparsity. The solution produced by GP-RVM is a sparse Bayesian linear model of the coefficients of many non-linear functions. Our hybrid approach is focused on nonlinear white-box models selecting the right combination of functions to build robust predictions without prior knowledge about data. Experimental results show that GP-RVM outperforms conventional methods, which suggest that it is an efficient and accurate technique for solving SR. The computational complexity of GP-RVM scales in O( M^{3}), where M is the number of functions in the basis set and is typically much smaller than the number N of training patterns.

MEBN-RM: A Mapping between Multi-Entity Bayesian Network and Relational Model

Multi-Entity Bayesian Network (MEBN) is a knowledge representation formalism combining Bayesian Networks (BN) with First-Order Logic (FOL). MEBN has sufficient expressive power for general-purpose knowledge representation and reasoning. Developing a MEBN model to support a given application is a challenge, requiring definition of entities, relationships, random variables, conditional dependence relationships, and probability distributions. When available, data can be invaluable both to improve performance and to streamline development. By far the most common format for available data is the relational database (RDB). Relational databases describe and organize data according to the Relational Model (RM). Developing a MEBN model from data stored in an RDB therefore requires mapping between the two formalisms. This paper presents MEBN-RM, a set of mapping rules between key elements of MEBN and RM. We identify links between the two languages (RM and MEBN) and define four levels of mapping from elements of RM to elements of MEBN. These definitions are implemented in the MEBN-RM algorithm, which converts a relational schema in RM to a partial MEBN model. Through this research, the software has been released as a MEBN-RM open-source software tool. The method is illustrated through two example use cases using MEBN-RM to develop MEBN models: a Critical Infrastructure Defense System and a Smart Manufacturing System.

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a \emph{simple and explicit finite time analysis} of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to Q-learning applied to high dimensional optimal stopping problems.

Studying the Difference Between Natural and Programming Language Corpora

Code corpora, as observed in large software systems, are now known to be far more repetitive and predictable than natural language corpora. But why Does the difference simply arise from the syntactic limitations of programming languages Or does it arise from the differences in authoring decisions made by the writers of these natural and programming language texts We conjecture that the differences are not entirely due to syntax, but also from the fact that reading and writing code is un-natural for humans, and requires substantial mental effort; so, people prefer to write code in ways that are familiar to both reader and writer. To support this argument, we present results from two sets of studies: 1) a first set aimed at attenuating the effects of syntax, and 2) a second, aimed at measuring repetitiveness of text written in other settings (e.g. second language, technical/specialized jargon), which are also effortful to write. We find find that this repetition in source code is not entirely the result of grammar constraints, and thus some repetition must result from human choice. While the evidence we find of similar repetitive behavior in technical and learner corpora does not conclusively show that such language is used by humans to mitigate difficulty, it is consistent with that theory.

Causal Interventions for Fairness

Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already suffered from discrimination and lost opportunities due to factors out of their control. In the present work we focus instead on interventions such as a new public policy, and in particular, how to maximize their positive effects while improving the fairness of the overall system. We use causal methods to model the effects of interventions, allowing for potential interference–each individual’s outcome may depend on who else receives the intervention. We demonstrate this with an example of allocating a budget of teaching resources using a dataset of schools in New York City.

Predictive Analysis on Twitter: Techniques and Applications

Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories.

Dempsterian-Shaferian Belief Network From Data

Shenoy and Shafer {Shenoy:90} demonstrated that both for Dempster-Shafer Theory and probability theory there exists a possibility to calculate efficiently marginals of joint belief distributions (by so-called local computations) provided that the joint distribution can be decomposed (factorized) into a belief network. A number of algorithms exists for decomposition of probabilistic joint belief distribution into a bayesian (belief) network from data. For example Spirtes, Glymour and Schein{Spirtes:90b} formulated a Conjecture that a direct dependence test and a head-to-head meeting test would suffice to construe bayesian network from data in such a way that Pearl’s concept of d-separation {Geiger:90} applies. This paper is intended to transfer Spirtes, Glymour and Scheines {Spirtes:90b} approach onto the ground of the Dempster-Shafer Theory (DST). For this purpose, a frequentionistic interpretation of the DST developed in {Klopotek:93b} is exploited. A special notion of conditionality for DST is introduced and demonstrated to behave with respect to Pearl’s d-separation {Geiger:90} much the same way as conditional probability (though some differences like non-uniqueness are evident). Based on this, an algorithm analogous to that from {Spirtes:90b} is developed. The notion of a partially oriented graph (pog) is introduced and within this graph the notion of p-d-separation is defined. If direct dependence test and head-to-head meeting test are used to orient the pog then its p-d-separation is shown to be equivalent to the Pearl’s d-separation for any compatible dag.

Towards Dependability Metrics for Neural Networks

Neural networks and other data engineered models are instrumental in developing automated driving components such as perception or intention prediction. The safety-critical aspect of such a domain makes dependability of neural networks a central concern for long living systems. Hence, it is of great importance to support the development team in evaluating important dependability attributes of the machine learning artifacts during their development process. So far, there is no systematic framework available in which a neural network can be evaluated against these important attributes. In this paper, we address this challenge by proposing eight metrics that characterize the robustness, interpretability, completeness, and correctness of machine learning artifacts, enabling the development team to efficiently identify dependability issues.

Understanding Batch Normalization

Batch normalization is a ubiquitous deep learning technique that normalizes activations in intermediate layers. It is associated with improved accuracy and faster learning, but despite its enormous success there is little consensus regarding why it works. We aim to rectify this and take an empirical approach to understanding batch normalization. Our primary observation is that the higher learning rates that batch normalization enables have a regularizing effect that dramatically improves generalization of normalized networks, which is both demonstrated empirically and motivated theoretically. We show how activations become large and how the convolutional channels become increasingly ill-behaved for layers deep in unnormalized networks, and how this results in larger input-independent gradients. Beyond just gradient scaling, we demonstrate how the learning rate in unnormalized networks is further limited by the magnitude of activations growing exponentially with network depth for large parameter updates, a problem batch normalization trivially avoids. Motivated by recent results in random matrix theory, we argue that ill-conditioning of the activations is due to fluctuations in random initialization, shedding new light on classical initialization schemes and their consequences.

Population growth with correlated generation times at the single-cell level
Randomly Perturbed Ergodic Averages
Data Summarization at Scale: A Two-Stage Submodular Approach
Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings
Towards Riemannian Accelerated Gradient Methods
Unbiased Estimation of the Value of an Optimized Policy
Estimation of Mittag-Leffler Parameters
Domain Adversarial Training for Accented Speech Recognition
Weak dynamic monopolies in social graphs
Training Augmentation with Adversarial Examples for Robust Speech Recognition
Integral uniform global asymptotic stability and non-coercive Lyapunov functions
Stein Variational Gradient Descent Without Gradient
Parameter estimation for fractional Poisson processes
Structural Rounding: Approximation Algorithms for Graphs Near an Algorithmically Tractable Class
Partial vertex covers and the complexity of some problems concerning static and dynamic monopolies
Performance of Hierarchical Sparse Detectors for Massive MTC
Evaluating surgical skills from kinematic data using convolutional neural networks
A stratified age-period-cohort model for spatial heterogeneity in all-cause mortality
Quantum accelerated approach to the thermal state of classical spin systems with applications to pattern-retrieval in the Hopfield neural network
On the rate of convergence of empirical barycentres in metric spaces: curvature, convexity and extendible geodesics
Discovering space – Grounding spatial topology and metric regularity in a naive agent’s sensorimotor experience
A supercongruence concerning truncated hypergeometric series ${}_nF_{n-1}$
Inertial lower bounds for the orthogonal and projective ranks of a graph
Connectedness of projective codes in the Grassmann graph
Downlink Interference Management in Dense Interference-Aware Drone Small Cells Networks Using Mean-Field Game Theory
A Challenge Set for French –> English Machine Translation
Speaker-Follower Models for Vision-and-Language Navigation
Alignment-free sequence comparison using absent words
A Study of EV BMS Cyber Security Based on Neural Network SOC Prediction
POTs: The revolution will not be optimized
Multiobjective Test Problems with Degenerate Pareto Fronts
Efficient semantic image segmentation with superpixel pooling
Dwarf in a Giant: Enabling Scalable, High-Resolution HPC Energy Monitoring for Real-Time Profiling and Analytics
Gradient Method for Optimization on Riemannian Manifolds with Lower Bounded Curvature
Locally Recoverable codes from algebraic curves with separated variables
An Experimental Mathematics Approach to the Area Statistic of Parking Functions
Correlation bounds for fields and matroids
Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning
Scalable Multi-Class Bayesian Support Vector Machines for Structured and Unstructured Data
Super-Resolution using Convolutional Neural Networks without Any Checkerboard Artifacts
On a characterization of the Grassmann graphs
The log-Lévy moment problem via Berg-Urbanik semigroups
Multichannel social signatures and persistent features of ego networks
Path-Level Network Transformation for Efficient Architecture Search
Fast Approximate Counting and Leader Election in Populations
Cooperative Authentication in Underwater Acoustic Sensor Networks
Role of Symmetry in Irrational Choice
On the spectral determinations of the connected multicone graphs $ K_r\bigtriangledown sK_t $
Spectral Network Embedding: A Fast and Scalable Method via Sparsity
Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization
Nonparametric Density Flows for MRI Intensity Normalisation
Learning Multi-Modal Self-Awareness Models for Autonomous Vehicles from Human Driving
Code Design for Non-Coherent Detection of Frame Headers in Precoded Satellite Systems
The $A_α$-spectral radius of graphs with given degree sequence
On Predictive Density Estimation under $α$-divergence Loss
Performance evaluation of a novel relay assisted hybrid FSO / RF communication system with receive diversity
Fast Consensus Protocols in the Asynchronous Poisson Clock Model with Edge Latencies
Inference for a constrained parameter in presence of an uncertain constraint
Designing Experiments to Measure Incrementality on Facebook
Fault-Tolerant Control of Linear Quantum Stochastic Systems
Generative Adversarial Networks for Realistic Synthesis of Hyperspectral Samples
Investigating Spatiotemporal Dynamics and Synchrony of Influenza Epidemics in Australia: An Agent-Based Modelling Approach
$k$-Sets and Rectilinear Crossings in Complete Uniform Hypergraphs
Accessing eigenstate spin-glass order from reduced density matrices
AI-based Two-Stage Intrusion Detection for Software Defined IoT Networks
Branching random walk in the presence of a hard wall
On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation
Shape Robust Text Detection with Progressive Scale Expansion Network
Ermes: Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification
Irreversible Work Reduction by Disorder in Many-Body Quantum Systems
Undirected network models with degree heterogeneity and homophily
Grouped Gaussian Processes for Solar Power Prediction
PMU Placement Optimization for Smart Grid Obvervability and State Estimation
Segment-Based Credit Scoring Using Latent Clusters in the Variational Autoencoder
Recursive Estimation of Dynamic RSS Fields Based on Crowdsourcing and Gaussian Processes
Information-Maximizing Sampling to Promote Tracking-by-Detection
Multi-cell Hybrid Millimeter Wave Systems: Pilot Contamination and Interference Mitigation
Importance weighted generative networks
Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements
Removing Algorithmic Discrimination (With Minimal Individual Error)
Fast Distributed Deep Learning via Worker-adaptive Batch Sizing
Large scale classification in deep neural network with Label Mapping
Simplifying Reward Design through Divide-and-Conquer
Conditional probability calculation using restricted Boltzmann machine with application to system identification
Secure and Decentralized Swarm Behavior with Autonomous Agents for Smart Cities
Structured Actor-Critic for Managing and Dispensing Public Health Inventory
Quantitative Assessment of Robotic Swarm Coverage
Stochastic Block Models are a Discrete Surface Tension
Splitting loops and necklaces: Variants of the square peg problem
Pinned, locked, pushed, and pulled traveling waves in structured environments
Interlinked Convolutional Neural Networks for Face Parsing
Accelerating Greedy Coordinate Descent Methods
Understanding State level Variations in U.S. Infant Mortality: 2000 to 2015
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
Prioritized Threshold Allocation for Distributed Frequency Response
Shape theorem and surface fluctuation for Poisson cylinders
An introduction to stochastic processes associated with resistance forms and their scaling limits
Strain localisation above the yielding point in cyclically deformed glasses
The effect of the choice of neural network depth and breadth on the size of its hypothesis space
Fault Tolerant Control for Networked Mobile Robots
On Bayesian inferential tasks with infinite-state jump processes: efficient data augmentation
Reference Model of Multi-Entity Bayesian Networks for Predictive Situation Awareness
Progressive Reasoning by Module Composition
NumtaDB – Assembled Bengali Handwritten Digits
Joint Power Allocation in Interference-Limited Networks via Distributed Coordinated Learning
Deep Reinforcement Learning for General Video Game AI
Deep Ordinal Regression Network for Monocular Depth Estimation
A power series identity and Bessel-type integrals over unitary groups
Optimal Energy Consumption Forecast for Grid Responsive Buildings: A Sensitivity Analysis
Identifying Heritable Communities of Microbiome by Root-Unifrac and Wishart Distribution
Bayesian Inference for Diffusion Processes: Using Higher-Order Approximations for Transition Densities
Deep Variational Reinforcement Learning for POMDPs
Action4D: Real-time Action Recognition in the Crowd and Clutter
Human-aided Multi-Entity Bayesian Networks Learning from Relational Data
Simulating the stochastic dynamics and cascade failure of power networks
A Likelihood-based Alternative to Null Hypothesis Significance Testing
Finding Convincing Arguments Using Scalable Bayesian Preference Learning
Gaussian Mixture Reduction for Time-Constrained Approximate Inference in Hybrid Bayesian Networks
Outcome identification in electronic health records using predictions from an enriched Dirichlet process mixture
Polar Code Moderate Deviation: Recovering the Scaling Exponent
Localized Structured Prediction
A Comparative Study on Unsupervised Domain Adaptation Approaches for Coffee Crop Mapping
Maximum and minimum nullity of a tree degree sequence
Resource Provisioning and Scheduling Algorithm for Meeting Cost and Deadline-Constraints of Scientific Workflows in IaaS Clouds
On Maximizing Safety in Stochastic Aircraft Trajectory Planning with Uncertain Thunderstorm Development
Variational Implicit Processes
$d_{\mathcal{X}}$-Private Mechanisms for Linear Queries
Efficient Collection of Connected Vehicle Data based on Compressive Sensing
Universal Conditional Machine
Adversarial Attack on Graph Structured Data
Design of CMOS-memristor Circuits for LSTM architecture
A new family of bijections for planar maps
Localization-Driven Correlated States of Two Isolated Interacting Helical Edges
Mitigating Bias in Adaptive Data Gathering via Differential Privacy
Proofs of two conjectures on Catalan triangle numbers
Fractional Fokker-Planck equation from non-singular kernel operators
Inhomogeneous Fokker-Planck equation as an homogeneous one from deformed derivative framework
Spatial Frequency Loss for Learning Convolutional Autoencoders
New Hybrid Neuro-Evolutionary Algorithms for Renewable Energy and Facilities Management Problems
Traffic state estimation using stochastic Lagrangian dynamics
Fast Context-Annotated Classification of Different Types of Web Service Descriptions
New mechanism for repeated posted price auction with a strategic buyer without discounting
D2D Communications Underlaying Wireless Powered Communication Networks