Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

Collecting large training datasets, annotated with high quality labels, is a costly process. This paper proposes a novel framework for training deep convolutional neural networks from noisy labeled datasets. The problem is formulated using an undirected graphical model that represents the relationship between noisy and clean labels, trained in a semi-supervised setting. In the proposed structure, the inference over latent clean labels is tractable and is regularized during training using auxiliary sources of information. The proposed model is applied to the image labeling problem and is shown to be effective in labeling unseen images as well as reducing label noise in training on CIFAR-10 and MS COCO datasets.

Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks

Natural language generation (NLG) is a critical component in a spoken dialogue system. This paper presents a Recurrent Neural Network based Encoder-Decoder architecture, in which an LSTM-based decoder is introduced to select, aggregate semantic elements produced by an attention mechanism over the input elements, and to produce the required utterances. The proposed generator can be jointly trained both sentence planning and surface realization to produce natural language sentences. The proposed model was extensively evaluated on four different NLG datasets. The experimental results showed that the proposed generators not only consistently outperform the previous methods across all the NLG domains but also show an ability to generalize from a new, unseen domain and learn from multi-domain datasets.

Deep Learning for Hate Speech Detection in Tweets

Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify a tweet as racist, sexist or neither. The complexity of the natural language constructs makes this task very challenging. We perform extensive experiments with multiple deep learning architectures to learn semantic word embeddings to handle this complexity. Our experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.

A fast algorithm for the gas station problem

In the gas station problem we want to find the cheapest path between two vertices of an n-vertex graph. Our car has a specific fuel capacity and at each vertex we can fill our car with gas, with the fuel cost depending on the vertex. Furthermore, we are allowed at most \Delta stops for refuelling. In this short paper we provide an algorithm solving the problem in O(\Delta n^2 + n^2\log{n}) steps improving an earlier result by Khuller, Malekian and Mestre.

Supervised Quantile Normalisation

Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after normalisation, they follow the same target distribution for each sample. Choosing a ‘good’ target distribution remains however largely empirical and heuristic, and is usually done independently of the subsequent analysis of normalised data. We propose instead to couple the quantile normalisation step with the subsequent analysis, and to optimise the target distribution jointly with the other parameters in the analysis. We illustrate this principle on the problem of estimating a linear model over normalised data, and show that it leads to a particular low-rank matrix regression problem that can be solved efficiently. We illustrate the potential of our method, which we term SUQUAN, on simulated data, images and genomic data, where it outperforms standard quantile normalisation.

A graph model of message passing processes

In the paper we consider a graph model of message passing processes and present a method verification of message passing processes. The method is illustrated by an example of a verification of sliding window protocol.

Integer Echo State Networks: Hyperdimensional Reservoir Computing

We propose an integer approximation of Echo State Networks (ESN) based on the mathematics of hyperdimensional computing. The reservoir of the proposed Integer Echo State Network (intESN) contains only n-bits integers and replaces the recurrent matrix multiply with an efficient cyclic shift operation. Such an architecture results in dramatic improvements in memory footprint and computational efficiency, with minimal performance loss. Our architecture naturally supports the usage of the trained reservoir in symbolic processing tasks of analogy making and logical inference.

Discriminative k-shot learning using probabilistic models

This paper introduces a probabilistic framework for k-shot image classification. The goal is to generalise from an initial large-scale classification task to a separate task comprising new classes and small numbers of examples. The new approach not only leverages the feature-based representation learned by a neural network from the initial task (representational transfer), but also information about the form of the classes (concept transfer). The concept information is encapsulated in a probabilistic model for the final layer weights of the neural network which then acts as a prior when probabilistic k-shot learning is performed. Surprisingly, simple probabilistic models and inference schemes outperform many existing k-shot learning approaches and compare favourably with the state-of-the-art method in terms of error-rate. The new probabilistic methods are also able to accurately model uncertainty, leading to well calibrated classifiers, and they are easily extensible and flexible, unlike many recent approaches to k-shot learning.

Deep Mutual Learning

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary — mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.

DiracNets: Training Very Deep Neural Networks Without Skip-Connections

Deep neural networks with skip-connections, such as ResNet, show excellent performance in various image classification benchmarks. It is though observed that the initial motivation behind them – training deeper networks – does not actually hold true, and the benefits come from increased capacity, rather than from depth. Motivated by this, and inspired from ResNet, we propose a simple Dirac weight parameterization, which allows us to train very deep plain networks without skip-connections, and achieve nearly the same performance. This parameterization has a minor computational cost at training time and no cost at all at inference. We’re able to achieve 95.5% accuracy on CIFAR-10 with 34-layer deep plain network, surpassing 1001-layer deep ResNet, and approaching Wide ResNet. Our parameterization also mostly eliminates the need of careful initialization in residual and non-residual networks. The code and models for our experiments are available at https://…/diracnets

Fader Networks: Manipulating Images by Sliding Attributes

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images.

Energy Harvesting Networks with General Utility Functions: Near Optimal Online Policies

A Learning Based Optimal Human Robot Collaboration with Linear Temporal Logic Constraints

Localization-protected order in spin chains with non-Abelian discrete symmetries

Improved Algorithms for MST and Metric-TSP Interdiction

A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit

A problem on partial sums in abelian groups

Biased Importance Sampling for Deep Neural Network Training

Learning Time-Efficient Deep Architectures with Budgeted Super Networks

Interference Modeling for Cellular Networks under Beamforming Transmission

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

A Lower Bound for Nonadaptive, One-Sided Error Testing of Unateness of Boolean Functions over the Hypercube

The Sample Complexity of Online One-Class Collaborative Filtering

A Latent Trait Model for Multivariate Longitudinal Data With Two Sources of Measurement Error

Descriptions of Objectives and Processes of Mechanical Learning

Metropolis-Hastings reversiblizations of non-reversible Markov chains

Low-Rank Matrix Approximation in the Infinity Norm

Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

Tropical Combinatorial Nullstellensatz and Fewnomials Testing

Two monads on the category of graphs

Megapixel Size Image Creation using Generative Adversarial Networks

Blood capillaries and vessels segmentation in optical coherence tomography angiogram using fuzzy C-means and Curvelet transform

An Inertial Parallel and Asynchronous Fixed-Point Iteration for Convex Optimization

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Inexact Gradient Projection and Fast Data Driven Compressed Sensing

Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

Bayesian $l_0$ Regularized Least Squares

Low Subpacketization Schemes for Coded Caching

From patterned response dependency to structured covariate dependency: categorical-pattern-matching

Splines over integer quotient rings

Probabilistic response and rare events in Mathieu’s equation under correlated parametric excitation

Optimal repair of Reed-Solomon codes: Achieving the cut-set bound

Subjective fairness: Fairness is in the eye of the beholder

Superhuman Accuracy on the SNEMI3D Connectomics Challenge

Concentration inequalities for polynomials of contracting Ising models

Assessment of Future Changes in Intensity-Duration-Frequency Curves for Southern Ontario using North American (NA)-CORDEX Models with Nonstationary Methods

Diversified Top-k Partial MaxSAT Solving

Cohen-Macaulay vertex-weighted digraphs

Teaching Machines to Describe Images via Natural Language Feedback

On the Hausdorff dimension of pinned distance sets

Semantic Refinement GRU-based Neural Language Generation for Spoken Dialogue Systems

Scalable Generalized Linear Bandits: Online Computation and Hashing

Faster Spatially Regularized Correlation Filters for Visual Tracking

Asymptotic Outage Analysis of HARQ-IR over Time-Correlated Nakagami-$m$ Fading Channels

Order preserving pattern matching on trees and DAGs

Shape and Positional Geometry of Multi-Object Configurations

Cross-modal Common Representation Learning by Hybrid Transfer Network

Coding Method for Parallel Iterative Linear Solver

Stability, shards, and preprojective algebras

Woon’s tree and sums over compositions

Network Capacity Bound for Personalized PageRank in Multimodal Networks

A spectral characterisation of t-designs

Efficient learning with robust gradient descent

Characterization of the community structure in a large-scale production network in Japan

On the super edge-magicness of graphs of equal order and size

Depth Structure Preserving Scene Image Generation

On pre-Hamiltonian cycles in balanced bipartite digraphs

Partition-free families of sets

Item-Item Music Recommendations With Side Information

Selling Complementary Goods: Dynamics, Efficiency and Revenue

An Effective Approach for Point Clouds Registration Based on the Hard and Soft Assignments

${\mathcal L}^1$ limit solutions in impulsive control

Automatic Differentiation using Constraint Handling Rules in Prolog

A sufficient condition for pre-Hamiltonian cycles in bipartite digraphs

Optimality conditions for minimizers at infinity in polynomial programming

The ELEGANT NMR Spectrometer

Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning

Polish Read Speech Corpus for Speech Tools and Services

Enumeration of Restricted Words and Linear Recurrence Equations

Sesqui-type branching processes

Learning to Compute Word Embeddings on the Fly

On the Bernstein-Von Mises Theorem for High Dimensional Nonlinear Bayesian Inverse Problems

Transfer Learning for Speech Recognition on a Budget

Data Analysis in Multimedia Quality Assessment: Revisiting the Statistical Tests

Sinkhorn-AutoDiff: Tractable Wasserstein Learning of Generative Models

Completing graphs to metric spaces

Modeling and Design of Millimeter-Wave Networks for Highway Vehicular Communication

Multi-point Codes from the GGS Curves

Using of heterogeneous corpora for training of an ASR system

TransFlow: Unsupervised Motion Flow by Joint Geometric and Pixel-level Estimation

One button machine for automating feature engineering in relational databases

A Composition Theorem for Randomized Query Complexity

Triangle-free graphs of tree-width t are ceil((t + 3)/2)-colorable

The Size of the Sync Basin Revisited

Approximating first-passage time distributions via sequential Bayesian computation

Optimal Slotted ALOHA under Delivery Deadline Constraint for Multiple-Packet Reception

Grounding Symbols in Multi-Modal Instructions

Enhancing workflow-nets with data for trace completion

Spectral gaps of simplicial complexes without large missing faces

Discovering Discrete Latent Topics with Neural Variational Inference

More new classes of permutation trinomials over $\mathbb{F}_{2^n}$

A new stochastic STDP Rule in a neural Network Model

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules

New goodness-of-fit diagnostics for conditional discrete response models

Blind nonnegative source separation using biological neural networks

Equivariant Quantum Cohomology of the Odd Symplectic Grassmannian

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Multiscale unfolding of real networks by geometric renormalization

Line Profile Based Segmentation Algorithm for Touching Corn Kernels

Benchmark problems for phase retrieval

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Large deviations in presence of small noise for delay differential equations at an instability