Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

We propose a new algorithm for training generative adversarial networks to jointly learn latent codes for both identities (e.g. individual humans) and observations (e.g. specific photographs). In practice, this means that by fixing the identity portion of latent codes, we can generate diverse images of the same subject, and by fixing the observation portion we can traverse the manifold of subjects while maintaining contingent aspects such as lighting and pose. Our algorithm features a pairwise training scheme in which each sample from the generator consists of two images with a common identity code. Corresponding samples from the real dataset consist of two distinct photographs of the same subject. In order to fool the discriminator, the generator must produce images that are both photorealistic, distinct, and appear to depict the same person. We augment both the DCGAN and BEGAN approaches with Siamese discriminators to accommodate pairwise training. Experiments with human judges and an off-the-shelf face verification system demonstrate our algorithm’s ability to generate convincing, identity-matched photographs.

Prediction Measures in Nonlinear Beta Regression Models

Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely P^2 statistics. We aim at using both prediction coefficients and goodness-of-fit measures as a scheme of model select criteria. In this sense, we introduce for beta regression models under nonlinearity the use of the model selection criteria based on robust pseudo-R^2 statistics. Monte Carlo simulation results on the finite sample behavior of both prediction-based model selection criteria P^2 and the pseudo-R^2 statistics are provided. Three applications for real data are presented. The linear application relates to the distribution of natural gas for home usage in S\~ao Paulo, Brazil. Faced with the economic risk of too overestimate or to underestimate the distribution of gas has been necessary to construct prediction limits and to select the best predicted and fitted model to construct best prediction limits it is the aim of the first application. Additionally, the two nonlinear applications presented also highlight the importance of considering both goodness-of-predictive and goodness-of-fit of the competitive models.

Variable Screening for High Dimensional Time Series

Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid sub-Gaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) (Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008) 849-911]) for high dimensional linear models with dependent and/or heavy tailed covariates and errors. We also introduce a generalized least squares screening (GLSS) procedure which utilizes the serial correlation present in the data. By utilizing this serial correlation when estimating our marginal effects, GLSS is shown to outperform SIS in many cases. For both procedures we prove sure screening properties, which depend on the moment conditions, and the strength of dependence in the error and covariate processes, amongst other factors. Additionally, combining these screening procedures with the adaptive Lasso is analyzed. Dependence is quantified by functional dependence measures (Wu [Proc. Natl. Acad. Sci. USA 102 (2005) 14150-14154]), and the results rely on the use of Nagaev type and exponential inequalities for dependent random variables. We also conduct simulations to demonstrate the finite sample performance of these procedures, and include a real data application of forecasting the US inflation rate.

Living Together: Mind and Machine Intelligence

In this paper we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to \emph{embodiment factors}. We define embodiment factors as the ratio between an entity’s ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is \emph{not} sentient intelligence we should fear but \emph{non-sentient} intelligence.

Convergence Analysis of Batch Normalization for Deep Neural Nets

Batch normalization (BN) is very effective in accelerating the convergence of a neural network training phase that it has become a common practice. We propose a generalization of BN, the diminishing batch normalization (DBN) algorithm. We provide an analysis of the convergence of the DBN algorithm that converges to a stationary point with respect to trainable parameters. We analyze a two layer model with linear activation. The main challenge of the analysis is the fact that some parameters are updated by gradient while others are not. In the numerical experiments, we use models with more layers and ReLU activation. We observe that DBN outperforms the original BN algorithm on MNIST, NI and CIFAR-10 datasets with reasonable complex FNN and CNN models.

Wasserstein Learning of Deep Generative Point Process Models

Point processes are becoming very popular in modeling asynchronous sequential data due to their sound mathematical foundation and strength in modeling a variety of real-world phenomena. Currently, they are often characterized via intensity function which limits model’s expressiveness due to unrealistic assumptions on its parametric form used in practice. Furthermore, they are learned via maximum likelihood approach which is prone to failure in multi-modal distributions of sequences. In this paper, we propose an intensity-free approach for point processes modeling that transforms nuisance processes to a target one. Furthermore, we train the model using a likelihood-free leveraging Wasserstein distance between point processes. Experiments on various synthetic and real-world data substantiate the superiority of the proposed point process model over conventional ones.

Compressing Recurrent Neural Network with Tensor Train

Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems. However, most of the state-of-the-art RNNs have millions of parameters and require many computational resources for training and predicting new data. This paper proposes an alternative RNN model to reduce the number of parameters significantly by representing the weight parameters based on Tensor Train (TT) format. In this paper, we implement the TT-format representation for several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We compare and evaluate our proposed RNN model with uncompressed RNN model on sequence classification and sequence prediction tasks. Our proposed RNNs with TT-format are able to preserve the performance while reducing the number of RNN parameters significantly up to 40 times smaller.

Patchnet: Interpretable Neural Networks for Image Classification

The ability to visually understand and interpret learned features from complex predictive models is crucial for their acceptance in sensitive areas such as health care. To move closer to this goal of truly interpretable complex models, we present PatchNet, a network that restricts global context for image classification tasks in order to easily provide visual representations of learned texture features on a predetermined local scale. We demonstrate how PatchNet provides visual heatmap representations of the learned features, and we mathematically analyze the behavior of the network during convergence. We also present a version of PatchNet that is particularly well suited for lowering false positive rates in image classification tasks. We apply PatchNet to the classification of textures from the Describable Textures Dataset and to the ISBI-ISIC 2016 melanoma classification challenge.

TwiInsight: Discovering Topics and Sentiments from Social Media Datasets

Social media platforms contain a great wealth of information which provides opportunities for us to explore hidden patterns or unknown correlations, and understand people’s satisfaction with what they are discussing. As one showcase, in this paper, we present a system, TwiInsight which explores the insight of Twitter data. Different from other Twitter analysis systems, TwiInsight automatically extracts the popular topics under different categories (e.g., healthcare, food, technology, sports and transport) discussed in Twitter via topic modeling and also identifies the correlated topics across different categories. Additionally, it also discovers the people’s opinions on the tweets and topics via the sentiment analysis. The system also employs an intuitive and informative visualization to show the uncovered insight. Furthermore, we also develop and compare six most popular algorithms – three for sentiment analysis and three for topic modeling.

Sluice networks: Learning what to share between loosely related tasks

Multi-task learning is partly motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks (and are typically not aware of the transfer). In machine learning, it is hard to estimate if sharing will lead to improvements; especially if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing — including which parts of the models to share. Our framework goes beyond and generalizes over previous proposals in enabling hard or soft sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs from natural language processing, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We analyze when the architecture is particularly helpful, as well as its ability to fit noise. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing, and b) while sluice networks easily fit noise, they are robust across domains in practice.

Visualizing LSTM decisions

Long Short-Term Memory (LSTM) recurrent neural networks are renowned for being uninterpretable ‘black boxes’. In the medical domain where LSTMs have shown promise, this is specifically concerning because it is imperative to understand the decisions made by machine learning models in such acute situations. This study employs techniques used in the Convolutional Neural Network domain to elucidate the operations that LSTMs perform on time series. The visualization techniques include input saliency by means of occlusion and derivatives, class mode visualization, and temporal outputs. Moreover, we demonstrate that LSTMs appear to extract features similar to those extracted by wavelets. It was found that deriving the inputs for saliency is a poor approximation and occlusion is a better approach. Moreover, analyzing LSTMs on different sets of data provide novel interpretations.

Look, Listen and Learn

We consider the question: what can be learnt by looking at and listening to a large amount of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself — the correspondence between the visual and the audio streams, and we introduce a novel ‘Audio-Visual Correspondence’ learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good vision and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art self-supervised approaches on ImageNet classification. We also demonstrate that the network is able to localize objects in both modalities, as well as perform fine-grained recognition tasks.

Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

We examine the Bayes-consistency of a recently proposed 1-nearest-neighbor-based multiclass learning algorithm. This algorithm is derived from sample compression bounds and enjoys the statistical advantages of tight, fully empirical generalization bounds, as well as the algorithmic advantages of runtime and memory savings. We prove that this algorithm is strongly Bayes-consistent in metric spaces with finite doubling dimension — the first consistency result for an efficient nearest-neighbor sample compression scheme. Rather surprisingly, we discover that this algorithm continues to be Bayes-consistent even in a certain infinite-dimensional setting, in which the basic measure-theoretic conditions on which classic consistency proofs hinge are violated. This is all the more surprising, since it is known that k-NN is not Bayes-consistent in this setting. We pose several challenging open problems for future research.

Learning to Succeed while Teaching to Fail: Privacy in Closed Machine Learning Systems

Security, privacy, and fairness have become critical in the era of data science and machine learning. More and more we see that achieving universally secure, private, and fair systems is practically impossible. We have seen for example how generative adversarial networks can be used to learn about the expected private training data; how the exploitation of additional data can reveal private information in the original one; and how what looks like unrelated features can teach us about each other. Confronted with this challenge, in this paper we open a new line of research, where the security, privacy, and fairness is learned and used in a closed environment. The goal is to ensure that a given entity (e.g., the company or the government), trusted to infer certain information with our data, is blocked from inferring protected information from it. For example, a hospital might be allowed to produce diagnosis on the patient (the positive task), without being able to infer the gender of the subject (negative task). Similarly, a company can guarantee that internally it is not using the provided data for any undesired task, an important goal that is not contradicting the virtually impossible challenge of blocking everybody from the undesired task. We design a system that learns to succeed on the positive task while simultaneously fail at the negative one, and illustrate this with challenging cases where the positive task is actually harder than the negative one being blocked. Fairness, to the information in the negative task, is often automatically obtained as a result of this proposed approach. The particular framework and examples open the door to security, privacy, and fairness in very important closed scenarios, ranging from private data accumulation companies like social networks to law-enforcement and hospitals.

Detecting Adversarial Examples in Deep Networks with Adaptive Noise Reduction

Deep neural networks (DNNs) play a key role in many applications. Unsurprisingly, they also became a potential attack target of adversaries. Some studies have demonstrated DNN classifiers can be fooled by the adversarial example, which is crafted via introducing some perturbations into an original sample. Accordingly, some powerful defense techniques were proposed against adversarial examples. However, existing defense techniques require modifying the target model or depend on the prior knowledge of attack techniques to different degrees. In this paper, we propose a straightforward method for detecting adversarial image examples. It doesn’t require any prior knowledge of attack techniques and can be directly deployed into unmodified off-the-shelf DNN models. Specifically, we consider the perturbation to images as a kind of noise and introduce two classical image processing techniques, scalar quantization and smoothing spatial filter, to reduce its effect. The image two-dimensional entropy is employed as a metric to implement an adaptive noise reduction for different kinds of images. As a result, the adversarial example can be effectively detected by comparing the classification results of a given sample and its denoised version. Thousands of adversarial examples against some state-of-the-art DNN models are used to evaluate the proposed method, which are crafted with different attack techniques. The experiment shows that our detection method can achieve an overall recall of 93.73% and an overall precision of 95.45% without referring to any prior knowledge of attack techniques.

Continual Learning in Generative Adversarial Nets

Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.

Community Detection with Graph Neural Networks

We study data-driven methods for community detection in graphs. This estimation problem is typically formulated in terms of the spectrum of certain operators, as well as via posterior inference under certain probabilistic graphical models. Focusing on random graph families such as the Stochastic Block Model, recent research has unified these two approaches, and identified both statistical and computational signal-to-noise detection thresholds. We embed the resulting class of algorithms within a generic family of graph neural networks and show that they can reach those detection thresholds in a purely data-driven manner, without access to the underlying generative models and with no parameter assumptions. The resulting model is also tested on real datasets, requiring less computational steps and performing significantly better than rigid parametric models.

Asymmetry-Induced Synchronization in Oscillator Networks

Superbosonization in disorder and chaos: The role of anomalies

Upstream Causes of Downstream Effects

Joint Uplink/Downlink Resource Allocation and Data Offloading in OFDMA-Based Wireless Powered HetNets

Constraining the clustering transition for colorings of sparse random graphs

Brazilian Network of PhDs Working with Probability and Statistics

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

Hodge theory in combinatorics

Compatible extensions and consistent closures: a fuzzy approach

pix2code: Generating Code from a Graphical User Interface Screenshot

Universal 3D Wearable Fingerprint Targets: Advancing Fingerprint Reader Evaluations

Exponential decay of connection probabilities for subcritical Voronoi percolation in $\mathbb{R}^d$

Predicting stock market movements using network science: An information theoretic approach

On Central-Peripheral Appendage Numbers of Uniform Central Graphs

Liquid Cloud Storage

Capacitated Bounded Cardinality Hub Routing Problem: Model and Solution Algorithm

Multivariate generalized Pareto distributions: parametrizations, representations, and properties

Quadratic obstructions to small-time local controllability for scalar-input differential systems

Fair Allocation based on Diminishing Differences

Permutation Tests for Infection Graphs

GP-Unet: Lesion Detection from Weak Labels with a 3D Regression Network

Grouped multivariate and functional time series forecasting: An application to annuity pricing

Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding

Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

Training with Confusion for Fine-Grained Visual Classification

Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

Selective inference for effect modification via the lasso

The theory of Turing patterns on time varying networks

Parallel Stochastic Gradient Descent with Sound Combiners

On the consistency between model selection and link prediction in networks

Can Everyone Benefit from Social Integration?

Compressed and Penalized Linear Regression

Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach

Poincaré Embeddings for Learning Hierarchical Representations

Capacity of Molecular Channels with Imperfect Particle-Intensity Modulation and Detection

Unrolled Optimization with Deep Priors

Spectral Simplicity of Apparent Complexity, Part I: The Nondiagonalizable Metadynamics of Prediction

Detection Algorithms for Communication Systems Using Deep Learning

Learning multiple visual domains with residual adapters

An Elementary Proof for the Structure of Derivatives in Probability Measures

Neural Network Memory Architectures for Autonomous Robot Navigation

Asymptotically optimal codebooks based on generalized Jacobi sums

Ambiguity set and learning via Bregman and Wasserstein

A divide and conquer method for symbolic regression

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

On Controllable Abundance Of Saturated-input Linear Discrete Systems

Multiple Images Recovery Using a Single Affine Transformation

Cofactors and eigenvectors of banded Toeplitz matrices: Trench formulas via skew Schur polynomials

Universally Optimal Designs for the Two-dimensional Interference Model

Learning from partial correction

Effective injury prediction in professional soccer with GPS data and machine learning

Visual Semantic Planning using Deep Successor Representations

An Investigation of the Different Levels of Poverty and the Corresponding Variance in Student Academic Prosperity

Peng’s stochastic maximum principle for mean-field SDEs involving laws with jumps

Universal Style Transfer via Feature Transforms

Two bounds for generalized $3$-connectivity of Cartesian product graphs

Local Monotonic Attention Mechanism for End-to-End Speech Recognition

An Improved Secretive Coded Caching Scheme exploiting Common Demands

acebayes: An R Package for Bayesian Optimal Design of Experiments via Approximate Coordinate Exchange

Towards seamless multi-view scene analysis from satellite to street-level

FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Semi-Bandits with Knapsacks

A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

Distributed Synthesis for Parameterized Temporal Logics

Noncommutative Bell polynomials and the dual immaculate basis

Consistent Multitask Learning with Nonlinear Output Relations

Black-Box Attacks against RNN based Malware Detection Algorithms

Long time behaviour and mean-field limit of Atlas models

Dimension improvement in Dhar’s refutation of the Eden conjecture

Topological dimension tunes activity patterns in hierarchical modular network models

Capacity Outer Bound and Degrees of Freedom of Wiener Phase Noise Channels with Oversampling

Total weight choosability for Halin graphs

Reference String Extraction Using Line-Based Conditional Random Fields

Evolutionary game of coalition building under external pressure

Computational Methods for Path-based Robust Flows

Stochastic spikes and strong noise limits of stochastic differential equations

Transformation of Python Applications into Function-as-a-Service Deployments

Distributed Testing of Conductance

Disorder-induced dephasing in backscattering-free quantum transport

Correlation Alignment by Riemannian Metric for Domain Adaptation

Unmasking the abnormal events in video

A generalization of Kleiner’s Theorem to Measures with exponential tail

Logical Learning Through a Hybrid Neural Network with Auxiliary Inputs

Salient Object Detection with Semantic Priors

Unbiasing Truncated Backpropagation Through Time

Parallel Accelerated Vector Similarity Calculations for Genomics Applications

Parallel Accelerated Custom Correlation Coefficient Calculations for Genomics Applications

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

XOR-Sampling for Network Design with Correlated Stochastic Events

Qualification Conditions in Semi-algebraic Programming

Reduced α-stable dynamics for multiple time scale systems forced with correlated additive and multiplicative Gaussian white noise

3D Convolutional Neural Networks for Brain Tumor Segmentation: A Comparison of Multi-resolution Architectures

Randomized Composable Coresets for Matching and Vertex Cover

Enhanced Experience Replay Generation for Efficient Reinforcement Learning

Generalized Pascal triangle for binomial coefficients of words

Optimization in large graphs: Toward a better future?

How hard can it be? Estimating the difficulty of visual search in an image

Algorithms and hardness results for happy coloring problems

Music Playlist Continuation by Learning from Hand-Curated Examples and Song Features

Sensitivity analysis of the utility maximization problem with respect to model perturbations

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Coupling of Brownian motions in Banach spaces

Data and uncertainty in extreme risks; a nonlinear expectations approach

D-vine quantile regression with discrete variables

Improvements to Frank-Wolfe optimization for multi-detector multi-object tracking

Nordhaus-Gaddum-type theorem for conflict-free connection number of graphs

Identification and isotropy characterization of deformed random fields through excursion sets

Explaining Transition Systems through Program Induction

Behavior of digital sequences through exotic numeration systems

Exact adaptive confidence intervals for linear regression coefficients

A Note on Uniform Integrability of Random Variables in a Probability Space and Sublinear Expectation Space

A Colonel Blotto Game for Interdependence-Aware Cyber-Physical Systems Security in Smart Cities

Distributed Precoding Systems in Multi-Gateway Multibeam Satellites: Regularization and Coarse Beamforming

On random stable partitions

Counting the number of non-zero coefficients in rows of generalized Pascal triangles

Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

Efficient and principled score estimation

A Coalgebraic Paige-Tarjan Algorithm

A Short Proof for a Lower Bound on the Zero Forcing Number

Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

Classification of Aerial Photogrammetric 3D Point Clouds

An evolutionary strategy for DeltaE – E identification

Better Text Understanding Through Image-To-Text Transfer

A Derandomized Algorithm for RP-ADMM with Symmetric Gauss-Seidel Method

Exponential error rates of SDP for block models: Beyond Grothendieck’s inequality

Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls

Information Theoretic Principles of Universal Discrete Denoising

Rank-Metric Codes and Zeta Functions

Killing (absorption) versus survival in random motion

Ridesourcing Car Detection by Transfer Learning

Rare Events of Transitory Queues

Stable Limit Theorems for Empirical Processes under Conditional Neighborhood Dependence

Reinforcement Learning with a Corrupted Reward Channel

Scaling relations in the diffusive infiltration in fractals

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Continuous State-Space Models for Optimal Sepsis Treatment – a Deep Reinforcement Learning Approach

Symbolic LTLf Synthesis

Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues

Deep Learning of Grammatically-Interpretable Representations Through Question-Answering

Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning

Preserving Privacy while Broadcasting: $k$-Limited-Access Schemes

On The Multiparty Communication Complexity of Testing Triangle-Freeness

Thinking Fast and Slow with Deep Learning and Tree Search

Knowledge Acquisition, Representation \& Manipulation in Decision Support Systems

Stratification of Markov Chain Monte Carlo