Deep Learning for Physical Processes: Incorporating Prior Scientific Knowledge

We consider the use of Deep Learning methods for modeling complex phenomena like those occurring in natural physical processes. With the large amount of data gathered on these phenomena the data intensive paradigm could begin to challenge more traditional approaches elaborated over the years in fields like maths or physics. However, despite considerable successes in a variety of application domains, the machine learning field is not yet ready to handle the level of complexity required by such problems. Using an example application, namely Sea Surface Temperature Prediction, we show how general background knowledge gained from physics could be used as a guideline for designing efficient Deep Learning models. In order to motivate the approach and to assess its generality we demonstrate a formal link between the solution of a class of differential equations underlying a large family of physical phenomena and the proposed model. Experiments and comparison with series of baselines including a state of the art numerical approach is then provided.

Self-Similarity Based Time Warping

In this work, we explore the problem of aligning two time-ordered point clouds which are spatially transformed and re-parameterized versions of each other. This has a diverse array of applications such as cross modal time series synchronization (e.g. MOCAP to video) and alignment of discretized curves in images. Most other works that address this problem attempt to jointly uncover a spatial alignment and correspondences between the two point clouds, or to derive local invariants to spatial transformations such as curvature before computing correspondences. By contrast, we sidestep spatial alignment completely by using self-similarity matrices (SSMs) as a proxy to the time-ordered point clouds, since self-similarity matrices are blind to isometries and respect global geometry. Our algorithm, dubbed ‘Isometry Blind Dynamic Time Warping’ (IBDTW), is simple and general, and we show that its associated dissimilarity measure lower bounds the L1 Gromov-Hausdorff distance between the two point sets when restricted to warping paths. We also present a local, partial alignment extension of IBDTW based on the Smith Waterman algorithm. This eliminates the need for tedious manual cropping of time series, which is ordinarily necessary for global alignment algorithms to function properly.

Residual Gated Graph ConvNets

Graph-structured data such as functional brain networks, social networks, gene regulatory networks, communications networks have brought the interest in generalizing neural networks to graph domains. In this paper, we are interested to de- sign efficient neural network architectures for graphs with variable length. Several existing works such as Scarselli et al. (2009); Li et al. (2016) have focused on recurrent neural networks (RNNs) to solve this task. A recent different approach was proposed in Sukhbaatar et al. (2016), where a vanilla graph convolutional neural network (ConvNets) was introduced. We believe the latter approach to be a better paradigm to solve graph learning problems because ConvNets are more pruned to deep networks than RNNs. For this reason, we propose the most generic class of residual multi-layer graph ConvNets that make use of an edge gating mechanism, as proposed in Marcheggiani & Titov (2017). Gated edges appear to be a natural property in the context of graph learning tasks, as the system has the ability to learn which edges are important or not for the task to solve. We apply several graph neural models to two basic network science tasks; subgraph matching and semi-supervised clustering for graphs with variable length. Numerical results show the performances of the new model.

Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs

Organisations store huge amounts of data from multiple heterogeneous sources in the form of Knowledge Graphs (KGs). One of the ways to query these KGs is to use SPARQL queries over a database engine. Since SPARQL follows exact match semantics, the queries may return too few or no results. Recent works have proposed query relaxation where the query engine judiciously replaces a query predicate with similar predicates using weighted relaxation rules mined from the KG. The space of possible relaxations is potentially too large to fully explore and users are typically interested in only top-k results, so such query engines use top-k algorithms for query processing. However, they may still process all the relaxations, many of whose answers do not contribute towards top-k answers. This leads to computation overheads and delayed response times. We propose Spec-QP, a query planning framework that speculatively determines which relaxations will have their results in the top-k answers. Only these relaxations are processed using the top-k operators. We, therefore, reduce the computation overheads and achieve faster response times without adversely affecting the quality of results. We tested Spec-QP over two datasets – XKG and Twitter, to demonstrate the efficiency of our planning framework at reducing runtimes with reasonable accuracy for query engines supporting relaxations.

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural networks can approximate complex multivariate functions, they generally require a large number of training observations to obtain reasonable fits, unless one can learn the appropriate network structure. In this manuscript, we show that neural networks can be applied successfully to high-dimensional settings if the true function falls in a low dimensional subspace, and proper regularization is used. We propose fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features. In addition, we characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network: we show that the excess risk of this penalized estimator only grows with the logarithm of the number of input features; and we show that the weights of irrelevant features converge to zero. Via simulation studies and data analyses, we show that these sparse-input neural networks outperform existing nonparametric high-dimensional estimation methods when the data has complex higher-order interactions.

Event Representations with Tensor-based Compositions

Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.

Proximal Alternating Direction Network: A Globally Converged Deep Unrolling Framework

Deep learning models have gained great success in many real-world applications. However, most existing networks are typically designed in heuristic manners, thus lack of rigorous mathematical principles and derivations. Several recent studies build deep structures by unrolling a particular optimization model that involves task information. Unfortunately, due to the dynamic nature of network parameters, their resultant deep propagation networks do \emph{not} possess the nice convergence property as the original optimization scheme does. This paper provides a novel proximal unrolling framework to establish deep models by integrating experimentally verified network architectures and rich cues of the tasks. More importantly, we \emph{prove in theory} that 1) the propagation generated by our unrolled deep model globally converges to a critical-point of a given variational energy, and 2) the proposed framework is still able to learn priors from training data to generate a convergent propagation even when task information is only partially available. Indeed, these theoretical results are the best we can ask for, unless stronger assumptions are enforced. Extensive experiments on various real-world applications verify the theoretical convergence and demonstrate the effectiveness of designed deep models.

Genetic Algorithms for Evolving Deep Neural Networks

In recent years, deep learning methods applying unsupervised learning to train deep layers of neural networks have achieved remarkable results in numerous fields. In the past, many genetic algorithms based methods have been successfully applied to training neural networks. In this paper, we extend previous work and propose a GA-assisted method for deep learning. Our experimental results indicate that this GA-assisted approach improves the performance of a deep autoencoder, producing a sparser neural network.

Repulsion Loss: Detecting Pedestrians in a Crowd

Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in real-world scenarios. In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowd-robust localization. Our detector trained by repulsion loss outperforms all the state-of-the-art methods with a significant improvement in occlusion cases.

Understanding Deep Learning Generalization by Maximum Entropy

Deep learning achieves remarkable generalization capability with overwhelming number of model parameters. Theoretical understanding of deep learning generalization receives recent attention yet remains not fully explored. This paper attempts to provide an alternative understanding from the perspective of maximum entropy. We first derive two feature conditions that softmax regression strictly apply maximum entropy principle. DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle. The connection between DNN and maximum entropy well explains why typical designs such as shortcut and regularization improves model generalization, and provides instructions for future model development.

Detecting independence of random vectors II. Distance multivariance and Gaussian multivariance

We introduce two new measures for the dependence of n \ge 2 random variables: `distance multivariance’ and `total distance multivariance’. Both measures are based on the weighted L^2-distance of quantities related to the characteristic functions of the underlying random variables. They extend distance covariance (introduced by Szekely, Rizzo and Bakirov) and generalized distance covariance (introduced in part I) from pairs of random variables to n-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of n random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Based on our theoretical results, we present a test for independence of multiple random vectors which is consistent against all alternatives.

Detecting independence of random vectors I. Generalized distance covariance and Gaussian covariance

Distance covariance is a quantity to measure the dependence of two random vectors. We show that the original concept introduced and developed by Sz\’ekely, Rizzo and Bakirov can be embedded into a more general framework based on symmetric L\’evy measures and the corresponding real-valued continuous negative definite functions. The L\’evy measures replace the weight functions used in the original definition of distance covariance. All essential properties of distance covariance are preserved in this new framework and some proofs are streamlined. Form a practical point of view this allows less restrictive moment conditions on the underlying random variables and one can use other distance functions than the Euclidean distance, e.g. the Minkowski distance. Most importantly, it serves as the basic building block for distance multivariance, a quantity to measure and estimate dependence of multiple random vectors, which is introduced in the companion paper [Detecting independence of random vectors II: Distance multivariance and Gaussian multivariance] to the present article.

Hidden Tree Markov Networks: Deep and Wide Learning for Structured Data

The paper introduces the Hidden Tree Markov Network (HTN), a neuro-probabilistic hybrid fusing the representation power of generative models for trees with the incremental and discriminative learning capabilities of neural networks. We put forward a modular architecture in which multiple generative models of limited complexity are trained to learn structural feature detectors whose outputs are then combined and integrated by neural layers at a later stage. In this respect, the model is both deep, thanks to the unfolding of the generative models on the input structures, as well as wide, given the potentially large number of generative modules that can be trained in parallel. Experimental results show that the proposed approach can outperform state-of-the-art syntactic kernels as well as generative kernels built on the same probabilistic model as the HTN.

Visual and Textual Sentiment Analysis Using Deep Fusion Convolutional Neural Networks

Sentiment analysis is attracting more and more attentions and has become a very hot research topic due to its potential applications in personalized recommendation, opinion mining, etc. Most of the existing methods are based on either textual or visual data and can not achieve satisfactory results, as it is very hard to extract sufficient information from only one single modality data. Inspired by the observation that there exists strong semantic correlation between visual and textual data in social medias, we propose an end-to-end deep fusion convolutional neural network to jointly learn textual and visual sentiment representations from training examples. The two modality information are fused together in a pooling layer and fed into fully-connected layers to predict the sentiment polarity. We evaluate the proposed approach on two widely used data sets. Results show that our method achieves promising result compared with the state-of-the-art methods which clearly demonstrate its competency.

Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising

We design a novel network architecture for learning discriminative image models that are employed to efficiently tackle the problem of grayscale and color image denoising. Based on the proposed architecture, we introduce two different variants. The first network involves convolutional layers as a core component, while the second one relies instead on non-local filtering layers and thus it is able to exploit the inherent non-local self-similarity property of natural images. As opposed to most of the existing neural networks, which require the training of a specific model for each considered noise level, the proposed networks are able to handle a wide range of different noise levels, while they are very robust when the noise degrading the latent image does not match the statistics of the one used during training. The latter argument is supported by results that we report on publicly available images corrupted by unknown noise and which we compare against solutions obtained by alternative state-of-the-art methods. At the same time the introduced networks achieve excellent results under additive white Gaussian noise (AWGN), which are comparable to the current state-of-the-art network, while they depend on a more shallow architecture with the number of trained parameters being one order of magnitude smaller. These properties make the proposed networks ideal candidates to serve as sub-solvers on restoration methods that deal with general inverse imaging problems such as deblurring, demosaicking, superresolution, etc.

Adversarial Network Embedding

Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other high-order proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks.

Functional Map of the World

We present a new dataset, Functional Map of the World (fMoW), which aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features. The metadata provided with each image enables reasoning about location, time, sun angles, physical sizes, and other features when making predictions about objects in the image. Our dataset consists of over 1 million images from over 200 countries. For each image, we provide at least one bounding box annotation containing one of 63 categories, including a ‘false detection’ category. We present an analysis of the dataset along with baseline approaches that reason about metadata and temporal views. Our data, code, and pretrained models have been made publicly available.

Autoencoder Node Saliency: Selecting Relevant Latent Representations

The autoencoder is an artificial neural network model that learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA that are paired with the eigenvectors. We propose a novel supervised node saliency (SNS) method that ranks the hidden nodes by comparing class distributions of latent representations against a fixed reference distribution. The latent representations of a hidden node can be described using a one-dimensional histogram. We apply normalized entropy difference (NED) to measure the ‘interestingness’ of the histograms, and conclude a property for NED values to identify a good classifying node. By applying our methods to real data sets, we demonstrate the ability of SNS to explain what the trained autoencoders have learned.

Domain Generalization by Marginal Transfer Learning

Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy.

Non-local Neural Networks

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

The Combinatorics of Higher Derivatives of Implicit Functions
Robust Environmental Mapping by Mobile Sensor Networks
Optimistic Robust Optimization With Applications To Machine Learning
Non-Gaussian Autoregressive Processes with Tukey g-and-h Transformations
Treatment Effect Quantification for Time-to-event Endpoints — Estimands, Analysis Strategies, and beyond
Dropping Activation Outputs with Localized First-layer Deep Network for Enhancing User Privacy and Data Security
On k-neighbor separated permutations
Subgroup Identification and Interpretation with Bayesian Nonparametric Models in Health Care Claims Data
Path properties of the solution to the stochastic heat equation with Lévy noise
Transition density estimates for diagonal systems of SDEs driven by cylindrical $α$-stable processes
Convergence of Finite Element Methods for Singular Stochastic Control
On estimating the alphabet size of a discrete random source
Collective behavior of oscillating electric dipoles
Review on Parameter Estimation in HMRF
Unbiased Simulation for Optimizing Stochastic Function Compositions
Neural 3D Mesh Renderer
Edge Estimation with Independent Set Oracles
On Nearest Neighbors in Non Local Means Denoising
Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective
Chaos expansion of 2D parabolic Anderson model
CVXR: An R Package for Disciplined Convex Optimization
Pure state `really’ informationally complete with rank-1 POVM
Domination structure for number three
An Enhanced Middleware for Collaborative Privacy in IPTV Recommender Services
On the Distortion of Voting with Multiple Representative Candidates
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time
Non-spanning lattice 3-polytopes
Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards
Dynamic Distributed Storage for Scaling Blockchains
$S^4$Net: Single Stage Salient-Instance Segmentation
Groupwise Maximin Fair Allocation of Indivisible Goods
A deep learning-based method for relative location prediction in CT scan images
On statistical approaches to generate Level 3 products from remote sensing retrievals
Bounds on Fractional Repetition Codes using Hypergraphs
Generating Thematic Chinese Poetry with Conditional Variational Autoencoder
High-Dimensional Multivariate Posterior Consistency Under Global-Local Shrinkage Priors
Towards a More Reliable Privacy-preserving Recommender System
HybridTune: Spatio-temporal Data and Model Driven Performance Diagnosis for Big Data Systems
Multi-Image Semantic Matching by Mining Consistent Features
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method
Cross Temporal Recurrent Networks for Ranking Question Answer Pairs
Fullie and Wiselie: A Dual-Stream Recurrent Convolutional Attention Model for Activity Recognition
Asymptotic independence of regenerative processes with dependent cycles
The Pontryagin Maximum Principle in the Wasserstein Space
Massive MIMO for Drone Communications: Applications, Case Studies and Future Directions
Mondrian Processes for Flow Cytometry Analysis
Transferring Agent Behaviors from Videos via Motion GANs
Controllability under positivity constraints of semilinear heat equations
Induced subgraphs of graphs with large chromatic number. XI. Orientations
Asymptotically Minimax Robust Hypothesis Testing
JamBot: Music Theory Aware Chord Based Generation of Polyphonic Music with LSTMs
A two-dimensional decomposition approach for matrix completion through gossip
Regret Analysis for Continuous Dueling Bandit
Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images
Construction of asymptotically good locally repairable codes via automorphism groups of function fields
Multiplicity One property of The Length Spectra of Simple Regular Periodic Graphs
Wick order, spreadability and exchangeability for monotone commutation relations
Approximating Geometric Knapsack via L-packings
Invariant measures of discrete interacting particle systems: Algebraic aspects
Residual Parameter Transfer for Deep Domain Adaptation
Partially Observed Functional Data: The Case of Systematically Missing Parts
Total Variation-Based Dense Depth from Multi-Camera Array
The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack
On the Turán number of ordered forests
Using stochastic computation graphs formalism for optimization of sequence-to-sequence model
A New Approach for Solving the Market Clearing Problem With Uniform Purchase Price and Curtailable Block Orders
Variational Probability Flow for Biologically Plausible Training of Deep Neural Networks
Non-uniform Replication
The Turan number of 2P_7
On $P_5$-free Chordal bipartite graphs
The Hidden Binary Search Tree:A Balanced Rotation-Free Search Tree in the AVL RAM Model
Model-based Clustering with Sparse Covariance Matrices
Continuity equation in LlogL for the 2D Euler equations under the enstrophy measure
Beyond Accuracy Optimization: On the Value of Item Embeddings for Student Job Recommendations
Data Assimilation for a Geological Process Model Using the Ensemble Kalman Filter
Receptive Field Block Net for Accurate and Fast Object Detection
Localisation in a growth model with interaction
Ultra-Reliable Low Latency Communication (URLLC) using Interface Diversity
A Geometric Approach to Spectral Analysis
Energy-Efficient Transmission Strategies for CoMP Downlink – Overview, Extension, and Numerical Comparison
Presentations of the saturated cluster modular groups of finite mutation type $X_6$ and $X_7$
Hierarchical internal representation of spectral features in deep convolutional networks trained for EEG decoding
Efficient Multi-Person Pose Estimation with Provable Guarantees
Uniqueness of Dirichlet forms related to infinite systems of interacting Brownian motions
Why ‘Redefining Statistical Significance’ Will Not Improve Reproducibility and Could Make the Replication Crisis Worse
Approaching Miscorrection-free Performance of Product and Generalized Product Codes
Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity
On the EM-Tau algorithm: a new EM-style algorithm with partial E-steps
Evaluation of bioinspired algorithms for the solution of the job scheduling problem
Efficient Implementation of a Recognition System Using the Cortex Ventral Stream Model
A smartphone application to measure the quality of pest control spraying machines via image analysis
Discussion among Different Methods of Updating Model Filter in Object Tracking
Robust Object Tracking Based on Self-adaptive Search Area
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
Application of generative autoencoder in de novo molecular design
Approximation Algorithms for Rectangle Packing Problems (PhD Thesis)
Accurate Semidefinite Programming Models for Optimal Power Flow in Distribution Systems
Mobility and Popularity-Aware Coded Small-Cell Caching
Integrable Combinatorics
Dynamic topologies of activity-driven temporal networks with memory
Universal minimal flows of generalized Ważewski dendrites
Revisiting Connected Vertex Cover: FPT Algorithms and Lossy Kernels
Constructive Preference Elicitation over Hybrid Combinatorial Spaces
Training large margin host-pathogen protein-protein interaction predictors
SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes
A New Algorithm to Fit Exponential Decays
Effective Strategies in Zero-Shot Neural Machine Translation
Quantifying Performance of Bipedal Standing with Multi-channel EMG
Two-Archive Evolutionary Algorithm for Constrained Multi-Objective Optimization
Effective Use of Bidirectional Language Modeling for Medical Named Entity Recognition
Optimal Sleeping Mechanism for Multiple Servers with MMPP-Based Bursty Traffic Arrival
Asymptotic Close To Optimal Joint Resource Allocation and Power Control in the Uplink of Two-cell Networks
10Sent: A Stable Sentiment Analysis Method Based on the Combination of Off-The-Shelf Approaches
Dimension Drop for Transient Random Walks on Galton-Watson Trees in Random Environments
Kullback-Leibler Principal Component for Tensors is not NP-hard
Aperture Supervision for Monocular Depth Estimation
Randomization Bias in Field Trials to Evaluate Targeting Methods
Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Cellular Automata Simulation on FPGA for Training Neural Networks with Virtual World Imagery
Time-Limited Toeplitz Operators on Abelian Groups: Applications in Information Theory and Subspace Approximation
Fine-Grained I/O Complexity via Reductions: New lower bounds, faster algorithms, and a time hierarchy
Local cohomology and the multi-graded regularity of FI$^m$-modules
Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation with Advanced LIGO Data
A Compositional Treatment of Iterated Open Games
WAYLA – Generating Images from Eye Movements
Low-Complexity Integer-Forcing Methods for Block Fading MIMO Multiple-Access Channels