**Learning Equations for Extrapolation and Control**

We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.

**Identifying Causal Effects with the R Package causaleffect**

Do-calculus is concerned with estimating the interventional distribution of an action from the observed joint probability distribution of the variables in a given causal structure. All identifiable causal effects can be derived using the rules of do-calculus, but the rules themselves do not give any direct indication whether the effect in question is identifiable or not. Shpitser and Pearl constructed an algorithm for identifying joint interventional distributions in causal models, which contain unobserved variables and induce directed acyclic graphs. This algorithm can be seen as a repeated application of the rules of do-calculus and known properties of probabilities, and it ultimately either derives an expression for the causal distribution, or fails to identify the effect, in which case the effect is non-identifiable. In this paper, the R package causaleffect is presented, which provides an implementation of this algorithm. Functionality of causaleffect is also demonstrated through examples.

**Simplifying Probabilistic Expressions in Causal Inference**

Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the R package causaleffect.

**Evaluating Ex Ante Counterfactual Predictions Using Ex Post Causal Inference**

We derive a formal, decision-based method for comparing the performance of counterfactual treatment regime predictions using the results of experiments that give relevant information on the distribution of treated outcomes. Our approach allows us to quantify and assess the statistical significance of differential performance for optimal treatment regimes estimated from structural models, extrapolated treatment effects, expert opinion, and other methods. We apply our method to evaluate optimal treatment regimes for conditional cash transfer programs across countries where predictions are generated using data from experimental evaluations in other countries and pre-program data in the country of interest.

**Neural Ordinary Differential Equations**

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

**Neural Code Comprehension: A Learnable Representation of Code Semantics**

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. However, none of the existing methods are sufficient to comprehend program semantics robustly, due to structural features such as function calls, branching, and interchangeable order of statements. In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings qualitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that with a single RNN architecture and pre-trained fixed embeddings, inst2vec outperforms specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art.

**Forest Packing: Fast, Parallel Decision Forests**

Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool because they provide insight into model operation that is critical to interpreting learned results. While decision forests are trivially parallelizable, the traversals of tree data structures incur many random memory accesses and are very slow. We present memory packing techniques that reorganize learned forests to minimize cache misses during classification. The resulting layout is hierarchical. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in cache lines. We extend this layout with out-of-order execution and cache-line prefetching to increase memory throughput. Together, these optimizations increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation.

**Learning from Chunk-based Feedback in Neural Machine Translation**

We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute.

**SMarTplan: a Task Planner for Smart Factories**

Smart factories are on the verge of becoming the new industrial paradigm, wherein optimization permeates all aspects of production, from concept generation to sales. To fully pursue this paradigm, flexibility in the production means as well as in their timely organization is of paramount importance. AI is planning a major role in this transition, but the scenarios encountered in practice might be challenging for current tools. Task planning is one example where AI enables more efficient and flexible operation through an online automated adaptation and rescheduling of the activities to cope with new operational constraints and demands. In this paper we present SMarTplan, a task planner specifically conceived to deal with real-world scenarios in the emerging smart factory paradigm. Including both special-purpose and general-purpose algorithms, SMarTplan is based on current automated reasoning technology and it is designed to tackle complex application domains. In particular, we show its effectiveness on a logistic scenario, by comparing its specialized version with the general purpose one, and extending the comparison to other state-of-the-art task planners.

**Instance-Level Explanations for Fraud Detection: A Case Study**

Fraud detection is a difficult problem that can benefit from predictive modeling. However, the verification of a prediction is challenging; for a single insurance policy, the model only provides a prediction score. We present a case study where we reflect on different instance-level model explanation techniques to aid a fraud detection team in their work. To this end, we designed two novel dashboards combining various state-of-the-art explanation techniques. These enable the domain expert to analyze and understand predictions, dramatically speeding up the process of filtering potential fraud cases. Finally, we discuss the lessons learned and outline open research issues.

**Restricted Boltzmann Machines: Introduction and Review**

The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.

**Deep Neural Decision Trees**

Deep neural networks have been proven powerful at processing perceptual data, such as images and audio. However for tabular data, tree-based models are more popular. A nice property of tree-based models is their natural interpretability. In this work, we present Deep Neural Decision Trees (DNDT) — tree models realised by neural networks. A DNDT is intrinsically interpretable, as it is a tree. Yet as it is also a neural network (NN), it can be easily implemented in NN toolkits, and trained with gradient descent rather than greedy splitting. We evaluate DNDT on several tabular datasets, verify its efficacy, and investigate similarities and differences between DNDT and vanilla decision trees. Interestingly, DNDT self-prunes at both split and feature-level.

**A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress**

Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.

**Tensor-Tensor Product Toolbox**

Tensors are higher-order extensions of matrices. In recent work [Kilmer and Martin, 2011], the authors introduced the notion of the t-product, a generalization of matrix multiplication for tensors of order three. The multiplication is based on a convolution-like operation, which can be implemented efficiently using the Fast Fourier Transform (FFT). Based on t-product, there has a similar linear algebraic structure of tensors to matrices. For example, there has the tensor SVD (t-SVD) which is computable. By using some properties of FFT, we have a more efficient way for computing t-product and t-SVD in [C. Lu, et al., 2018]. We develop a Matlab toolbox to implement several basic operations on tensors based on t-product. The toolbox is available at

https://…/tproduct.

**In situ TensorView: In situ Visualization of Convolutional Neural Networks**

Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, visualization tools like Paraview have long been utilized to provide insights and understandings. We present in situ TensorView to visualize the training and functioning of CNNs as if they are systems of scientific simulations. In situ TensorView is a loosely coupled in situ visualization open framework that provides multiple viewers to help users to visualize and understand their networks. It leverages the capability of co-processing from Paraview to provide real-time visualization during training and predicting phases. This avoid heavy I/O overhead for visualizing large dynamic systems. Only a small number of lines of codes are injected in TensorFlow framework. The visualization can provide guidance to adjust the architecture of networks, or compress the pre-trained networks. We showcase visualizing the training of LeNet-5 and VGG16 using in situ TensorView.

**Meta Continual Learning**

Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of the forgetting problem is to constrain parameters that are important to previous tasks to stay close to the optimal parameters. Recently, multiple competitive approaches for computing the importance of the parameters with respect to the previous tasks have been presented. In this paper, we propose a learning to optimize algorithm for mitigating catastrophic forgetting. Instead of trying to formulate a new constraint function ourselves, we propose to train another neural network to predict parameter update steps that respect the importance of parameters to the previous tasks. In the proposed meta-training scheme, the update predictor is trained to minimize loss on a combination of current and past tasks. We show experimentally that the proposed approach works in the continual learning setting.

• A Graph-Theoretic Analysis of Distributed Replicator Dynamic

• Relating the cut distance and the weak* topology for graphons

• State equation from the spectral structure of human brain activity

• GOE Statistics for Levy Matrices

• Quadratic Approximation of Generalized Tribonacci Sequences

• No Threshold graphs are cospectral

• Records from partial comparisons and discrete approximations

• Deterministic $O(1)$-Approximation Algorithms to 1-Center Clustering with Outliers

• Faster SGD training by minibatch persistency

• Opportunistic Scheduling in Underlay Cognitive Radio based Systems: User Selection Probability Analysis

• Statistical Optimal Transport via Geodesic Hubs

• Couplings for determinantal point processes and their reduced Palm distributions with a view to quantifying repulsiveness

• Reducing Property Graph Queries to Relational Algebra for Incremental View Maintenance

• Quantum Nash equilibrium in the thermodynamic limit

• A Reputation System for Artificial Societies

• Movement-efficient Sensor Deployment in Wireless Sensor Networks with Limited Communication Range

• Rate-Memory Trade-Off for Caching and Delivery of Correlated Sources

• Hybrid Coordination and Control for Multiagent Systems with Input Constraints

• Simultaneous Signal Subspace Rank and Model Selection with an Application to Single-snapshot Source Localization

• Cluster-robust Standard Errors for Linear Regression Models with Many Controls

• Recommending Scientific Videos based on Metadata Enrichment using Linked Open Data

• A Novel Mobile Data Contract Design with Time Flexibility

• Estimation from Non-Linear Observations via Convex Programming with Application to Bilinear Regression

• A variational approach to Data Assimilation in the Solar Wind

• Dynamic Multi-Level Multi-Task Learning for Sentence Simplification

• Canonical Tensor Decomposition for Knowledge Base Completion

• End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings

• Variance Reduced Three Operator Splitting

• On pathwise quadratic variation for cadlag functions

• A one-shot quantum joint typicality lemma

• Inner bounds via simultaneous decoding in quantum network information theory

• Efficient data augmentation for multivariate probit models with panel data: An application to general practitioner decision-making about contraceptives

• Unsupervised Deep Multi-focus Image Fusion

• COUNTDOWN – three, two, one, low power! A Run-time Library for Energy Saving in MPI Communication Primitives

• vsgoftest: An Package for Goodness-of-Fit Testing Based on Kullback-Leibler Divergence

• Learning Conditioned Graph Structures for Interpretable Visual Question Answering

• NISQ circuit compilers: search space structure and heuristics

• PaMpeR: Proof Method Recommendation System for Isabelle/HOL

• Magnetic Resonance Spectroscopy Quantification using Deep Learning

• ASIC Implementation of Time-Domain Digital Backpropagation with Deep-Learned Chromatic Dispersion Filters

• Self-adaptive Privacy Concern Detection for User-generated Content

• Solving Fractional Polynomial Problems by Polynomial Optimization Theory

• Painting and Correspondence Coloring of Squares of Planar Graphs with no 4-cycles

• Unsupervised Imitation Learning

• Agent-Mediated Social Choice

• Independent graph of the finite group

• Stable Gaussian Process based Tracking Control of Euler-Lagrange Systems

• Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

• Mixed batches and symmetric discriminators for GAN training

• LIL type behaviour of multivariate Levy processes at zero

• Belousov-Zhabotinsky reaction in liquid marbles

• When Is the Achievable Rate Region Convex in Two-User Massive MIMO Systems

• Letter to the Editor

• FRnet-DTI: Convolutional Neural Networks for Drug-Target Interaction

• Surrogate Outcomes and Transportability

• Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings

• Markov chains with heavy-tailed increments and asymptotically zero drift

• Approximation Strategies for Incomplete MaxSAT

• The determinant of the second additive compound of a square matrix: a formula and applications

• Semi-supervised Hashing for Semi-Paired Cross-View Retrieval

• Automatic segmentation of prostate zones

• Properization

• Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

• Large-Scale Stochastic Sampling from the Probability Simplex

• Feature learning based on visual similarity triplets in medical image analysis: A case study of emphysema in chest CT scans

• FineTag: Multi-label Retrieval of Attributes at Fine-grained Level in Images

• Cooperative Queuing Policies for Effective Human-Multi-Robot Interaction

• Gradient flow approach to local mean-field spin systems

• Infrared and Visible Image Fusion with ResNet and zero-phase component analysis

• Positioning Data-Rate Trade-off in mm-Wave Small Cells and Service Differentiation for 5G Networks

• ConFusion: Sensor Fusion for Complex Robotic Systems using Nonlinear Optimization

• Facing Multiple Attacks in Adversarial Patrolling Games with Alarmed Targets

• Modality Distillation with Multiple Stream Networks for Action Recognition

• Diffeomorphic brain shape modelling using Gauss-Newton optimisation

• Improving brain computer interface performance by data augmentation with conditional Deep Convolutional Generative Adversarial Networks

• Nivat’s Conjecture and Pattern Complexity in Algebraic Subshifts

• Online Linear Quadratic Control

• End-to-End Speech Recognition From the Raw Waveform

• Breaking the 6/5 threshold for sums and products modulo a prime

• Enhancing Identification of Causal Effects by Pruning

• Itemsets of interest for negative association rules

• Distributed Optimization over Directed Graphs with Row Stochasticity and Constraint Regularity

• Learning to Update for Object Tracking

• Transfer Learning with Human Corneal Tissues: An Analysis of Optimal Cut-Off Layer

• A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

• Optimizing Leader Influence in Networks through Selection of Direct Followers

• A new distance-regular graph of diameter $3$ on $1024$ vertices

• Cancer Metastasis Detection With Neural Conditional Random Field

• A model-driven approach for a new generation of adaptive libraries

• Effect of Hyper-Parameter Optimization on the Deep Learning Model Proposed for Distributed Attack Detection in Internet of Things Environment

• Capacitor Based Activity Sensing for Kinetic Powered Wearable IoTs

• Impact of Building-Level Motor Protection on Power System Transient Behaviors

• MoE-SPNet: A Mixture-of-Experts Scene Parsing Network

• Bayesian Sequential Inference in Dynamic Survival Models

• Fast Mixing of Metropolis-Hastings with Unimodal Targets

• Matrix valued inverse problems on graphs with application to elastodynamic networks

• Response Generation by Context-aware Prototype Editing

• Defective and Clustered Colouring of Sparse Graphs

• EmotionX-DLC: Self-Attentive BiLSTM for Detecting Sequential Emotions in Dialogue

• Translating MFM into FOL: towards plant operation planning

• Deep neural network based sparse measurement matrix for image compressed sensing

• On the Metric Distortion of Embedding Persistence Diagrams into Reproducing Kernel Hilbert Spaces

• Complete regular dessins and skew-morphisms of cyclic groups

• Maximum average degree and relaxed coloring

• On the Cauchy problem for parabolic integro-differential equations in generalized Hölder spaces

• The strong chromatic index of $(3,Δ)$-bipartite graphs

• Covering 2-connected 3-regular graphs with disjoint paths

• Strong chromatic index of graphs with maximum degree four

• VirtualHome: Simulating Household Activities via Programs

• Thermodynamics of the Minimum Description Length on Community Detection

• Maximally Invariant Data Perturbation as Explanation

• Theoretical Analysis of Image-to-Image Translation with Adversarial Learning

• Emotional Conversation Generation Orientated Syntactically Constrained Bidirectional-asynchronous Framework

• Private Text Classification

• Optimization over Nonnegative and Convex Polynomials With and Without Semidefinite Programming

• Smoothed SVD-based Beamforming for FBMC/OQAM Systems Based on Frequency Spreading

• Fast Multiple Landmark Localisation Using a Patch-based Iterative Network

• Soft Sampling for Robust Object Detection

• Classification of remote sensing images using attribute profiles and feature profiles from different trees: a comparative study

• Repetition Estimation

• GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

• A Web of Blocks

• Using Mode Connectivity for Loss Landscape Analysis

• Towards Gene Expression Convolutions using Gene Interaction Graphs

• Bayesian monotonic errors-in-variables models with applications to pathogen susceptibility testing

• On the Bias of Reed-Muller Codes over Odd Prime Fields

• Comparative Analysis of Neural QA models on SQuAD

• Deconvolving convolution neural network for cell detection

• Proportional Choosability: A New List Analogue of Equitable Coloring

• High-frequency analysis of parabolic stochastic PDEs

• A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

• Overlapping Clustering Models, and One (class) SVM to Bind Them All

• Reconstruction methods for networks: the case of economic and financial systems

• Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization

• Delegated Search Approximates Efficient Search

• The domination number of plane triangulations

• Paths in ordered trees

• Implementation of Peridynamics utilizing HPX — the C++ standard library for parallelism and concurrency

• Designing Optimal Binary Rating Systems

• Cyclic triangle factors in regular tournaments

• Some remarks on the bias distribution analysis of discrete-time identification algorithms based on pseudo-linear regressions

• Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images

• Learning to Decode 7T-like MR Image Reconstruction from 3T MR Images

• The Minimax Learning Rate of Normal and Ising Undirected Graphical Models

• Manifold Learning & Stacked Sparse Autoencoder for Robust Breast Cancer Classification from Histopathological Images

• Learning Distributed Representations from Reviews for Collaborative Filtering

• Combining Word Feature Vector Method with the Convolutional Neural Network for Slot Filling in Spoken Language Understanding

• Continuous-variable quantum neural networks

• The Off-Topic Memento Toolkit

• Strong coupling limit of the Polaron measure and the Pekar process

• Age-Minimal Transmission for Energy Harvesting Sensors with Finite Batteries: Online Policies

• Beyond Local Nash Equilibria for Adversarial Networks

• Coupled Fluid Density and Motion from Single Views

• The graphs with all but two eigenvalues equal to $2$ or $-1$

• A Hybrid Fuzzy Regression Model for Optimal Loss Reserving in Insurance

• Emergent Open-Endedness from Contagion of the Fittest

• On the relation between Sion’s minimax theorem and existence of Nash equilibrium in asymmetric multi-players zero-sum game with only one alien

• Two Stream Self-Supervised Learning for Action Recognition

• G2D: from GTA to Data

• A Proof of Delta Conjecture

• A Scalable Machine Learning Approach for Inferring Probabilistic US-LI-RADS Categorization

• Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures

• Implicit Quantile Networks for Distributional Reinforcement Learning

• Maximum a Posteriori Policy Optimisation

• Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

• Deep Sequence Learning with Auxiliary Information for Traffic Prediction

• Deep Learning based Estimation of Weaving Target Maneuvers

• Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

• A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data

• DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning

• A Graph Transduction Game for Multi-target Tracking

• Pressure Predictions of Turbine Blades with Deep Learning

• Understanding Patch-Based Learning by Explaining Predictions

• Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation

• DropBack: Continuous Pruning During Training

• Multilingual Scene Character Recognition System using Sparse Auto-Encoder for Efficient Local Features Representation in Bag of Features

• An optimized system to solve text-based CAPTCHA

• DFNet: Semantic Segmentation on Panoramic Images with Dynamic Loss Weights and Residual Fusion Block

• Auto-Meta: Automated Gradient Based Meta Learner Search

• Distributional Advantage Actor-Critic

• Localizing and Quantifying Damage in Social Media Images

• A maximal energy pointset configuration problem