Causal Inference with Noisy and Missing Covariates via Matrix Factorization

Valid causal inference in observational studies often requires controlling for confounders. However, in practice measurements of confounders may be noisy, and can lead to biased estimates of causal effects. We show that we can reduce the bias caused by measurement noise using a large number of noisy measurements of the underlying confounders. We propose the use of matrix factorization to infer the confounders from noisy covariates, a flexible and principled framework that adapts to missing values, accommodates a wide variety of data types, and can augment many causal inference methods. We bound the error for the induced average treatment effect estimator and show it is consistent in a linear regression setting, using Exponential Family Matrix Completion preprocessing. We demonstrate the effectiveness of the proposed procedure in numerical experiments with both synthetic data and real clinical data.

Exploration in Structured Reinforcement Learning

We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration of suboptimal (state, action) pairs. For any arbitrary structure, we derive problem-specific regret lower bounds satisfied by any learning algorithm. These lower bounds are made explicit for unstructured MDPs and for those whose transition probabilities and average reward function are Lipschitz continuous w.r.t. the state and action. For Lipschitz MDPs, the bounds are shown not to scale with the sizes S and A of the state and action spaces, i.e., they are smaller than c \log T where T is the time horizon and the constant c only depends on the Lipschitz structure, the span of the bias function, and the minimal action sub-optimality gap. This contrasts with unstructured MDPs where the regret lower bound typically scales as SA \log T . We devise DEL (Directed Exploration Learning), an algorithm that matches our regret lower bounds. We further simplify the algorithm for Lipschitz MDPs, and show that the simplified version is still able to efficiently exploit the structure.

Dual-Primal Graph Convolutional Networks

In recent years, there has been a surge of interest in developing deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (GAT) model. We provide extensive experimental validation showing state-of-the-art results on a variety of tasks tested on established graph benchmarks, including CORA and Citeseer citation networks as well as MovieLens, Flixter, Douban and Yahoo Music graph-guided recommender systems.

Efficient Time-Evolving Stream Processing at Scale

Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these time-evolving datasets while preserving low memory overhead. In this paper, we present a novel grouping approach (named FISH), which can provide the efficient time-evolving stream processing at scale. The key insight of this work is that the keys of time-evolving stream data can have a skewed distribution within any bounded distance of time interval. This enables to accurately identify the recent hot keys for the real-time load balance within a bounded scope. We therefore propose an epoch-based recent hot key identification with specialized intra-epoch frequency counting (for maintaining low memory overhead) and inter-epoch hotness decaying (for suppressing superfluous computation). We also propose to heuristically infer the accurate information of remote workers through computation rather than communication for cost-efficient worker assignment. We have integrated our approach into Apache Storm. Our results on a cluster of 128 nodes for both synthetic and real-world stream datasets show that FISH significantly outperforms state-of-the-art with the average and the 99th percentile latency reduction by 87.12% and 76.34% (vs. W-Choices), and memory overhead reduction by 99.96% (vs. Shuffle Grouping).

Psychological State in Text: A Limitation of Sentiment Analysis

Starting with the idea that sentiment analysis models should be able to predict not only positive or negative but also other psychological states of a person, we implement a sentiment analysis model to investigate the relationship between the model and emotional state. We first examine psychological measurements of 64 participants and ask them to write a book report about a story. After that, we train our sentiment analysis model using crawled movie review data. We finally evaluate participants’ writings, using the pretrained model as a concept of transfer learning. The result shows that sentiment analysis model performs good at predicting a score, but the score does not have any correlation with human’s self-checked sentiment.

Minnorm training: an algorithm for training overcomplete deep neural networks

In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data. It draws inspiration from support vector machines (SVMs), which are able to generalize well, despite often having an infinite number of free parameters in their primal form, and from recent theoretical generalization bounds on NNs which suggest that lower norm solutions generalize better. To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector’-like examples. The method can be implemented as a wrapper around gradient based methods and uses standard back-propagation of gradients from the NN for both regression and classification versions of the algorithm. We provide theoretical justifications for the effectiveness of this algorithm in comparison to early stopping and L_2-regularization using simple, analytically tractable settings. In particular, we show faster convergence to the max-margin hyperplane in a shallow network (compared to vanilla gradient descent); faster convergence to the minimum-norm solution in a linear chain (compared to L_2-regularization); and initialization-independent generalization performance in a deep linear network. Finally, using the MNIST dataset, we demonstrate that this algorithm can boost test accuracy and identify difficult examples in real-world datasets.

Dense Information Flow for Neural Machine Translation

Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework. From the optimization perspective, residual connections are adopted to improve learning performance for both encoder and decoder in most of these deep architectures, and advanced attention connections are applied as well. Inspired by the success of the DenseNet model in computer vision problems, in this paper, we propose a densely connected NMT architecture (DenseNMT) that is able to train more efficiently for NMT. The proposed DenseNMT not only allows dense connection in creating new features for both encoder and decoder, but also uses the dense attention structure to improve attention quality. Our experiments on multiple datasets show that DenseNMT structure is more competitive and efficient.

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts. The state-of-the-art aggregation models, however, either provide inconsistent predictions or require time-consuming aggregation process. We first prove the inconsistency of typical aggregations using disjoint or random data partition, and then present a consistent yet efficient aggregation model for large-scale GP. The proposed model inherits the advantages of aggregations, e.g., closed-form inference and aggregation, parallelization and distributed computing. Furthermore, theoretical and empirical analyses reveal that the new aggregation model performs better due to the consistent predictions that converge to the true underlying function when the training size approaches infinity.

Diagnosis of Anomaly in the Dynamic State Estimator of a Power System using System Decomposition

In a state estimator, the presence of malicious or simply corrupt sensor data or bad data is detected by the high value of normalized measurement residuals that exceeds the threshold value, determined by the \chi^2 distribution. However, high normalized residuals can also be caused by another type of anomaly, namely gross modeling or topology error. In this paper we propose a method to distinguish between these two sources of anomalies – 1) malicious sensor data and 2) modeling error. The anomaly detector will start with assuming a case of malicious data and suspect some of the individual measurements corresponding to the highest normalized residuals to be `malicious’, unless proved otherwise. Then, choosing a change of basis, the state space is transformed and decomposed into `observable’ and `unobservable’ parts with respect to these `suspicious’ measurements. We argue that, while the anomaly due to malicious data can only affect the `observable’ part of the states, there exists no such restriction for anomalies due to modeling error. Numerical results illustrate how the proposed anomaly diagnosis based on Kalman decomposition can successfully distinguish between the two types of anomalies.

On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks

Parsimonious representations in data modeling are ubiquitous and central for processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit regression problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors. We propose and analyze different iterative algorithms to solve this new problem in practice. We prove that the presented multi-layer Iterative Soft Thresholding (ML-ISTA) and multi-layer Fast ISTA (ML-FISTA) converge to the global optimum of our multi-layer formulation at a rate of \mathcal{O}(1/k) and \mathcal{O}(1/k^2), respectively. We further show how these algorithms effectively implement particular recurrent neural networks that generalize feed-forward architectures without any increase in the number of parameters. We demonstrate the different architectures resulting from unfolding the iterations of the proposed multi-layer pursuit algorithms, providing a principled way to construct deep recurrent CNNs from feed-forward ones. We demonstrate the emerging constructions by training them in an end-to-end manner, consistently improving the performance of classical networks without introducing extra filters or parameters.

NLP-assisted software testing: a systematic review

Context: To reduce manual effort of extracting test cases from natural-language requirements, many approaches based on Natural Language Processing (NLP) have been proposed in the literature. Given the large number of approaches in this area, and since many practitioners are eager to utilize such techniques, it is important to synthesize and provide an overview of the state-of-the-art in this area. Objective: Our objective is to summarize the state-of-the-art in NLP-assisted software testing which could benefit practitioners to potentially utilize those NLP-based techniques, benefit researchers in providing an overview of the research landscape. Method: To address the above need, we conducted a survey in the form of a systematic literature mapping (classification) and systematic literature review. After compiling an initial pool of 57 papers, we conducted a systematic voting, and our final pool included 50 technical papers. Results: This review paper provides an overview of contribution types in the papers, types of NLP approaches used to assist software testing, types of required input requirements, and a review of tool support in this area. Among our results are the followings: (1) only 2 of the 28 tools (7%) presented in the papers are available for download; (2) a larger ratio of the papers (23 of 50) provided a shallow exposure to the NLP aspects (almost no details). Conclusion: We believe that this paper would benefit both practitioners and researchers by serving as an ‘index’ to the body of knowledge in this area. The results could help practitioners by enabling them to utilize any of the existing NLP-based techniques to reduce cost of test-case design and decrease the amount of human resources spent on test activities. Initial insights, after sharing this review with some of our industrial collaborators, show that this review can indeed be useful and beneficial to practitioners.

Stress Test Evaluation for Natural Language Inference

Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understand the semantic content of sentences is unclear. In this work, we propose an evaluation methodology consisting of automatically constructed ‘stress tests’ that allow us to examine whether systems have the ability to make real inferential decisions. Our evaluation of six sentence-encoder models on these stress tests reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.

Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction

Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an end-to-end deep learning model, i.e., Hierarchical attention-based Recurrent Highway Network (HRHN), which incorporates spatio-temporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series.

Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling

Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of diffusion and damping effect of nonlocal networks by doing the spectrum analysis on the weight matrices of the well-trained networks, and propose a new formulation of the nonlocal block. The new block not only learns the nonlocal interactions but also has stable dynamics and thus allows deeper nonlocal structures. Moreover, we interpret our formulation from the general nonlocal modeling perspective, where we make connections between the proposed nonlocal network and other nonlocal models, such as nonlocal diffusion processes and nonlocal Markov jump processes.

Emotion Detection in Text: a Review

In recent years, emotion detection in text has become more popular due to its vast potential applications in marketing, political science, psychology, human-computer interaction, artificial intelligence, etc. Access to a huge amount of textual data, especially opinionated and self-expression text also played a special role to bring attention to this field. In this paper, we review the work that has been done in identifying emotion expressions in text and argue that although many techniques, methodologies, and models have been created to detect emotion in text, there are various reasons that make these methods insufficient. Although, there is an essential need to improve the design and architecture of current systems, factors such as the complexity of human emotions, and the use of implicit and metaphorical language in expressing it, lead us to think that just re-purposing standard methodologies will not be enough to capture these complexities, and it is important to pay attention to the linguistic intricacies of emotion expression.

Optimal Clustering under Uncertainty

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.

Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study

We prove that idealised discriminative Bayesian neural networks, capturing perfect epistemic uncertainty, cannot have adversarial examples: Techniques for crafting adversarial examples will necessarily fail to generate perturbed images which fool the classifier. This suggests why MC dropout-based techniques have been observed to be fairly robust to adversarial examples. We support our claims mathematically and empirically. We experiment with HMC on synthetic data derived from MNIST for which we know the ground truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold. Using our new-found insights we suggest a new attack for MC dropout-based models by looking for imperfections in uncertainty estimation, and also suggest a mitigation. Lastly, we demonstrate our mitigation on a cats-vs-dogs image classification task with a VGG13 variant.

Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP)

Supervised Machine Learning (SML) algorithms such as Gradient Boosting, Random Forest, and Neural Networks have become popular in recent years due to their increased predictive performance over traditional statistical methods. This is especially true with large data sets (millions or more observations and hundreds to thousands of predictors). However, the complexity of the SML models makes them opaque and hard to interpret without additional tools. There has been a lot of interest recently in developing global and local diagnostics for interpreting and explaining SML models. In this paper, we propose locally interpretable models and effects based on supervised partitioning (trees) referred to as LIME-SUP. This is in contrast with the KLIME approach that is based on clustering the predictor space. We describe LIME-SUP based on fitting trees to the fitted response (LIM-SUP-R) as well as the derivatives of the fitted response (LIME-SUP-D). We compare the results with KLIME and describe its advantages using simulation and real data.

DAQN: Deep Auto-encoder and Q-Network

The deep reinforcement learning method usually requires a large number of training images and executing actions to obtain sufficient results. When it is extended a real-task in the real environment with an actual robot, the method will be required more training images due to complexities or noises of the input images, and executing a lot of actions on the real robot also becomes a serious problem. Therefore, we propose an extended deep reinforcement learning method that is applied a generative model to initialize the network for reducing the number of training trials. In this paper, we used a deep q-network method as the deep reinforcement learning method and a deep auto-encoder as the generative model. We conducted experiments on three different tasks: a cart-pole game, an atari game, and a real-game with an actual robot. The proposed method trained efficiently on all tasks than the previous method, especially 2.5 times faster on a task with real environment images.

A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

This paper proposes a novel framework for recurrent neural networks (RNNs) inspired by the human memory models in the field of cognitive neuroscience to enhance information processing and transmission between adjacent RNNs’ units. The proposed framework for RNNs consists of three stages that is working memory, forget, and long-term store. The first stage includes taking input data into sensory memory and transferring it to working memory for preliminary treatment. And the second stage mainly focuses on proactively forgetting the secondary information rather than the primary in the working memory. And finally, we get the long-term store normally using some kind of RNN’s unit. Our framework, which is generalized and simple, is evaluated on 6 datasets which fall into 3 different tasks, corresponding to text classification, image classification and language modelling. Experiments reveal that our framework can obviously improve the performance of traditional recurrent neural networks. And exploratory task shows the ability of our framework of correctly forgetting the secondary information.

Detecting Adversarial Examples via Key-based Network

Though deep neural networks have achieved state-of-the-art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the most powerful deep neural networks. Various defense methods have been proposed to address this issue. However, they either require knowledge on the process of generating adversarial examples, or are not robust against new attacks specifically designed to penetrate the existing defense. In this work, we introduce key-based network, a new detection-based defense mechanism to distinguish adversarial examples from normal ones based on error correcting output codes, using the binary code vectors produced by multiple binary classifiers applied to randomly chosen label-sets as signatures to match normal images and reject adversarial examples. In contrast to existing defense methods, the proposed method does not require knowledge of the process for generating adversarial examples and can be applied to defend against different types of attacks. For the practical black-box and gray-box scenarios, where the attacker does not know the encoding scheme, we show empirically that key-based network can effectively detect adversarial examples generated by several state-of-the-art attacks.

Autoencoders Learn Generative Linear Models

Recent progress in learning theory has led to the emergence of provable algorithms for training certain families of neural networks. Under the assumption that the training data is sampled from a suitable generative model, the weights of the trained networks obtained by these algorithms recover (either exactly or approximately) the generative model parameters. However, the large majority of these results are only applicable to supervised learning architectures. In this paper, we complement this line of work by providing a series of results for unsupervised learning with neural networks. Specifically, we study the familiar setting of shallow autoencoder architectures with shared weights. We focus on three generative models for the data: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the non-negative sparsity model. All three models are widely studied in the machine learning literature. For each of these models, we rigorously prove that under suitable choices of hyperparameters, architectures, and initialization, the autoencoder weights learned by gradient descent % -based training can successfully recover the parameters of the corresponding model. To our knowledge, this is the first result that rigorously studies the dynamics of gradient descent for weight-sharing autoencoders. Our analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as unsupervised feature training mechanisms for a wide range of datasets, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.

Targeted Kernel Networks: Faster Convolutions with Attentive Regularization

We propose Attentive Regularization (AR), a method to constrain the activation maps of kernels in Convolutional Neural Networks (CNNs) to specific regions of interest (ROIs). Each kernel learns a location of specialization along with its weights through standard backpropagation. A differentiable attention mechanism requiring no additional supervision is used to optimize the ROIs. Traditional CNNs of different types and structures can be modified with this idea into equivalent Targeted Kernel Networks (TKNs), while keeping the network size nearly identical. By restricting kernel ROIs, we reduce the number of sliding convolutional operations performed throughout the network in its forward pass, speeding up both training and inference. We evaluate our proposed architecture on both synthetic and natural tasks across multiple domains. TKNs obtain significant improvements over baselines, requiring less computation (around an order of magnitude) while achieving superior performance.

Study and development of a Computer-Aided Diagnosis system for classification of chest x-ray images using convolutional neural networks pre-trained for ImageNet and data augmentation
Garbage Collection in Concurrent Sets
A Column Generation Algorithm for Vehicle Scheduling and Routing Problems
Studying Politically Vulnerable Communities Online: Ethical Dilemmas, Questions, and Solutions
Machine learning of quantum phase transitions
Analysis of regularized Nyström subsampling for regression functions of low smoothness
Short rainbow cycles in sparse graphs
Convergence to the Mean Field Game Limit: A Case Study
Low-Overhead Hierarchically-Sparse Channel Estimation for Multiuser Wideband Massive MIMO
Minmax Regret 1-Sink for Aggregate Evacuation Time on Path Networks
Wideband Massive MIMO Channel Estimation via Sequential Atomic Norm Minimization
Learning Semantic Sentence Embeddings using Pair-wise Discriminator
k-Space Deep Learning for Parallel MRI: Application to Time-Resolved MR Angiography
Admissible Abstractions for Near-optimal Task and Motion Planning
NAM: Non-Adversarial Unsupervised Domain Mapping
AID++: An Updated Version of AID on Scene Classification
ProFlow: Learning to Predict Optical Flow
Identification of Conduit Countries and Community Structures in the Withholding Tax Networks
Tunable degenerate two-dimensional optomechanical system
Echo state networks are universal
Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014
Jackknife Empirical Likelihood Methods for Gini Correlations and their Equality Testing
On majorization of closed walks vector of trees with given degree sequences
Design and evaluation of a genomics variant analysis pipeline using GATK Spark tools
Chromatic numbers of directed hypergraphs with no ‘bad’ cycles
Building Advanced Dialogue Managers for Goal-Oriented Dialogue Systems
Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction
Efficient Two-Level Scheduling for Concurrent Graph Processing
A Note on Many-server Fluid Models with Time-varying Arrivals
Low Cost Edge Sensing for High Quality Demosaicking
ECG encryption and identification based security solution on the Zynq SoC for connected health systems
Research Challenges in Nextgen Service Orchestration
Scaling Up Large-Scale Graph Processing for GPU-Accelerated Heterogeneous Systems
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately and Affordably
An Efficient Graph Accelerator with Parallel Data Conflict Management
TI-CNN: Convolutional Neural Networks for Fake News Detection
Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
Cramér’s estimate for stable processes with power drift
How does climate change influence regional stability
Second-Order Asymptotically Optimal Statistical Classification
Contextualize, Show and Tell: A Neural Visual Storyteller
Content-based Video Relevance Prediction Challenge: Data, Protocol, and Baseline
Spanning trees with at most 2 branch vertices in claw – free graphs
Bandwidth selection for kernel density estimators of multivariate level sets and highest density regions
Partitioning transitive tournaments into isomorphic digraphs
Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries
Closed-loop Bayesian Semantic Data Fusion for Collaborative Human-Autonomy Target Search
Explainable Social Contextual Image Recommendation with Hierarchical Attention
Representaciones Circulares de Grafos Simples Conexos y el Rango Mínimo Semidefinido de un Delta Grafo
An Interpretable Deep Hierarchical Semantic Convolutional Neural Network for Lung Nodule Malignancy Classification
Learning and Generalizing Motion Primitives from Driving Data for Path-Tracking Applications
Primal-Dual Frank-Wolfe for Constrained Stochastic Programs with Convex and Non-convex Objectives
An Enhancement of Decimation Process using Fast Cascaded Integrator Comb (CIC) Filter
Quantifying the dynamics of topical fluctuations in language
Fast Exact Univariate Kernel Density Estimation
Some coefficient sequences related to the descent polynomial
Synthesis methods for reversible circuits consisting of NOT, CNOT and 2-CNOT gates (Ph.D. thesis)
Excessive Backlog Probabilities of Two Parallel Queues
Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting
AutoRally An open platform for aggressive autonomous driving
A Geometric Approach for Real-time Monitoring of Dynamic Large Scale Graphs: AS-level graphs illustrated
Hybrid Data-Sharing and Compression Strategy for Downlink Cloud Radio Access Network
Rejection Sampling for Tempered Levy Processes
Ill-posed Estimation in High-Dimensional Models with Instrumental Variables
Estimating Local Daytime Population Density from Census and Payroll Data
Robust Seriation and Applications to Cancer Genomics
Capacity of Single-Server Single-Message Private Information Retrieval with Coded Side Information
Scraping and Preprocessing Commercial Auction Data for Fraud Classification
Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit
Binary Classification with Karmic, Threshold-Quasi-Concave Metrics
On Minrank and Forbidden Subgraphs
Quality-Assured Synchronized Task Assignment in Crowdsourcing
On an Exact Penalty Result and New Constraint Qualifications for Mathematical Programs with Vanishing Constraints
Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition
Simple Fast Vectorial Solution to The Rigid 3D Registration Problem
AP18-OLR Challenge: Three Tasks and Their Baselines
Multiplex Communities and the Emergence of International Conflict
Anomalous cumulative inertia in human behaviour
Accounting for the Neglected Dimensions of AI Progress
GamePad: A Learning Environment for Theorem Proving
Semantic-Aware Generative Adversarial Nets for Unsupervised Domain Adaptation in Chest X-ray Segmentation
Random integral matrices: universality of surjectivity and the cokernel
BoxNet: Deep Learning Based Biomedical Image Segmentation Using Boxes Only Annotation
Asynchronous Batch and PIR Codes from Hypergraphs
Does the brain represent words An evaluation of brain decoding studies of language understanding
Efficient Entropy for Policy Gradient with Multidimensional Action Space
Fast Locality Sensitive Hashing for Beam Search on GPU
On Minimum Cost Sparsest Input-Connectivity for Controllability of Linear Systems
Monocular Depth Estimation with Augmented Ordinal Depth Relationships
Sequential sampling of junction trees for decomposable graphs
Federated Learning with Non-IID Data
SCAN: Sliding Convolutional Attention Network for Scene Text Recognition
Efficient Interactive Search for Geo-tagged Multimedia Data
Variable Selection for Nonparametric Learning with Power Series Kernels
Introduction to Network Games with Linear Best Responses
Paracontrolled quasi-geostrophic equation with space-time white noise
Minimax adaptive wavelet estimator for the simultaneous blind deconvolution with fractional Gaussian noise
CubeSLAM: Monocular 3D Object Detection and SLAM without Prior Models
Intrinsic Isometric Manifold Learning with Application to Localization
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
Bayesian approach to model-based extrapolation of nuclear observables
Return of the Infinitesimal Jackknife
A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
Spatially Localized Atlas Network Tiles Enables 3D Whole Brain Segmentation from Limited Data
The Externalities of Exploration and How Data Diversity Helps Exploitation
Extension Complexity of the Correlation Polytope
Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling
Run Procrustes, Run! On the convergence of accelerated Procrustes Flow
Efficient, Certifiably Optimal High-Dimensional Clustering
The Stock Market Has Grown Unstable Since February 2018
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
The lemniscate tree of a random polynomial
Generalized modes in Bayesian inverse problems
Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training
Performance Based Cost Functions for End-to-End Speech Separation
Semi-Recurrent CNN-based VAE-GAN for Sequential Data Generation
Smoothness of continuous state branching with immigration semigroups
Backpropagation for Implicit Spectral Densities
A Tradeoff between the Sub-Packetization Size and the Repair Bandwidth for Reed-Solomon Codes
Near-perfect clique-factors in sparse pseudorandom graphs
Machines hear better when they have ears
Extreme values of CUE characteristic polynomials: a numerical study
A Highly Parallel FPGA Implementation of Sparse Neural Network Training
Comply/Constrain Subtraction
Improving Dialogue Act Classification for Spontaneous Arabic Speech and Instant Messages at Utterance Level
Bayesian nonparametric inference for the covariate-adjusted ROC curve
The unicyclic graphs with the second smallest normalized Laplacian eigenvalue no less than $1-\frac{\sqrt{6}}{3}$