Deep Reinforcement Learning for Programming Language Correction

Novice programmers often struggle with the formal syntax of programming languages. To assist them, we design a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, program text itself, without any knowledge of the formal syntax of the programming language. We leverage expert demonstrations for one tenth of the training data to accelerate training. The proposed technique is evaluated on 6975 erroneous C programs with typographic errors, written by students during an introductory programming course. Our technique fixes 14% more programs and 29% more compiler error messages relative to those fixed by a state-of-the-art tool, DeepFix, which uses a fully supervised neural machine translation approach.

Nonparametric Quantile-Based Causal Discovery

Telling cause from effect using observational data is a challenging problem, especially in the bivariate case. Contemporary methods often assume an independence between the cause and the generating mechanism of the effect given the cause. From this postulate, they derive asymmetries to uncover causal relationships. In this work, we propose such an approach, based on the link between Kolmogorov complexity and quantile scoring. We use a nonparametric conditional quantile estimator based on copulas to implement our procedure, thus avoiding restrictive assumptions about the joint distribution between cause and effect. In an extensive study on real and synthetic data, we show that quantile copula causal discovery (QCCD) compares favorably to state-of-the-art methods, while at the same time being computationally efficient and scalable.

A-Tree: A Bounded Approximate Index Structure

Index structures are one of the most important tools that DBAs leverage in order to improve the performance of analytics and transactional workloads. However, with the explosion of data that is constantly being generated in a wide variety of domains including autonomous vehicles, Internet of Things (IoT) devices, and E-commerce sites, building several indexes can often become prohibitive and consume valuable system resources. In fact, a recent study has shown that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a state-of-the-art in-memory DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space that a database has available to store new data or process existing data. In this paper, we present a novel approximate index structure called A-Tree. At the core of our index is a tunable error parameter that allows a DBA to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps the DBA choose an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index structure is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.

Compressed Anomaly Detection with Multiple Mixed Observations

We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the collection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observations (e.g. linear combinations) of the random variables. Recognizing the connection between taking mixed observations and compressed sensing, we view the problem as recovering the ‘support’ (index set) of the anomalous random variables from multiple measurement vectors (MMVs). Many algorithms have been developed for recovering jointly sparse signals and their support from MMVs. We establish the theoretical and empirical effectiveness of these algorithms at detecting anomalies. We also extend the LASSO algorithm to an MMV version for our purpose. Further, we perform experiments on synthetic data, consisting of samples from the random variables, to explore the trade-off between the number of mixed observations per sample and the number of samples required to detect anomalies.

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees

The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is inefficient, both computationally and space wise. We provide a novel stochastic approximation version of the CE method, where the sample averaging is replaced with incremental geometric averaging. This approach can save considerable computational and storage costs. Our algorithm is incremental in nature and possesses additional attractive features such as accuracy, stability, robustness and convergence to the global optimum for a particular class of objective functions. We evaluate the algorithm on a variety of global optimization benchmark problems and the results obtained corroborate our theoretical findings.

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, ‘reinforced self-attention (ReSA)’, for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called ‘reinforced sequence sampling (RSS)’, selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, ‘reinforced self-attention network (ReSAN)’, solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.

Nested LSTMs

We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as c^{outer}_t = f_t \odot c_{t-1} + i_t \odot g_t, NLSTM memory cells use the concatenation (f_t \odot c_{t-1}, i_t \odot g_t) as input to an inner LSTM (or NLSTM) memory cell, and set c^{outer}_t = h^{inner}_t. Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.

ConvCSNet: A Convolutional Compressive Sensing Framework Based on Deep Learning

Compressive sensing (CS), aiming to reconstruct an image/signal from a small set of random measurements has attracted considerable attentions in recent years. Due to the high dimensionality of images, previous CS methods mainly work on image blocks to avoid the huge requirements of memory and computation, i.e., image blocks are measured with Gaussian random matrices, and the whole images are recovered from the reconstructed image blocks. Though efficient, such methods suffer from serious blocking artifacts. In this paper, we propose a convolutional CS framework that senses the whole image using a set of convolutional filters. Instead of reconstructing individual blocks, the whole image is reconstructed from the linear convolutional measurements. Specifically, the convolutional CS is implemented based on a convolutional neural network (CNN), which performs both the convolutional CS and nonlinear reconstruction. Through end-to-end training, the sensing filters and the reconstruction network can be jointly optimized. To facilitate the design of the CS reconstruction network, a novel two-branch CNN inspired from a sparsity-based CS reconstruction model is developed. Experimental results show that the proposed method substantially outperforms previous state-of-the-art CS methods in term of both PSNR and visual quality.

The k-PDTM : a coreset for robust geometric inference

Analyzing the sub-level sets of the distance to a compact sub-manifold of R d is a common method in TDA to understand its topology. The distance to measure (DTM) was introduced by Chazal, Cohen-Steiner and M{\’e}rigot in [7] to face the non-robustness of the distance to a compact set to noise and outliers. This function makes possible the inference of the topology of a compact subset of R d from a noisy cloud of n points lying nearby in the Wasserstein sense. In practice, these sub-level sets may be computed using approximations of the DTM such as the q-witnessed distance [10] or other power distance [6]. These approaches lead eventually to compute the homology of unions of n growing balls, that might become intractable whenever n is large. To simultaneously face the two problems of large number of points and noise, we introduce the k-power distance to measure (k-PDTM). This new approximation of the distance to measure may be thought of as a k-coreset based approximation of the DTM. Its sublevel sets consist in union of k-balls, k << n, and this distance is also proved robust to noise. We assess the quality of this approximation for k possibly dramatically smaller than n, for instance k = n 1 3 is proved to be optimal for 2-dimensional shapes. We also provide an algorithm to compute this k-PDTM.

Deep Learning Works in Practice. But Does it Work in Theory?

Deep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it actually performs. We argue that the explanation is intimately connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelizable logical depth. Roughly speaking, we conjecture that the shortest computational descriptions of the universe are algorithms with inherently large computation times, even when a large number of computers are available for parallelization. Interestingly, this conjecture, combined with the folklore conjecture in theoretical computer science that P \neq NC, explains the success of deep learning.

De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample X^i \in\mathbb R^p, i = 1,...,n, generated from an unknown distribution with an unknown covariance matrix \Sigma_0, our aim is to estimate the first vector of loadings and the largest eigenvalue of \Sigma_0 in a setting where p\gg n. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in \ell_1 and \ell_2-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix \Sigma_0. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order \sqrt{n}/\log p and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.

Solving estimating equations with copulas

Thanks to their aptitude to capture complex dependence structures, copulas are frequently used to glue random variables into a joint model with arbitrary one-dimensional margins. More recently, they have been applied to solve statistical learning problems such as regressions or classification. Framing such approaches as solutions of estimating equations, we generalize them in a unified framework. We derive consistency, asymptotic normality, and validity of the bootstrap for copula-based Z-estimators. We further illustrate the versatility such estimators through theoretical and simulated examples.

Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach

The robustness of neural networks to adversarial examples has received great attention due to security implications. Despite various attack approaches to crafting visually imperceptible adversarial examples, little has been developed towards a comprehensive measure of robustness. In this paper, we provide a theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and propose to use the Extreme Value Theory for efficient evaluation. Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks. Experimental results on various networks, including ResNet, Inception-v3 and MobileNet, show that (i) CLEVER is aligned with the robustness indication measured by the \ell_2 and \ell_\infty norms of adversarial examples from powerful attacks, and (ii) defended networks using defensive distillation or bounded ReLU indeed achieve better CLEVER scores. To the best of our knowledge, CLEVER is the first attack-independent robustness metric that can be applied to any neural network classifier.

Learning to Classify from Impure Samples
Sometimes You Want to Go Where Everybody Knows your Name
A Rational Distributed Process-level Account of Independence Judgment
Enhanced Max-Min SINR for Uplink Cell-Free Massive MIMO Systems
Cell-Free Massive MIMO with Limited Backhaul
DeepDTA: Deep Drug-Target Binding Affinity Prediction
Generating Wikipedia by Summarizing Long Sequences
A novel methodology on distributed representations of proteins using their interacting ligands
A Framework for the Dynamic Programming Principle and Martingale-generated Control Correspondences
Noise-robust quantum sensing via optimal multi-probe spectroscopy
On the Horadam symbol elements
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
Multidimensional Scaling of Noisy High Dimensional Data
Energy-efficient Deployment of Relay Nodes in Wireless Sensor Networks using Evolutionary Techniques
Low-rank Bandit Methods for High-dimensional Dynamic Pricing
A scalable estimate of the extra-sample prediction error via approximate leave-one-out
FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling
Multivariate Specification Tests Based on a Dynamic Rosenblatt Transform
The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval
REOH: Runtime Energy Optimization for Heterogeneous Systems
Kernel Distillation for Gaussian Processes
Cataloging the Visible Universe through Bayesian Inference at Petascale
Learning Video-Story Composition via Recurrent Neural Network
Stochastic Optimization and Control Framework for 5G Network Slicing with Effective Isolation
Optimal Configurations in Coverage Control with Polynomial Costs
An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path
Visually Explainable Recommendation
On the Optimal Recovery Threshold of Coded Matrix Multiplication
Paraphrase-Supervised Models of Compositionality
Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures
Action Recognition with Visual Attention on Skeleton Images
Positiveness of the permanent of 4-dimensional polystochastic matrices of order 4
Demonstration of the Relationship between Sensitivity and Identifiability for Inverse Uncertainty Quantification
A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video
Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph
Probability of detection of an extraneous mobile object by autonomous unmanned underwater vehicles as a solution of the Buffon problem
SESR: Single Image Super Resolution with Recursive Squeeze and Excitation Networks
Privacy-Preserving Secret Shared Computations using MapReduce
A Survey of Recent Advances in Texture Representation
Incidence structures near configurations of type $(n_3)$
A Continuous – Time Quantum Walk for Attributed Graphs Matching
An Infinitesimal Probabilistic Model for Principal Component Analysis of Manifold Valued Data
QRMW: Quantum representation of multi wavelength images
Fast and Accurate Reconstruction of Compressed Color Light Field
A CNN-based Spatial Feature Fusion Algorithm for Hyperspectral Imagery Classification
Landau-Ginzburg theory of cortex dynamics: Scale-free avalanches emerge at the edge of synchronization
Multi-factor approximation of rough volatility models
Security of NEQR Quantum Image by Using Quantum Fourier Transform with Blind Trent
Synchronization Detection and Recovery of Steganographic Messages with Adversarial Learning
Noncoherent LDPC-Coded Physical-Layer Network Coding using Multitone FSK
Data driven time scale in Gaussian quasi-likelihood inference
Noise contrastive estimation: asymptotics, comparison with MC-MLE
On the computability of graphons
On the probability that a stationary Gaussian process with spectral gap remains non-negative on a long interval
A structure theorem for generalized affine buildings
Probabilistic Recurrent State-Space Models
Deux améliorations concurrentes des PID
Efficient Algorithms for Measuring the Funnel-likeness of DAGs
Deep Multi-view Learning to Rank
Hardness, Approximability, and Fixed-Parameter Tractability of the Clustered Shortest-Path Tree Problem
Extremal values of the Sackin balance index for rooted binary trees
Robust Designs of Beamforming and Power Splitting for Distributed Antenna Systems with Wireless Energy Harvesting
On the size of the set $AA+A$
A Variable Density Sampling Scheme for Compressive Fourier Transform Interferometry
Hierarchical restricted isometry property for Kronecker product measurements
Robust 3D Human Motion Reconstruction Via Dynamic Template Construction
Weighted Nonlocal Total Variation in Image Processing
From Benedict Cumberbatch to Sherlock Holmes: Character Identification in TV series without a Script
Counting Cells in Time-Lapse Microscopy using Deep Neural Networks
Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
Irreducible 4-critical triangle-free toroidal graphs
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Characteristic polynomials of modified permutation matrices at microscopic scale
Constant Factor Time Optimal Multi-Robot Routing on High-Dimensional Grids in Mostly Sub-Quadratic Time
Parameterized Power Vertex Cover
New Size Hierarchies for Two Way Automata
Deep Predictive Models in Interactive Music
A New Interweave Cognitive Radio Scheme for Out-band Energy Harvesting Systems
Lifted Filtering via Exchangeable Decomposition
Remanufacturing cost analysis under uncertain core quality and return conditions: extreme and non-extreme scenarios
Analysis of Coded Selective-Repeat ARQ via Matrix Signal-Flow Graphs
Learning Families of Formal Languages from Positive and Negative Information
Probabilistic representation for solutions to nonlinear Fokker-Planck equations
Eigenvectors of a matrix under random perturbation
Volumetric Densely Dilated Spatial Pooling ConNets for Prostate Segmentation
Tropical optimization techniques in multi-criteria decision making with Analytical Hierarchy Process
Classifying Conversation in Digital Communication
The Barycenter Method for Direct Optimization
Smoluchowski-Kramers approximation for the damped stochastic wave equation with multiplicative noise in any spatial dimension
Strong coordination of signals and actions over noisy channels with two-sided state information
Assessing student’s achievement gap between ethnic groups in Brazil
A family of OWA operators based on Faulhaber’s formulas
Multi-Layer Competitive-Cooperative Framework for Performance Enhancement of Differential Evolution
On the construction of unbiased estimators for the group testing problem
Message Transmission over Classical Quantum Channels with a Jammer with Side Information: Message Transmission Capacity and Resources
Pilot-Assisted Short-Packet Transmission over Multiantenna Fading Channels: A 5G Case Study
Power models, energy models and libraries for energy-efficient concurrent data structures and algorithms
A Distribution-Free Test of Independence and Its Application to Variable Selection
Feature Decomposition Based Saliency Detection in Electron Cryo-Tomograms
A Novel Centralized Strategy for Coded Caching with Non-uniform Demands
Calculus of convex polyhedra and polyhedral convex functions by utilizing a multiple objective linear programming solver
Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks
Pressureless Euler alignment system with control
Universal Level Statistics of the Out-of-Time-Ordered Operator
Contextuality Analysis of the Double Slit Experiment (With a Glimpse Into Three Slits)
Model compression for faster structural separation of macromolecules captured by Cellular Electron Cryo-Tomography
Drawdown and drawup for fractional Brownian motion with trend
ILPS at TREC 2017 Common Core Track
Hypercube Packings and Coverings with Higher Dimensional Rooks