Rule Primality, Minimal Generating Sets, Turing-Universality and Causal Decomposition in Elementary Cellular Automata

We introduce several concepts such as prime and composite rule, tools and methods for causal composition and decomposition. We discover and prove new universality results in ECA, namely, that the Boolean composition of ECA rules 51 and 118, and 170, 15 and 118 can emulate ECA rule 110 and are thus Turing-universal coupled systems. We construct the 4-colour Turing-universal cellular automaton that carries the Boolean composition of the 2 and 3 ECA rules emulating ECA rule 110 under multi-scale coarse-graining. We find that rules generating the ECA rulespace by Boolean composition are of low complexity and comprise prime rules implementing basic operations that when composed enable complex behaviour. We also found a candidate minimal set with only 38 ECA prime rules—and several other small sets—capable of generating all other (non-trivially symmetric) 88 ECA rules under Boolean composition.

Time Series Analysis via Matrix Estimation

We consider the task of interpolating and forecasting a time series in the presence of noise and missing data. As the main contribution of this work, we introduce an algorithm that transforms the observed time series into a matrix, utilizes singular value thresholding to simultaneously recover missing values and de-noise observed entries, and performs linear regression to make predictions. We argue that this method provides meaningful imputation and forecasting for a large class of models: finite sum of harmonics (which approximate stationary processes), non-stationary sublinear trends, Linear Time-Invariant (LTI) systems, and their additive mixtures. In general, our algorithm recovers the hidden state of dynamics based on its noisy observations, like that of a Hidden Markov Model (HMM), provided the dynamics obey the above stated models. We demonstrate on synthetic and real-world datasets that our algorithm outperforms standard software packages not only in the presence of significantly missing data with high levels of noise, but also when the packages are given the underlying model while our algorithm remains oblivious. This is in line with the finite sample analysis for these model classes.

Adversarially Learned One-Class Classifier for Novelty Detection

Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to-end deep network is a cumbersome task. In this paper, inspired by the success of generative adversarial networks for training deep models in unsupervised and semi-supervised settings, we propose an end-to-end architecture for one-class classification. Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples. The proposed framework applies to different related applications of anomaly and outlier detection in images and videos. The results on MNIST and Caltech-256 image datasets, along with the challenging UCSD Ped2 dataset for video anomaly detection illustrate that our proposed method learns the target class effectively and is superior to the baseline and state-of-the-art methods.

Partial Distance Correlation Screening for High Dimensional Time Series

High dimensional time series datasets are becoming increasingly common in various fields such as economics, finance, meteorology, and neuroscience. Given this ubiquity of time series data, it is surprising that very few works on variable screening are directly applicable to time series data, and even fewer methods developed which utilize the unique aspects of time series data. This paper introduces several model free screening methods developed specifically to deal with dependent and/or heavy tailed response and covariate time series. These methods are based on the distance correlation and the partial distance correlation. Methods are developed both for univariate response models, such as non linear autoregressive models with exogenous predictors, and multivariate response models such as linear or nonlinear VAR models. Sure screening properties are proved for our methods, which depend on the moment conditions, and the strength of dependence in the response and covariate processes, amongst other factors. Dependence is quantified by functional dependence measures (Wu [Proc. Natl. Acad. Sci. USA 102 (2005) 14150-14154]), and \beta-mixing coefficients, and the results rely on the use of Nagaev and Rosenthal type inequalities for dependent random variables. Finite sample performance of our methods is shown through extensive simulation studies, and we include an application to macroeconomic forecasting.

Learning with Abandonment

Consider a platform that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if she is dissatisfied with the actions of the platform. For example, a platform is interested in personalizing the number of newsletters it sends, but faces the risk that the user unsubscribes forever. We propose a general thresholded learning model for scenarios like this, and discuss the structure of optimal policies. We describe salient features of optimal personalization algorithms and how feedback the platform receives impacts the results. Furthermore, we investigate how the platform can efficiently learn the heterogeneity across users by interacting with a population and provide performance guarantees.

Teacher Improves Learning by Selecting a Training Subset

We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better. We provide sharp super-teaching guarantees on two learners: the maximum likelihood estimator for the mean of a Gaussian, and the large margin classifier in 1D. For general learners, we provide a mixed-integer nonlinear programming-based algorithm to find a super teaching set. Empirical experiments show that our algorithm is able to find good super-teaching sets for both regression and classification problems.

GPU Accelerated Sub-Sampled Newton\textsf{‘}s Method

First order methods, which solely rely on gradient information, are commonly used in diverse machine learning (ML) and data analysis (DA) applications. This is attributed to the simplicity of their implementations, as well as low per-iteration computational/storage costs. However, they suffer from significant disadvantages; most notably, their performance degrades with increasing problem ill-conditioning. Furthermore, they often involve a large number of hyper-parameters, and are notoriously sensitive to parameters such as the step-size. By incorporating additional information from the Hessian, second-order methods, have been shown to be resilient to many such adversarial effects. However, these advantages of using curvature information come at the cost of higher per-iteration costs, which in \enquote{big data} regimes, can be computationally prohibitive. In this paper, we show that, contrary to conventional belief, second-order methods, when implemented appropriately, can be more efficient than first-order alternatives in many large-scale ML/ DA applications. In particular, in convex settings, we consider variants of classical Newton\textsf{‘}s method in which the Hessian and/or the gradient are randomly sub-sampled. We show that by effectively leveraging the power of GPUs, such randomized Newton-type algorithms can be significantly accelerated, and can easily outperform state of the art implementations of existing techniques in popular ML/ DA software packages such as TensorFlow. Additionally these randomized methods incur a small memory overhead compared to first-order methods. In particular, we show that for million-dimensional problems, our GPU accelerated sub-sampled Newton\textsf{‘}s method achieves a higher test accuracy in milliseconds as compared with tens of seconds for first order alternatives.

Interpreting Complex Regression Models

Interpretation of a machine learning induced models is critical for feature engineering, debugging, and, arguably, compliance. Yet, best of breed machine learning models tend to be very complex. This paper presents a method for model interpretation which has the main benefit that the simple interpretations it provides are always grounded in actual sets of learning examples. The method is validated on the task of interpreting a complex regression model in the context of both an academic problem — predicting the year in which a song was recorded and an industrial one — predicting mail user churn.

Learning Anonymized Representations with Adversarial Neural Networks

Statistical methods protecting sensitive information or the identity of the data owner have become critical to ensure privacy of individuals as well as of organizations. This paper investigates anonymization methods based on representation learning and deep neural networks, and motivated by novel information theoretical bounds. We introduce a novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels. The architecture is based on three sub-networks: one going from input to representation, one from representation to predicted regular labels, and one from representation to predicted private labels. The training procedure aims at learning representations that preserve the relevant part of the information (about regular labels) while dismissing information about the private labels which correspond to the identity of a person. We demonstrate the success of this approach for two distinct classification versus anonymization tasks (handwritten digits and sentiment analysis).

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

We consider the problem of \emph{fully decentralized} multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large. Under the decentralized structure, the actor step is performed individually by each agent with no need to infer the policies of others. For the critic step, we propose a consensus update via communication over the network. Our algorithms are fully incremental and can be implemented in an online fashion. Convergence analyses of the algorithms are provided when the value functions are approximated within the class of linear functions. Extensive simulation results with both linear and nonlinear function approximations are presented to validate the proposed algorithms. Our work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees.

GraphRNN: A Deep Generative Model for Graphs

Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models.

Extremely Fast Decision Tree

We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree—‘Extremely Fast Decision Tree’, a minor modification to the MOA implementation of Hoeffding Tree—obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.

Syntax-Directed Variational Autoencoder for Structured Data

Deep generative models have been enjoying success in modeling continuous data. However it remains challenging to capture the representations for discrete structures with formal grammars and semantics, e.g., computer programs and molecular structures. How to generate both syntactically and semantically correct data still remains largely an open problem. Inspired by the theory of compiler where the syntax and semantics check is done via syntax-directed translation (SDT), we propose a novel syntax-directed variational autoencoder (SD-VAE) by introducing stochastic lazy attributes. This approach converts the offline SDT check into on-the-fly generated guidance for constraining the decoder. Comparing to the state-of-the-art methods, our approach enforces constraints on the output space so that the output will be not only syntactically valid, but also semantically reasonable. We evaluate the proposed model with applications in programming language and molecules, including reconstruction and program/molecule optimization. The results demonstrate the effectiveness in incorporating syntactic and semantic constraints in discrete generative models, which is significantly better than current state-of-the-art approaches.

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to ‘warm-start’ the agent by pre-training it to mimic expert demonstrations, but this is prone to overfitting. Instead, we propose to constrain exploration using demonstrations. From each demonstration, we induce high-level ‘workflows’ which constrain the allowable actions at each time step to be similar to those in the demonstration (e.g., ‘Step 1: click on a textbox; Step 2: enter some text’). Our exploration policy then learns to identify successful workflows and samples actions that satisfy these workflows. Workflows prune out bad exploration directions and accelerate the agent’s ability to discover rewards. We use our approach to train a novel neural policy designed to handle the semi-structured nature of websites, and evaluate on a suite of web tasks, including the recent World of Bits benchmark. We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x.

Single Image Super-Resolution via Cascaded Multi-Scale Cross Network

The deep convolutional neural networks have achieved significant improvements in accuracy and speed for single image super-resolution. However, as the depth of network grows, the information flow is weakened and the training becomes harder and harder. On the other hand, most of the models adopt a single-stream structure with which integrating complementary contextual information under different receptive fields is difficult. To improve information flow and to capture sufficient knowledge for reconstructing the high-frequency details, we propose a cascaded multi-scale cross network (CMSC) in which a sequence of subnetworks is cascaded to infer high resolution features in a coarse-to-fine manner. In each cascaded subnetwork, we stack multiple multi-scale cross (MSC) modules to fuse complementary multi-scale information in an efficient way as well as to improve information flow across the layers. Meanwhile, by introducing residual-features learning in each stage, the relative information between high-resolution and low-resolution features is fully utilized to further boost reconstruction performance. We train the proposed network with cascaded-supervision and then assemble the intermediate predictions of the cascade to achieve high quality image reconstruction. Extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over state-of-the-art super-resolution methods.

Convolutional Neural Networks combined with Runge-Kutta Methods

A convolutional neural network for image classification can be constructed following some mathematical ways since it models the ventral stream in visual cortex which is regarded as a multi-period dynamical system. In this paper, a new point of view is proposed for constructing network models as well as providing a direction to get inspiration or explanation for neural network. If each period in ventral stream was deemed to be a dynamical system with time as the independent variable, there should be a set of ordinary differential equations (ODEs) for this system. Runge-Kutta methods are common means to solve ODE. Thus, network model ought to be built using these methods. Moreover, convolutional networks could be employed to emulate the increments within every time-step. The model constructed in the above way is named Runge-Kutta Convolutional Neural Network (RKNet). According to this idea, Dense Convolutional Networks (DenseNets) and Residual Networks (ResNets) were varied to RKNets. To prove the feasibility of RKNets, these variants were verified on benchmark datasets, CIFAR and ImageNet. The experimental results show that the RKNets transformed from DenseNets gained similar or even higher parameter efficiency. The success of the experiments denotes that Runge-Kutta methods can be utilized to construct convolutional neural networks for image classification efficiently. Furthermore, the network models might be structured more rationally in the future basing on RKNet and priori knowledge.

One Big Net For Everything

I apply recent work on ‘learning to think’ (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by black box optimization / reinforcement learning / artificial evolution as well as supervised / unsupervised learning. For example, ONE may learn through neuroevolution to control a robot through environment-changing actions, and learn through unsupervised gradient descent to predict future inputs and vector-valued reward signals as suggested in 1990. User-given tasks can be defined through extra goal-defining input patterns, also proposed in 1990. Suppose ONE has already learned many skills. Now a copy of ONE can be re-trained to learn a new skill, e.g., through neuroevolution without a teacher. Here it may profit from re-using previously learned subroutines, but it may also forget previous skills. Then ONE is retrained in PowerPlay style (2011) on stored input/output traces of (a) ONE’s copy executing the new skill and (b) previous instances of ONE whose skills are still considered worth memorizing. Simultaneously, ONE is retrained on old traces (even those of unsuccessful trials) to become a better predictor, without additional expensive interaction with the enviroment. More and more control and prediction skills are thus collapsed into ONE, like in the chunker-automatizer system of the neural history compressor (1991). This forces ONE to relate partially analogous skills (with shared algorithmic information) to each other, creating common subroutines in form of shared subnetworks of ONE, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.

Time Series Learning using Monotonic Logical Properties

We propose a new paradigm for time-series learning where users implicitly specify families of signal shapes by choosing monotonic parameterized signal predicates. These families of predicates (also called specifications) can be seen as infinite Boolean feature vectors, that are able to leverage a user’s domain expertise and have the property that as the parameter values increase, the specification becomes easier to satisfy. In the presence of multiple parameters, monotonic specifications admit trade-off curves in the parameter space, akin to Pareto fronts in multi-objective optimization, that separate the specifications that are satisfied from those that are not satisfied. Viewing monotonic specifications (and their trade-off curves) as ‘features’ for time-series data, we develop a principled way to bestow a distance measure between signals through the lens of a monotonic specification. A unique feature of this approach is that, a simple Boolean predicate based on the monotonic specification can be used to explain why any two traces (or sets of traces) have a given distance. Given a simple enough specification, this enables relaying at a high level ‘why’ two signals have a certain distance and what kind of signals lie between them. We conclude by demonstrating our technique with two case studies that illustrate how simple monotonic specifications can be used to craft desirable distance measures.

Meta Multi-Task Learning for Sequence Modeling

Semantic composition functions have been playing a pivotal role in neural representation learning of text sequences. In spite of their success, most existing models suffer from the underfitting problem: they use the same shared compositional function on all the positions in the sequence, thereby lacking expressive power due to incapacity to capture the richness of compositionality. Besides, the composition functions of different tasks are independent and learned from scratch. In this paper, we propose a new sharing scheme of composition function across multiple tasks. Specifically, we use a shared meta-network to capture the meta-knowledge of semantic composition and generate the parameters of the task-specific semantic composition models. We conduct extensive experiments on two types of tasks, text classification and sequence tagging, which demonstrate the benefits of our approach. Besides, we show that the shared meta-knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks.

Wide Compression: Tensor Ring Nets

Deep neural networks have demonstrated state-of-the-art performance in a variety of real-world applications. In order to obtain performance gains, these networks have grown larger and deeper, containing millions or even billions of parameters and over a thousand layers. The trade-off is that these large architectures require an enormous amount of memory, storage, and computation, thus limiting their usability. Inspired by the recent tensor ring factorization, we introduce Tensor Ring Networks (TR-Nets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks. Our results show that our TR-Nets approach {is able to compress LeNet-5 by 11\times without losing accuracy}, and can compress the state-of-the-art Wide ResNet by 243\times with only 2.3\% degradation in {Cifar10 image classification}. Overall, this compression scheme shows promise in scientific computing and deep learning, especially for emerging resource-constrained devices such as smartphones, wearables, and IoT devices.

Attention-Aware Generative Adversarial Networks (ATA-GANs)

In this work, we present a novel approach for training Generative Adversarial Networks (GANs). Using the attention maps produced by a Teacher- Network we are able to improve the quality of the generated images as well as perform weakly object localization on the generated images. To this end, we generate images of HEp-2 cells captured with Indirect Imunofluoresence (IIF) and study the ability of our network to perform a weakly localization of the cell. Firstly, we demonstrate that whilst GANs can learn the mapping between the input domain and the target distribution efficiently, the discriminator network is not able to detect the regions of interest. Secondly, we present a novel attention transfer mechanism which allows us to enforce the discriminator to put emphasis on the regions of interest via transfer learning. Thirdly, we show that this leads to more realistic images, as the discriminator learns to put emphasis on the area of interest. Fourthly, the proposed method allows one to generate both images as well as attention maps which can be useful for data annotation e.g in object detection.

SAFFRON: an adaptive algorithm for online control of the false discovery rate

In the online false discovery rate (FDR) problem, one observes a possibly infinite sequence of p-values P_1,P_2,\dots, each testing a different null hypothesis, and an algorithm must pick a sequence of rejection thresholds \alpha_1,\alpha_2,\dots in an online fashion, effectively rejecting the k-th null hypothesis whenever P_k \leq \alpha_k. Importantly, \alpha_k must be a function of the past, and cannot depend on P_k or any of the later unseen p-values, and must be chosen to guarantee that for any time t, the FDR up to time t is less than some pre-determined quantity \alpha \in (0,1). In this work, we present a powerful new framework for online FDR control that we refer to as SAFFRON. Like older alpha-investing (AI) algorithms, SAFFRON starts off with an error budget, called alpha-wealth, that it intelligently allocates to different tests over time, earning back some wealth on making a new discovery. However, unlike older methods, SAFFRON’s threshold sequence is based on a novel estimate of the alpha fraction that it allocates to true null hypotheses. In the offline setting, algorithms that employ an estimate of the proportion of true nulls are called adaptive methods, and SAFFRON can be seen as an online analogue of the famous offline Storey-BH adaptive procedure. Just as Storey-BH is typically more powerful than the Benjamini-Hochberg (BH) procedure under independence, we demonstrate that SAFFRON is also more powerful than its non-adaptive counterparts, such as LORD and other generalized alpha-investing algorithms. Further, a monotone version of the original AI algorithm is recovered as a special case of SAFFRON, that is often more stable and powerful than the original. Lastly, the derivation of SAFFRON provides a novel template for deriving new online FDR rules.

A New Algorithm for Finding Closest Pair of Vectors

Given n vectors x_0, x_1, \ldots, x_{n-1} in \{0,1\}^{m}, how to find two vectors whose pairwise Hamming distance is minimum? This problem is known as the Closest Pair Problem. If these vectors are generated uniformly at random except two of them are correlated with Pearson-correlation coefficient \rho, then the problem is called the Light Bulb Problem. In this work, we propose a novel coding-based scheme for the Close Pair Problem. We design both randomized and deterministic algorithms, which achieve the best-known running time when the minimum distance is very small compared to the length of input vectors. When applied to the Light Bulb Problem, our algorithms yields state-of-the-art deterministic running time when the Pearson-correlation coefficient \rho is very large.

Cuttlefish: A Lightweight Primitive for Adaptive Query Processing

Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query optimizer rules and cost models to choose physical operators for all of these novel logical operators is impractical. To address this challenge, we develop Cuttlefish, a new primitive for adaptively processing online query plans that explores candidate physical operator instances during query execution and exploits the fastest ones using multi-armed bandit reinforcement learning techniques. We prototype Cuttlefish in Apache Spark and adaptively choose operators for image convolution, regular expression matching, and relational joins. Our experiments show Cuttlefish-based adaptive convolution and regular expression operators can reach 72-99% of the throughput of an all-knowing oracle that always selects the optimal algorithm, even when individual physical operators are up to 105x slower than the optimal. Additionally, Cuttlefish achieves join throughput improvements of up to 7.5x compared with Spark SQL’s query optimizer.

Improved MapReduce and Streaming Algorithms for $k$-Center Clustering (with Outliers)

We present efficient MapReduce and Streaming algorithms for the k-center problem with and without outliers. Our algorithms exhibit an approximation factor which is arbitrarily close to the best possible, given enough resources.

Max-Mahalanobis Linear Discriminant Analysis Networks

A deep neural network (DNN) consists of a nonlinear transformation from an input to a feature representation, followed by a common softmax linear classifier. Though many efforts have been devoted to designing a proper architecture for nonlinear transformation, little investigation has been done on the classifier part. In this paper, we show that a properly designed classifier can improve robustness to adversarial attacks and lead to better prediction results. Specifically, we define a Max-Mahalanobis distribution (MMD) and theoretically show that if the input distributes as a MMD, the linear discriminant analysis (LDA) classifier will have the best robustness to adversarial examples. We further propose a novel Max-Mahalanobis linear discriminant analysis (MM-LDA) network, which explicitly maps a complicated data distribution in the input space to a MMD in the latent feature space and then applies LDA to make predictions. Our results demonstrate that the MM-LDA networks are significantly more robust to adversarial attacks, and have better performance in class-biased classification.

‘You are no Jack Kennedy’: On Media Selection of Highlights from Presidential Debates
The Sprague-Grundy function for some selective compound games
Machine learning based hyperspectral image analysis: A survey
STRIPStream: Integrating Symbolic Planners and Blackbox Samplers
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Detection of Sparse Mixtures: Higher Criticism and Scan Statistic
A Robust Power Grid Defense Model Considering Load Demand and Wind Generation Uncertainties
A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos
Longitudinal Face Aging in the Wild – Recent Deep Learning Approaches
Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data
The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection
Conflict-Aware Replicated Data Types
Quantum walks and the size of the graph
A DIRT-T Approach to Unsupervised Domain Adaptation
Estimating Graphlet Statistics via Lifting
Contextual Bandits with Stochastic Experts
Interacting partially directed self-avoiding walk: a probabilistic perspective
Data-driven brain network models predict individual variability in behavior
A Generalized Discrete-Time Altafini Model
Edge-Based Recognition of Novel Objects for Robotic Grasping
No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras & LiDARs
Sensitivity and Generalization in Neural Networks: an Empirical Study
Behavioral-clinical phenotyping with type 2 diabetes self-monitoring data
Diffusion Maps meet Nyström
Is Generator Conditioning Causally Related to GAN Performance?
A Walk with SGD
Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks
Bandwidth Partitioning and Downlink Analysis in mmWave Integrated Access and Backhaul for 5G
Measuring the Demand Effects of Formal and Informal Communication : Evidence from Online Markets for Illicit Drugs
Superpixel based Class-Semantic Texton Occurrences for Natural Roadside Vegetation Segmentation
Facial Expression Analysis under Partial Occlusion: A Survey
Improved Regularity Model-based EDA for Many-objective Optimization
Spatially Constrained Location Prior for Scene Parsing
Erdős-Burgess constant of the direct product of cyclic semigroups
IGD Indicator-based Evolutionary Algorithm for Many-objective Optimization Problems
Multispectral Image Intrinsic Decomposition via Low Rank Constraint
Constrained Image Generation Using Binarized Neural Networks with Decision Procedures
Residual Dense Network for Image Super-Resolution
Automatic adaptation of MCMC algorithms
On Pseudo-disk Hypergraphs
Stochastic Gradient Descent on Highly-Parallel Architectures
Uniform semimodular lattice and valuated matroid
Bound on the diameter of split metacyclic groups
Observer-Based Controllers For Incrementally Quadratic Nonlinear Systems with Disturbances: Continuous-time and Event-triggered Cases
A Twofold Siamese Network for Real-Time Object Tracking
PSO-based Fuzzy Markup Language for Student Learning Performance Evaluation and Educational Application
Integrable spin chains with random interactions
Adaptive Deep Learning through Visual Domain Localization
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
On the sizes of $k$-edge-maximal $r$-uniform hypergraphs
Combining historical data and bookmakers’odds in modelling football scores
Minimax Distribution Estimation in Wasserstein Distance
Hypergeometry inspired by irrationality questions
Localisation Transition in the Driven Aubry-André Model
Classifying surface probe images in strongly correlated electronic systems via machine learning
Deep learning for conifer/deciduous classification of airborne LiDAR 3D point clouds representing individual trees
Weisfeiler-Leman meets Homomorphisms
Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization
The effect of transmission-line dynamics on grid-forming dispatchable virtual oscillator control
A Block-wise, Asynchronous and Distributed ADMM Algorithm for General Form Consensus Optimization
Approximation of Kolmogorov-Smirnov Test Statistics
Importance of initial conditions in the polarization of complex networks
Water from Two Rocks: Maximizing the Mutual Information
N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification
Improving Recall of In Situ Sequencing by Self-Learned Features and a Graphical Model
Semi-Smooth Newton Algorithm for Non-Convex Penalized Linear Regression
Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo
Powers of tight Hamilton cycles in randomly perturbed hypergraphs
A quasi-physical dynamic reduced order model for thermospheric mass density via Hermitian Space Dynamic Mode Decomposition
Product Kernel Interpolation for Scalable Gaussian Processes
Muon Hunter: a Zooniverse project
Scalable Private Learning with PATE
Free-breathing cardiac MRI using bandlimited manifold modelling
Correlating Cellular Features with Gene Expression using CCA
Permissive Barrier Certificates for Safe Stabilization Using Sum-of-squares
Color-disjoint rainbow spanning trees of edge-colored graphs
Generating retinal flow maps from structural optical coherence tomography with artificial intelligence
Circular support in random sorting networks
The Archimedean limit of random sorting networks
A Dataset To Evaluate The Representations Learned By Video Prediction Models
Detecting Comma-shaped Clouds for Severe Weather Forecasting using Shape and Motion
DID: Distributed Incremental Block Coordinate Descent for Nonnegative Matrix Factorization
Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization
$2$-groups behaving as automorphism groups of regular $3$-polytopes
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
OhioState at SemEval-2018 Task 7: Exploiting Data Augmentation for Relation Classification in Scientific Papers using Piecewise Convolutional Neural Networks
On the discrete analog of gamma-Lomax distribution: properties and applications
Efficient nonparametric causal inference with missing exposure information
On the Broadcast Routing Problem
Measuring quantum discord using the most distinguishable steered states
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs
Optimal Containment of Epidemics over Temporal Activity-Driven Networks
The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices
Diffusion Based Molecular Communication with Limited Molecule Production Rate
Group Divisible Designs with $λ_1=3$ and Large Second Index
Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method
A Framework in CRM Customer Lifecycle: Identify Downward Trend and Potential Issues Detection
Dynamic Bidding for Advance Commitments in Truckload Brokerage Markets
Cylindric Reverse Plane Partitions and 2D TQFT
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System
Sparse Network Estimation for Dynamical Spatio-temporal Array Models
Deep Neural Network for Learning to Rank Query-Text Pairs
Enhancing Gaussian Estimation of Distribution Algorithm by Exploiting Evolution Direction with Archive
Bayesian linear inverse problems in regularity scales
Bayesian inverse problems with partial observations
Using Information Invariants to Compare Swarm Algorithms and General Multi-Robot Algorithms: A Technical Report
The Complexity of the Possible Winner Problem over Partitioned Preferences
A generating function for the Euler numbers of the second kind and its application
Evaluating and Tuning n-fold Integer Programming
On decompositions and approximations of conjugate partial-symmetric complex tensors
Exchangeable interval hypergraphs and limits of ordered discrete structures
Disorder-free weak dynamic localization in deformable lattices
Distributions associated with simultaneous multiple hypothesis testing
An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization
Online Learning of Quantum States
Building Instance Classification Using Street View Images
Cakewalk Sampling
Functional Gradient Boosting based on Residual Network Perception
Retrodirective Large Antenna Energy Beamforming in Backscatter Multi-User Networks
Random walks in doubly random scenery
Exact spectral asymptotics of fractional processes
Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier
Power efficient Spiking Neural Network Classifier based on memristive crossbar network for spike sorting application
On 1-factors with prescribed lengths in tournaments
Global phase diagram of Coulomb-interacting anisotropic Weyl semimetals with disorder
Graphs with equal domination and covering numbers
Evolutionary Spectra Based on the Multitaper Method with Application to Stationarity Test
First derivatives at the optimum analysis (fdao): An approach to estimate the uncertainty in nonlinear regression involving stochastically independent variables
Seeing Small Faces from Robust Anchor’s Perspective
One Single Deep Bidirectional LSTM Network for Word Sense Disambiguation of Text Data
On asymptotic formulae in some sum-product questions
Active Learning with Logged Data
Growth of periodic Grigorchuk groups
Minimizing Flow Completion Times using Adaptive Routing over Inter-Datacenter Wide Area Networks
Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Robustly Complete Reach-and-Stay Control Synthesis for Switched Systems via Interval Analysis
Conditionally Independent Multiresolution Gaussian Processes
Cache-Aided Fog Radio Access Networks with Partial Connectivity
Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection
Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks
Pareto optimal multi-robot motion planning
Can a Chatbot Determine My Diet?: Addressing Challenges of Chatbot Application for Meal Recommendation
Quenched invariance principles for orthomartingale-like sequences
Submodularity on Hypergraphs: From Sets to Sequences
Dynamic Effective Resistances and Approximate Schur Complement on Separable Graphs
More Virtuous Smoothing
Testability of high-dimensional linear models with non-sparse structures
Multi-Commodity Flow with In-Network Processing
Prototyping Virtual Reality Serious Games for Building Earthquake Preparedness: The Auckland City Hospital Case Study
Limits on representing Boolean functions by linear combinations of simple functions: thresholds, ReLUs, and low-degree polynomials
Optimal airline de-ice scheduling
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
Averaging Stochastic Gradient Descent on Riemannian Manifolds
Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning
Did You Really Just Have a Heart Attack? Towards Robust Detection of Personal Health Mentions in Social Media
PBGAN: Partial Binarization of Deconvolution Based Generators
Splay states in low-dimensional hypercubic lattice
Surrogate Scoring Rules and a Dominant Truth Serum for Information Elicitation
Antifragility for Intelligent Autonomous Systems
Pappus’s Theorem in Grassmannian Gr(3,C^n)
Short Block-length Codes for Ultra-Reliable Low-Latency Communications
Timeliness in Lossless Block Coding
Millionaire: A Hint-guided Approach for Crowdsourcing
Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network
Variance Reduction Methods for Sublinear Reinforcement Learning
Analysis of Langevin Monte Carlo via convex optimization
Language Distribution Prediction based on Batch Markov Monte Carlo Simulation with Migration
Deep Feed-forward Sequential Memory Networks for Speech Synthesis
AI4AI: Quantitative Methods for Classifying Host Species from Avian Influenza DNA Sequence
Quantifier-free descriptions for quantifier solutions to interval linear systems of relations
On the local asymptotic stabilization of the nonlinear systems with small time-varying perturbations by state-feedback control
Towards evaluating emergent behavior of the Internet of Things using large scale simulation techniques
Output feedback stable stochastic predictive control with hard control constraints
A representer theorem for deep neural networks
Comments on ‘Fractional Extreme Value Adaptive Training Method: Fractional Steepest Descent Approach’
An Asymptotic Series for an Integral
Controllability and observability for non-autonomous evolution equations: the averaged Hautus test
Infill asymptotics for estimators of the integral of the extreme value index function of the Brown-Resnick processes
Depth Masked Discriminative Correlation Filter
Experimental observation of time singularity in classical-to-quantum chaos transition
The linkedness of cubical polytopes
2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning
EiTAKA at SemEval-2018 Task 1: An Ensemble of N-Channels ConvNet and XGboost Regressors for Emotion Analysis of Tweets
Second-Order Necessary Conditions for Optimal Control with Recursive Utilities
An algorithm for computing Fréchet means on the sphere
Scalable kernel-based variable selection with sparsistency
The minimum number of Hamilton cycles in a hamiltonian threshold graph of a prescribed order
Comments on ‘Design of fractional-order variants of complex LMS and NLMS algorithms for adaptive channel equalization’
Categorical relations between Langlands dual quantum affine algebras: Exceptional cases
HBST: A Hamming Distance embedding Binary Search Tree for Visual Place Recognition
O-Minimal Invariants for Linear Loops
Wealth Inequality and the Price of Anarchy
Bayesian regional food frequency analysis for large catchments
A family of extremum seeking laws for a unicycle model with a moving target: theoretical and experimental studies
Gender Aware Spoken Language Translation Applied to English-Arabic
Constructing Category-Specific Models for Monocular Object-SLAM
AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data
Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking
Dimension-free Information Concentration via Exp-Concavity
A Decomposition Algorithm for Sparse Generalized Eigenvalue Problem
Marked Self-Exciting Point Process Modelling of Information Diffusion on Twitter
The replica symmetric phase of random constraint satisfaction problems
DP-3-coloring of some planar graphs
A Model of Free Will for Artificial Entities
Rare events in networks with internal and external noise
Random Walks on Polytopes of Constant Corank
Dimension of CPT posets
On the Well-posedness of a Generalized Moment Problem and Its Numerical Solution
On the centroid of increasing trees
Averaging of density kernel estimators
DropLasso: A robust variant of Lasso for single cell RNA-seq data
Using Curvilinear Features in Focus for Registering a Single Image to a 3D Object
Modeling Precipitation Extremes using Log-Histospline
Bayesian Sample Size Determination for Planning Hierarchical Bayes Small Area Estimates
Incentivizing Wi-Fi Network Crowdsourcing: A Contract Theoretic Approach
Persuading Perceval; Information Provision in a Sequential Search Setting
Quantum reflections, random walks and cut-off
Improving Graph Convolutional Networks with Non-Parametric Activation Functions
Environmental Policy Regulation and Corporate Compliance in a Spatial Evolutionary Game Model
Principled Bayesian Minimum Divergence Inference
An efficient explicit full discrete scheme for strong approximation of stochastic Allen-Cahn equation
Publishing a Quality Context-aware Annotated Corpus and Lexicon for Harassment Research
Stochastic Hyperparameter Optimization through Hypernetworks
Classification of breast cancer histology images using transfer learning
Tone Biased MMR Text Summarization
Self Super-Resolution for Magnetic Resonance Images using Deep Networks
Self-organizing maps and generalization: an algorithmic description of Numerosity and Variability Effects
On sparsity of the solution to a random quadratic optimization problem
Objective Bayesian analysis of neutrino masses and hierarchy
Observable atomic consistency for CvRDTs
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
On Strong NP-Completeness of Rational Problems
Effect of Random Time Changes on Loewner Hulls
Addressing Function Approximation Error in Actor-Critic Methods
In-database connected component analysis
One-step Targeted Maximum Likelihood for Time-to-event Outcomes
Disentangling the independently controllable factors of variation by interacting with the world
Can the Stochastic Wave Equation with Strong Drift Hit Zero?
Adaptive Geospatial Joins for Modern Hardware
Controlling Human Utilization of Failure-Prone Systems via Taxes
Retrieval-Augmented Convolutional Neural Networks for Improved Robustness against Adversarial Examples
Online Coloring of Short Intervals
Conjugacy growth of commutators
Estimation of Local Degree Distributions via Local Weighted Averaging and Monte Carlo Cross-Validation
Missing Data in Sparse Transition Matrix Estimation for Sub-Gaussian Vector Autoregressive Processes
Typical and Generic Ranks in Matrix Completion
Best Arm Identification for Contaminated Bandits
Representations of Sparse Distributed Networks: A Locality-Sensitive Approach