Nonlinear Distributional Gradient Temporal-Difference Learning

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In our paper, we design two new algorithms called distributional GTD2 and distributional TDC using the Cram{\’e}r distance on the distributional version of the Bellman error objective function, which inherits advantages of both the nonlinear gradient TD algorithms and the distributional RL approach. We prove the asymptotic almost-sure convergence to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely used in recent study to solve the real-life RL problems. In each step, the computational complexity is linear w.r.t.\ the number of the parameters of the function approximator, thus can be implemented efficiently for neural networks.

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand of the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding optimal demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing an equivalence to the set cover problem, and use this equivalence to develop an efficient algorithm for determining the set of maximally-informative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: benchmarking active learning IRL algorithms and developing an IRL algorithm that, rather than assuming demonstrations are i.i.d., uses counterfactual reasoning over informative demonstrations to learn more efficiently.

Sampling-Free Variational Inference of Bayesian Neural Nets

We propose a new Bayesian Neural Net (BNN) formulation that affords variational inference for which the evidence lower bound (ELBO) is analytically tractable subject to a tight approximation. We achieve this tractability by decomposing ReLU nonlinearities into an identity function and a Kronecker delta function. We demonstrate formally that assigning the outputs of these functions to separate latent variables allows representing the neural network likelihood as the composition of a chain of linear operations. Performing variational inference on this construction enables closed-form computation of the evidence lower bound. It can thus be maximized without requiring Monte Carlo sampling to approximate the problematic expected log-likelihood term. The resultant formulation boils down to stochastic gradient descent, where the gradients are not distorted by any factor besides minibatch selection. This amends a long-standing disadvantage of BNNs relative to deterministic nets. Experiments on four benchmark data sets show that the cleaner gradients provided by our construction yield a steeper learning curve, achieving higher prediction accuracies for a fixed epoch budget.

Learning Sampling Policies for Domain Adaptation

We address the problem of semi-supervised domain adaptation of classification algorithms through deep Q-learning. The core idea is to consider the predictions of a source domain network on target domain data as noisy labels, and learn a policy to sample from this data so as to maximize classification accuracy on a small annotated reward partition of the target domain. Our experiments show that learned sampling policies construct labeled sets that improve accuracies of visual classifiers over baselines.

CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces

In this paper, we formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples. To this end, we propose to learn a group of capsule subspaces onto which an input feature vector is projected. Then the lengths of resultant capsules are used to score the probability of belonging to different classes. We train such a Capsule Projection Network (CapProNet) by learning an orthogonal projection matrix for each capsule subspace, and show that each capsule subspace is updated until it contains input feature vectors corresponding to the associated class. Only a small negligible computing overhead is incurred to train the network in low-dimensional capsule subspaces or through an alternative hyper-power iteration to estimate the normalization matrix. Experiment results on image datasets show the presented model can greatly improve the performance of state-of-the-art ResNet backbones by 10-20\% at the same level of computing and memory costs.

Episodic Memory Deep Q-Networks

Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.

Deep Generative Markov State Models

We propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories. After unsupervised training on time series data, the model contains (i) a probabilistic encoder that maps from high-dimensional configuration space to a small-sized vector indicating the membership to metastable (long-lived) states, (ii) a Markov chain that governs the transitions between metastable states and facilitates analysis of the long-time dynamics, and (iii) a generative part that samples the conditional distribution of configurations in the next time step. The model can be operated in a recursive fashion to generate trajectories to predict the system evolution from a defined starting state and propose new configurations. The DeepGenMSM is demonstrated to provide accurate estimates of the long-time kinetics and generate valid distributions for molecular dynamics (MD) benchmark systems. Remarkably, we show that DeepGenMSMs are able to make long time-steps in molecular configuration space and generate physically realistic structures in regions that were not seen in training data.

Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval

This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interaction-based neural ranking networks. The two components are learned end-to-end, making EDRM a natural combination of entity-oriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models.

Robust Optimization over Multiple Domains

Recently, machine learning becomes important for the cloud computing service. Users of cloud computing can benefit from the sophisticated machine learning models provided by the service. Considering that users can come from different domains with the same problem, an ideal model has to be applicable over multiple domains. In this work, we propose to address this challenge by developing a framework of robust optimization. In lieu of minimizing the empirical risk, we aim to learn a model optimized with an adversarial distribution over multiple domains. Besides the convex model, we analyze the convergence rate of learning a robust non-convex model due to its dominating performance on many real-word applications. Furthermore, we demonstrate that both the robustness of the framework and the convergence rate can be enhanced by introducing appropriate regularizers for the adversarial distribution. The empirical study on real-world fine-grained visual categorization and digits recognition tasks verifies the effectiveness and efficiency of the proposed framework.

Conditional Network Embeddings

Network embeddings map the nodes of a given network into d-dimensional Euclidean space \mathbb{R}^d. Ideally, this mapping is such that `similar’ nodes are mapped onto nearby points, such that the embedding can be used for purposes such as link prediction (if `similar’ means being `more likely to be connected’) or classification (if `similar’ means `being more likely to have the same label’). In recent years various methods for network embedding have been introduced. These methods all follow a similar strategy, defining a notion of similarity between nodes (typically deeming nodes more similar if they are nearby in the network in some metric), a distance measure in the embedding space, and minimizing a loss function that penalizes large distances for similar nodes or small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties, such as (approximate) multipartiteness, certain degree distributions, or certain kinds of assortativity. Overcoming this difficulty, we introduce a conceptual innovation to the literature on network embedding, proposing to create embeddings that maximally add information with respect to such structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. Finally, we demonstrate that the combination of information such structural properties and a Euclidean embedding provides superior performance across a range of link prediction tasks. Moreover, we demonstrate the potential of our approach for network visualization.

Learning to Multitask

Multitask learning has shown promising performance in many applications and many multitask models have been proposed. In order to identify an effective multitask model for a given multitask problem, we propose a learning framework called learning to multitask (L2MT). To achieve the goal, L2MT exploits historical multitask experience which is organized as a training set consists of several tuples, each of which contains a multitask problem with multiple tasks, a multitask model, and the relative test error. Based on such training set, L2MT first uses a proposed layerwise graph neural network to learn task embeddings for all the tasks in a multitask problem and then learns an estimation function to estimate the relative test error based on task embeddings and the representation of the multitask model based on a unified formulation. Given a new multitask problem, the estimation function is used to identify a suitable multitask model. Experiments on benchmark datasets show the effectiveness of the proposed L2MT framework.

Diverse Few-Shot Text Classification with Multiple Metrics

We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study.

Physically optimizing inference

Data is scaling exponentially in fields ranging from genomics to neuroscience to economics. A central question is whether modern machine learning methods can be applied to construct predictive models based on large data sets drawn from complex, natural systems like cells and brains. In machine learning, the predictive power or generalizability of a model is determined by the statistics of training data. In this paper, we ask how predictive inference is impacted when training data is generated by the statistical behavior of a physical system. We develop an information-theoretic analysis of a canonical problem, spin network inference. Our analysis reveals the essential role that thermal fluctuations play in determining the efficiency of predictive inference. Thermal noise drives a system to explore a range of configurations providing `raw’ information for a learning algorithm to construct a predictive model. Conversely, thermal energy degrades information by blurring energetic differences between network states. In general, spin networks have an intrinsic optimal temperature at which inference becomes maximally efficient. Simple active learning protocols allow optimization of network temperature, without prior knowledge, to dramatically increase the efficiency of inference. Our results reveal a fundamental link between physics and information and show how the physical environment can be tuned to optimize the efficiency of machine learning.

GEN Model: An Alternative Approach to Deep Neural Network Models

In this paper, we introduce an alternative approach, namely GEN (Genetic Evolution Network) Model, to the deep learning models. Instead of building one single deep model, GEN adopts a genetic-evolutionary learning strategy to build a group of unit models generations by generations. Significantly different from the wellknown representation learning models with extremely deep structures, the unit models covered in GEN are of a much shallower architecture. In the training process, from each generation, a subset of unit models will be selected based on their performance to evolve and generate the child models in the next generation. GEN has significant advantages compared with existing deep representation learning models in terms of both learning effectiveness, efficiency and interpretability of the learning process and learned results. Extensive experiments have been done on diverse benchmark datasets, and the experimental results have demonstrated the outstanding performance of GEN compared with the state-of-the-art baseline methods in both effectiveness of efficiency.

Reconciled Polynomial Machine: A Unified Representation of Shallow and Deep Learning Models

In this paper, we aim at introducing a new machine learning model, namely reconciled polynomial machine, which can provide a unified representation of existing shallow and deep machine learning models. Reconciled polynomial machine predicts the output by computing the inner product of the feature kernel function and variable reconciling function. Analysis of several concrete models, including Linear Models, FM, MVM, Perceptron, MLP and Deep Neural Networks, will be provided in this paper, which can all be reduced to the reconciled polynomial machine representations. Detailed analysis of the learning error by these models will also be illustrated in this paper based on their reduced representations from the function approximation perspective.

Deep Loopy Neural Network Model for Graph Structured Data Representation Learning

Existing deep learning models may encounter great challenges in handling graph structured data. In this paper, we introduce a new deep learning model for graph data specifically, namely the deep loopy neural network. Significantly different from the previous deep models, inside the deep loopy neural network, there exist a large number of loops created by the extensive connections among nodes in the input graph data, which makes model learning an infeasible task. To resolve such a problem, in this paper, we will introduce a new learning algorithm for the deep loopy neural network specifically. Instead of learning the model variables based on the original model, in the proposed learning algorithm, errors will be back-propagated through the edges in a group of extracted spanning trees. Extensive numerical experiments have been done on several real-world graph datasets, and the experimental results demonstrate the effectiveness of both the proposed model and the learning algorithm in handling graph data.

On Deep Ensemble Learning from a Function Approximation Perspective

In this paper, we propose to provide a general ensemble learning framework based on deep learning models. Given a group of unit models, the proposed deep ensemble learning framework will effectively combine their learning results via a multilayered ensemble model. In the case when the unit model mathematical mappings are bounded, sigmoidal and discriminatory, we demonstrate that the deep ensemble learning framework can achieve a universal approximation of any functions from the input space to the output space. Meanwhile, to achieve such a performance, the deep ensemble learning framework also impose a strict constraint on the number of involved unit models. According to the theoretic proof provided in this paper, given the input feature space of dimension d, the required unit model number will be 2d, if the ensemble model involves one single layer. Furthermore, as the ensemble component goes deeper, the number of required unit model is proved to be lowered down exponentially.

GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization

Deep neural network learning can be formulated as a non-convex optimization problem. Existing optimization algorithms, e.g., Adam, can learn the models fast, but may get stuck in local optima easily. In this paper, we introduce a novel optimization algorithm, namely GADAM (Genetic-Evolutionary Adam). GADAM learns deep neural network models based on a number of unit models generations by generations: it trains the unit models with Adam, and evolves them to the new generations with genetic algorithm. We will show that GADAM can effectively jump out of the local optima in the learning process to obtain better solutions, and prove that GADAM can also achieve a very fast convergence. Extensive experiments have been done on various benchmark datasets, and the learning results will demonstrate the effectiveness and efficiency of the GADAM algorithm.

Tell Me Something New: a new framework for asynchronous parallel learning

We present a novel approach for parallel computation in the context of machine learning that we call ‘Tell Me Something New’ (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe ‘something new’. TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splice-site prediction problem.

Norm-Preservation: Why Residual Networks Can Become Extremely Deep

Augmenting deep neural networks with skip connections, as introduced in the so called ResNet architecture, surprised the community by enabling the training of networks of more than 1000 layers with significant performance gains. It has been shown that identity skip connections eliminate singularities and improve the optimization landscape of the network. This paper deciphers ResNet by analyzing the of effect of skip connections in the backward path and sets forth new theoretical results on the advantages of identity skip connections in deep neural networks. We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient and lead to well-behaved and stable back-propagation, which is a desirable feature from optimization perspective. We also show that, perhaps surprisingly, as more residual blocks are stacked, the network becomes more norm-preserving. Traditionally, norm-preservation is enforced on the network only at beginning of the training, by using initialization techniques. However, we show that identity skip connection retain norm-preservation during the training procedure. Our theoretical arguments are supported by extensive empirical evidence. Can we push for more norm-preservation We answer this question by proposing zero-phase whitening of the fully-connected layer and adding norm-preserving transition layers. Our numerical investigations demonstrate that the learning dynamics and the performance of ResNets can be improved by making it even more norm preserving through changing only a few blocks in very deep residual networks. Our results and the introduced modification for ResNet, referred to as Procrustes ResNets, can be used as a guide for studying more complex architectures such as DenseNet, training deeper networks, and inspiring new architectures.

Two geometric input transformation methods for fast online reinforcement learning with neural nets

We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node’s learning behavior. We propose reducing such interferences with two efficient input transformation methods that are geometric in nature and match well the geometric property of ReLU gates. The first one is tile coding, a classic binary encoding scheme originally designed for local generalization based on the topological structure of the input space. The second one (EmECS) is a new method we introduce; it is based on geometric properties of convex sets and topological embedding of the input space into the boundary of a convex set. We discuss the behavior of the network when it operates on the transformed inputs. We also compare it experimentally with some neural nets that do not use the same input transformations, and with the classic algorithm of tile coding plus a linear function approximator, and on several online reinforcement learning tasks, we show that the neural net with tile coding or EmECS can achieve not only faster learning but also more accurate approximations. Our results strongly suggest that geometric input transformation of this type can be effective for interference reduction and takes us a step closer to fully incremental reinforcement learning with neural nets.

Unsupervised Learning of Neural Networks to Explain Neural Networks

This paper presents an unsupervised method to learn a neural network, namely an explainer, to interpret a pre-trained convolutional neural network (CNN), i.e., explaining knowledge representations hidden in middle conv-layers of the CNN. Given feature maps of a certain conv-layer of the CNN, the explainer performs like an auto-encoder, which first disentangles the feature maps into object-part features and then inverts object-part features back to features of higher conv-layers of the CNN. More specifically, the explainer contains interpretable conv-layers, where each filter disentangles the representation of a specific object part from chaotic input feature maps. As a paraphrase of CNN features, the disentangled representations of object parts help people understand the logic inside the CNN. We also learn the explainer to use object-part features to reconstruct features of higher CNN layers, in order to minimize loss of information during the feature disentanglement. More crucially, we learn the explainer via network distillation without using any annotations of sample labels, object parts, or textures for supervision. We have applied our method to different types of CNNs for evaluation, and explainers have significantly boosted the interpretability of CNN features.

AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search

We present AlphaX, a fully automated agent that designs complex neural architectures from scratch. AlphaX explores the exponentially exploded search space with a novel distributed Monte Carlo Tree Search (MCTS) and a Meta-Deep Neural Network (DNN). MCTS intrinsically improves the search efficiency by automatically balancing the exploration and exploitation at each state, while Meta-DNN predicts the network accuracy to guide the search, and to provide an estimated reward for the preemptive backpropagation in the distributed setup. As the search progresses, AlphaX also generates the training date for Meta-DNN. So, the learning of Meta-DNN is end-to-end. In searching for NASNet style architectures, AlphaX found several promising architectures with up to 1% higher accuracy than NASNet using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original searching for NASNet that used 500 GPUs in 4 days.

Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. Principal curves act as a nonlinear generalization of PCA and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret bound and performance on a toy example and seismic data.

Processing of missing data by neural networks

We propose a general, theoretically justified mechanism for processing missing data by neural networks. Our idea is to replace typical neuron response in the first hidden layer by its expected value. This approach can be applied for various types of networks at minimal cost in their modification. Moreover, in contrast to recent approaches, it does not require complete data for training. Experimental results performed on different types of architectures show that our method gives better results than typical imputation strategies and other methods dedicated for incomplete data.

Structural Regularity Exploring and Controlling: A Network Reconstruction Perspective
Abstractive Text Classification Using Sequence-to-convolution Neural Networks
A PTAS for a Class of Stochastic Dynamic Programs
STS Classification with Dual-stream CNN
Effects of Memristors on Fully Differential Transimpedance Amplifier Performance
Exp-Concavity of Proper Composite Losses
CMOS-Memristive Analog Multiplier Design
Analysis of Multilayer Perceptron with Rectifier Linear Unit Activation Function
Learning Attentional Communication for Multi-Agent Cooperation
Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models
Human-guided data exploration using randomisation
Minimax Lower Bounds for Cost Sensitive Classification
Task-Agnostic Meta-Learning for Few-shot Learning
A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models
Balancing Shared Autonomy with Human-Robot Communication
An Online RFID Localization in the Manufacturing Shopfloor
On a General Class of Discrete Bivariate Distributions
Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration
A Lyapunov-based Approach to Safe Reinforcement Learning
Object Localization and Motion Transfer learning with Capsules
Predicting drug response of tumors from integrated genomic profiles by deep neural networks
A Doob-type maximal inequality and its applications to various stochastic processes
Density-Adaptive Kernel based Re-Ranking for Person Re-Identification
The UN Parallel Corpus Annotated for Translation Direction
RGB-Depth SLAM Review
Adaptive Spectral Graph Convolutional Networks for Skeleton-Based Action Recognition
Bayesian Modeling and Computation for Analyte Quantification in Complex Mixtures Using Raman Spectroscopy
Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer
Stacked Propensity Score Functions for Observational Cohorts with Oversampled Exposed Subjects
Learning Graph-Level Representations with Gated Recurrent Neural Networks
The Generalized Lasso Problem and Uniqueness
Periodicity of Grover walks on distance-regular graphs
Analog multiplier design with CMOS-memristor circuits
Variability analysis of Memristor-based Sigmoid Function
Perceptron Linear Function Design with CMOS-Memristive Circuits
Sense amplifier design using CMOS-memristor circuits
Instrumentation Amplifier design: Comparison of CMOS-memristive to CMOS design
BourGAN: Generative Networks with Metric Embeddings
An Extended Poisson Family of Life Distribution: A Unified Approach in Competitive and Complementary Risks
Incidence hypergraphs: The categorical inconsistency of set-systems and a characterization of quiver exponentials
The anatomy of a Web of Trust: the Bitcoin-OTC market
An Evaluation of Trajectory Prediction Approaches and Notes on the TrajNet Benchmark
Measuring neuronal avalanches in disordered systems with absorbing states
Functional response regression with funBART: an analysis of patient-specific stillbirth risk
Learning a face space for experiments on human identity
Security Performance Analysis of Physical Layer over Fisher-Snedecor $\mathcal{F}$ Fading Channels
Do You Like What I Like Similarity Estimation in Proximity-based Mobile Social Networks
On Attention Models for Human Activity Recognition
Learning Hierarchical Visual Representations in Deep Neural Networks Using Hierarchical Linguistic Labels
Long-term face tracking in the wild using deep learning
Regularized Loss Minimizers with Local Data Obfuscation
Capturing human category representations by sampling in deep feature spaces
Fuel Economy and Emission Testing for Connected and Automated Vehicles Using Real-world Driving Datasets
On testing substitutability
Comments on ‘Momentum fractional LMS for power signal parameter estimation’
The probabilities of extinction in a branching random walk on a strip
Heterogeneous Multi-output Gaussian Process Prediction
Latent Space Non-Linear Statistics
Learning to Detect
Micro Water-Energy Nexus: Optimal Demand-Side Management and Quasi-Convex Hull Relaxation
Nonparametric Bayesian Deep Networks with Local Competition
Bayesian Bootstrap Inference for the ROC Surface
Do Neural Network Cross-Modal Mappings Really Bridge Modalities
Generative Creativity: Adversarial Learning for Bionic Design
Predictive Estimation of the Optimal Signal Strength from Unmanned Aerial Vehicle over Internet of Things Using ANN
Bitcoin price and its marginal cost of production: support for a fundamental value
Predicting Strategic Voting Behavior with Poll Information
A hybrid index model for efficient spatio-temporal search in HBase
Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions
Adaptively Pruning Features for Boosted Decision Trees
Fast Object Classification in Single-pixel Imaging
Analytic moment and Laplace transform formulae for the quasi-stationary distribution of the Shiryaev diffusion on an interval
Structure of Cubic Lehman Matrices
Sequential adaptive elastic net approach for single-snapshot source localization
Chief complaint classification with recurrent neural networks
Reliable counting of weakly labeled concepts by a single spiking neuron model
Partitioning SKA Dataflows for Optimal Graph Execution
Optimizing the F-measure for Threshold-free Salient Object Detection
Wildest Faces: Face Detection and Recognition in Violent Settings
ACR: a cluster-based routing protocol for VANET
Decompositions into spanning rainbow structures
Reinforcement Learning of Theorem Proving
Well-posedness of monotone semilinear SPDEs with semimartingale noise
Transduction with Matrix Completion Using Smoothed Rank Function
Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate
Regularization with Metric Double Integrals of Functions with Values in a Set of High-Dimensional Vectors
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding
Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image
Learning Pixel-wise Labeling from the Internet without Human Interaction
Autonomous discovery of the goal space to learn a parameterized skill
End-to-end driving simulation via angle branched network
An optimal approximation of discrete random variables with respect to the Kolmogorov distance
Optimal Consumption in the Stochastic Ramsey Problem without Boundedness Constraints
Neural networks with dynamical coefficients and adjustable connections on the basis of integrated backpropagation
Two-stage quality adaptive fingerprint image enhancement using Fuzzy c-means clustering based fingerprint quality analysis
Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition
Counting copies of a fixed subgraph in $F$-free graphs
Integral representation of the global minimizer
Estimation of Non-Normalized Mixture Models and Clustering Using Deep Representation
Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation
Free-rider Episode Screening via Dual Partition Model
Fast Disparity Estimation using Dense Networks
M-estimation with the Trimmed l1 Penalty
Number Sequence Prediction Problems and Computational Powers of Neural Network Models
A Compositional Approach to Network Algorithms
Contour location via entropy reduction leveraging multiple information sources
A Tunable Base Station Cooperation Scheme for Poisson Cellular Networks
New methods for incorporating network cyclic structures to improve community detection
Optimal DR-Submodular Maximization and Applications to Provable Mean Field Inference
A bistable belief dynamics model for radicalization within sectarian conflict
Semisupervised Learning on Heterogeneous Graphs and its Applications to Facebook News Feed
Learning to Repair Software Vulnerabilities with Generative Adversarial Networks
Projection-Free Bandit Convex Optimization
Self-Training Ensemble Networks for Zero-Shot Image Recognition
Solving the Rubik’s Cube Without Human Knowledge
Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces
Quantum critical behavior of a three-dimensional superfluid-Mott glass transition
Using permutations to quantify and correct for confounding in machine learning predictions
Fast Kernel Approximations for Latent Force Models and Convolved Multiple-Output Gaussian processes
PCA by Optimisation of Symmetric Functions has no Spurious Local Optima
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits
Adversarial Structure Matching Loss for Image Segmentation
On Robustness Analysis of a Dynamic Average Consensus Algorithm to Communication Delay
Subspace Selection via DR-Submodular Maximization on Lattices
Model Inference with Stein Density Ratio Estimation
Butterfly-Net: Optimal Function Representation Based on Convolutional Neural Networks
Closed Walk Sampler: An Efficient Method for Estimating Eigenvalues of Large Graphs
DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors
Multi-view Sentence Representation Learning
My camera can see through fences: A deep learning approach for image de-fencing
Overcoming catastrophic forgetting problem by weight consolidation and long-term memory
Acceleration of Non-Linear Minimisation with PyTorch
A case study of hurdle and generalized additive models in astronomy: the escape of ionizing radiation
DeepLogic: End-to-End Logical Reasoning
Reduction of power grid fluctuations by communication between smart devices
Can machine learning identify interesting mathematics An exploration using empirically observed laws
Efficient Online Portfolio with Logarithmic Regret
Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error
Method G: Uncertainty Quantification for Distributed Data Problems using Generalized Fiducial Inference
Incept-N: A Convolutional Neural Network based Classification Approach for Predicting Nationality from Facial Features
Computing Kantorovich-Wasserstein Distances on $d$-dimensional histograms using $(d+1)$-partite graphs
Assessing Health Care Interventions via an Interrupted Time Series Model: Study Power and Design Considerations
Wasserstein Coresets for Lipschitz Costs
Learning to Collaborate for User-Controlled Privacy
Asset Price Bubbles: An Option-based Indicator
Robust Handling of Polysemy via Sparse Representations
More green space is related to less antidepressant prescription rates in the Netherlands: A Bayesian geoadditive quantile regression approach
Batch Normalization in the final layer of generative networks
Political Discussion and Leanings on Twitter: the 2016 Italian Constitutional Referendum
The impact of geometry on many body localization
Modeling trend in temperature volatility using generalized LASSO
Testing Alignment of Node Attributes with Network Structure Through Label Propagation
Do Diffusion Protocols Govern Cascade Growth
Prediction in Projection: A new paradigm in delay-coordinate reconstruction
Cancer Research UK Drug Discovery Process Mining
Spherical harmonics entropy for optimal 3D modeling
Implementation of Chua’s chaotic oscillator with an HP memristor
Conflict-free connections: algorithm and complexity
On channel sounding with switched arrays in fast time-varying channels
ALVEC: Auto-scaling by Lotka Volterra Elastic Cloud: A QoS aware Non Linear Dynamical Allocation Model
Poster: Resource Allocation with Conflict Resolution for Vehicular Sidelink Broadcast Communications
Finite point configurations in the plane, rigidity and Erdos problems
Fixed-PSNR Lossy Compression for Scientific Data
Efficient simulation of Gaussian Markov random fields by Chebyshev polynomial approximation
Why is the HLR theory particle-hole symmetric
Combinatorial Properties of Metrically Homogeneous Graphs