**How to estimate time-varying Vector Autoregressive Models? A comparison of two methods**

The ubiquity of mobile devices led to a surge in intensive longitudinal (or time series) data of individuals. This is an exciting development because personalized models both naturally tackle the issue of heterogeneities between people and increase the validity of models for applications. A popular model for time series is the Vector Autoregressive (VAR) model, in which each variable is modeled as a linear function of all variables at previous time points. A key assumption of this model is that the parameters of the true data generating model are constant (or stationary) across time. The most straightforward way to check for time-varying parameters is to fit a model that allows for time-varying parameters. In the present paper we compare two methods to estimate time-varying VAR models: the first method uses a spline-approach to allow for time-varying parameters, the second uses kernel-smoothing. We report the performance of both methods and their stationary counterparts in an extensive simulation study that reflects the situations typically encountered in practice. We compare the performance of stationary and time-varying models and discuss the theoretical characteristics of all methods in the light of the simulation results. In addition, we provide a step-by-step tutorial for both methods showing how to estimate a time-varying VAR model on an openly available individual time series dataset.

**Interpretable R-CNN**

This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods.

**Neural Network Gradient Hamiltonian Monte Carlo**

Hamiltonian Monte Carlo is a widely used algorithm for sampling from posterior distributions of complex Bayesian models. It can efficiently explore high-dimensional parameter spaces guided by simulated Hamiltonian flows. However, the algorithm requires repeated gradient calculations, and these computations become increasingly burdensome as data sets scale. We present a method to substantially reduce the computation burden by using a neural network to approximate the gradient. First, we prove that the proposed method still maintains convergence to the true distribution though the approximated gradient no longer comes from a Hamiltonian system. Second, we conduct experiments on synthetic examples and real data sets validate the proposed method.

**On Optimal Generalizability in Parametric Learning**

We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer.

**A Deep Learning Approach for Expert Identification in Question Answering Communities**

In this paper, we describe an effective convolutional neural network framework for identifying the expert in question answering community. This approach uses the convolutional neural network and combines user feature representations with question feature representations to compute scores that the user who gets the highest score is the expert on this question. Unlike prior work, this method does not measure expert based on measure answer content quality to identify the expert but only require question sentence and user embedding feature to identify the expert. Remarkably, Our model can be applied to different languages and different domains. The proposed framework is trained on two datasets, The first dataset is Stack Overflow and the second one is Zhihu. The Top-1 accuracy results of our experiments show that our framework outperforms the best baseline framework for expert identification.

**Optimizing Kernel Machines using Deep Learning**

Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task-specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this article, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the DKMO (Deep Kernel Machine Optimization) framework, that creates an ensemble of dense embeddings using Nystrom kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pre-trained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.

**Sliced Wasserstein Distance for Learning Gaussian Mixture Models**

Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and computer vision. Expectation maximization (EM) is the most popular algorithm for estimating the GMM parameters. However, EM guarantees only convergence to a stationary point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the relationship between the negative log-likelihood function and the Kullback-Leibler (KL) divergence, we propose an alternative formulation for estimating the GMM parameters using the sliced Wasserstein distance, which gives rise to a new algorithm. Specifically, we propose minimizing the sliced-Wasserstein distance between the mixture model and the data distribution with respect to the GMM parameters. In contrast to the KL-divergence, the energy landscape for the sliced-Wasserstein distance is more well-behaved and therefore more suitable for a stochastic gradient descent scheme to obtain the optimal GMM parameters. We show that our formulation results in parameter estimates that are more robust to random initializations and demonstrate that it can estimate high-dimensional data distributions more faithfully than the EM algorithm.

**Deep Epitome for Unravelling Generalized Hamming Network: A Fuzzy Logic Interpretation of Deep Learning**

This paper gives a rigorous analysis of trained Generalized Hamming Networks(GHN) proposed by Fan (2017) and discloses an interesting finding about GHNs, i.e., stacked convolution layers in a GHN is equivalent to a single yet wide convolution layer. The revealed equivalence, on the theoretical side, can be regarded as a constructive manifestation of the universal approximation theorem Cybenko(1989); Hornik (1991). In practice, it has profound and multi-fold implications. For network visualization, the constructed deep epitomes at each layer provide a visualization of network internal representation that does not rely on the input data. Moreover, deep epitomes allows the direct extraction of features in just one step, without resorting to regularized optimizations used in existing visualization tools.

**Revisiting Simple Neural Networks for Learning Representations of Knowledge Graphs**

We address the problem of learning vector representations for entities and relations in Knowledge Graphs (KGs) for Knowledge Base Completion (KBC). This problem has received significant attention in the past few years and multiple methods have been proposed. Most of the existing methods in the literature use a predefined and characteristic scoring function for evaluating correctness of KG triples. These scoring functions distinguish correct triples (high score) from incorrect ones (low score). However, their performance varies across different datasets. In this work, we demonstrate that a simple neural network based score function can consistently achieve near start-of-the-art performance and adapt to different datasets.

**A Fast and Robust TSVM for Pattern Classification**

Twin support vector machine~(TSVM) is a powerful learning algorithm by solving a pair of smaller SVM-type problems. However, there are still some specific issues waiting to be solved when it faces with some real applications, \emph{e.g}, low efficiency and noise data. In this paper, we propose a Fast and Robust TSVM~(FR-TSVM) to deal with these issues above. In FR-TSVM, we propose an effective fuzzy membership function to ease the effects of noisy inputs. We apply the fuzzy membership to each input instance and reformulate the TSVMs such that different input instances can make different contributions to the learning of the separating hyperplanes. To further speed up the training procedure, we develop an efficient coordinate descent algorithm with shirking to solve the involved a pair of quadratic programming problems (QPPs) of FR-TSVM. Moreover, theoretical foundations of the proposed model are analyzed in details. The experimental results on several artificial and benchmark datasets indicate that the FR-TSVM not only obtains the fast learning speed but also shows the robust classification performance.

**Z-Forcing: Training Stochastic Recurrent Networks**

Many efforts have been devoted to training generative latent variable models with autoregressive decoders, such as recurrent neural networks (RNN). Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech. We unify successful ideas from recently proposed architectures into a stochastic recurrent model: each step in the sequence is associated with a latent variable that is used to condition the recurrent dynamics for future steps. Training is performed with amortized variational inference where the approximate posterior is augmented with a RNN that runs backward through the sequence. In addition to maximizing the variational lower bound, we ease training of the latent variables by adding an auxiliary cost which forces them to reconstruct the state of the backward recurrent network. This provides the latent variables with a task-independent objective that enhances the performance of the overall model. We found this strategy to perform better than alternative approaches such as KL annealing. Although being conceptually simple, our model achieves state-of-the-art results on standard speech benchmarks such as TIMIT and Blizzard and competitive performance on sequential MNIST. Finally, we apply our model to language modeling on the IMDB dataset where the auxiliary cost helps in learning interpretable latent variables. Source Code: \url{

https://…/zforcing_nips17}

**DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images**

Disentangling factors of variation has always been a challenging problem in representation learning. Existing algorithms suffer from many limitations, such as unpredictable disentangling factors, bad quality of generated images from encodings, lack of identity information, etc. In this paper, we propose a supervised algorithm called DNA-GAN trying to disentangle different attributes of images. The latent representations of images are DNA-like, in which each individual piece represents an independent factor of variation. By annihilating the recessive piece and swapping a certain piece of two latent representations, we obtain another two different representations which could be decoded into images. In order to obtain realistic images and also disentangled representations, we introduce the discriminator for adversarial training. Experiments on Multi-PIE and CelebA datasets demonstrate the effectiveness of our method and the advantage of overcoming limitations existing in other methods.

**Can clone detection support quality assessments of requirements specifications?**

Due to their pivotal role in software engineering, considerable effort is spent on the quality assurance of software requirements specifications. As they are mainly described in natural language, relatively few means of automated quality assessment exist. However, we found that clone detection, a technique widely applied to source code, is promising to assess one important quality aspect in an automated way, namely redundancy that stems from copy&paste operations. This paper describes a large-scale case study that applied clone detection to 28 requirements specifications with a total of 8,667 pages. We report on the amount of redundancy found in real-world specifications, discuss its nature as well as its consequences and evaluate in how far existing code clone detection approaches can be applied to assess the quality of requirements specifications in practice.

**Squeeze-SegNet: A new fast Deep Convolutional Neural Network for Semantic Segmentation**

The recent researches in Deep Convolutional Neural Network have focused their attention on improving accuracy that provide significant advances. However, if they were limited to classification tasks, nowadays with contributions from Scientific Communities who are embarking in this field, they have become very useful in higher level tasks such as object detection and pixel-wise semantic segmentation. Thus, brilliant ideas in the field of semantic segmentation with deep learning have completed the state of the art of accuracy, however this architectures become very difficult to apply in embedded systems as is the case for autonomous driving. We present a new Deep fully Convolutional Neural Network for pixel-wise semantic segmentation which we call Squeeze-SegNet. The architecture is based on Encoder-Decoder style. We use a SqueezeNet-like encoder and a decoder formed by our proposed squeeze-decoder module and upsample layer using downsample indices like in SegNet and we add a deconvolution layer to provide final multi-channel feature map. On datasets like Camvid or City-states, our net gets SegNet-level accuracy with less than 10 times fewer parameters than SegNet.

**Accelerated Alternating Projections for Robust Principal Component Analysis**

We study robust PCA for the fully observed setting, which is about separating a low rank matrix

and a sparse matrix

from their sum

. In this paper, a new algorithm, termed accelerated alternating projections, is introduced for robust PCA which accelerates existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014]. Let

and

be the current estimates of the low rank matrix and the sparse matrix, respectively. The algorithm achieves significant acceleration by first projecting

onto a low dimensional subspace before obtaining the new estimate of

via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other state-of-the-art algorithms for robust PCA.

**Variational Adaptive-Newton Method for Explorative Learning**

We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution reveals that VAN is a second-order method that unifies existing methods in distinct fields of continuous optimization, variational inference, and evolution strategies. Our experimental results show that VAN performs well on a wide-variety of learning tasks. This work presents a general-purpose explorative-learning method that has the potential to improve learning in areas such as active learning and reinforcement learning.

**Advances in Variational Inference**

Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully used in various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.

**Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time**

Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model known as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNN-RSM shows better generalization, topic interpretation, evolution and trends. We also propose to quantify the capability of dynamic topic model to capture word evolution in topics over time.

**Learning to Predict with Big Data**

Big spatio-temporal datasets, available through both open and administrative data sources, offer significant potential for social science research. The magnitude of the data allows for increased resolution and analysis at individual level. One of the issues researchers face with such data is the stationarity assumption. This poses several challenges in how to quantify uncertainty and bias. While there are recent advances in forecasting techniques for highly granular temporal data, little attention is given to segmenting the time series and finding homogeneous patterns. In this paper, it is proposed to estimate behavioral profiles of individuals’ activities over time using Gaussian Process based models. In particular, the aim is to investigate how individuals or groups may be clustered according to the model parameters. Such a Bayesian non-parametric method is then tested by looking at the predictability of the segments using a combination of models to fit different parts of the temporal profiles. Models validity is then tested on a set of hold out data. The dataset consists of half hourly energy consumption records from smart meters from more than 100,000 households in the UK and covers the period from 2015 to 2016. The methodological approach that is developed in the paper may be easily applied to datasets of similar structure and granularity, for example social media data, and may lead to improved accuracy in the prediction of social dynamics and behavior.

**Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance**

In many modern machine learning applications, the outcome is expensive or time-consuming to collect while the predictor information is easy to obtain. Semi-supervised learning (SSL) aims at utilizing large amounts of `unlabeled’ data along with small amounts of `labeled’ data to improve the efficiency of a classical supervised approach. Though numerous SSL classification and prediction procedures have been proposed in recent years, no methods currently exist to evaluate the prediction performance of a working regression model. In the context of developing phenotyping algorithms derived from electronic medical records (EMR), we present an efficient two-step estimation procedure for evaluating a binary classifier based on various prediction performance measures in the semi-supervised (SS) setting. In step I, the labeled data is used to obtain a non-parametrically calibrated estimate of the conditional risk function. In step II, SS estimates of the prediction accuracy parameters are constructed based on the estimated conditional risk function and the unlabeled data. We demonstrate that under mild regularity conditions the proposed estimators are consistent and asymptotically normal. Importantly, the asymptotic variance of the SS estimators is always smaller than that of the supervised counterparts under correct model specification. We also correct for potential overfitting bias in the SS estimators in finite sample with cross-validation and develop a perturbation resampling procedure to approximate their distributions. Our proposals are evaluated through extensive simulation studies and illustrated with two real EMR studies aiming to develop phenotyping algorithms for rheumatoid arthritis and multiple sclerosis.

**Variational Bi-LSTMs**

Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data. In the training of Bi-LSTMs, the forward and backward paths are learned independently. We propose a variant of the Bi-LSTM architecture, which we call Variational Bi-LSTM, that creates a channel between the two paths (during training, but which may be omitted during inference); thus optimizing the two paths jointly. We arrive at this joint objective for our model by minimizing a variational lower bound of the joint likelihood of the data sequence. Our model acts as a regularizer and encourages the two networks to inform each other in making their respective predictions using distinct information. We perform ablation studies to better understand the different components of our model and evaluate the method on various benchmarks, showing state-of-the-art performance.

**Markov Decision Processes with Continuous Side Information**

We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs. For the linear setting, we give a PAC learning algorithm based on KWIK learning techniques.

• A learning problem that is independent of the set theory ZFC axioms

• Joint Gaussian Processes for Biophysical Parameter Retrieval

• Unsupervised patient representations from clinical notes with interpretable classification decisions

• Characterizations and Enumerations of Patterns of Signed Shifts

• Tree Projections and Constraint Optimization Problems: Fixed-Parameter Tractability and Parallel Algorithms

• Controllable Abstractive Summarization

• LAA LTE and WiFi based Smart Grid Metering Infrastructure in 3.5 GHz Band

• Towards Dual-functional Radar-Communication Systems: Optimal Waveform Design

• Revisiting Normalized Gradient Descent: Evasion of Saddle Points

• CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

• Goal-Driven Query Answering for Existential Rules with Equality

• A visual search engine for Bangladeshi laws

• Considering Durations and Replays to Improve Music Recommender Systems

• Weakly-supervised Semantic Parsing with Abstract Examples

• Regularization and Hierarchical Prior Distributions for Adjustment with Health Care Claims Data: Rethinking Comorbidity Scores

• Private Information Retrieval from Storage Constrained Databases — Coded Caching meets PIR

• Loss Functions for Multiset Prediction

• Linear response and moderate deviations: hierarchical approach. III

• Neural Network Dynamics Models for Control of Under-actuated Legged Millirobots

• C-WSL: Count-guided Weakly Supervised Localization

• SI-ADMM: A Stochastic Inexact ADMM Framework for Resolving Structured Stochastic Convex Programs

• Modeling Semantic Relatedness using Global Relation Vectors

• Improved quantum backtracking algorithms through effective resistance estimates

• The KPZ Limit of ASEP with Boundary

• An Accelerated Communication-Efficient Primal-Dual Optimization Framework for Structured Machine Learning

• The $(2,2)$ and $(4,3)$ properties in families of fat sets in the plane

• Making spanning graphs

• Simulating Action Dynamics with Neural Process Networks

• The Value of Communication in Synthesizing Controllers given an Information Structure

• Geometric integrators and the Hamiltonian Monte Carlo method

• Supervised and Unsupervised Transfer Learning for Question Answering

• A bilinear Bogolyubov theorem

• Quotientopes

• On the Numerical Solution of Fourth-Order Linear Two-Point Boundary Value Problems

• Automatic Conflict Detection in Police Body-Worn Video

• Rate-Compatible Punctured Polar (RCPP) Codes Based On Hierarchical Puncturing

• Linear and quadratic uniformity of the Möbius function over $\mathbb{F}_q[t]$

• The Dispersion Bias

• Kernel Conditional Exponential Family

• LIUBoost : Locality Informed Underboosting for Imbalanced Data Classification

• Velocity variations at Columbia Glacier captured by particle filtering of oblique time-lapse images

• A Novel SDASS Descriptor for Fully Encoding the Information of 3D Local Surface

• Effective Filtering on a Random Slow Manifold

• Bridging Source and Target Word Embeddings for Neural Machine Translation

• A New Perspective on Robust $M$-Estimation: Finite Sample Theory and Applications to Dependence-Adjusted Multiple Testing

• Error bounds for Approximations of Markov chains

• Normal Approximation by Stein’s Method under Sublinear Expectations

• FARM-Test: Factor-Adjusted Robust Multiple Testing with False Discovery Control

• Semiblind subgraph reconstruction in Gaussian graphical models

• On the anti-Kelulé problem of cubic graphs

• Sparse Combinatorial Group Testing for Low-Energy Massive Random Access

• Influential Sample Selection: A Graph Signal Processing Approach

• Recurrent Neural Networks as Weighted Language Recognizers

• IKBT: solving closed-form Inverse Kinematics with Behavior Tree

• On the Anti-Jamming Performance of the NR-DCSK System

• Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

• Physical Layer Security Schemes for Full-Duplex Cooperative Systems: State of the Art and Beyond

• The landscape of the spiked tensor model

• The Chromatic Number of the Disjointness Graph of the Double Chain

• Modular Resource Centric Learning for Workflow Performance Prediction

• Deep Inception-Residual Laplacian Pyramid Networks for Accurate Single Image Super-Resolution

• A Sequential Neural Encoder with Latent Structured Description for Modeling Sentences

• TorusE: Knowledge Graph Embedding on a Lie Group

• A characterization of finite abelian groups via sets of lengths in transfer Krull monoids

• On Mubayi’s Conjecture and conditionally intersecting sets

• Human and Machine Speaker Recognition Based on Short Trivial Events

• Robust Real-Time Multi-View Eye Tracking

• Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

• Hibikino-Musashi@Home 2017 Team Description Paper

• A Public Image Database for Benchmark of Plant Seedling Classification Algorithms

• A Machine Learning Approach to Modeling Human Migration

• Modeling Binary Time Series Using Gaussian Processes with Application to Predicting Sleep States

• Aicyber’s System for NLPCC 2017 Shared Task 2: Voting of Baselines

• Tracking Typological Traits of Uralic Languages in Distributed Language Representations

• Deterministic Distributed Edge-Coloring with Fewer Colors

• On the Utility of Context (or the Lack Thereof) for Object Detection

• Coloring intersection hypergraphs of pseudo-disks

• The best defense is a good offense: Countering black box attacks by predicting slightly wrong labels

• A Convex Parametrization of a New Class of Universal Kernel Functions for use in Kernel Learning

• No Reference Stereoscopic Video Quality Assessment Using Joint Motion and Depth Statistics

• Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

• Fisher information matrix of binary time series

• A Lie bracket approximation approach to distributed optimization over directed graphs

• A Descent on Simple Graphs — from Complete to Cycle — and Algebraic Properties of Their Spectra

• Sparse identification of nonlinear dynamics for model predictive control in the low-data limit

• A Generally Applicable, Highly Scalable Measurement Computation and Optimization Approach to Sequential Model-Based Diagnosis

• Note on Representing attribute reduction and concepts in concepts lattice using graphs

• Convolutional Neural Networks and Data Augmentation for Spectral-Spatial Classification of Hyperspectral Images

• Investigating Inner Properties of Multimodal Representation and Semantic Compositionality with Brain-based Componential Semantics

• MAMoC: Multisite Adaptive Offloading Framework for Mobile Cloud Applications

• A Correlation Based Feature Representation for First-Person Activity Recognition

• Two-Sample Test for Sparse High Dimensional Multinomial Distributions

• Trees of self-avoiding walks

• A balanced non-partitionable Cohen-Macaulay complex

• Dual-Path Convolutional Image-Text Embedding

• Detecting and assessing contextual change in diachronic text documents using context volatility

• Good and safe uses of AI Oracles

• Sound Event Detection in Synthetic Audio: Analysis of the DCASE 2016 Task Results

• A Stochastic Resource-Sharing Network for Electric Vehicle Charging

• Fully-dynamic risk-indifference pricing and no-good-deal bounds

• Dialogue Act Recognition via CRF-Attentive Structured Network

• An Extended Sensitivity Analysis for Heterogeneous Unmeasured Confounding

• (2+1)-dimensional interface dynamics: mixing time, hydrodynamic limit and Anisotropic KPZ growth

• Mitigating Clipping Effects on Error Floors under Belief Propagation Decoding of Polar Codes

• PlinyCompute: A Platform for High-Performance, Distributed, Data-Intesive Tool Development

• People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting

• New support for the value 5/2 for the spin glass lower critical dimension at zero magnetic field

• Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

• A bijective proof of the enumeration of maps in higher genus

• On consistent vertex nomination schemes

• Interpreting Deep Visual Representations via Network Dissection

• Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features

• P-spline smoothing for spatial data collected worldwide

• Sharp non-asymptotic Concentration Inequalities for the Approximation of the Invariant Measure of a Diffusion

• Gaussian width bounds with applications to arithmetic progressions in random settings

• Parsimonious Model-Based Clustering with Covariates

• Quantitative Benchmarks and New Directions for Noise Power Estimation Methods in ISM Radio Environment

• Relating the wave-function collapse with Euler’s formula

• Spatial Joint Species Distribution Modeling using Dirichlet Processes

• A Tractable Product Channel Model for Line-of-Sight Scenarios

• A Friendly Smoothed Analysis of the Simplex Method

• On laws of large numbers in $L^2$ for supercritical branching Markov processes beyond $λ$-positivity

• On joint distribution of range and terminal value of a Brownian motion

• Unsupervised Morphological Expansion of Small Datasets for Improving Word Embeddings

• An Unsupervised Approach for Mapping between Vector Spaces

• Hydra: a C++11 framework for data analysis in massively parallel platforms

• Novel decision-theoretic and risk-stratification metrics of predictive performance: Application to deciding who should undergo genetic testing

• Motif-based Convolutional Neural Network on Graphs

• Brain Extraction from Normal and Pathological Images: A Joint PCA/Image-Reconstruction Approach

• Bayesian optimal designs for dose-response curves with common parameters

• Contextual Object Detection with a Few Relevant Neighbors

• Classification of binary self-dual [76, 38, 14] codes with an automorphism of order 9

• CSWA: Aggregation-Free Spatial-Temporal Community Sensing

• Fighting fish and two-stack sortable permutations

• BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

• Exact Limits of Inference in Coalescent Models

• Extremes of multifractional Brownian motion

• Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations