**Effective optimization using sample persistence: A case study on quantum annealers and various Monte Carlo optimization methods**

We present and apply a general-purpose, multi-start algorithm for improving the performance of low-energy samplers used for solving optimization problems. The algorithm iteratively fixes the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are smaller and less connected, and samplers tend to give better low-energy samples for these problems. The algorithm is trivially parallelizable, since each start in the multi-start algorithm is independent, and could be applied to any heuristic solver that can be run multiple times to give a sample. We present results for several classes of hard problems solved using simulated annealing, path-integral quantum Monte Carlo, parallel tempering with isoenergetic cluster moves, and a quantum annealer, and show that the success metrics as well as the scaling are improved substantially. When combined with this algorithm, the quantum annealer’s scaling was substantially improved for native Chimera graph problems. In addition, with this algorithm the scaling of the time to solution of the quantum annealer is comparable to the Hamze–de Freitas–Selby algorithm on the weak-strong cluster problems introduced by Boixo et al. Parallel tempering with isoenergetic cluster moves was able to consistently solve 3D spin glass problems with 8000 variables when combined with our method, whereas without our method it could not solve any.

**Semi-supervised Text Categorization Using Recursive K-means Clustering**

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy. It uses recursive K-means algorithm for partitioning both labeled and unlabeled data collection. The K-means algorithm is applied recursively on each partition till a desired level partition is achieved such that each partition contains labeled documents of a single class. Once the desired clusters are obtained, the respective cluster centroids are considered as representatives of the clusters and the nearest neighbor rule is used for classifying an unknown text document. Series of experiments have been conducted to bring out the superiority of the proposed model over other recent state of the art models on 20Newsgroups dataset.

**Auto-Encoding User Ratings via Knowledge Graphs in Recommendation Scenarios**

In the last decade, driven also by the availability of an unprecedented computational power and storage capabilities in cloud environments we assisted to the proliferation of new algorithms, methods, and approaches in two areas of artificial intelligence: knowledge representation and machine learning. On the one side, the generation of a high rate of structured data on the Web led to the creation and publication of the so-called knowledge graphs. On the other side, deep learning emerged as one of the most promising approaches in the generation and training of models that can be applied to a wide variety of application fields. More recently, autoencoders have proven their strength in various scenarios, playing a fundamental role in unsupervised learning. In this paper, we instigate how to exploit the semantic information encoded in a knowledge graph to build connections between units in a Neural Network, thus leading to a new method, SEM-AUTO, to extract and weigh semantic features that can eventually be used to build a recommender system. As adding content-based side information may mitigate the cold user problems, we tested how our approach behave in the presence of a few rating from a user on the Movielens 1M dataset and compare results with BPRSLIM.

**Irregular Convolutional Neural Networks**

Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like

, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN.

**Methods for Interpreting and Understanding Deep Neural Networks**

This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.

**A Deep Neural Architecture for Sentence-level Sentiment Classification in Twitter Social Networking**

This paper introduces a novel deep learning framework including a lexicon-based approach for sentence-level prediction of sentiment label distribution. We propose to first apply semantic rules and then use a Deep Convolutional Neural Network (DeepCNN) for character-level embeddings in order to increase information for word-level embedding. After that, a Bidirectional Long Short-Term Memory Network (Bi-LSTM) produces a sentence-wide feature representation from the word-level embedding. We evaluate our approach on three Twitter sentiment classification datasets. Experimental results show that our model can improve the classification accuracy of sentence-level sentiment analysis in Twitter social networking.

**Invariant Causal Prediction for Sequential Data**

We investigate the problem of inferring the causal variables of a response

from a set of

predictors

. Classical ordinary least squares regression includes all predictors that reduce the variance of

. Using only the causal parents instead leads to models that have the advantage of remaining invariant under interventions, i.e., loosely speaking they lead to invariance across different ‘environments’ or ‘heterogeneity patterns’. More precisely, the conditional distribution of

given its causal variables remains constant for all observations. Recent work exploit such a stability to infer causal relations from data with different but known environments. We show here that even without having knowledge of the environments or heterogeneity pattern, inferring causal relations is possible for time-ordered (or any other type of sequentially ordered) data. In particular, this then allows to detect instantaneous causal relations in multivariate linear time series, in contrast to the concept of Granger causality. Besides novel methodology, we provide statistical confidence bounds and asymptotic detection results for inferring causal variables, and we present an application to monetary policy in macro economics.

**A Contemporary Overview of Probabilistic Latent Variable Models**

In this paper we provide a conceptual overview of latent variable models within a probabilistic modeling framework, an overview that emphasizes the compositional nature and the interconnectedness of the seemingly disparate models commonly encountered in statistical practice.

**There and Back Again: A General Approach to Learning Sparse Models**

We propose a simple and efficient approach to learning sparse models. Our approach consists of (1) projecting the data into a lower dimensional space, (2) learning a dense model in the lower dimensional space, and then (3) recovering the sparse model in the original space via compressive sensing. We apply this approach to Non-negative Matrix Factorization (NMF), tensor decomposition and linear classification—showing that it obtains

compression with negligible loss in accuracy on real data, and obtains up to

speedups. Our main theoretical contribution is to show the following result for NMF: if the original factors are sparse, then their projections are the sparsest solutions to the projected NMF problem. This explains why our method works for NMF and shows an interesting new property of random projections: they can preserve the solutions of non-convex optimization problems such as NMF.

**Automated text summarisation and evidence-based medicine: A survey of two domains**

The practice of evidence-based medicine (EBM) urges medical practitioners to utilise the latest research evidence when making clinical decisions. Because of the massive and growing volume of published research on various medical topics, practitioners often find themselves overloaded with information. As such, natural language processing research has recently commenced exploring techniques for performing medical domain-specific automated text summarisation (ATS) techniques– targeted towards the task of condensing large medical texts. However, the development of effective summarisation techniques for this task requires cross-domain knowledge. We present a survey of EBM, the domain-specific needs for EBM, automated summarisation techniques, and how they have been applied hitherto. We envision that this survey will serve as a first resource for the development of future operational text summarisation techniques for EBM.

**Automatic Synonym Discovery with Knowledge Bases**

Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., ‘apple’ could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as ‘distant’ supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.

**Do GANs actually learn the distribution? An empirical study**

Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of (Goodfellow et al 2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al (to appear at ICML 2017) raised doubts whether the same holds when discriminator has finite size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support —in other words, the training objective is unable to prevent mode collapse. The current note reports experiments suggesting that such problems are not merely theoretical. It presents empirical evidence that well-known GANs approaches do learn distributions of fairly low support, and thus presumably are not learning the target distribution. The main technical contribution is a new proposed test, based upon the famous birthday paradox, for estimating the support size of the generated distribution.

**Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets**

This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an {unknown} fraction of {fixed size} of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and flexible approach which, contrarily to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e. the chain distribution approximates the target, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.

**StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge**

Today, massive amounts of streaming data from smart devices need to be analyzed automatically to realize the Internet of Things. The Complex Event Processing (CEP) paradigm promises low-latency pattern detection on event streams. However, CEP systems need to be extended with Machine Learning (ML) capabilities such as online training and inference in order to be able to detect fuzzy patterns (e.g., outliers) and to improve pattern recognition accuracy during runtime using incremental model training. In this paper, we propose a distributed CEP system denoted as StreamLearner for ML-enabled complex event detection. The proposed programming model and data-parallel system architecture enable a wide range of real-world applications and allow for dynamically scaling up and out system resources for low-latency, high-throughput event processing. We show that the DEBS Grand Challenge 2017 case study (i.e., anomaly detection in smart factories) integrates seamlessly into the StreamLearner API. Our experiments verify scalability and high event throughput of StreamLearner.

**Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables**

Bayesian neural networks (BNNs) with latent variables are probabilistic models which can automatically identify complex stochastic patterns in the data. We describe and study in these models a decomposition of predictive uncertainty into its epistemic and aleatoric components. First, we show how such a decomposition arises naturally in a Bayesian active learning scenario by following an information theoretic approach. Second, we use a similar decomposition to develop a novel risk sensitive objective for safe reinforcement learning (RL). This objective minimizes the effect of model bias in environments whose stochastic dynamics are described by BNNs with latent variables. Our experiments illustrate the usefulness of the resulting decomposition in active learning and safe RL settings.

• The Optimal Route and Stops for a Group of Users in a Road Network

• Formation Maneuvering Control of Multiple Nonholonomic Robotic Vehicles: Theory and Experimentation

• Synchronization in Dynamic Networks

• Growing Linear Consensus Networks Endowed by Spectral Systemic Performance Measures

• On a conjecture in second-order optimality conditions

• Cover Tree Compressed Sensing for Fast MR Fingerprint Recovery

• Time series experiments and causal estimands: exact randomization tests and trading

• A practical fpt algorithm for Flow Decomposition and transcript assembly

• Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

• Control Synthesis for High-Dimensional Systems With Counting Constraints

• Precise deviations for Cox processes with shot noise

• Full Randomness in the Higher Difference Structure of Two-state Markov Chains

• Preserving Intermediate Objectives: One Simple Trick to Improve Learning for Hierarchical Models

• Collaborative Deep Learning in Fixed Topology Networks

• On Sampling Strategies for Neural Network-based Collaborative Filtering

• On the numerical rank of radial basis function kernel matrices in high dimension

• Fundamental Matrix Estimation: A Study of Error Criteria

• Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling

• A Note on a Communication Game

• Reservoir Computing on the Hypersphere

• High-dimensional Linear Regression for Dependent Observations with Application to Nowcasting

• Tree-Residue Vertex-Breaking: a new tool for proving hardness

• Deep Mixture of Diverse Experts for Large-Scale Visual Recognition

• Joint and Competitive Caching Designs in Large-Scale Multi-Tier Wireless Multicasting Networks

• Random-field-induced disordering mechanism in a disordered ferromagnet: Between the Imry-Ma and the standard disordering mechanism

• Encoder-Decoder Shift-Reduce Syntactic Parsing

• On Validity of Reed Conjecture for {P_5, Flag^C}-free graphs

• Multi-agent constrained optimization of a strongly convex function over time-varying directed networks

• Large-Scale Human Activity Mapping using Geo-Tagged Videos

• Cluster Based Symbolic Representation for Skewed Text Categorization

• Online Participatory Sensing in Double Auction Environment with Location Information

• The Semantic Information Method for Maximum Mutual Information and Maximum Likelihood of Tests, Estimations, and Mixture Models

• Twisted Recurrence via Polynomial Walks

• Notes on Random Walks in the Cauchy Domain of Attraction

• A Variational EM Method for Pole-Zero Modeling of Speech with Mixed Block Sparse and Gaussian Excitation

• Optimal Feedback Selection for Structurally Cyclic Systems with Dedicated Actuators and Sensors

• ISTA-Net: Iterative Shrinkage-Thresholding Algorithm Inspired Deep Network for Image Compressive Sensing

• Justifications in Constraint Handling Rules for Logical Retraction in Dynamic Algorithms

• Thinnable Ideals and Invariance of Cluster Points

• Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset

• Sparsity-Based STAP Design Based on Alternating Direction Method with Gain/Phase Errors

• Martingale-coboundary decomposition for stationary random fields

• A Regress-Later Algorithm for Backward Stochastic Differential Equations

• Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems

• Temporal-related Convolutional-Restricted-Boltzmann-Machine capable of learning relational order via reinforcement learning procedure?

• Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications

• On integer network synthesis problem with tree-metric cost

• FAIR: A Hadoop-based Hybrid Model for Faculty Information Retrieval System

• Robust Sparse Covariance Estimation by Thresholding Tyler’s M-Estimator

• Online Power Control for Block i.i.d. Energy Harvesting Channels

• On generalizations of $p$-sets and their applications

• A splitter theorem for 3-connected 2-polymatroids

• Intrinsic Ultracontractivity of Non-local Dirichlet forms on Unbounded Open Sets

• Decomposing Motion and Content for Natural Video Sequence Prediction

• Uncertainty quantification and design for noisy matrix completion – a unified framework

• Sparsity Enables Estimation of both Subcortical and Cortical Activity from MEG and EEG

• An Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles (Extended)

• Minimum Connected Transversals in Graphs: New Hardness Results and Tractable Cases Using the Price of Connectivity

• Development of structural correlations and synchronization from adaptive rewiring in networks of Kuramoto oscillators

• Simplifying the Kohlberg Criterion on the Nucleolus: A Correct Approach

• Target contrastive pessimistic risk for robust domain adaptation

• Efficient and accurate monitoring of the depth information in a Wireless Multimedia Sensor Network based surveillance

• Finding optimal finite biological sequences over finite alphabets: the OptiFin toolbox

• Count-Based Exploration in Feature Space for Reinforcement Learning

• Expected volumes of Gaussian polytopes, external angles, and multiple order statistics

• One random jump and one permutation: sufficient conditions to chaotic, statistically faultless, and large throughput PRNG for FPGA

• Interactive Exploration and Discovery of Scientific Publications with PubVis

• Merging real and virtual worlds: An analysis of the state of the art and practical evaluation of Microsoft Hololens

• Flexible Rectified Linear Units for Improving Convolutional Neural Networks

• Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version)

• Random Forests for Industrial Device Functioning Diagnostics Using Wireless Sensor Networks

• Detekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych

• Matrix Hilbert Space

• Self-Learning Phase Boundaries by Active Contours

• Some new results on the self-dual [120,60,24] code

• Steiner Point Removal with Distortion $O(\log k)$

• Survival probabilities and maxima of sums of correlated increments with applications to one-dimensional cellular automata

• Large sets avoiding linear patterns

• Scalable multimodal convolutional networks for brain tumour segmentation

• ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools

• A Security Framework for Wireless Sensor Networks: Theory and Practice

• Restricted size Ramsey number for $P_3$ versus cycles

• A Unified Analysis of Stochastic Optimization Methods Using Jump System Theory and Quadratic Constraints

• Revenue Loss in Shrinking Markets

• Value Asymptotics in Dynamic Games on Large Horizons

• An algorithm to find maximum area polygons circumscribed about a convex polygon

• Photometric Stereo by Hemispherical Metric Embedding

• Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

• Perfectly Dominating the Lattice Graph of $\mathbb{Z}^{3}$ with Squares

• Phase retrieval using alternating minimization in a batch setting

• Image transformations on locally compact spaces

• Faster ICA by preconditioning with Hessian approximations

• Strong Converses Are Just Edge Removal Properties

• Smith and Critical groups of Polar Graphs

• A preference elicitation interface for collecting rich recommender datasets

• Listing Words in Free Groups

• Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics

• Sparse Output Feedback Synthesis via Proximal Alternating Linearization Method

• Dickman approximation in simulation, summations and perpetuities

• English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor

• A Proof of Vivo-Pato-Oshanin’s Conjecture on the Fluctuation of von Neumann Entropy

• Dr.VAE: Drug Response Variational Autoencoder

• A sequential surrogate method for reliability analysis based on radial basis function

• IS-ASGD: Importance Sampling Accelerated Asynchronous SGD on Multi-Core Systems

• End-to-end Learning of Image based Lane-Change Decision

• NOMA in 5G Systems: Exciting Possibilities for Enhancing Spectral Efficiency

• Phase transition for a non-attractive infection process in heterogeneous environment

• An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform

• YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

• Asymptotic Existence of Fair Divisions for Groups

• YouTube-8M Video Understanding Challenge Approach and Applications

• Ramanujan-type congruences for certain weighted 7-colored partitions

• Multi-level SVM Based CAD Tool for Classifying Structural MRIs

• Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription

• Survival probabilities of high-dimensional stochastic SIS and SIR models with random edge weights

• Testing normality for unconditionally heteroscedastic macroeconomic variables

• Interferometric control of the photon-number distribution

• Spatial Risk Measure for Max-Stable and Max-Mixture Processes

• Few-shot Object Detection

• New procedures for discrete tests with proven false discovery rate control

• Lebesgue and gaussian measure of unions of basic semi-algebraic sets

• Deep Semantics-Aware Photo Adjustment

• Efficient Manifold and Subspace Approximations with Spherelets

• Adaptive Strategies for The Open-Pit Mine Optimal Scheduling Problem

• Top-down Transformation Choice

• Multilevel Monte Carlo Method for Statistical Model Checking of Hybrid Systems

• Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates

• State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Markov Models

• Wideband DOA Estimation through Projection Matrix Interpolation

• Estimation of species relative abundances and habitat preferences using opportunistic data

• On the Komlós, Major and Tusnády strong approximation for some classes of random iterates

• Quantum thermostatted disordered systems and sensitivity under compression

• A hypothesis testing approach for communication over entanglement assisted compound quantum channel

• Data depth and rank-based tests for covariance and spectral density matrices

• A Publish/Subscribe System Using Causal Broadcast Over Dynamically Built Spanning Trees

• Handling PDDL3.0 State Trajectory Constraints with Temporal Landmarks

• Unemployment estimation: Spatial point referenced methods and models

• Multi-Label Learning with Label Enhancement

• An adaptive prefix-assignment technique for symmetry reduction

• The Boolean Solution Problem from the Perspective of Predicate Logic – Extended Version

• On tree-decompositions of one-ended graphs

• Universal limits of sunstitution-closed permutation classes

• A Meta-Learning Approach to One-Step Active Learning

• Semantically Informed Multiview Surface Refinement

• On concentration properties of disordered Hamiltonians

• Monotonicity of functionals of random polytopes

• Location of the spectrum of Kronecker random matrices

• High-dimensional classification by sparse logistic regression

• Beyond Moore-Penrose Part I: Generalized Inverses that Minimize Matrix Norms

• Recurrence and Ergodicity of Switching Diffusions with Past-Dependent Switching Having A Countable State Space

• Deep Semantic Classification for 3D LiDAR Data

• GPU-acceleration for Large-scale Tree Boosting

• Extremes of $L^p$-norm of Vector-valued Gaussian processes with Trend

• Dynamic Load Balancing for PIC code using Eulerian/Lagrangian partitioning

• Ordered and Delayed Adversaries and How to Work against Them on Shared Channel

• Bounds on the length of a game of Cops and Robbers

• Metastable Behavior of Bootstrap Percolation on Galton-Watson Trees

• On risk averse competitive equilibrium

• Counting Restricted Homomorphisms via Möbius Inversion over Matroid Lattices

• Nonseparable Multinomial Choice Models in Cross-Section and Panel Data

• Ergodic aspects of some Ornstein-Uhlenbeck type processes related to L{é}vy processes

• Approximate Steepest Coordinate Descent

• Bounds on the Satisfiability Threshold for Power Law Distributed Random SAT

• Image Processing in Floriculture Using a robotic Mobile Platform

• On branching-point selection for triple products in spatial branch-and-bound: the hull relaxation

• Optimal choice problem and its solutions

• Challenges to estimating contagion effects from observational data

• Learning to Map Vehicles into Bird’s Eye View

• Edge of spiked beta ensembles, stochastic Airy semigroups and reflected Brownian motions

• Iterative Random Forests to detect predictive and stable high-order interactions

• Preasymptotic Convergence of Randomized Kaczmarz Method

• Is the Riemann zeta function in a short interval a 1-RSB spin glass ?

• Paths in hypergraphs: a rescaling phenomenon

• Inverse Ising inference by combining Ornstein-Zernike theory with deep learning

• Distributed compression through the lens of algorithmic information theory: a primer

• Efficiency of quantum versus classical annealing in non-convex learning problems

• Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

• Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability

• Deep Network Flow for Multi-Object Tracking

• Complexity of the Regularized Newton Method

• Non-Orthogonal Multiple Access combined with Random Linear Network Coded Cooperation

• Cognitive Subscore Trajectory Prediction in Alzheimer’s Disease

• Towards the Evolution of Multi-Layered Neural Networks: A Dynamic Structured Grammatical Evolution Approach

• On Signal Reconstruction from FROG Measurements

• Spectrally-normalized margin bounds for neural networks

• GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium

• A Simulator for Hedonic Games

• Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog