LoIDE: a web-based IDE for Logic Programming – Preliminary Technical Report

Logic-based paradigms are nowadays widely used in many different fields, also thank to the availability of robust tools and systems that allow the development of real-world and industrial applications. In this work we present LoIDE, an advanced and modular web-editor for logic-based languages that also integrates with state-of-the-art solvers.

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.

Foundations of Complex Event Processing

Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating heterogeneous distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP frameworks are based on ad-hoc solutions that do not rely on solid theoretical ground, making them hard to understand, extend or generalize. Moreover, they are usually presented as application programming interfaces documented by examples, and using each of them requires learning a different set of skills. In this paper we embark on the task of giving a rigorous framework to CEP. As a starting point, we propose a formal language for specifying complex events, called CEPL, that contains the common features used in the literature and has a simple and denotational semantics. We also formalize the so-called selection strategies, which are the cornerstone of CEP and had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we study how to efficiently evaluate CEPL for processing complex events. We provide optimization results based on rewriting formulas to a normal form that simplifies the evaluation of filters. Furthermore, we introduce a formal computational model for CEP based on transducers and symbolic automata, called match automata, that captures the regular core of CEPL, i.e. formulas with unary predicates. By using rewriting techniques and automata-based translations, we show that formulas in the regular core of CEPL can be evaluated using constant time per event followed by constant-delay enumeration of the output (under data complexity). By gathering these results together, we propose a framework for efficiently evaluating CEPL, establishing solid foundations for future CEP systems.

‘How May I Help You?’: Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts

Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained ‘dialogue acts’ frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.

An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems

We introduce a conceptual framework and an interventional calculus to reconstruct the dynamics of, steer, and manipulate systems based on their intrinsic algorithmic probability using the universal principles of the theory of computability and algorithmic information. By applying sequences of controlled interventions to systems and networks, we estimate how changes in their algorithmic information content are reflected in positive/negative shifts towards and away from randomness. The strong connection between approximations to algorithmic complexity (the size of the shortest generating mechanism) and causality induces a sequence of perturbations ranking the network elements by the steering capabilities that each of them is capable of. This new dimension unmasks a separation between causal and non-causal components providing a suite of powerful parameter-free algorithms of wide applicability ranging from optimal dimension reduction, maximal randomness analysis and system control. We introduce methods for reprogramming systems that do not require the full knowledge or access to the system’s actual kinetic equations or any probability distributions. A causal interventional analysis of synthetic and regulatory biological networks reveals how the algorithmic reprogramming qualitatively reshapes the system’s dynamic landscape. For example, during cellular differentiation we find a decrease in the number of elements corresponding to a transition away from randomness and a combination of the system’s intrinsic properties and its intrinsic capabilities to be algorithmically reprogrammed can reconstruct an epigenetic landscape. The interventional calculus is broadly applicable to predictive causal inference of systems such as networks and of relevance to a variety of machine and causal learning techniques driving model-based approaches to better understanding and manipulate complex systems.

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Human-Object Interactions

Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over times. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose changes are mostly attributed to the subject’s interactions with the surrounding, e.g., crossing behind another objects, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object’s visibility fluents and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate the above joint task as an iterative search of feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrated that our method clearly outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.

Process-oriented Iterative Multiple Alignment for Medical Process Mining

Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(N2L2) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL2) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.

Statistical inference on random dot product graphs: a survey

The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.

Subset Labeled LDA for Large-Scale Multi-Label Classification

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-of-the-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior–LDA, Dep–LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.

Relevant Ensemble of Trees

Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.

Applying Machine Learning Methods to Enhance the Distribution of Social Services in Mexico

The Government of Mexico’s social development agency, SEDESOL, is responsible for the administration of social services and has the mission of lifting Mexican families out of poverty. One key challenge they face is matching people who have social service needs with the services SEDESOL can provide accurately and efficiently. In this work we describe two specific applications implemented in collaboration with SEDESOL to enhance their distribution of social services. The first problem relates to systematic underreporting on applications for social services, which makes it difficult to identify where to prioritize outreach. Responding that five people reside in a home when only three do is a type of underreporting that could occur while a social worker conducts a home survey with a family to determine their eligibility for services. The second involves approximating multidimensional poverty profiles across households. That is, can we characterize different types of vulnerabilities — for example, food insecurity and lack of health services — faced by those in poverty? We detail the problem context, available data, our machine learning formulation, experimental results, and effective feature sets. As far as we are aware this is the first time government data of this scale has been used to combat poverty within Mexico. We found that survey data alone can suggest potential underreporting. Further, we found geographic features useful for housing and service related indicators and transactional data informative for other dimensions of poverty. The results from our machine learning system for estimating poverty profiles will directly help better match 7.4 million individuals to social programs.

Deep Automated Multit-task Learning

Multi-task learning (MTL) has recently contributed to learning better representations in service of various NLP tasks. MTL aims at improving the performance of a primary task, by jointly training on a secondary task. This paper introduces automated tasks, which exploit the sequential nature of the input data, as secondary tasks in an MTL model. We explore next word prediction, next character prediction, and missing word completion as potential automated tasks. Our results show that training on a primary task in parallel with a secondary automated task improves both the convergence speed and accuracy for the primary task. We suggest two methods for augmenting an existing network with automated tasks and establish better performance in topic prediction, sentiment analysis, and hashtag recommendation. Finally, we show that the MTL models can perform well on datasets that are small and colloquial by nature.

Data Innovation for International Development: An overview of natural language processing for qualitative data analysis

Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the UNDP Fragments of Impact project.

Representation Learning on Graphs: Methods and Applications

Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph convolutional networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.

Markov Brains: A Technical Introduction

Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many aspects, but the key difference is that instead of a layered architecture, with each node performing the same function, Markov Brains are networks built from individual computational components. These computational components interact with each other, receive inputs from sensors, and control motor outputs. The function of the computational components, their connections to each other, as well as connections to sensors and motors are all subject to evolutionary optimization. Here we describe in detail how a Markov Brain works, what techniques can be used to study them, and how they can be evolved.

Label propagation for clustering

Label propagation is a heuristic method initially proposed for community detection in networks, while the method can be adopted also for other types of network clustering and partitioning. Among all the approaches and techniques described in this book, label propagation is neither the most accurate nor the most robust method. It is, however, without doubt one of the simplest and fastest clustering methods. Label propagation can be implemented with a few lines of programming code and applied to networks with hundreds of millions of nodes and edges on a standard computer, which is true only for a handful of other methods in the literature. In this chapter, we present the basic framework of label propagation, review different advances and extensions of the original method, and highlight its equivalences with other approaches. We show how label propagation can be used effectively for large-scale community detection, graph partitioning, identification of structurally equivalent nodes and other network structures. We conclude the chapter with a summary of the label propagation methods and suggestions for future research.

Bayesian nonparametric Principal Component Analysis

Principal component analysis (PCA) is very popular to perform dimension reduction. The selection of the number of significant components is essential but often based on some practical heuristics depending on the application. Only few works have proposed a probabilistic approach able to infer the number of significant components. To this purpose, this paper introduces a Bayesian nonparametric principal component analysis (BNP-PCA). The proposed model projects observations onto a random orthogonal basis which is assigned a prior distribution defined on the Stiefel manifold. The prior on factor scores involves an Indian buffet process to model the uncertainty related to the number of components. The parameters of interest as well as the nuisance parameters are finally inferred within a fully Bayesian framework via Monte Carlo sampling. A study of the (in-)consistence of the marginal maximum a posteriori estimator of the latent dimension is carried out. A new estimator of the subspace dimension is proposed. Moreover, for sake of statistical significance, a Kolmogorov-Smirnov test based on the posterior distribution of the principal components is used to refine this estimate. The behaviour of the algorithm is first studied on various synthetic examples. Finally, the proposed BNP dimension reduction approach is shown to be easily yet efficiently coupled with clustering or latent factor models within a unique framework.

Semi-supervised learning

Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of not classified data, to perform classification, in situations when, typically, the labelled data are few. Even though this is not always possible (it depends on how useful is to know the distribution of the unlabelled data in the inference of the labels), several algorithm have been proposed recently. A new algorithm is proposed, that under almost neccesary conditions, attains asymptotically the performance of the best theoretical rule, when the size of unlabeled data tends to infinity. The set of necessary assumptions, although reasonables, show that semi-parametric classification only works for very well conditioned problems.

AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms

In this paper, we present the first-of-its-kind machine learning (ML) system, called AI Programmer, that can automatically generate full software programs requiring only minimal human guidance. At its core, AI Programmer uses genetic algorithms (GA) coupled with a tightly constrained programming language that minimizes the overhead of its ML search space. Part of AI Programmer’s novelty stems from (i) its unique system design, including an embedded, hand-crafted interpreter for efficiency and security and (ii) its augmentation of GAs to include instruction-gene randomization bindings and programming language-specific genome construction and elimination techniques. We provide a detailed examination of AI Programmer’s system design, several examples detailing how the system works, and experimental data demonstrating its software generation capabilities and performance using only mainstream CPUs.

Memory Augmented Control Networks

Planning problems in partially observable environments cannot be solved directly with convolutional networks and require some form of memory. But, even memory networks with sophisticated addressing schemes are unable to learn intelligent reasoning satisfactorily due to the complexity of simultaneously learning to access memory and plan. To mitigate these challenges we introduce the Memory Augmented Control Network (MACN). The proposed network architecture consists of three main parts. The first part uses convolutions to extract features and the second part uses a neural network-based planning module to pre-plan in the environment. The third part uses a network controller that learns to store those specific instances of past information that are necessary for planning. The performance of the network is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles. We show that our network learns to plan and can generalize to new environments.

A Hierarchical Probabilistic Model for Facial Feature Detection

Facial feature detection from facial images has attracted great attention in the field of computer vision. It is a nontrivial task since the appearance and shape of the face tend to change under different conditions. In this paper, we propose a hierarchical probabilistic model that could infer the true locations of facial features given the image measurements even if the face is with significant facial expression and pose. The hierarchical model implicitly captures the lower level shape variations of facial components using the mixture model. Furthermore, in the higher level, it also learns the joint relationship among facial components, the facial expression, and the pose information through automatic structure learning and parameter estimation of the probabilistic model. Experimental results on benchmark databases demonstrate the effectiveness of the proposed hierarchical probabilistic model.

Relational Marginal Problems: Theory and Estimation

In the propositional setting, the marginal problem is to find a (maximum-entropy) distribution that has some given marginals. We study this problem in a relational setting and make the following contributions. First, we compare two different notions of relational marginals. Second, we show a duality between the resulting relational marginal problems and the maximum likelihood estimation of the parameters of relational models, which generalizes a well-known duality from the propositional setting. Third, by exploiting the relational marginal formulation, we present a statistically sound method to learn the parameters of relational models that will be applied in settings where the number of constants differs between the training and test data. Furthermore, based on a relational generalization of marginal polytopes, we characterize cases where the standard estimators based on feature’s number of true groundings needs to be adjusted and we quantitatively characterize the consequences of these adjustments. Fourth, we prove bounds on expected errors of the estimated parameters, which allows us to lower-bound, among other things, the effective sample size of relational training data.

ZhuSuan: A Library for Bayesian Deep Learning

In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan is featured for its deep root into Bayesian inference, thus supporting various kinds of probabilistic models, including both the traditional hierarchical Bayesian models and recent deep generative models. We use running examples to illustrate the probabilistic programming on ZhuSuan, including Bayesian logistic regression, variational auto-encoders, deep sigmoid belief networks and Bayesian recurrent neural networks.

IBM Deep Learning Service

Deep learning driven by large neural network models is overtaking traditional machine learning methods for understanding unstructured and perceptual data domains such as speech, text, and vision. At the same time, the ‘as-a-Service’-based business model on the cloud is fundamentally transforming the information technology industry. These two trends: deep learning, and ‘as-a-service’ are colliding to give rise to a new business model for cognitive application delivery: deep learning as a service in the cloud. In this paper, we will discuss the details of the software architecture behind IBM’s deep learning as a service (DLaaS). DLaaS provides developers the flexibility to use popular deep learning libraries such as Caffe, Torch and TensorFlow, in the cloud in a scalable and resilient manner with minimal effort. The platform uses a distribution and orchestration layer that facilitates learning from a large amount of data in a reasonable amount of time across compute nodes. A resource provisioning layer enables flexible job management on heterogeneous resources, such as graphics processing units (GPUs) and central processing units (CPUs), in an infrastructure as a service (IaaS) cloud.

E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network

Traditional Bag-of-visual Words (BoWs) model is commonly generated with many steps including local feature extraction, codebook generation, and feature quantization, \emph{etc.} Those steps are relatively independent with each other and are hard to be jointly optimized. Moreover, the dependency on hand-crafted local feature makes BoWs model not effective in conveying high-level semantics. These issues largely hinder the performance of BoWs model in large-scale image applications. To conquer these issues, we propose an End-to-End BoWs (E^2BoWs) model based on Deep Convolutional Neural Network (DCNN). Our model takes an image as input, then identifies and separates the semantic objects in it, and finally outputs the visual words with high semantic discriminative power. Specifically, our model firstly generates Semantic Feature Maps (SFMs) corresponding to different object categories through convolutional layers, then introduces Bag-of-Words Layers (BoWL) to generate visual words for each individual feature map. We also introduce a novel learning algorithm to reinforce the sparsity of the generated E^2BoWs model, which further ensures the time and memory efficiency. We evaluate the proposed E^2BoWs model on several image search datasets including \emph{CIFAR-10}, \emph{CIFAR-100}, \emph{MIRFLICKR-25K} and \emph{NUS-WIDE}. Experimental results show that our method achieves promising accuracy and efficiency compared with recent deep learning based retrieval works.

A Generalized Framework for Kullback-Leibler Markov Aggregation

This paper proposes an information-theoretic cost function for aggregating a Markov chain via a (possibly stochastic) mapping. The cost function is motivated by two objectives: 1) The process obtained by observing the Markov chain through the mapping should be close to a Markov chain, and 2) the aggregated Markov chain should retain as much of the temporal dependence structure of the original Markov chain as possible. We discuss properties of this parameterized cost function and show that it contains the cost functions previously proposed by Deng et al., Xu et al., and Geiger et al. as special cases. We moreover discuss these special cases providing a better understanding and highlighting potential shortcomings: For example, the cost function proposed by Geiger et al. is tightly connected to approximate probabilistic bisimulation, but leads to trivial solutions if optimized without regularization. We furthermore propose a simple heuristic to optimize our cost function for deterministic aggregations and illustrate its performance on a set of synthetic examples.

Leveraging Distributional Semantics for Multi-Label Learning

We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as label-label correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and state-of-the-art methods for large-scale multi-label learning.

A convergent relaxation of the Douglas-Rachford algorithm

This paper proposes an algorithm for solving structured optimization problems, which covers both the backward-backward and the Douglas-Rachford algorithms as special cases, and analyzes its convergence. The set of fixed points of the algorithm is characterized in several cases. Convergence criteria of the algorithm in terms of general fixed point operators are established. When applying to nonconvex feasibility including the inconsistent case, we prove local linear convergence results under mild assumptions on regularity of individual sets and of the collection of sets which need not intersect. In this special case, we refine known linear convergence criteria for the Douglas-Rachford algorithm (DR). As a consequence, for feasibility with one of the sets being affine, we establish criteria for linear and sublinear convergence of convex combinations of the alternating projection and the DR methods. These results seem to be new. We also demonstrate the seemingly improved numerical performance of this algorithm compared to the RAAR algorithm for both consistent and inconsistent sparse feasibility problems.

Guided Deep Reinforcement Learning for Swarm Systems

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher’ network as input and outputs a compressed `student’ network derived from the `teacher’ network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher’ model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward — a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher’ network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher’ networks can be used to rapidly speed up training on larger `teacher’ networks.

Sequence to Sequence Learning for Event Prediction

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.

Coupled Ensembles of Neural Networks

We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the ‘fuse (averaging) layer’ before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as ‘coupled ensembles’. The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks.

Embedding Deep Networks into Visual Explanations
Synthesis of surveillance strategies via belief abstraction
General Phase Regularized Reconstruction using Phase Cycling
A Rule-Based Approach to Analyzing Database Schema Objects with Datalog
Road Friction Estimation for Connected Vehicles using Supervised Machine Learning
The Uncertainty Bellman Equation and Exploration
Secrecy Rate of Distributed Cooperative MIMO in the Presence of Multi-Antenna Eavesdropper
Zero pattern matrix rings, reachable pairs in digraphs, and Sharp’s topological invariant $τ$
$ε$-Lexicase selection: a probabilistic and multi-objective analysis of lexicase selection in continuous domains
Differential Privacy on Finite Computers
Zero-Shot Learning to Manage a Large Number of Place-Specific Compressive Change Classifiers
Design, Modeling, and Geometric Control on SE(3) of a Fully-Actuated Hexarotor for Aerial Interaction
Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue
Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems
Combining Search with Structured Data to Create a More Engaging User Experience in Open Domain Dialogue
Multi-Agent Distributed Lifelong Learning for Collective Knowledge Acquisition
Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks
Impatient random walk
Robust estimation in single index models with asymmetric errors
NIMA: Neural Image Assessment
Grade Prediction with Temporal Course-wise Influence
Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs
To Go or Not To Go? A Near Unsupervised Learning Approach For Robot Navigation
Spectral Radii of Truncated Circular Unitary Matrices
Learning Sampling Distributions for Robot Motion Planning
Augmenting End-to-End Dialog Systems with Commonsense Knowledge
Detection of Transition Times from Single-particle-tracking Trajectories
Channel Access Method Classification For Cognitive Radio Applications
New approach to optimal control of stochastic Volterra integral equations
Codes over Affine Algebras with a Finite Commutative Chain coefficient Ring
Multivariable codes in principal ideal polynomial quotient rings with applications to additive modular bivariate codes over $\mathbb{F}_4$
Acquiring Background Knowledge to Improve Moral Value Prediction
Machine learning technique to find quantum many-body ground states of bosons on a lattice
Long-Term Ensemble Learning of Visual Place Classifiers
Some improved bounds on two energy-like invariants of some derived graphs
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification
Cooperative Network Synchronization: Asymptotic Analysis
Reliability of Multicast under Random Linear Network Coding
sPIN: High-performance streaming Processing in the Network
Role of Morphology Injection in Statistical Machine Translation
Performance Analysis of FSO System with Spatial Diversity and Relays for M-QAM over Log-Normal Channel
Challenges and potentials for visible light communications: State of the art
Performance analysis of dual-hop optical wireless communication systems over k-distribution turbulence channel with pointing error
The Multiscale Bowler-Hat Transform for Blood Vessel Enhancement in Retinal Images
Hopping charge transport in amorphous semiconductors with the spatially correlated exponential density of states
Constrained Bayesian Optimization for Automatic Chemical Design
The generalised random dot product graph
Miquel dynamics for circle patterns
The Geometric Block Model
Regularization and Variable Selection with Copula Prior
Some variations on Random Survival Forest with application to Cancer Research
Semi-Static and Sparse Variance-Optimal Hedging
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
Performance Evaluation of Spatial Complementary Code Keying Modulation in MIMO Systems
Semi-Static Variance-Optimal Hedging in Stochastic Volatility Models with Fourier Representation
Convergence Analysis of Parallel Multi-block ADMM via Discrete-time Recurrent Neural Networks
Information Geometry of Quantum Resources
An alternative to continuous univariate distributions supported on a bounded interval: The BMT distribution
DeepLung: 3D Deep Convolutional Nets for Automated Pulmonary Nodule Detection and Classification
On Isoperimetric Stability
Method for Mode Mixing Separation in Empirical Mode Decomposition
Forecasting of commercial sales with large scale Gaussian Processes
Multivariate Gaussian Network Structure Learning
Speech Dereverberation Using Nonnegative Convolutive Transfer Function and Spectro temporal Modeling
Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement
Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions
SKOS Concepts and Natural Language Concepts: an Analysis of Latent Relationships in KOSs
Multi-Modal Multi-Task Deep Learning for Autonomous Driving
Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification
Rigorous Analysis for Efficient Statistically Accurate Algorithms for Solving Fokker-Planck Equations in Large Dimensions
Generalized PMC model for the hybrid diagnosis of multiprocessor systems
Character Distributions of Classical Chinese Literary Texts: Zipf’s Law, Genres, and Epochs
Hybrid Fault diagnosis capability analysis of Hypercubes under the PMC model and MM* model
Millimeter Wave Channel Measurements and Implications for PHY Layer Design
Computation of graphical derivatives of normal cone maps to conic constraints without nondegeneracy and PDC
Hierarchical Gated Recurrent Neural Tensor Network for Answer Triggering
Learning Mixtures of Multi-Output Regression Models by Correlation Clustering for Multi-View Data
A Sharp Lower Bound for Mixed-membership Estimation
MOL-Eye: A New Metric for the Performance Evaluation of a Molecular Signal
Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder
A generalization of the Log Lindley distribution — its properties and applications
An adsorbed gas estimation model for shale gas reservoirs via statistical learning
Power-law exponent in multiplicative Langevin equation with temporally correlated noise
Convergence Analysis of Processes with Valiant Projection Operators in Hilbert Space
Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models
Joining Jolie to Docker – Orchestration of Microservices on a Containers-as-a-Service Layer
Reassessing Accuracy Rates of Median Decisions
Reinforcement Learning Based Conversational Search Assistant
A phase-field approach for the interface reconstruction in a nonlinear elliptic problem arising from cardiac electrophysiology
Mapping temporal-network percolation to weighted, static event graphs
On the Strong Feller Property of Stochastic Delay Differential Equations with Singular Drift
Large deviation principles and fluctuation theorems for currents in semi-Markov processes
Type II balanced truncation for deterministic bilinear control systems
Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery
On Inductive Abilities of Latent Factor Models for Relational Learning
An Improved Fatigue Detection System Based on Behavioral Characteristics of Driver
Neural Affine Grayscale Image Denoising
Semi-infinite Plücker relations and Weyl modules
Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks
Modeling Smooth Backgrounds and Generic Localized Signals with Gaussian Processes
A Categorical Approach for Recognizing Emotional Effects of Music
MERF: Morphology-based Entity and Relational Entity Extraction Framework for Arabic
Nonparametric Shape-restricted Regression
Cost-Based Assessment of Partitioning Algorithms of Agent-Based Systems on Hybrid Cloud Environments
Lexico-minimum Replica Placement in Multitrees
Optimal Battery Control Under Cycle Aging Mechanisms
FlashProfile: Interactive Synthesis of Syntactic Profiles
Flexible Computing Services for Comparisons and Analyses of Classical Chinese Poetry
Facial Feature Tracking under Varying Facial Expressions and Face Poses based on Restricted Boltzmann Machines
The Stochastic Geometry Analyses of Cellular Networks with α-Stable Self-Similarity
Modeling Co-location in Multi-Operator mmWave Networks with Spectrum Sharing
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
Indistinguishability and Energy Sensitivity of Asymptotically Gaussian Compressed Encryption
Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
Sim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter: Domain Randomization and Adaptation with Modular Networks
Douglas-Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions
Settling Payments Fast and Private: Efficient Decentralized Routing for Path-Based Transactions
Anticipating Information Needs Based on Check-in Activity
Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning
New Algorithms for Minimizing the Weighted Number of Tardy Jobs On a Single Machine
A Family of Partially Ordered Sets with Small Balance Constant
Finite-Alphabet Precoding for Massive MU-MIMO with Low-resolution DACs
Local Minimizers and Second-Order Conditions in Composite Piecewise Programming via Directional Derivatives
Quotient-complete arc-transitive latin square graphs from groups
Wavepacket Dynamics in One-Dimensional System with Long-Range Correlated Disorder
Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition
Variational formulas, Busemann functions, and fluctuation exponents for the corner growth model with exponential weights
Direction-Aware Semi-Dense SLAM
Social Style Characterization from Egocentric Photo-streams
GHK mirror symmetry, the Knutson-Tao hive cone, and Littlewood-Richardson coefficients
Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems
Parameter Regimes in Partial Functional Panel Regression
Multiplicity in the Lucas-Uzawa model with externalities
StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection
On the Restricted Isometry of the Columnwise Khatri-Rao Product
Learning Disordered Topological Phases by Statistical Recovery of Symmetry
MAX-consensus in open multi-agent systems with gossip interactions
Distinguishing graphs of maximum valence 3
Unification of graph products and compatibility with switching
Bounds on Binary Locally Repairable Codes Tolerating Multiple Erasures
Minimal Effort Back Propagation for Convolutional Neural Networks
Entrenched time delays versus accelerating opinion dynamics: are advanced democracies inherently unstable?
Globally simple Heffter arrays and orthogonal cyclic cycle decompositions
Direct Pose Estimation with a Monocular Camera
Toward a full-scale neural machine translation in production: the Booking.com use case
Order of approximation in the central limit theorem for associated random variables and a moderate deviation result
Stable Recovery of Structured Signals From Corrupted Sub-Gaussian Measurements
A note on the penalty parameter in Nitsche’s method for unfitted boundary value problems
Beyond SIFT using Binary features for Loop Closure Detection
Constructive approximate extremum value theorem for function spaces
A Democratically-Optimal Budgeting Algorithm
Autoencoder-Driven Weather Clustering for Source Estimation during Nuclear Events
Deletion theorem and combinatorics of hyperplane arrangements
Decentralized Collision-Free Control of Multiple Robots in 2D and 3D Spaces
Neonatal Seizure Detection using Convolutional Neural Networks
Dual Prediction-Correction Methods for Linearly Constrained Time-Varying Convex Programs
Stochastic Stability of Reinforcement Learning in Positive-Utility Games
Microscopy Cell Segmentation via Adversarial Neural Networks
Continuous Multimodal Emotion Recognition Approach for AVEC 2017
Recognizing Objects In-the-wild: Where Do We Stand?
Estimating the Variance of Measurement Errors in Running Variables of Sharp Regression Discontinuity Designs
Depression Scale Recognition from Audio, Visual and Text Analysis
Combinational neural network using Gabor filters for the classification of handwritten digits
On a generalization of Matérn hard-core processes with applications to max-stable processes
Use of Information, Memory and Randomization in Asynchronous Gathering
Building an Effective Data Warehousing for Financial Sector
From Electrical Power Flows to Unsplittabe Flows: A QPTAS for OPF with Discrete Demands in Line Distribution Networks
Rapid Fading Due to Human Blockage in Pedestrian Crowds at 5G Millimeter-Wave Frequencies
Variational Gaussian Approximation for Poisson Data
On stochastic integrals with controlled growth of their containing range
Self-embeddings of trees
Non-Clairvoyant Scheduling to Minimize Max Flow Time on a Machine with Setup Times
Localization game on geometric and planar graphs
A combinatorial characterisation of embedded polar spaces
Bayesian analysis of three parameter singular and absolute continuous Marshall-Olkin bivariate Pareto distribution
Learning a Fully Convolutional Network for Object Recognition using very few Data
Limitations of Cross-Lingual Learning from Image Search
All orthogonal arrays from quantum states
Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks
Normal Integration: A Survey
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations
Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification
Variational Methods for Normal Integration
A bijection between phylogenetic trees and plane oriented recursive trees
Multi-Person Pose Estimation via Column Generation
Examples of Itô càdlàg rough paths
On contractible edges in convex decompositions
A generalized major index statistic on tableaux
Orthogonal stochastic duality functions from Lie algebra representations
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Learning Depth-Three Neural Networks in Polynomial Time
LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation
Video Object Segmentation Without Temporal Information
Vehicle Tracking in Wide Area Motion Imagery via Stochastic Progressive Association Across Multiple Frames (SPAAM)
Local decoding and testing of polynomials over grids
Efficient simulation of Brown-Resnick processes based on variance reduction of Gaussian processes
Counting Steiner triple systems with classical parameters and prescribed rank
Target-adaptive CNN-based pansharpening
Cache-Aware Lock-Free Concurrent Hash Tries
Rotation Adaptive Visual Object Tracking with Motion Consistency
Network Deployment for Maximal Energy Efficiency in Uplink with Zero-Forcing
Game Total Domination Critical Graphs
MacWilliams’ extension theorem for infinite rings
Managing Price Uncertainty in Prosumer-Centric Energy Trading: A Prospect-Theoretic Stackelberg Game Approach