If you did not already know

Temporal Overdrive Recurrent Neural Network google
In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction tasks and we show some promising, preliminary results achieved on synthetic data. To evaluate the capabilities of our network, we compare the performance with several state-of-the-art recurrent architectures. …

Extreme Value Learning (EVL) google
The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on open-set recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabulary-informed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm — Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zero-shot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks. …

Skip-Gram Model google
A technique where by n-grams are still stored to model language, but they allow for tokens to be skipped. …


Distilled News

Measuring the Progress of AI Research

This pilot project collects problems and metrics/datasets from the AI research literature, and tracks progress on them. You can use this Notebook to see how things are progressing in specific subfields or AI/ML as a whole, as a place to report new results you’ve obtained, as a place to look for problems that might benefit from having new datasets/metrics designed for them, or as a source to build on for data science projects. At EFF, we’re ultimately most interested in how this data can influence our understanding of the likely implications of AI. To begin with, we’re focused on gathering it.

Using csvkit to Summarize Data: A Quick Example

As data analysts, we’re frequently presented with comma-separated value files and tasked with reporting insights. While it’s tempting to import that data directly into R or Python in order to perform data munging and exploratory data analysis, there are also a number of utilities to examine, fix, slice, transform, and summarize data through the command line. In particular, Csvkit is a suite of python based utilities for working with CSV files from the terminal. For this post, we will grab data using wget, subset rows containing a particular value, and summarize the data in different ways. The goal is to take data on criminal activity, group by a particular offense type, and develop counts to understand the frequency distribution.

Julia vs R and Python: what does Stack Overflow Developer Survey 2017 tell us?

TLDR: Most Julia programmers also use Python. However, among all languages R is the one whose users are most likely to also develop in Julia. Recently Stack Overflow has made public the results of Developer Survey 2017. It is definitely an interesting data set. In this post I analyzed the answers to the question ‘Which of the following languages have you done extensive development work in over the past year, and which do you want to work in over the next year?’ from the perspective of Julia language against other programming languages. Actually we get two variables of interest: 1) what was used and 2) what is planned to be used.

dbplyr 1.1.0

I’m pleased to announce the release of the dbplyr package, which now contains all dplyr code related to connecting to databases. This shouldn’t affect you-as-a-user much, but it makes dplyr simpler, and makes it easier to release improvements just for database related code.

Using the TensorFlow API: An Introductory Tutorial Series

This post summarizes and links to a great multi-part tutorial series on learning the TensorFlow API for building a variety of neural networks, as well as a bonus tutorial on backpropagation from the beginning.

GLM with H2O in R

Below is an example showing how to fit a Generalized Linear Model with H2O in R. The output is much more comprehensive than the one generated by the generic R glm().

R Packages worth a look

Nearest Centroid (NC) Sampling (NCSampling)
Provides functionality for performing Nearest Centroid (NC) Sampling. The NC sampling procedure was developed for forestry applications and selects plots for ground measurement so as to maximize the efficiency of imputation estimates. It uses multiple auxiliary variables and multivariate clustering to search for an optimal sample. Further details are given in Melville G. & Stone C. (2016) <doi:10.1080/00049158.2016.1218265>.

SQL Server R Database Interface (DBI) and ‘dplyr’ SQL Backend (RSQLServer)
Utilises The ‘jTDS’ project’s ‘JDBC’ 3.0 ‘SQL Server’ driver to extend ‘DBI’ classes and methods. The package also implements a ‘SQL’ backend to the ‘dplyr’ package.

A Way of Writing Functions that Quote their Arguments (quotedargs)
A facility for writing functions that quote their arguments, may sometimes evaluate them in the environment where they were quoted, and may pass them as quoted to other functions.

Easily Build and Evaluate Machine Learning Models (easyml)
Easily build and evaluate machine learning models on a dataset. Machine learning models supported include penalized linear models, penalized linear models with interactions, random forest, support vector machines, neural networks, and deep neural networks.

Sequential Invariant Causal Prediction (seqICP)
Contains an implementation of invariant causal prediction for sequential data. The main function in the package is ‘seqICP’, which performs linear sequential invariant causal prediction and has guaranteed type I error control. For non-linear dependencies the package also contains a non-linear method ‘seqICPnl’, which allows to input any regression procedure and performs tests based on a permutation approach that is only approximately correct. In order to test whether an individual set S is invariant the package contains the subroutines ‘seqICP.s’ and ‘seqICPnl.s’ corresponding to the respective main methods.

Whats new on arXiv

Exterior Distance Function

We introduce and study exterior distance function (EDF) and correspondent exterior point method (EPM) for convex optimization. The EDF is a classical Lagrangian for an equivalent problem obtained from the initial one by monotone transformation of both the objective function and the constraints. The constraints transformation is scaled by a positive scaling parameter. Thus, the EDF is a particular realization of the Nonlinear Rescaling (NR) principle. Along with the ‘center’, the EDF has two extra tools: the barrier (scaling) parameter and the vector of Lagrange multipliers. We show that EPM generates primal – dual sequence, which converges to the primal – dual solution in value under minimum assumption on the input data. Moreover, the convergence is taking place under any fixed interior point as a ‘center’ and any fixed positive scaling parameter, just due to the Lagrange multipliers update. If the second order sufficient optimality condition is satisfied, then the EPM converges with Q-linear rate under any fixed interior point as a ‘center’ and any fixed, but large enough positive scaling parameter.

Simulation optimization: A review of algorithms and applications

Simulation Optimization (SO) refers to the optimization of an objective function subject to constraints, both of which can be evaluated through a stochastic simulation. To address specific features of a particular simulation—discrete or continuous decisions, expensive or cheap simulations, single or multiple outputs, homogeneous or heterogeneous noise—various algorithms have been proposed in the literature. As one can imagine, there exist several competing algorithms for each of these classes of problems. This document emphasizes the difficulties in simulation optimization as compared to mathematical programming, makes reference to state-of-the-art algorithms in the field, examines and contrasts the different approaches used, reviews some of the diverse applications that have been tackled by these methods, and speculates on future directions in the field.

Domain reduction techniques for global NLP and MINLP optimization

Optimization solvers routinely utilize presolve techniques, including model simplification, reformulation and domain reduction techniques. Domain reduction techniques are especially important in speeding up convergence to the global optimum for challenging nonconvex nonlinear programming (NLP) and mixed-integer nonlinear programming (MINLP) optimization problems. In this work, we survey the various techniques used for domain reduction of NLP and MINLP optimization problems. We also present a computational analysis of the impact of these techniques on the performance of various widely available global solvers on a collection of 1740 test problems.

Hierarchical Model for Long-term Video Prediction

Video prediction has been an active topic of research in the past few years. Many algorithms focus on pixel-level predictions, which generates results that blur and disintegrate within a few frames. In this project, we use a hierarchical approach for long-term video prediction. We aim at estimating high-level structure in the input frame first, then predict how that structure grows in the future. Finally, we use an image analogy network to recover a realistic image from the predicted structure. Our method is largely adopted from the work by Villegas et al. The method is built with a combination of LSTMs and analogy-based convolutional auto-encoder networks. Additionally, in order to generate more realistic frame predictions, we also adopt adversarial loss. We evaluate our method on the Penn Action dataset, and demonstrate good results on high-level long-term structure prediction.

DE-PACRR: Exploring Layers Inside the PACRR Model

Recent neural IR models have demonstrated deep learning’s utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model’s components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.

Topometric Localization with Deep Learning

Compared to LiDAR-based localization methods, which provide high accuracy but rely on expensive sensors, visual localization approaches only require a camera and thus are more cost-effective while their accuracy and reliability typically is inferior to LiDAR-based methods. In this work, we propose a vision-based localization approach that learns from LiDAR-based localization methods by using their output as training data, thus combining a cheap, passive sensor with an accuracy that is on-par with LiDAR-based localization. The approach consists of two deep networks trained on visual odometry and topological localization, respectively, and a successive optimization to combine the predictions of these two networks. We evaluate the approach on a new challenging pedestrian-based dataset captured over the course of six months in varying weather conditions with a high degree of noise. The experiments demonstrate that the localization errors are up to 10 times smaller than with traditional vision-based localization methods.

When Neurons Fail

We view a neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neural activation functions are Lipschitz-continuous. Our bound is on a quantity, we call the \textit{Forward Error Propagation}, capturing how much error is propagated by a neural network when a given number of components is failing, computing this quantity only requires looking at the topology of the network, while experimentally assessing the robustness of a network requires the costly experiment of looking at all the possible inputs and testing all the possible configurations of the network corresponding to different failure situations, facing a discouraging combinatorial explosion. We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). Interestingly, as we show in the paper, our bound can easily be extended to the case where synapses can fail. We show how our bound can be leveraged to quantify the effect of memory cost reduction on the accuracy of a neural network, to estimate the amount of information any neuron needs from its preceding layer, enabling thereby a boosting scheme that prevents neurons from waiting for unnecessary signals. We finally discuss the trade-off between neural networks robustness and learning cost.

Fast Algorithms for Learning Latent Variables in Graphical Models

We study the problem of learning latent variables in Gaussian graphical models. Existing methods for this problem assume that the precision matrix of the observed variables is the superposition of a sparse and a low-rank component. In this paper, we focus on the estimation of the low-rank component, which encodes the effect of marginalization over the latent variables. We introduce fast, proper learning algorithms for this problem. In contrast with existing approaches, our algorithms are manifestly non-convex. We support their efficacy via a rigorous theoretical analysis, and show that our algorithms match the best possible in terms of sample complexity, while achieving computational speed-ups over existing methods. We complement our theory with several numerical experiments.

Exploring Generalization in Deep Learning

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.

Orthogonal Symmetric Chain Decompositions of Hypercubes
Symmetric Chain Decompositions of Products of Posets with Long Chains
Semidefinite Programming and Nash Equilibria in Bimatrix Games
New insights into non-central beta distributions
Group Synchronization on Grids
Illuminating Pedestrians via Simultaneous Detection & Segmentation
Coverage Probability Fails to Ensure Reliable Inference
MolecuLeNet: A continuous-filter convolutional neural network for modeling quantum interactions
Empirical priors and posterior concentration rates for a monotone density
Neural Question Answering at BioASQ 5B
Parareal Algorithm Implementation and Simulation in Julia
Detecting Small Signs from Large Images
Using Frame Theoretic Convolutional Gridding for Robust Synthetic Aperture Sonar Imaging
Invariant Causal Prediction for Nonlinear Models
Learning Local Feature Aggregation Functions with Backpropagation
Treewidth Bounds for Planar Graphs Using Three-Sided Brambles
Corrigendum for ‘Second-order reflected backward stochastic differential equations’ and ‘Second-order BSDEs with general reflection and game options under uncertainty’
Robust Sonar ATR Through Bayesian Pose Corrected Sparse Classification
Developing Bug-Free Machine Learning Systems With Formal Mathematics
Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study
The Minor Fall, the Major Lift: Inferring Emotional Valence of Musical Chords through Lyrics
Relating Complexity-theoretic Parameters with SAT Solver Performance
Do Deep Neural Networks Suffer from Crowding?
SUNNY-CP and the MiniZinc Challenge
Self-Sustaining Caching Stations: Towards Cost-Effective 5G-Enabled Vehicular Networks
Dense Non-rigid Structure-from-Motion Made Easy – A Spatial-Temporal Smoothness based Solution
Refined Cyclic Sieving on Words for the Major Index Statistic
Preservation of quantum Fisher information and geometric phase of a single qubit system in a dissipative reservoir through the addition of qubits
An Isomorphism between Lyapunov Exponents and Shannon’s Channel Capacity
A combinatorial proof of the smoothness of catalecticant schemes associated to complete intersections
Laplace deconvolution in the presence of indirect long-memory data
A Unified approach for Conventional Zero-shot, Generalized Zero-shot and Few-shot Learning
Fast and accurate classification of echocardiograms using deep learning
To slow, or not to slow? New science in sub-second networks
Fast and robust tensor decomposition with applications to dictionary learning
Proceedings of the First International Workshop on Deep Learning and Music
A stable Langevin model with diffusive-reflective boundary conditions
Memory-augmented Chinese-Uyghur Neural Machine Translation
Material Recognition CNNs and Hierarchical Planning for Biped Robot Locomotion on Slippery Terrain
Deviation inequalities for convex functions motivated by the Talagrand conjecture
Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild
Sensitivity analysis for network aggregative games
Mixing time of an unaligned Metropolis algorithm on the square
Controlled Tactile Exploration and Haptic Object Recognition
Two-Stage Hybrid Day-Ahead Solar Forecasting
PasMoQAP: A Parallel Asynchronous Memetic Algorithm for solving the Multi-Objective Quadratic Assignment Problem
Beyond Moore-Penrose Part II: The Sparse Pseudoinverse
PSK Precoding in Multi-User MISO Systems
Minimum BER Precoding in 1-Bit Massive MIMO Systems
Power- and Spectral Efficient Communication System Design Using 1-Bit Quantization
Independent motion detection with event-driven cameras
MMSE precoder for massive MIMO using 1-bit quantization
DFE/THP duality for FBMC with highly frequency selective channels
Spatial Coding Based on Minimum BER in 1-Bit Massive MIMO Systems
Typical Approximation Performance for Maximum Coverage Problem
Spectral shaping with low resolution signals
On efficiently solving the subproblems of a level-set method for fused lasso problems
Fountain Codes under Maximum Likelihood Decoding
Beamforming and Scheduling for mmWave Downlink Sparse Virtual Channels With Non-Orthogonal and Orthogonal Multiple Access
Hamilton-Jacobi equations for optimal control on networks with entry or exit costs
Extrinsic Gaussian processes for regression and classification on manifolds
Centralized and Distributed Sparsification for Low-Complexity Message Passing Algorithm in C-RAN Architectures
Computing denumerants in numerical 3-semigroups
Dynamics of a planar Coulomb gas
Equilibrium large deviations for mean-field systems with translation invariance
The Complexity of Counting Surjective Homomorphisms and Compactions
A decentralized approach to multi-agent MILPs: finite-time feasibility and performance guarantees
Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis
Landscape of Configurational Density of States for Discrete Large Systems
NOMA based Random Access with Multichannel ALOHA
On the R-superlinear convergence of the KKT residues generated by the augmented Lagrangian method for convex composite conic programming
Approximate Reflection Symmetry in a Point Set: Theory and Algorithm with an Application
NOMA: Principles and Recent Results
Recurrent Residual Learning for Action Recognition
A universal law for Voronoi cell volumes in infinitely large maps
Forecasting and Granger Modelling with Non-linear Dynamical Dependencies
Gabor frames and deep scattering networks in audio processing
Evolution of quantum entanglement with disorder in fractional quantum Hall liquids
archivist: An R Package for Managing, Recording and Restoring Data Analysis Results
The Second Leaper Theorem
A special case of completion invariance for the $c_2$ invariant of a graph
Large deviations for stochastic models of two-dimensional second grade fluids driven by Lévy noise
Hypergraphs with vanishing Turán density in uniformly dense hypergraphs
Rate-Distortion Classification for Self-Tuning IoT Networks
Unsupervised Feature Selection Based on Space Filling Concept
Critical properties of disordered XY model on sparse random graphs
Constant composition codes derived from linear codes
Detecting in-plane tension induced crystal plasticity transition with nanoindentation
Determinants of Random Block Hankel Matrices
Invariant components of synergy, redundancy, and unique information among three variables
Cross-Country Skiing Gears Classification using Deep Learning
Subspace Clustering with the Multivariate-t Distribution
Classical Music Clustering Based on Acoustic Features
Reexamining Low Rank Matrix Factorization for Trace Norm Regularization
The multipartite Ramsey number for the 3-path of length three
Robust and Efficient Parametric Spectral Estimation in Atomic Force Microscopy
Training a Fully Convolutional Neural Network to Route Integrated Circuits
Combinatorial approach to detection of fixed points, periodic orbits, and symbolic dynamics
Graphs that contain multiply transitive matchings

Document worth reading: “The ALAMO approach to machine learning”

ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a low-complexity, linear model composed of explicit non-linear transformations of the independent variables. Linear combinations of these non-linear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivative-free optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate first-principles knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO’s constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs. The ALAMO approach to machine learning

Book Memo: “A Primer on Process Mining”

Practical Skills with Python and Graphviz
The main goal of this book is to explain the core ideas of process mining, and to demonstrate how they can be implemented using just some basic tools that are available to any computer scientist or data scientist. It describes how to analyze event logs in order to discover the behavior of real-world business processes. The end result can often be visualized as a graph, and the book explains how to use Python and Graphviz to render these graphs intuitively. Overall, it enables the reader to implement process mining techniques on his or her own, independently of any specific process mining tool. An introduction to two popular process mining tools, namely Disco and ProM, is also provided. The book will be especially valuable for self-study or as a precursor to a more advanced text. Practitioners and students will be able to follow along on their own, even if they have no prior knowledge of the topic. After reading this book, they will be able to more confidently proceed to the research literature if needed.

Book Memo: “Guide to Convolutional Neural Networks”

A Practical Application to Traffic-Sign Detection and Classification
This must-read text/reference introduces the fundamental concepts of convolutional neural networks (ConvNets), offering practical guidance on using libraries to implement ConvNets in applications of traffic sign detection and classification. The work presents techniques for optimizing the computational efficiency of ConvNets, as well as visualization techniques to better understand the underlying processes. The proposed models are also thoroughly evaluated from different perspectives, using exploratory and quantitative analysis.
Topics and features: explains the fundamental concepts behind training linear classifiers and feature learning; discusses the wide range of loss functions for training binary and multi-class classifiers; illustrates how to derive ConvNets from fully connected neural networks, and reviews different techniques for evaluating neural networks; presents a practical library for implementing ConvNets, explaining how to use a Python interface for the library to create and assess neural networks; describes two real-world examples of the detection and classification of traffic signs using deep learning methods; examines a range of varied techniques for visualizing neural networks, using a Python interface; provides self-study exercises at the end of each chapter, in addition to a helpful glossary, with relevant Python scripts supplied at an associated website.
This self-contained guide will benefit those who seek to both understand the theory behind deep learning, and to gain hands-on experience in implementing ConvNets in practice. As no prior background knowledge in the field is required to follow the material, the book is ideal for all students of computer vision and machine learning, and will also be of great interest to practitioners working on autonomous cars and advanced driver assistance systems.

R Packages worth a look

RStudio Addin for Searching Packages in CRAN Database Based on Keywords (CRANsearcher)
One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to Bayesian inference and from spatial analyses to pharmacokinetics (<https://…/> ). There is probably not an area of quantitative research that isn’t represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio.

Parses Web Pages using Postlight Mercury (postlightmercury)
This is a wrapper for the Mercury Parser API. The Mercury Parser is a single API endpoint that takes a URL and gives you back the content reliably and easily. With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free. See the webpage here: <https://…/>.

Analysis of Time-Ordered Event Data with Missed Observations (intRvals)
Calculates event rates and compares means and variances of groups of interval data corrected for missed arrival observations.

Utilities for Delaying Function Execution (later)
Executes arbitrary R or C functions some time after the current time, after the R execution stack has emptied.

Simplification of Surface Triangular Meshes with Associated Distributed Data (meshsimp)
Iterative simplification strategy for surface triangular meshes (2.5D meshes) with associated data. Each iteration corresponds to an edge collapse where the selection of the edge to contract is driven by a cost functional that depends both on the geometry of the mesh than on the distribution of the data locations over the mesh. The library can handle both zero and higher genus surfaces. The package has been designed to be fully compatible with the R package ‘fdaPDE’, which implements regression models with partial differential regularizations, making use of the Finite Element Method. In the future, the functionalities provided by the current package may be directly integrated into ‘fdaPDE’.

Whats new on arXiv

Effective optimization using sample persistence: A case study on quantum annealers and various Monte Carlo optimization methods

We present and apply a general-purpose, multi-start algorithm for improving the performance of low-energy samplers used for solving optimization problems. The algorithm iteratively fixes the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are smaller and less connected, and samplers tend to give better low-energy samples for these problems. The algorithm is trivially parallelizable, since each start in the multi-start algorithm is independent, and could be applied to any heuristic solver that can be run multiple times to give a sample. We present results for several classes of hard problems solved using simulated annealing, path-integral quantum Monte Carlo, parallel tempering with isoenergetic cluster moves, and a quantum annealer, and show that the success metrics as well as the scaling are improved substantially. When combined with this algorithm, the quantum annealer’s scaling was substantially improved for native Chimera graph problems. In addition, with this algorithm the scaling of the time to solution of the quantum annealer is comparable to the Hamze–de Freitas–Selby algorithm on the weak-strong cluster problems introduced by Boixo et al. Parallel tempering with isoenergetic cluster moves was able to consistently solve 3D spin glass problems with 8000 variables when combined with our method, whereas without our method it could not solve any.

Semi-supervised Text Categorization Using Recursive K-means Clustering

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy. It uses recursive K-means algorithm for partitioning both labeled and unlabeled data collection. The K-means algorithm is applied recursively on each partition till a desired level partition is achieved such that each partition contains labeled documents of a single class. Once the desired clusters are obtained, the respective cluster centroids are considered as representatives of the clusters and the nearest neighbor rule is used for classifying an unknown text document. Series of experiments have been conducted to bring out the superiority of the proposed model over other recent state of the art models on 20Newsgroups dataset.

Auto-Encoding User Ratings via Knowledge Graphs in Recommendation Scenarios

In the last decade, driven also by the availability of an unprecedented computational power and storage capabilities in cloud environments we assisted to the proliferation of new algorithms, methods, and approaches in two areas of artificial intelligence: knowledge representation and machine learning. On the one side, the generation of a high rate of structured data on the Web led to the creation and publication of the so-called knowledge graphs. On the other side, deep learning emerged as one of the most promising approaches in the generation and training of models that can be applied to a wide variety of application fields. More recently, autoencoders have proven their strength in various scenarios, playing a fundamental role in unsupervised learning. In this paper, we instigate how to exploit the semantic information encoded in a knowledge graph to build connections between units in a Neural Network, thus leading to a new method, SEM-AUTO, to extract and weigh semantic features that can eventually be used to build a recommender system. As adding content-based side information may mitigate the cold user problems, we tested how our approach behave in the presence of a few rating from a user on the Movielens 1M dataset and compare results with BPRSLIM.

Irregular Convolutional Neural Networks

Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like {3\times3}, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN.

Methods for Interpreting and Understanding Deep Neural Networks

This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.

A Deep Neural Architecture for Sentence-level Sentiment Classification in Twitter Social Networking

This paper introduces a novel deep learning framework including a lexicon-based approach for sentence-level prediction of sentiment label distribution. We propose to first apply semantic rules and then use a Deep Convolutional Neural Network (DeepCNN) for character-level embeddings in order to increase information for word-level embedding. After that, a Bidirectional Long Short-Term Memory Network (Bi-LSTM) produces a sentence-wide feature representation from the word-level embedding. We evaluate our approach on three Twitter sentiment classification datasets. Experimental results show that our model can improve the classification accuracy of sentence-level sentiment analysis in Twitter social networking.

Invariant Causal Prediction for Sequential Data

We investigate the problem of inferring the causal variables of a response Y from a set of d predictors (X^1,\dots,X^d). Classical ordinary least squares regression includes all predictors that reduce the variance of Y. Using only the causal parents instead leads to models that have the advantage of remaining invariant under interventions, i.e., loosely speaking they lead to invariance across different ‘environments’ or ‘heterogeneity patterns’. More precisely, the conditional distribution of Y given its causal variables remains constant for all observations. Recent work exploit such a stability to infer causal relations from data with different but known environments. We show here that even without having knowledge of the environments or heterogeneity pattern, inferring causal relations is possible for time-ordered (or any other type of sequentially ordered) data. In particular, this then allows to detect instantaneous causal relations in multivariate linear time series, in contrast to the concept of Granger causality. Besides novel methodology, we provide statistical confidence bounds and asymptotic detection results for inferring causal variables, and we present an application to monetary policy in macro economics.

A Contemporary Overview of Probabilistic Latent Variable Models

In this paper we provide a conceptual overview of latent variable models within a probabilistic modeling framework, an overview that emphasizes the compositional nature and the interconnectedness of the seemingly disparate models commonly encountered in statistical practice.

There and Back Again: A General Approach to Learning Sparse Models

We propose a simple and efficient approach to learning sparse models. Our approach consists of (1) projecting the data into a lower dimensional space, (2) learning a dense model in the lower dimensional space, and then (3) recovering the sparse model in the original space via compressive sensing. We apply this approach to Non-negative Matrix Factorization (NMF), tensor decomposition and linear classification—showing that it obtains 10\times compression with negligible loss in accuracy on real data, and obtains up to 5\times speedups. Our main theoretical contribution is to show the following result for NMF: if the original factors are sparse, then their projections are the sparsest solutions to the projected NMF problem. This explains why our method works for NMF and shows an interesting new property of random projections: they can preserve the solutions of non-convex optimization problems such as NMF.

Automated text summarisation and evidence-based medicine: A survey of two domains

The practice of evidence-based medicine (EBM) urges medical practitioners to utilise the latest research evidence when making clinical decisions. Because of the massive and growing volume of published research on various medical topics, practitioners often find themselves overloaded with information. As such, natural language processing research has recently commenced exploring techniques for performing medical domain-specific automated text summarisation (ATS) techniques– targeted towards the task of condensing large medical texts. However, the development of effective summarisation techniques for this task requires cross-domain knowledge. We present a survey of EBM, the domain-specific needs for EBM, automated summarisation techniques, and how they have been applied hitherto. We envision that this survey will serve as a first resource for the development of future operational text summarisation techniques for EBM.

Automatic Synonym Discovery with Knowledge Bases

Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., ‘apple’ could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as ‘distant’ supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.

Do GANs actually learn the distribution? An empirical study

Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of (Goodfellow et al 2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al (to appear at ICML 2017) raised doubts whether the same holds when discriminator has finite size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support —in other words, the training objective is unable to prevent mode collapse. The current note reports experiments suggesting that such problems are not merely theoretical. It presents empirical evidence that well-known GANs approaches do learn distributions of fairly low support, and thus presumably are not learning the target distribution. The main technical contribution is a new proposed test, based upon the famous birthday paradox, for estimating the support size of the generated distribution.

Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets

This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an {unknown} fraction of {fixed size} of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and flexible approach which, contrarily to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e. the chain distribution approximates the target, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.

StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge

Today, massive amounts of streaming data from smart devices need to be analyzed automatically to realize the Internet of Things. The Complex Event Processing (CEP) paradigm promises low-latency pattern detection on event streams. However, CEP systems need to be extended with Machine Learning (ML) capabilities such as online training and inference in order to be able to detect fuzzy patterns (e.g., outliers) and to improve pattern recognition accuracy during runtime using incremental model training. In this paper, we propose a distributed CEP system denoted as StreamLearner for ML-enabled complex event detection. The proposed programming model and data-parallel system architecture enable a wide range of real-world applications and allow for dynamically scaling up and out system resources for low-latency, high-throughput event processing. We show that the DEBS Grand Challenge 2017 case study (i.e., anomaly detection in smart factories) integrates seamlessly into the StreamLearner API. Our experiments verify scalability and high event throughput of StreamLearner.

Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables

Bayesian neural networks (BNNs) with latent variables are probabilistic models which can automatically identify complex stochastic patterns in the data. We describe and study in these models a decomposition of predictive uncertainty into its epistemic and aleatoric components. First, we show how such a decomposition arises naturally in a Bayesian active learning scenario by following an information theoretic approach. Second, we use a similar decomposition to develop a novel risk sensitive objective for safe reinforcement learning (RL). This objective minimizes the effect of model bias in environments whose stochastic dynamics are described by BNNs with latent variables. Our experiments illustrate the usefulness of the resulting decomposition in active learning and safe RL settings.

The Optimal Route and Stops for a Group of Users in a Road Network
Formation Maneuvering Control of Multiple Nonholonomic Robotic Vehicles: Theory and Experimentation
Synchronization in Dynamic Networks
Growing Linear Consensus Networks Endowed by Spectral Systemic Performance Measures
On a conjecture in second-order optimality conditions
Cover Tree Compressed Sensing for Fast MR Fingerprint Recovery
Time series experiments and causal estimands: exact randomization tests and trading
A practical fpt algorithm for Flow Decomposition and transcript assembly
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Control Synthesis for High-Dimensional Systems With Counting Constraints
Precise deviations for Cox processes with shot noise
Full Randomness in the Higher Difference Structure of Two-state Markov Chains
Preserving Intermediate Objectives: One Simple Trick to Improve Learning for Hierarchical Models
Collaborative Deep Learning in Fixed Topology Networks
On Sampling Strategies for Neural Network-based Collaborative Filtering
On the numerical rank of radial basis function kernel matrices in high dimension
Fundamental Matrix Estimation: A Study of Error Criteria
Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling
A Note on a Communication Game
Reservoir Computing on the Hypersphere
High-dimensional Linear Regression for Dependent Observations with Application to Nowcasting
Tree-Residue Vertex-Breaking: a new tool for proving hardness
Deep Mixture of Diverse Experts for Large-Scale Visual Recognition
Joint and Competitive Caching Designs in Large-Scale Multi-Tier Wireless Multicasting Networks
Random-field-induced disordering mechanism in a disordered ferromagnet: Between the Imry-Ma and the standard disordering mechanism
Encoder-Decoder Shift-Reduce Syntactic Parsing
On Validity of Reed Conjecture for {P_5, Flag^C}-free graphs
Multi-agent constrained optimization of a strongly convex function over time-varying directed networks
Large-Scale Human Activity Mapping using Geo-Tagged Videos
Cluster Based Symbolic Representation for Skewed Text Categorization
Online Participatory Sensing in Double Auction Environment with Location Information
The Semantic Information Method for Maximum Mutual Information and Maximum Likelihood of Tests, Estimations, and Mixture Models
Twisted Recurrence via Polynomial Walks
Notes on Random Walks in the Cauchy Domain of Attraction
A Variational EM Method for Pole-Zero Modeling of Speech with Mixed Block Sparse and Gaussian Excitation
Optimal Feedback Selection for Structurally Cyclic Systems with Dedicated Actuators and Sensors
ISTA-Net: Iterative Shrinkage-Thresholding Algorithm Inspired Deep Network for Image Compressive Sensing
Justifications in Constraint Handling Rules for Logical Retraction in Dynamic Algorithms
Thinnable Ideals and Invariance of Cluster Points
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset
Sparsity-Based STAP Design Based on Alternating Direction Method with Gain/Phase Errors
Martingale-coboundary decomposition for stationary random fields
A Regress-Later Algorithm for Backward Stochastic Differential Equations
Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems
Temporal-related Convolutional-Restricted-Boltzmann-Machine capable of learning relational order via reinforcement learning procedure?
Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications
On integer network synthesis problem with tree-metric cost
FAIR: A Hadoop-based Hybrid Model for Faculty Information Retrieval System
Robust Sparse Covariance Estimation by Thresholding Tyler’s M-Estimator
Online Power Control for Block i.i.d. Energy Harvesting Channels
On generalizations of $p$-sets and their applications
A splitter theorem for 3-connected 2-polymatroids
Intrinsic Ultracontractivity of Non-local Dirichlet forms on Unbounded Open Sets
Decomposing Motion and Content for Natural Video Sequence Prediction
Uncertainty quantification and design for noisy matrix completion – a unified framework
Sparsity Enables Estimation of both Subcortical and Cortical Activity from MEG and EEG
An Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles (Extended)
Minimum Connected Transversals in Graphs: New Hardness Results and Tractable Cases Using the Price of Connectivity
Development of structural correlations and synchronization from adaptive rewiring in networks of Kuramoto oscillators
Simplifying the Kohlberg Criterion on the Nucleolus: A Correct Approach
Target contrastive pessimistic risk for robust domain adaptation
Efficient and accurate monitoring of the depth information in a Wireless Multimedia Sensor Network based surveillance
Finding optimal finite biological sequences over finite alphabets: the OptiFin toolbox
Count-Based Exploration in Feature Space for Reinforcement Learning
Expected volumes of Gaussian polytopes, external angles, and multiple order statistics
One random jump and one permutation: sufficient conditions to chaotic, statistically faultless, and large throughput PRNG for FPGA
Interactive Exploration and Discovery of Scientific Publications with PubVis
Merging real and virtual worlds: An analysis of the state of the art and practical evaluation of Microsoft Hololens
Flexible Rectified Linear Units for Improving Convolutional Neural Networks
Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version)
Random Forests for Industrial Device Functioning Diagnostics Using Wireless Sensor Networks
Detekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych
Matrix Hilbert Space
Self-Learning Phase Boundaries by Active Contours
Some new results on the self-dual [120,60,24] code
Steiner Point Removal with Distortion $O(\log k)$
Survival probabilities and maxima of sums of correlated increments with applications to one-dimensional cellular automata
Large sets avoiding linear patterns
Scalable multimodal convolutional networks for brain tumour segmentation
ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools
A Security Framework for Wireless Sensor Networks: Theory and Practice
Restricted size Ramsey number for $P_3$ versus cycles
A Unified Analysis of Stochastic Optimization Methods Using Jump System Theory and Quadratic Constraints
Revenue Loss in Shrinking Markets
Value Asymptotics in Dynamic Games on Large Horizons
An algorithm to find maximum area polygons circumscribed about a convex polygon
Photometric Stereo by Hemispherical Metric Embedding
Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context
Perfectly Dominating the Lattice Graph of $\mathbb{Z}^{3}$ with Squares
Phase retrieval using alternating minimization in a batch setting
Image transformations on locally compact spaces
Faster ICA by preconditioning with Hessian approximations
Strong Converses Are Just Edge Removal Properties
Smith and Critical groups of Polar Graphs
A preference elicitation interface for collecting rich recommender datasets
Listing Words in Free Groups
Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics
Sparse Output Feedback Synthesis via Proximal Alternating Linearization Method
Dickman approximation in simulation, summations and perpetuities
English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor
A Proof of Vivo-Pato-Oshanin’s Conjecture on the Fluctuation of von Neumann Entropy
Dr.VAE: Drug Response Variational Autoencoder
A sequential surrogate method for reliability analysis based on radial basis function
IS-ASGD: Importance Sampling Accelerated Asynchronous SGD on Multi-Core Systems
End-to-end Learning of Image based Lane-Change Decision
NOMA in 5G Systems: Exciting Possibilities for Enhancing Spectral Efficiency
Phase transition for a non-attractive infection process in heterogeneous environment
An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform
YoTube: Searching Action Proposal via Recurrent and Static Regression Networks
Asymptotic Existence of Fair Divisions for Groups
YouTube-8M Video Understanding Challenge Approach and Applications
Ramanujan-type congruences for certain weighted 7-colored partitions
Multi-level SVM Based CAD Tool for Classifying Structural MRIs
Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription
Survival probabilities of high-dimensional stochastic SIS and SIR models with random edge weights
Testing normality for unconditionally heteroscedastic macroeconomic variables
Interferometric control of the photon-number distribution
Spatial Risk Measure for Max-Stable and Max-Mixture Processes
Few-shot Object Detection
New procedures for discrete tests with proven false discovery rate control
Lebesgue and gaussian measure of unions of basic semi-algebraic sets
Deep Semantics-Aware Photo Adjustment
Efficient Manifold and Subspace Approximations with Spherelets
Adaptive Strategies for The Open-Pit Mine Optimal Scheduling Problem
Top-down Transformation Choice
Multilevel Monte Carlo Method for Statistical Model Checking of Hybrid Systems
Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates
State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Markov Models
Wideband DOA Estimation through Projection Matrix Interpolation
Estimation of species relative abundances and habitat preferences using opportunistic data
On the Komlós, Major and Tusnády strong approximation for some classes of random iterates
Quantum thermostatted disordered systems and sensitivity under compression
A hypothesis testing approach for communication over entanglement assisted compound quantum channel
Data depth and rank-based tests for covariance and spectral density matrices
A Publish/Subscribe System Using Causal Broadcast Over Dynamically Built Spanning Trees
Handling PDDL3.0 State Trajectory Constraints with Temporal Landmarks
Unemployment estimation: Spatial point referenced methods and models
Multi-Label Learning with Label Enhancement
An adaptive prefix-assignment technique for symmetry reduction
The Boolean Solution Problem from the Perspective of Predicate Logic – Extended Version
On tree-decompositions of one-ended graphs
Universal limits of sunstitution-closed permutation classes
A Meta-Learning Approach to One-Step Active Learning
Semantically Informed Multiview Surface Refinement
On concentration properties of disordered Hamiltonians
Monotonicity of functionals of random polytopes
Location of the spectrum of Kronecker random matrices
High-dimensional classification by sparse logistic regression
Beyond Moore-Penrose Part I: Generalized Inverses that Minimize Matrix Norms
Recurrence and Ergodicity of Switching Diffusions with Past-Dependent Switching Having A Countable State Space
Deep Semantic Classification for 3D LiDAR Data
GPU-acceleration for Large-scale Tree Boosting
Extremes of $L^p$-norm of Vector-valued Gaussian processes with Trend
Dynamic Load Balancing for PIC code using Eulerian/Lagrangian partitioning
Ordered and Delayed Adversaries and How to Work against Them on Shared Channel
Bounds on the length of a game of Cops and Robbers
Metastable Behavior of Bootstrap Percolation on Galton-Watson Trees
On risk averse competitive equilibrium
Counting Restricted Homomorphisms via Möbius Inversion over Matroid Lattices
Nonseparable Multinomial Choice Models in Cross-Section and Panel Data
Ergodic aspects of some Ornstein-Uhlenbeck type processes related to L{é}vy processes
Approximate Steepest Coordinate Descent
Bounds on the Satisfiability Threshold for Power Law Distributed Random SAT
Image Processing in Floriculture Using a robotic Mobile Platform
On branching-point selection for triple products in spatial branch-and-bound: the hull relaxation
Optimal choice problem and its solutions
Challenges to estimating contagion effects from observational data
Learning to Map Vehicles into Bird’s Eye View
Edge of spiked beta ensembles, stochastic Airy semigroups and reflected Brownian motions
Iterative Random Forests to detect predictive and stable high-order interactions
Preasymptotic Convergence of Randomized Kaczmarz Method
Is the Riemann zeta function in a short interval a 1-RSB spin glass ?
Paths in hypergraphs: a rescaling phenomenon
Inverse Ising inference by combining Ornstein-Zernike theory with deep learning
Distributed compression through the lens of algorithmic information theory: a primer
Efficiency of quantum versus classical annealing in non-convex learning problems
Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Deep Network Flow for Multi-Object Tracking
Complexity of the Regularized Newton Method
Non-Orthogonal Multiple Access combined with Random Linear Network Coded Cooperation
Cognitive Subscore Trajectory Prediction in Alzheimer’s Disease
Towards the Evolution of Multi-Layered Neural Networks: A Dynamic Structured Grammatical Evolution Approach
On Signal Reconstruction from FROG Measurements
Spectrally-normalized margin bounds for neural networks
GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium
A Simulator for Hedonic Games
Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog