End-to-End Neural Ad-hoc Ranking with Kernel Pooling

This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine’s query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM’s advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.

Observational Learning by Reinforcement Learning

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ‘teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour.

Index Search Algorithms for Databases and Modern CPUs

Over the years, many different indexing techniques and search algorithms have been proposed, including CSS-trees, CSB+ trees, k-ary binary search, and fast architecture sensitive tree search. There have also been papers on how best to set the many different parameters of these index structures, such as the node size of CSB+ trees. These indices have been proposed because CPU speeds have been increasing at a dramatically higher rate than memory speeds, giving rise to the Von Neumann CPU–Memory bottleneck. To hide the long latencies caused by memory access, it has become very important to well-utilize the features of modern CPUs. In order to drive down the average number of CPU clock cycles required to execute CPU instructions, and thus increase throughput, it has become important to achieve a good utilization of CPU resources. Some of these are the data and instruction caches, and the translation lookaside buffers. But it also has become important to avoid branch misprediction penalties, and utilize vectorization provided by CPUs in the form of SIMD instructions. While the layout of index structures has been heavily optimized for the data cache of modern CPUs, the instruction cache has been neglected so far. In this paper, we present NitroGen, a framework for utilizing code generation for speeding up index traversal in main memory database systems. By bringing together data and code, we make index structures use the dormant resource of the instruction cache. We show how to combine index compilation with previous approaches, such as binary tree search, cache-sensitive tree search, and the architecture-sensitive tree search presented by Kim et al.

Nonlinear probability. A theory with incompatible stochastic variables

In 1991 J.F. Aarnes introduced the concept of quasi-measures in a compact topological space \Omega and established the connection between quasi-states on C (\Omega) and quasi-measures in \Omega. This work solved the linearity problem of quasi-states on C^*-algebras formulated by R.V. Kadison in 1965. The answer is that a quasi-state need not be linear, so a quasi-state need not be a state. We introduce nonlinear measures in a space \Omega which is a generalization of a measurable space. In this more general setting we are still able to define integration and establish a representation theorem for the corresponding functionals. A probabilistic language is choosen since we feel that the subject should be of some interest to probabilists. In particular we point out that the theory allows for incompatible stochastic variables. The need for incompatible variables is well known in quantum mechanics, but the need seems natural also in other contexts as we try to explain by a questionary example. Keywords and phrases: Epistemic probability, Integration with respect to mea- sures and other set functions, Banach algebras of continuous functions, Set func- tions and measures on topological spaces, States, Logical foundations of quantum mechanics.

NPGLM: A Non-Parametric Method for Temporal Link Prediction

In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic non-parametric approach, called ‘Non-Parametric Generalized Linear Model’ (NP-GLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on both synthetic data and real-world Sina Weibo social network demonstrate the effectiveness of NP-GLM in solving temporal link prediction problem vis-a-vis competitive baselines.

GM-Net: Learning Features with More Efficiency

Deep Convolutional Neural Networks (CNNs) are capable of learning unprecedentedly effective features from images. Some researchers have struggled to enhance the parameters’ efficiency using grouped convolution. However, the relation between the optimal number of convolutional groups and the recognition performance remains an open problem. In this paper, we propose a series of Basic Units (BUs) and a two-level merging strategy to construct deep CNNs, referred to as a joint Grouped Merging Net (GM-Net), which can produce joint grouped and reused deep features while maintaining the feature discriminability for classification tasks. Our GM-Net architectures with the proposed BU_A (dense connection) and BU_B (straight mapping) lead to significant reduction in the number of network parameters and obtain performance improvement in image classification tasks. Extensive experiments are conducted to validate the superior performance of the GM-Net than the state-of-the-arts on the benchmark datasets, e.g., MNIST, CIFAR-10, CIFAR-100 and SVHN.

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods have been proposed including im2col-based convolution, FFT-based convolution, or Winograd-based algorithm. However, all these indirect methods have high memory-overhead, which creates performance degradation and offers a poor trade-off between performance and memory consumption. In this work, we propose a memory-efficient convolution or MEC with compact lowering, which reduces memory-overhead substantially and accelerates convolution process. MEC lowers the input matrix in a simple yet efficient/compact way (i.e., much less memory-overhead), and then executes multiple small matrix multiplications in parallel to get convolution completed. Additionally, the reduced memory footprint improves memory sub-system efficiency, improving performance. Our experimental results show that MEC reduces memory consumption significantly with good speedup on both mobile and server platforms, compared with other indirect convolution algorithms.

Multi-scale streaming anomalies detection for time series

In the class of streaming anomaly detection algorithms for univariate time series, the size of the sliding window over which various statistics are calculated is an important parameter. To address the anomalous variation in the scale of the pseudo-periodicity of time series, we define a streaming multi-scale anomaly score with a streaming PCA over a multi-scale lag-matrix. We define three methods of aggregation of the multi-scale anomaly scores. We evaluate their performance on Yahoo! and Numenta dataset for unsupervised anomaly detection benchmark. To the best of authors’ knowledge, this is the first time a multi-scale streaming anomaly detection has been proposed and systematically studied.

Concept Drift and Anomaly Detection in Graph Streams

Graph representations offer powerful and intuitive ways to describe data in a multitude of application domains. Here, we consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology is general and considers a process generating attributed graphs with a variable number of vertices/edges, without the need to assume one-to-one correspondence between vertices at different time steps. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results. In addition, we provide a specific implementation of the methodology and evaluate its effectiveness on several detection problems involving attributed graphs representing biological molecules and drawings. Experimental results are contrasted with respect to suitable baseline methods, demonstrating the competitiveness of our approach.

Ensemble Framework for Real-time Decision Making

This paper introduces a new framework for real-time decision making in video games. An Ensemble agent is a compound agent composed of multiple agents, each with its own tasks or goals to achieve. Usually when dealing with real-time decision making, reactive agents are used; that is agents that return a decision based on the current state. While reactive agents are very fast, most games require more than just a rule-based agent to achieve good results. Deliberative agents—agents that use a forward model to search future states—are very useful in games with no hard time limit, such as Go or Backgammon, but generally take too long for real-time games. The Ensemble framework addresses this issue by allowing the agent to be both deliberative and reactive at the same time. This is achieved by breaking up the game-play into logical roles and having highly focused components for each role, with each component disregarding anything outwith its own role. Reactive agents can be used where a reactive agent is suited to the role, and where a deliberative approach is required, branching is kept to a minimum by the removal of all extraneous factors, enabling an informed decision to be made within a much smaller time-frame. An Arbiter is used to combine the component results, allowing high performing agents to be created from simple, efficient components.

Adaptive Huber Regression: Optimality and Phase Transition

Big data are often contaminated by outliers and heavy-tailed errors, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our framework is able to handle heavy-tailed data with bounded (1 \! + \! \delta)-th moment for any \delta\!>\!0. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when \delta \!\geq\! 1, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime 0 \!<\! \delta \!<\! 1 and the transition is smooth and optimal. Moreover, a nonasymptotic Bahadur representation for finite-sample inference is derived when the variance is finite. Numerical experiments lend further support to our obtained theory.

A latent variable model for survival time prediction with censoring and diverse covariates

Fulfilling the promise of precision medicine requires accurately and precisely classifying disease states. For cancer, this includes prediction of survival time from a surfeit of covariates. Such data presents an opportunity for improved prediction, but also a challenge due to high dimensionality. Furthermore, disease populations can be heterogeneous. Integrative modeling is sensible, as the underlying hypothesis is that joint analysis of multiple covariates provides greater explanatory power than separate analyses. We propose an integrative latent variable model that combines factor analysis for various data types and an exponential Cox proportional hazards model for continuous survival time with informative censoring. The factor and Cox models are connected through low-dimensional latent variables that can be interpreted and visualized to identify subpopulations. We use this model to predict survival time. We demonstrate this model’s utility in simulation and on four Cancer Genome Atlas datasets: diffuse lower-grade glioma, glioblastoma multiforme, lung adenocarcinoma, and lung squamous cell carcinoma. These datasets have small sample sizes, high-dimensional diverse covariates, and high censorship rates. We compare the predictions from our model to two alternative models. Our model outperforms in simulation and is competitive on real datasets. Furthermore, the low-dimensional visualization for diffuse lower-grade glioma displays known subpopulations.

Tunneling probe of fluctuating superconductivity in disordered thin films
Individualized Treatment Effects with Censored Data via Fully Nonparametric Bayesian Accelerated Failure Time Models
Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy
Analog CMOS-based Resistive Processing Unit for Deep Neural Network Training
$q$-Stirling numbers revisited
On the directed Oberwolfach Problem with equal cycle lengths: the odd case
Models for Configuration Space in a Simplicial Complex
Universality and correlations in individuals wandering through an online extremist space
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
On the joint asymptotic distribution of the restricted estimators in multivariate regression model
Word-Entity Duet Representations for Document Ranking
On convergence of the sample correlation matrices in high-dimensional data
Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
Multi-objective, Decentralized Dynamic Virtual Machine Consolidation using ACO Metaheuristic in Computing Clouds
Pseudocodeword-Free Criterion for Codes with Cycle-Free Tanner Graph
Passive Classification of Source Printer using Text-line-level Geometric Distortion Signatures from Scanned Images of Printed Documents
Asymptotics of free fermions in a quadratic well at finite temperature and the Moshe-Neuberger-Shapiro random matrix model
BB-Graph: A New Subgraph Isomorphism Algorithm for Efficiently Querying Big Graph Databases
Zeckendorf’s Theorem and Fibonacci Coding for Modules
A Bootstrap Method for Sinusoid Detection in Colored Noise and Uneven Sampling. Application to Exoplanet Detection
Crowdsourcing with Sparsely Interacting Workers
Arrays of (locality-sensitive) Count Estimators (ACE): High-Speed Anomaly Detection via Cache Lookups
Fixed-point-free involutions and Schur P-positivity
Stable limit laws and structure of the scaling function for reaction-diffusion in random environment
Certain Properties Related to Well Posedness of Switching Diffusions
Graph-based Neural Multi-Document Summarization
Single- and Multiple-Shell Uniform Sampling Schemes for Diffusion MRI Using Spherical Codes
Computing maximum cliques in $B_2$-EPG graphs
Reputation blackboard systems
Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models
Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
Ergodic Fading MIMO Dirty Paper and Broadcast Channels: Capacity Bounds and Lattice Strategies
Recognition of Grasp Points for Clothes Manipulation under unconstrained Conditions
Toward Real-Time Decentralized Reinforcement Learning using Finite Support Basis Functions
Representation Learning using Event-based STDP
Towards a Recommender System for Undergraduate Research
Using Convolutional Neural Networks in Robots with Limited Computational Resources: Detecting NAO Robots while Playing Soccer
New Parsimonious Multivariate Spatial Model: Spatial Envelope
Compact Tensor Pooling for Visual Question Answering
Solving the Rubik’s Cube Optimally is NP-complete
Freeness and The Partial Transposes of Wishart Random Matrices
Neural-based Natural Language Generation in Dialogue using RNN Encoder-Decoder with Semantic Aggregation
Your Click Knows It: Predicting User Purchase through Improved User-Item Pairwise Relationship
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Deep Learning Autoencoder Approach for Handwritten Arabic Digits Recognition
The dynamic three-dimensional Anderson localization of optical fields in active percolating systems
Improvements of Plachky-Steinebach theorem
Hurwitz Theory of Elliptic Orbifolds
Variational inference for coupled Hidden Markov Models applied to the joint detection of copy number variations
Spectral and Energy Efficiency of Multi-pair Massive MIMO Relay Network with Hybrid Processing
Iterative Splitting Methods for Coulomb Collisions in Plasma Simulations
Cross-language Learning with Adversarial Neural Networks: Application to Community Question Answering
GPGPU Acceleration of the KAZE Image Feature Extraction Algorithm
Comicolorization : Semi-automatic Manga Colorization
Unlocking datasets by calibrating populations of models to data density: a study in atrial electrophysiology
A geometric proof of the polarization property
Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
The modular SAXS data correction sequence for solids and dispersions
Optimal modification of the LRT for the equality of two high-dimensional covariance matrices
Variable-to-Fixed Length Homophonic Coding Suitable for Asymmetric Channel Coding
Object Detection Using Deep CNNs Trained on Synthetic Images
Agreement Protocols on an Arbitrary Network in the Presence of a Mobile Adversary
Critical eigenstates and their properties in one and two dimensional quasicrystals
JaTeCS an open-source JAva TExt Categorization System
Maxent-Stress Optimization of 3D Biomolecular Models
Approximating Sparsest Cut in Low Rank Graphs via Embeddings from Approximately Low-Dimensional Spaces
Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification
On Performance of Quantized Transceiver in Multiuser Massive MIMO Downlinks
Irregular independence and irregular domination
Mild solutions to the dynamic programming equation for stochastic optimal control problems
New lower bounds for $t$-coverings
Structure Learning in Motor Control:A Deep Reinforcement Learning Model
Many Touchings Force Many Crossings
Enriching Existing Test Collections with OXPath
A giant with feet of clay: on the validity of the data that feed machine learning in medicine
The $W,Z$ scale functions kit for first passage problems of spectrally negative Levy processes, and applications to the optimization of dividends
Optimal control of non-autonomous SEIRS models with vaccination and treatment
Discrete Approximation of Two-Stage Stochastic and Distributionally Robust Linear Complementarity Problems
Fractal-Dimensional Properties of Subordinators
Metric dimension of Andrasfai graphs
Intrinsic Capacity
Adaptive Multilevel Monte Carlo Approximation of Distribution Functions
Probabilistically-Shaped Coded Modulation with Hard Decision Decoding and Staircase Codes
An Unsupervised Method to Assess the Global Horizontal Irradiance from Photovoltaic Power Measurements
On the Performance of Ultra-Reliable Decode and Forward Relaying Under the Finite Blocklength
The monoids of the patience sorting algorithm
gk: An R Package for the g-and-k and generalised g-and-h Distributions
Stance Detection in Turkish Tweets
The Augmentation Property of Binary Matrices for the Binary and Boolean Rank
Aircraft routing and crew pairing: updated algorithms at Air France
Learnable pooling with Context Gating for video classification
Expert and Non-Expert Opinion about Technological Unemployment
Sparse and Smooth Prior for Bayesian Linear Regression with Application to ETEX Data
Minimum Cost Feedback Selection for Arbitrary Pole Placement in Structured Systems
Hippocampal Spike-Timing Correlations Lead to Hexagonal Grid Fields
Class-specific image denoising using importance sampling
cGAN-based Manga Colorization Using a Single Training Image
Genetic Algorithm with Optimal Recombination for the Asymmetric Travelling Salesman Problem
Improved upper bounds in the moving sofa problem
Trade-off preservation in inverse multi-objective convex optimization
Combined Task and Motion Planning as Classical AI Planning
Non-Nudgable Subgroups of Permutations
Faster Monte-Carlo Algorithms for Fixation Probability of the Moran Process on Undirected Graphs
Exact Learning of Juntas from Membership Queries
Faster batched range minimum queries
Graphcut Texture Synthesis for Single-Image Superresolution
On vertex-disjoint paths in regular graphs
Exact Coupling of Random Walks on Polish Groups
Comparing deep neural networks against humans: object recognition when the signal gets weaker
Ensembles of Models and Metrics for Robust Ranking of Homologous Proteins
Online Convolutional Sparse Coding
The Theory is Predictive, but is it Complete? An Application to Human Perception of Randomness
The effect of the spatial domain in FANOVA models with ARH(1) error term
A sharp oracle inequality for Graph-Slope
Deep Interest Network for Click-Through Rate Prediction
Higher-order derivative of intersection local time for two independent fractional Brownian motions
Two-Stream Convolutional Networks for Dynamic Texture Synthesis
An Improved Second Order Poincaré Inequality for Functionals of Gaussian Fields
A Generative Model of Group Conversation
Optimal Unconstrained Pulse Inputs to the Bergman Minimal Model
Obstacle Numbers of Planar Graphs
Disjoint pairs in set systems with restricted intersection
Language That Matters: Statistical Inferences for Polarity Identification in Natural Language
Constant Composition Codes as Subcodes of Linear Codes
Secret Sharing and Shared Information
Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations
Uncertainty-Aware Organ Classification for Surgical Data Science Applications in Laparoscopy
UAV-Enabled Wireless Power Transfer: Trajectory Design and Energy Region Characterization
Local bandwidth selection for kernel density estimation in bifurcating Markov chain model
The Capacity of Cache Aided Private Information Retrieval
Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction
Combined Heat and Power Unit Commitment with Smart Parking Lots of Plug-in Electric Vehicles