This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine’s query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM’s advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.
Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ‘teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour.
Over the years, many different indexing techniques and search algorithms have been proposed, including CSS-trees, CSB+ trees, k-ary binary search, and fast architecture sensitive tree search. There have also been papers on how best to set the many different parameters of these index structures, such as the node size of CSB+ trees. These indices have been proposed because CPU speeds have been increasing at a dramatically higher rate than memory speeds, giving rise to the Von Neumann CPU–Memory bottleneck. To hide the long latencies caused by memory access, it has become very important to well-utilize the features of modern CPUs. In order to drive down the average number of CPU clock cycles required to execute CPU instructions, and thus increase throughput, it has become important to achieve a good utilization of CPU resources. Some of these are the data and instruction caches, and the translation lookaside buffers. But it also has become important to avoid branch misprediction penalties, and utilize vectorization provided by CPUs in the form of SIMD instructions. While the layout of index structures has been heavily optimized for the data cache of modern CPUs, the instruction cache has been neglected so far. In this paper, we present NitroGen, a framework for utilizing code generation for speeding up index traversal in main memory database systems. By bringing together data and code, we make index structures use the dormant resource of the instruction cache. We show how to combine index compilation with previous approaches, such as binary tree search, cache-sensitive tree search, and the architecture-sensitive tree search presented by Kim et al.
In 1991 J.F. Aarnes introduced the concept of quasi-measures in a compact topological space $\Omega$ and established the connection between quasi-states on $C (\Omega)$ and quasi-measures in $\Omega$. This work solved the linearity problem of quasi-states on $C^*$-algebras formulated by R.V. Kadison in 1965. The answer is that a quasi-state need not be linear, so a quasi-state need not be a state. We introduce nonlinear measures in a space $\Omega$ which is a generalization of a measurable space. In this more general setting we are still able to define integration and establish a representation theorem for the corresponding functionals. A probabilistic language is choosen since we feel that the subject should be of some interest to probabilists. In particular we point out that the theory allows for incompatible stochastic variables. The need for incompatible variables is well known in quantum mechanics, but the need seems natural also in other contexts as we try to explain by a questionary example. Keywords and phrases: Epistemic probability, Integration with respect to mea- sures and other set functions, Banach algebras of continuous functions, Set func- tions and measures on topological spaces, States, Logical foundations of quantum mechanics.
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic non-parametric approach, called ‘Non-Parametric Generalized Linear Model’ (NP-GLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on both synthetic data and real-world Sina Weibo social network demonstrate the effectiveness of NP-GLM in solving temporal link prediction problem vis-a-vis competitive baselines.
Deep Convolutional Neural Networks (CNNs) are capable of learning unprecedentedly effective features from images. Some researchers have struggled to enhance the parameters’ efficiency using grouped convolution. However, the relation between the optimal number of convolutional groups and the recognition performance remains an open problem. In this paper, we propose a series of Basic Units (BUs) and a two-level merging strategy to construct deep CNNs, referred to as a joint Grouped Merging Net (GM-Net), which can produce joint grouped and reused deep features while maintaining the feature discriminability for classification tasks. Our GM-Net architectures with the proposed BU_A (dense connection) and BU_B (straight mapping) lead to significant reduction in the number of network parameters and obtain performance improvement in image classification tasks. Extensive experiments are conducted to validate the superior performance of the GM-Net than the state-of-the-arts on the benchmark datasets, e.g., MNIST, CIFAR-10, CIFAR-100 and SVHN.
Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods have been proposed including im2col-based convolution, FFT-based convolution, or Winograd-based algorithm. However, all these indirect methods have high memory-overhead, which creates performance degradation and offers a poor trade-off between performance and memory consumption. In this work, we propose a memory-efficient convolution or MEC with compact lowering, which reduces memory-overhead substantially and accelerates convolution process. MEC lowers the input matrix in a simple yet efficient/compact way (i.e., much less memory-overhead), and then executes multiple small matrix multiplications in parallel to get convolution completed. Additionally, the reduced memory footprint improves memory sub-system efficiency, improving performance. Our experimental results show that MEC reduces memory consumption significantly with good speedup on both mobile and server platforms, compared with other indirect convolution algorithms.
In the class of streaming anomaly detection algorithms for univariate time series, the size of the sliding window over which various statistics are calculated is an important parameter. To address the anomalous variation in the scale of the pseudo-periodicity of time series, we define a streaming multi-scale anomaly score with a streaming PCA over a multi-scale lag-matrix. We define three methods of aggregation of the multi-scale anomaly scores. We evaluate their performance on Yahoo! and Numenta dataset for unsupervised anomaly detection benchmark. To the best of authors’ knowledge, this is the first time a multi-scale streaming anomaly detection has been proposed and systematically studied.
Graph representations offer powerful and intuitive ways to describe data in a multitude of application domains. Here, we consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology is general and considers a process generating attributed graphs with a variable number of vertices/edges, without the need to assume one-to-one correspondence between vertices at different time steps. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results. In addition, we provide a specific implementation of the methodology and evaluate its effectiveness on several detection problems involving attributed graphs representing biological molecules and drawings. Experimental results are contrasted with respect to suitable baseline methods, demonstrating the competitiveness of our approach.
This paper introduces a new framework for real-time decision making in video games. An Ensemble agent is a compound agent composed of multiple agents, each with its own tasks or goals to achieve. Usually when dealing with real-time decision making, reactive agents are used; that is agents that return a decision based on the current state. While reactive agents are very fast, most games require more than just a rule-based agent to achieve good results. Deliberative agents—agents that use a forward model to search future states—are very useful in games with no hard time limit, such as Go or Backgammon, but generally take too long for real-time games. The Ensemble framework addresses this issue by allowing the agent to be both deliberative and reactive at the same time. This is achieved by breaking up the game-play into logical roles and having highly focused components for each role, with each component disregarding anything outwith its own role. Reactive agents can be used where a reactive agent is suited to the role, and where a deliberative approach is required, branching is kept to a minimum by the removal of all extraneous factors, enabling an informed decision to be made within a much smaller time-frame. An Arbiter is used to combine the component results, allowing high performing agents to be created from simple, efficient components.
Big data are often contaminated by outliers and heavy-tailed errors, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our framework is able to handle heavy-tailed data with bounded $(1 \! + \! \delta)$-th moment for any $\delta\!>\!0$. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when $\delta \!\geq\! 1$, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime $0 \!<\! \delta \!<\! 1$ and the transition is smooth and optimal. Moreover, a nonasymptotic Bahadur representation for finite-sample inference is derived when the variance is finite. Numerical experiments lend further support to our obtained theory.
Fulfilling the promise of precision medicine requires accurately and precisely classifying disease states. For cancer, this includes prediction of survival time from a surfeit of covariates. Such data presents an opportunity for improved prediction, but also a challenge due to high dimensionality. Furthermore, disease populations can be heterogeneous. Integrative modeling is sensible, as the underlying hypothesis is that joint analysis of multiple covariates provides greater explanatory power than separate analyses. We propose an integrative latent variable model that combines factor analysis for various data types and an exponential Cox proportional hazards model for continuous survival time with informative censoring. The factor and Cox models are connected through low-dimensional latent variables that can be interpreted and visualized to identify subpopulations. We use this model to predict survival time. We demonstrate this model’s utility in simulation and on four Cancer Genome Atlas datasets: diffuse lower-grade glioma, glioblastoma multiforme, lung adenocarcinoma, and lung squamous cell carcinoma. These datasets have small sample sizes, high-dimensional diverse covariates, and high censorship rates. We compare the predictions from our model to two alternative models. Our model outperforms in simulation and is competitive on real datasets. Furthermore, the low-dimensional visualization for diffuse lower-grade glioma displays known subpopulations.