In this paper we describe a method to detect event descriptions in different news articles and to model the semantics of events and their components using RDF representations. We compare these descriptions to solve a cross-document event coreference task. Our component approach to event semantics defines identity and granularity of events at different levels. It performs close to state-of-the-art approaches on the cross-document event coreference task, while outperforming other works when assuming similar quality of event detection. We demonstrate how granularity and identity are interconnected and we discuss how semantic anomaly could be used to define differences between coreference, subevent and topical relations.
Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.
We present DAPIP, a Programming-By-Example system that learns to program with APIs to perform data transformation tasks. We design a domain-specific language (DSL) that allows for arbitrary concatenations of API outputs and constant strings. The DSL consists of three family of APIs: regular expression-based APIs, lookup APIs, and transformation APIs. We then present a novel neural synthesis algorithm to search for programs in the DSL that are consistent with a given set of examples. The search algorithm uses recently introduced neural architectures to encode input-output examples and to model the program search in the DSL. We show that synthesis algorithm outperforms baseline methods for synthesizing programs on both synthetic and real-world benchmarks.
As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of the shared representation, by modeling the pairwise similar and dissimilar constraints. Compared to the existing methods which mostly ignore the dissimilar constraints and only use sample distance metric as Euclidean distance separately, our UNCSM approach unifies the representation learning and distance metric to preserve the relative similarity as well as embrace more complex similarity functions for further improving the cross-media retrieval accuracy. The experimental results show that our UNCSM approach outperforms 8 state-of-the-art methods on 4 widely-used cross-media datasets.
In this work, we consider an extension of graphical models to random graphs, trees, and other objects. To do this, many fundamental concepts for multivariate random variables (e.g., marginal variables, Gibbs distribution, Markov properties) must be extended to other mathematical objects; it turns out that this extension is possible, as we will discuss, if we have a consistent, complete system of projections on a given object. Each projection defines a marginal random variable, allowing one to specify independence assumptions between them. Furthermore, these independencies can be specified in terms of a small subset of these marginal variables (which we call the atomic variables), allowing the compact representation of independencies by a directed graph. Projections also define factors, functions on the projected object space, and hence a projection family defines a set of possible factorizations for a distribution; these can be compactly represented by an undirected graph. The invariances used in graphical models are essential for learning distributions, not just on multivariate random variables, but also on other objects. When they are applied to random graphs and random trees, the result is a general class of models that is applicable to a broad range of problems, including those in which the graphs and trees have complicated edge structures. These models need not be conditioned on a fixed number of vertices, as is often the case in the literature for random graphs, and can be used for problems in which attributes are associated with vertices and edges. For graphs, applications include the modeling of molecules, neural networks, and relational real-world scenes; for trees, applications include the modeling of infectious diseases, cell fusion, the structure of language, and the structure of objects in visual scenes. Many classic models are particular instances of this framework.
Backwards analysis, first popularized by Seidel, is often the simplest most elegant way of analyzing a randomized algorithm. It applies to incremental algorithms where elements are added incrementally, following some random permutation, e.g., incremental Delauney triangulation of a pointset, where points are added one by one, and where we always maintain the Delauney triangulation of the points added thus far. For backwards analysis, we think of the permutation as generated backwards, implying that the $i$th point in the permutation is picked uniformly at random from the $i$ points not picked yet in the backwards direction. Backwards analysis has also been applied elegantly by Chan to the randomized linear time minimum spanning tree algorithm of Karger, Klein, and Tarjan. The question considered in this paper is how much randomness we need in order to trust the expected bounds obtained using backwards analysis, exactly and approximately. For the exact case, it turns out that a random permutation works if and only if it is minwise, that is, for any given subset, each element has the same chance of being first. Minwise permutations are known to have $\Theta(n)$ entropy, and this is then also what we need for exact backwards analysis. However, when it comes to approximation, the two concepts diverge dramatically. To get backwards analysis to hold within a factor $\alpha$, the random permutation needs entropy $\Omega(n/\alpha)$. This contrasts with minwise permutations, where it is known that a $1+\varepsilon$ approximation only needs $\Theta(\log (n/\varepsilon))$ entropy. Our negative result for backwards analysis essentially shows that it is as abstract as any analysis based on full randomness.
Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. We train this neural network with reinforcement learning. The actions correspond to selecting terms to build a reformulated query, and the reward is the document recall. We evaluate our approach on three datasets against strong baselines and show a relative improvement of 5-20% in terms of recall. Furthermore, we present a simple method to estimate a conservative upper-bound performance of a model in a particular environment and verify that there is still large room for improvements.
The task of next POI recommendation has been studied extensively in recent years. However, developing an unified recommendation framework to incorporate multiple factors associated with both POIs and users remains challenging, because of the heterogeneity nature of these information. Further, effective mechanisms to handle cold-start and endow the system with interpretability are also difficult topics. Inspired by the recent success of neural networks in many areas, in this paper, we present a simple but effective neural network framework for next POI recommendation, named NEXT. NEXT is an unified framework to learn the hidden intent regarding user’s next move, by incorporating different factors in an unified manner. Specifically, in NEXT, we incorporate meta-data information and two kinds of temporal contexts (i.e., time interval and visit time). To leverage sequential relations and geographical influence, we propose to adopt DeepWalk, a network representation learning technique, to encode such knowledge. We evaluate the effectiveness of NEXT against state-of-the-art alternatives and neural networks based solutions. Experimental results over three publicly available datasets demonstrate that NEXT significantly outperforms baselines in real-time next POI recommendation. Further experiments demonstrate the superiority of NEXT in handling cold-start. More importantly, we show that NEXT provides meaningful explanation of the dimensions in hidden intent space.
Monte Carlo (MC) sampling methods are widely applied in Bayesian inference, system simulation and optimization problems. The Markov Chain Monte Carlo (MCMC) algorithms are a well-known class of MC methods which generate a Markov chain with the desired invariant distribution. In this document, we focus on the Metropolis-Hastings (MH) sampler, which can be considered as the atom of the MCMC techniques, introducing the basic notions and different properties. We describe in details all the elements involved in the MH algorithm and the most relevant variants. Several improvements and recent extensions proposed in the literature are also briefly discussed, providing a quick but exhaustive overview of the current Metropolis-based sampling’s world.
The preceding three decades have seen the emergence, rise, and proliferation of machine learning (ML). From half-recognised beginnings in perceptrons, neural nets, and decision trees, algorithms that extract correlations (that is, patterns) from a set of data points have broken free from their origin in computational cognition to embrace all forms of problem solving, from voice recognition to medical diagnosis to automated scientific research and driverless cars, and it is now widely opined that the real industrial revolution lies less in mobile phone and similar than in the maturation and universal application of ML. Among the consequences just might be the triumph of anti-realism over realism.