Word embeddings aims to map sense of the words into a lower dimensional vector space in order to reason over them. Training embeddings on domain specific data helps express concepts more relevant to their use case but comes at a cost of accuracy when data is less. Our effort is to minimise this by infusing syntactic knowledge into the embeddings. We propose a graph based embedding algorithm inspired from node2vec. Experimental results have shown that our algorithm improves the syntactic strength and gives robust performance on meagre data.
Highly regularized LSTMs that model the auto-regressive conditional factorization of the joint probability distribution of words achieve state-of-the-art results in language modeling. These models have an implicit bias towards predicting the next word from a given context. We propose a new regularization term based on decoding words in the context from the predicted distribution of the next word. With relatively few additional parameters, our model achieves absolute improvements of 1.7\% and 2.3\% over the current state-of-the-art results on the Penn Treebank and WikiText-2 datasets.
Efficient stochastic simulation algorithms are of paramount importance to the study of spreading phenomena on complex networks. Using insights and analytical results from network science, we discuss how the structure of contacts affects the efficiency of current algorithms. We show that algorithms believed to require $\mathcal{O}(\log N)$ or even $\mathcal{O}(1)$ operations per update—where $N$ is the number of nodes—display instead a polynomial scaling for networks that are either dense or sparse and heterogeneous. This significantly affects the required computation time for simulations on large networks. To circumvent the issue, we propose a node-based method combined with a composition and rejection algorithm, a sampling scheme that has an average-case complexity of $\mathcal{O} [\log(\log N)]$ per update for general networks. This systematic approach is first set-up for Markovian dynamics, but can also be adapted to a number of non-Markovian processes and can enhance considerably the study of a wide range of dynamics on networks.
Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first- order queries). Specifically, we introduce a graph-based provenance model that, while syntactic in nature, supports reverse reasoning and is proven to encode a wide range of provenance models from the literature. The implementation of this model in our PUG (Provenance Unification through Graphs) system takes a provenance question and Datalog query as an input and generates a Datalog program that computes an explanation, i.e., the part of the provenance that is relevant to answer the question. Furthermore, we demonstrate how a desirable factorization of provenance can be achieved by rewriting an input query. We experimentally evaluate our approach demonstrating its efficiency.
Following a particular news story online is an important but difficult task, as the relevant information is often scattered across different domains/sources (e.g., news articles, blogs, comments, tweets), presented in various formats and language styles, and may overlap with thousands of other stories. In this work we join the areas of topic tracking and entity disambiguation, and propose a framework named Story Disambiguation – a cross-domain story tracking approach that builds on real-time entity disambiguation and a learning-to-rank framework to represent and update the rich semantic structure of news stories. Given a target news story, specified by a seed set of documents, the goal is to effectively select new story-relevant documents from an incoming document stream. We represent stories as entity graphs and we model the story tracking problem as a learning-to-rank task. This enables us to track content with high accuracy, from multiple domains, in real-time. We study a range of text, entity and graph based features to understand which type of features are most effective for representing stories. We further propose new semi-supervised learning techniques to automatically update the story representation over time. Our empirical study shows that we outperform the accuracy of state-of-the-art methods for tracking mixed-domain document streams, while requiring fewer labeled data to seed the tracked stories. This is particularly the case for local news stories that are easily over shadowed by other trending stories, and for complex news stories with ambiguous content in noisy stream environments.
Internet has brought about a tremendous increase in content of all forms and, in that, video content constitutes the major backbone of the total content being published as well as watched. Thus it becomes imperative for video recommendation engines such as Hulu to look for novel and innovative ways to recommend the newly added videos to their users. However, the problem with new videos is that they lack any sort of metadata and user interaction so as to be able to rate the videos for the consumers. To this effect, this paper introduces the several techniques we develop for the Content Based Video Relevance Prediction (CBVRP) Challenge being hosted by Hulu for the ACM Multimedia Conference 2018. We employ different architectures on the CBVRP dataset to make use of the provided frame and video level features and generate predictions of videos that are similar to the other videos. We also implement several ensemble strategies to explore complementarity between both the types of provided features. The obtained results are encouraging and will impel the boundaries of research for multimedia based video recommendation systems.
In this paper, we consider the problem of detecting the presence (or absence) of an unknown but structured signal from the space-time outputs of an array under strong, non-white interference. Our motivation is the detection of a communication signal in jamming, where often the training portion is known but the data portion is not. We assume that the measurements are corrupted by additive white Gaussian noise of unknown variance and a few strong interferers, whose number, powers, and array responses are unknown. We also assume the desired signals array response is unknown. To address the detection problem, we propose several GLRT-based detection schemes that employ a probabilistic signal model and use the EM algorithm for likelihood maximization. Numerical experiments are presented to assess the performance of the proposed schemes.
Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, which is usually very costly to compute. Inspired by the recent success of neural network approaches to several graph applications, such as node classification and graph classification, we propose a novel neural network-based approach to address this challenging while classical graph problem, with the hope to alleviate the computational burden while preserving a good performance. Our model generalizes to unseen graphs, and in the worst case runs in linear time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error and great time reduction compared against several approximate algorithms on GED computation. To the best of our knowledge, we are among the first to adopt neural networks to model the similarity between two graphs, and provide a new direction for future research on graph similarity computation and graph similarity search.
Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.
Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred. Previous works using an additional no-answer option attempt to extract answers and detect unanswerable questions simultaneously, but they have struggled to produce high-quality answers and often misclassify questions. In this paper, we propose a novel read-then-verify system that combines a base neural reader with a sentence-level answer verifier trained to further validate if the predicted answer is entailed by input snippets. Moreover, we augment the base reader with two auxiliary losses to better handle answer extraction and no-answer detection respectively, and investigate three different architectures for the answer verifier. Our experiments on the SQuAD 2.0 dataset show that our system can achieve a score of 74.8 F1 on the development set, outperforming the previous best published model by more than 7 points.
In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automatically gauging the models’ ability to generate data that is similar to the observed data. Importantly, the criterion follows from the model class itself and is therefore directly applicable to a broad range of inference problems with varying data types. The proposed data consistency criterion is illustrated and evaluated using three synthetic and two real data sets.
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.
An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest neighbors ideas presented in Camelo at al. (2015). As in that reference, the goal of the present proposal is to achieve a faster solution of the SVM problem without a significance loss in the prediction error. The performance of the methodology is evaluated in benchmark examples and theoretical aspects of subsample methods are discussed.
We analyze the uncertainties in the minimum norm solution of full-rank regression problems, arising from Gaussian linear models, computed by randomized (row-wise sampling and, more generally, sketching) algorithms. From a deterministic perspective, our structural perturbation bounds imply that least squares problems are less sensitive to multiplicative perturbations than to additive perturbations. From a probabilistic perspective, our expressions for the total expectation and variance with regard to both model- and algorithm-induced uncertainties, are exact, hold for general sketching matrices, and make no assumptions on the rank of the sketched matrix. The relative differences between the total bias and variance on the one hand, and the model bias and variance on the other hand, are governed by two factors: (i) the expected rank deficiency of the sketched matrix, and (ii) the expected difference between projectors associated with the original and the sketched problems. A simple example, based on uniform sampling with replacement, illustrates the statistical quantities.
Advertisements