This paper introduces a simple and explicit measure of word importance in a global context, including very small contexts (10+ sentences). After generating a word-vector space containing both 2-gram clauses and single tokens, it became clear that more contextually significant words disproportionately define clause meanings. Using this simple relationship in a weighted bag-of-words sentence embedding model results in sentence vectors that outperform the state-of-the-art for subjectivity/objectivity analysis, as well as paraphrase detection, and fall within those produced by state-of-the-art models for six other transfer learning tests. The metric was then extended to a sentence/document summarizer, an improved (and context-aware) cosine distance and a simple document stop word identifier. The sigmoid-global context weighted bag of words is presented as a new baseline for sentence embeddings. Context is Everything: Finding Meaning Statistically in Semantic Spaces

Advertisements