Brain2Text google
Nowadays, the Internet represents a vast informational space, growing exponentially and the problem of search for relevant data becomes essential as never before. The algorithm proposed in the article allows to perform natural language queries on content of the document and get comprehensive meaningful answers. The problem is partially solved for English as SQuAD contains enough data to learn on, but there is no such dataset in Russian, so the methods used by scientists now are not applicable to Russian. Brain2 framework allows to cope with the problem – it stands out for its ability to be applied on small datasets and does not require impressive computing power. The algorithm is illustrated on Sberbank of Russia Strategy’s text and assumes the use of a neuromodel consisting of 65 mln synapses. The trained model is able to construct word-by-word answers to questions based on a given text. The existing limitations are its current inability to identify synonyms, pronoun relations and allegories. Nevertheless, the results of conducted experiments showed high capacity and generalisation ability of the suggested approach. …

Thrill google
This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill. …

MINT google
We propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach, which we call MINT, is based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances. The proposed critical values, which may be obtained from simulation (in the case where one marginal is known) or resampling, guarantee that the test has nominal size, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide a new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector. The theory is supported by numerical studies on both simulated and real data. …

Advertisements