Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software (interlineaR)
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. ‘InterlineaR’ provide a set of functions that targets several popular formats of IGT (‘SIL Toolbox’, ‘EMELD XML’) and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software (‘SIL FLEX’, ‘SIL Toolbox’) typically produce dictionaries of the morphemes used in the glosses. ‘InterlineaR’ provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.

A Collection of Techniques Correcting for Sample Selection Bias (sambia)
A collection of various techniques correcting statistical models for sample selection bias is provided. In particular, the resampling-based methods ‘stochastic inverse-probability oversampling’ and ‘parametric inverse-probability bagging’ are placed at the disposal which generate synthetic observations for correcting classifiers for biased samples resulting from stratified random sampling. For further information, see the article Krautenbacher, Theis, and Fuchs (2017) <doi:10.1155/2017/7847531>. The methods may be used for further purposes where weighting and generation of new observations is needed.

Efficient Gibbs-Sampler for Markov-Modulated-Poisson-Processes (MMPPsampler)
Efficient implementation of the Gibbs sampler by Fearnheard and Sherlock (2006) <DOI:10.1111/j.1467-9868.2006.00566.x> for the Markov modulated Poisson process that uses ‘C++’ via the ‘Rcpp’ interface. Fearnheard and Sherlock proposed an exact Gibbs-sampler for performing Bayesian inference on Markov Modulated Poisson processes. This package is an efficient implementation of their proposal for binned data. Furthermore, the package contains an efficient implementation of the hierarchical MMPP framework, proposed by Clausen, Adams, and Briers (2017) <https://…/Master_thesis_Henry.pdf>, that is tailored towards inference on network flow arrival data and extends Fearnheard and Sherlock’s Gibbs sampler. Both frameworks harvests greatly from routines that are optimised for this specific problem in order to remain scalable and efficient for large amounts of input data. These optimised routines include matrix exponentiation and multiplication, and endpoint-conditioned Markov process sampling. Both implementations require an input vector that contains the binned observations, the length of a binning interval, the number of states of the hidden Markov process, and lose prior hyperparameters. As a return, the user receives the desired number of sample trajectories of the hidden Markov process as well as the likelihood of each trajectory.

Advertisements