ALINE Algorithm (ALINE) google
The ALINE algorithm (Kondrak, 2000) assigns a similarity score to pairs of phonetically-transcribed words on the basis of the decomposition of phonemes into elementary phonetic features. The algorithm was originally designed to identify and align cognates in vocabularies of related languages. Nevertheless, thanks to its grounding in universal phonetic principles, the algorithm can be used for estimating the similarity of any pair of words. The principal component of ALINE is a function that calculates the similarity of two phonemes that are expressed in terms of about a dozen multi-valued phonetic features (Place, Manner, Voice, etc.). The phonetic features are assigned salience weights that express their relative importance. Feature values are encoded as oating-point numbers in the range [0;1]. For example, the feature Manner can take any of the following seven values: stop = 1.0, affricate = 0.9, fricative = 0.8, approximant = 0.6, high vowel = 0.4, mid vowel = 0.2, and low vowel = 0.0. The numerical values re ect the distances between vocal organs during speech production. The overall similarity score is the sum of individual similarity scores between pairs of phonemes in an optimal alignment of two words, which is computed by a dynamic programming algorithm (Wagner and Fischer, 1974). A constant insertion/deletion penalty is applied for each unaligned phoneme. Another constant penalty is set to reduce relative importance of vowel—as opposed to consonant— phoneme matches. The similarity value is normalized by the length of the longer word. ALINE’s behavior is controlled by a number of parameters: the maximum phonemic score, the insertion/ deletion penalty, the vowel penalty, and the feature salience weights. The parameters have default settings for the cognate matching task, but these settings can be optimized (tuned) on a development set that includes both positive and negative examples of similar words.
Greg Kondrak
Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification


Three-Mode Principal Components Analysis google
In multivariate analysis the data have usually two way and/or two modes. This book treats prinicipal component analysis of data which can be characterised by three-ways and/or modes, like subjects by variables by conditions or occasions. The book extends the work on three-mode factor analysis by Tucker and the work on individual differences scaling by Carroll and colleagues. The many examples give a true feeling of the working of the techniques. …

PRESS google
Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely $P^2$ statistics. We aim at using both prediction coefficients and goodness-of-fit measures as a scheme of model select criteria. In this sense, we introduce for beta regression models under nonlinearity the use of the model selection criteria based on robust pseudo-$R^2$ statistics. Monte Carlo simulation results on the finite sample behavior of both prediction-based model selection criteria $P^2$ and the pseudo-$R^2$ statistics are provided. Three applications for real data are presented. The linear application relates to the distribution of natural gas for home usage in S\~ao Paulo, Brazil. Faced with the economic risk of too overestimate or to underestimate the distribution of gas has been necessary to construct prediction limits and to select the best predicted and fitted model to construct best prediction limits it is the aim of the first application. Additionally, the two nonlinear applications presented also highlight the importance of considering both goodness-of-predictive and goodness-of-fit of the competitive models. …

Advertisements