varbvs google
We introduce varbvs, a suite of functions written in R and MATLAB for regression analysis of large-scale data sets using Bayesian variable selection methods. We have developed numerical optimization algorithms based on variational approximation methods that make it feasible to apply Bayesian variable selection to very large data sets. With a focus on examples from genome-wide association studies, we demonstrate that varbvs scales well to data sets with hundreds of thousands of variables and thousands of samples, and has features that facilitate rapid data analyses. Moreover, varbvs allows for extensive model customization, which can be used to incorporate external information into the analysis. We expect that the combination of an easy-to-use interface and robust, scalable algorithms for posterior computation will encourage broader use of Bayesian variable selection in areas of applied statistics and computational biology. The most recent R and MATLAB source code is available for download at Github (https://…/varbvs ), and the R package can be installed from CRAN (https://…/package=varbvs ). …

Confidence-Weighted Linear Classification google
We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. …

Large Vocabulary Continuous Speech Recognition System (LVCSR) google
The search problem in LVCSR can be simply stated: find the most probable sequence of words given a sequence of acoustic observations, an acoustic model and a language model. This is a demanding problem since word boundary information is not available in continuous speech and each word in the dictionary may be hypothesized to start at each frame of acoustic data. The problem is further complicated by the vocabulary size (typically 65,000 words) and the structure imposed on the search space by the language model. Direct evaluation of all the possible word sequences is impossible (given the large vocabulary) and an efficient search algorithm will consider only a very small subset of all possible utterance models. Typically, the effective size of the search space is reduced through pruning of unlikely hypotheses and/or the elimination of repeated computations. …

Advertisements