* Generalized Ridge Regression (with special advantage for p >> n cases)* (

**bigRR**)

The package fits large-scale (generalized) ridge regression for various distributions of response. The shrinkage parameters (lambdas) can be pre-specified or estimated using an internal update routine (fitting a heteroscedastic effects model, or HEM). It gives possibility to shrink any subset of parameters in the model. It has special computational advantage for the cases when the number of shrinkage parameters exceeds the number of observations. For example, the package is very useful for fitting large-scale omics data, such as high-throughput genotype data (genomics), gene expression data (transcriptomics), metabolomics data, etc.

*(*

**Lightweight Portable Message Queue Using ‘SQLite’****liteq**)

Temporary and permanent message queues for R. Built on top of ‘SQLite’ databases. ‘SQLite’ provides locking, and makes it possible to detect crashed consumers. Crashed jobs can be automatically marked as ‘failed’, or put in the queue again, potentially a limited number of times.

*(*

**Bayesian Variable Selection Using Simplified Shotgun Stochastic Search with Screening (S5)****BayesS5**)

In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities).

*(*

**Threshold Estimation Approaches****tea**)

Different approaches for selecting the threshold in generalized Pareto distributions. Most of them are based on minimizing the AMSE-criterion or at least by reducing the bias of the assumed GPD-model. Others are heuristically motivated by searching for stable sample paths, i.e. a nearly constant region of the tail index estimator with respect to k, which is the number of data in the tail. The third class is motivated by graphical inspection. In addition to the very helpful eva package which includes many goodness of fit tests for the generalized Pareto distribution, the sequential testing procedure provided in Thompson et al. (2009) <doi:10.1016/j.coastaleng.2009.06.003> is also implemented here.

*(*

**Miscellaneous Basic Statistical Functions****statoo**)

A collection of miscellaneous statistical functions for probability distributions: ‘dbern’, ‘pbern’, ‘qbern’, ‘rbern’ for the Bernoulli distribution, and ‘distr2name’, ‘name2distr’ for distribution names; probability density estimation (‘densityfun’); most frequent value estimation (‘mfv’, ‘mfv1’); calculation of the Hellinger distance (‘hellinger’); use of classical kernels (‘kernelfun’, ‘kernel_properties’).

*(*

**Visualization of Categorical Response Models****EffectStars**)

The package provides functions to visualize regression models with categorical response. The effects of the covariates are plotted with star plots in order to allow for an optical impression of the fitted model.

*(*

**Ionicons’ Icon Pack****ionicons**)

Provides icons from the ‘Ionicons’ icon pack (<http://…/> ). Functions are provided to get icons as png files or as raw matrices. This is useful when you want to embed raster icons in a report or a graphic.

*(*

**Mutate Data Frames with Random Variates****dmutate**)

Work within the ‘dplyr’ workflow to add random variates to your data frame. Variates can be added at any level of an existing column. Also, bounds can be specified for simulated variates.

Pingback: This Week in Data Science (February 7, 2017) – Be Analytics