R Packages

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|R Packages| = 6670

A

A3 Accurate, Adaptable, and Accessible Error Metrics for Predictive Models
Supplies tools for tabulating and analyzing the results of predictive models. The methods employed are applicable to virtually any predictive model and make comparisons between different methodologies straightforward.
abc Tools for Approximate Bayesian Computation (ABC)
Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models.
abc.data Data Only: Tools for Approximate Bayesian Computation (ABC)
Contains data which are used by functions of the ‘abc’ package.
ABCanalysis Computed ABC Analysis
For a given data set, the package provides a novel method of computing precise limits to acquire subsets which are easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphically representing the cumulative distribution function. Based on an ABC analysis the algorithm calculates, with the help of the ABC curve , the optimal limits by exploiting the mathematical properties pertaining to distribution of analyzed items. The data containing positive values is divided into three disjoint subsets A, B and C, with subset A comprising very profitable values, i.e. largest data values (“the important few”) subset B comprising values where the profit equals to the effort required to obtain it, and the subset C comprising of non-profitable values, i.e., the smallest data sets (“the trivial many”).
abcrf Approximate Bayesian Computation via Random Forests
Performs Approximate Bayesian Computation (ABC) model choice via random forests.
abctools Tools for ABC Analyses
Tools for approximate Bayesian computation including summary statistic selection and assessing coverage.
An R Package for Tuning Approximate Bayesian Computation Analyses
abe Augmented Backward Elimination
Performs augmented backward elimination and checks the stability of the obtained model. Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>.
abnormality Measure a Subject’s Abnormality with Respect to a Reference Population
Contains the functions to implement the methodology and considerations laid out by Marks et al. in the manuscript Measuring Abnormality in High Dimensional Spaces: Applications in Biomechanical Gait Analysis. As of 2/27/2018 this paper has been submitted and is under scientific review. Using high-dimensional datasets to measure a subject’s overall level of abnormality as compared to a reference population is often needed in outcomes research. Utilizing applications in instrumented gait analysis, that article demonstrates how using data that is inherently non-independent to measure overall abnormality may bias results. A methodology is introduced to address this bias to accurately measure overall abnormality in high dimensional spaces. While this methodology is in line with previous literature, it differs in two major ways. Advantageously, it can be applied to datasets in which the number of observations is less than the number of features/variables, and it can be abstracted to practically any number of domains or dimensions. After applying the proposed methodology to the original data, the researcher is left with a set of uncorrelated variables (i.e. principal components) with which overall abnormality can be measured without bias. Different considerations are discussed in that article in deciding the appropriate number of principal components to keep and the aggregate distance measure to utilize.
abodOutlier Angle-Based Outlier Detection
Performs angle-based outlier detection on a given dataframe. Three methods are available, a full but slow implementation using all the data that has cubic complexity, a fully randomized one which is way more efficient and another using k-nearest neighbours. These algorithms are specially well suited for high dimensional data outlier detection.
abstractr An R-Shiny Application for Creating Visual Abstracts
An R-Shiny application to create visual abstracts for original research. A variety of user defined options and formatting are included.
abtest Bayesian A/B Testing
Provides functions for Bayesian A/B testing including prior elicitation options based on Kass and Vaidyanathan (1992) <doi:10.1111/j.2517-6161.1992.tb01868.x>.
Ac3net Inferring Directional Conservative Causal Core Gene Networks
Infers directional conservative causal core (gene) networks. It is an advanced version of the algorithm C3NET by providing directional network. Gokmen Altay (2018) <doi:10.1101/271031>, bioRxiv.
ACA Abrupt Change-Point or Aberration Detection in Point Series
Offers an interactive function for the detection of breakpoints in series.
accelmissing Missing Value Imputation for Accelerometer Data
Imputation for the missing count values in accelerometer data. The methodology includes both parametric and semi-parametric multiple imputations under the zero-inflated Poisson lognormal model. This package also provides multiple functions to pre-process the accelerometer data previous to the missing data imputation. These includes detecting wearing and non-wearing time, selecting valid days and subjects, and creating plots.
accSDA Accelerated Sparse Discriminant Analysis
Implementation of sparse linear discriminant analysis, which is a supervised classification method for multiple classes. Various novel optimization approaches to this problem are implemented including alternating direction method of multipliers (ADMM), proximal gradient (PG) and accelerated proximal gradient (APG) (See Atkins et al. <arXiv:1705.07194>). Functions for performing cross validation are also supplied along with basic prediction and plotting functions. Sparse zero variance discriminant analysis (SZVD) is also included in the package (See Ames and Hong, <arXiv:1401.5492>). See the github wiki for a more extended description.
ACDm Tools for Autoregressive Conditional Duration Models
Package for Autoregressive Conditional Duration (ACD, Engle and Russell, 1998) models. Creates trade, price or volume durations from transactions (tic) data, performs diurnal adjustments, fits various ACD models and tests them.
Acinonyx High-Performance interactive graphics system iPlots eXtreme
Acinonyx (genus of cheetah – for its speed) is the codename for the next generation of a high-performance interactive graphics system iPlots eXtreme. It is a continuation of the iPlots project, allowing visualization and exploratory analysis of large data. Due to its highly flexible design and focus on speed optimization, it can also be used as a general graphics system (e.g. it is the fastest R graphics device if you have a good GPU) and an interactive toolkit. It is a complete re-write of iPlots from scratch, taking the best from iPlots design and focusing on speed and flexibility. The main focus compared to the previous iPlots project is on: • speed and scalability to support large data (it uses OpenGL, optimized native code and object sharing to allow visualization of millions of datapoints). • enhanced support for adding statistical models to plots with full interactivity • seamless integration in GUIs (Windows and Mac OS X)
ACMEeqtl Estimation of Interpretable eQTL Effect Sizes Using a Log of Linear Model
We use a non-linear model, termed ACME, that reflects a parsimonious biological model for allelic contributions of cis-acting eQTLs. With non-linear least-squares algorithm we estimate maximum likelihood parameters. The ACME model provides interpretable effect size estimates and p-values with well controlled Type-I error. Includes both R and (much faster) C implementations. For more details see Palowitch et al. (2017) <doi:10.1111/biom.12810>.
AcousticNDLCodeR Coding Sound Files for Use with NDL
Make acoustic cues to use with the R packages ‘ndl’ or ‘ndl2’. The package implements functions used in the PLOS ONE paper: Denis Arnold, Fabian Tomaschek, Konstantin Sering, Florence Lopez, and R. Harald Baayen (accepted). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLOS ONE More details can be found in the paper and the supplement. ‘ndl’ is available on CRAN. ‘ndl2’ is available by request from <konstantin.sering@uni-tuebingen.de>.
acp Autoregressive Conditional Poisson
Time series analysis of count data
AcrossTic A Cost-Minimal Regular Spanning Subgraph with TreeClust
Construct minimum-cost regular spanning subgraph as part of a non-parametric two-sample test for equality of distribution.
acrt Autocorrelation Robust Testing
Functions for testing affine hypotheses on the regression coefficient vector in regression models with autocorrelated errors.
AdapEnetClass A Class of Adaptive Elastic Net Methods for Censored Data
Provides new approaches to variable selection for AFT model.
adapr Implementation of an Accountable Data Analysis Process
Tracks reading and writing within R scripts that are organized into a directed acyclic graph. Contains an interactive shiny application adaprApp(). Uses Git and file hashes to track version histories of input and output.
AdapSamp Adaptive Sampling Algorithms
For distributions whose probability density functions are log-concave, Adaptive Rejection Sampling (ARS) algorithm can be used to build for sampling by Gilks W R, Wild P (1992) <doi:10.2307/2347565>. For others, we can use Modifed Adaptive Rejection Sampling (MARS) algorithm by Martino L, Míguez J (2011) <doi:10.1007/s11222-010-9197-9>, Concave-Convex Adaptive Rejection (CCARS) Sampling algorithm by Görür D, Teh Y W (2011) <doi:10.1198/jcgs.2011.09058> and Adaptive Slice Sampling (ASS) algorithm by Radford M. Neal (2003) <doi:10.1214/aos/1056562461>. So we designed an R package including mainly 4 functions: rARS(), rMARS(), rCCARS(), and rASS(). These functions can realize sampling based on the algorithms above.
adaptalint Check Code Style Painlessly
Infer the code style (which style rules are followed and which ones are not) from one package and use it to check another. This makes it easier to find and correct the most important problems first.
adaptDA Adaptive Mixture Discriminant Analysis
The adaptive mixture discriminant analysis (AMDA) allows to adapt a model-based classifier to the situation where a class represented in the test set may have not been encountered earlier in the learning phase.
AdaptGauss Gaussian Mixture Models (GMM)
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot and Chi-squared test.
adaptiveGPCA Adaptive Generalized PCA
Implements adaptive gPCA, as described in: Fukuyama, J. (2017) <arXiv:1702.00501>. The package also includes functionality for applying the method to ‘phyloseq’ objects so that the method can be easily applied to microbiome data and a ‘shiny’ app for interactive visualization.
AdaptiveSparsity Adaptive Sparsity Models
Implements Figueiredo EM algorithm for adaptive sparsity (Jeffreys prior) (see Figueiredo, M.A.T.; , ‘Adaptive sparseness for supervised learning,’ Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.25, no.9, pp. 1150- 1159, Sept. 2003) and Wong algorithm for adaptively sparse gaussian geometric models (see Wong, Eleanor, Suyash Awate, and P. Thomas Fletcher. ‘Adaptive Sparsity in Gaussian Graphical Models.’ In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 311-319. 2013.)
adaptMT Adaptive P-Value Thresholding for Multiple Hypothesis Testing with Side Information
Implementation of adaptive p-value thresholding (AdaPT), including both a framework that allows the user to specify any algorithm to learn local false discovery rate and a pool of convenient functions that implement specific algorithms. See Lei, Lihua and Fithian, William (2016) <arXiv:1609.06035>.
AdaSampling Adaptive Sampling for Positive Unlabeled and Label Noise Learning
Implements the adaptive sampling procedure, a framework for both positive unlabeled learning and learning with class label noise. Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J. (2018) <doi:10.1109/TCYB.2018.2816984>.
ADCT Adaptive Design in Clinical Trials
Existing adaptive design methods in clinical trials. The package includes power, stopping boundaries (sample size) calculation functions for two-group group sequential designs, adaptive design with coprimary endpoints, biomarker-informed adaptive design, etc.
addhaz Binomial and Multinomial Additive Hazards Models
Functions to fit the binomial and multinomial additive hazards models and to calculate the contribution of diseases/conditions to the disability prevalence, as proposed by Nusselder and Looman (2004) <DOI:10.1353/dem.2004.0017>.
addhazard Fit Additive Hazards Models for Survival Analysis
Contains tools to fit additive hazards model to random sampling, two-phase sampling and two-phase sampling with auxiliary information. This package provides regression parameter estimates and their model-based and robust standard errors. It also offers tools to make prediction of individual specific hazards.
additiveDEA Additive Data Envelopment Analysis Models
Provides functions for calculating efficiency with two types of additive Data Envelopment Analysis models: (i) Generalized Efficiency Measures: unweighted additive model (Cooper et al., 2007 <doi:10.1007/978-0-387-45283-8>), Range Adjusted Measure (Cooper et al., 1999, <doi:10.1023/A:1007701304281>), Bounded Adjusted Measure (Cooper et al., 2011 <doi:10.1007/s11123-010-0190-2>), Measure of Inefficiency Proportions (Cooper et al., 1999 <doi:10.1023/A:1007701304281>), and the Lovell-Pastor Measure (Lovell and Pastor, 1995 <doi:10.1016/0167-6377(95)00044-5>); and (ii) the Slacks-Based Measure (Tone, 2001 <doi:10.1016/S0377-2217(99)00407-5>). The functions provide several options: (i) constant and variable returns to scale; (ii) fixed (non-controllable) inputs and/or outputs; (iii) bounding the slacks so that unrealistically large slack values are avoided; and (iv) calculating the efficiency of specific Decision-Making Units (DMUs), rather than of the whole sample. Package additiveDEA also provides a function for reducing computation time when datasets are large.
ADDT A Package for Analysis of Accelerated Destructive Degradation Test Data
Accelerated destructive degradation tests (ADDT) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the collected data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparison. This package performs the least squares (LS) and maximum likelihood (ML) procedures for estimating TI for polymeric materials. The LS approach is a two-step approach that is currently used in industrial standards, while the ML procedure is widely used in the statistical literature. The ML approach allows one to do statistical inference such as quantifying uncertainties in estimation, hypothesis testing, and predictions. Two publicly available datasets are provided to allow users to experiment and practice with the functions.
adeba Adaptive Density Estimation by Bayesian Averaging
Univariate and multivariate non-parametric kernel density estimation with adaptive bandwidth using a Bayesian approach to Abramson’s square root law.
adegraphics An S4 Lattice-Based Package for the Representation of Multivariate Data
Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the ‘ade4’ package.
adepro A Shiny Application for the (Audio-)Visualization of Adverse Event Profiles
The name of this package is an abbreviation for Animation of Adverse Event Profiles and refers to a shiny application which (audio-)visualizes adverse events occurring in clinical trials. As this data is usually considered sensitive, this tool is provided as a stand-alone application that can be launched from any local machine on which the data is stored.
adept Adaptive Empirical Pattern Transformation
Designed for optimal use in performing fast, accurate walking strides segmentation from high-density data collected from a wearable accelerometer worn during continuous walking activity.
adespatial Multivariate Multiscale Spatial Analysis
Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran’s Eigenvectors Maps, MEM).
adjclust Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix
Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Chapter 4 of Alia Dehman (2015) <https://…/tel-01288568v1>.
adjustedcranlogs Remove Automated and Repeated Downloads from ‘RStudio’ ‘CRAN’ Download Logs
Adjusts output of ‘cranlogs’ package to account for ‘CRAN’-wide daily automated downloads and re-downloads caused by package updates.
admisc Adrian Dusa’s Miscellaneous
Contains functions used across packages ‘QCA’, ‘DDIwR’, and ‘venn’. Interprets and translates DNF – Disjunctive Normal Form expressions, for both binary and multi-value crisp sets, and extracts information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly flexible recoding function.
ADMM Algorithms using Alternating Direction Method of Multipliers
Provides algorithms to solve popular optimization problems in statistics such as regression or denoising based on Alternating Direction Method of Multipliers (ADMM). See Boyd et al (2010) <doi:10.1561/2200000016> for complete introduction to the method.
ADMMnet Regularized Model with Selecting the Number of Non-Zeros
Fit linear and cox models regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty, and their adaptive forms, such as adaptive lasso and net adjusting for signs of linked coefficients. In addition, it treats the number of non-zero coefficients as another tuning parameter and simultaneously selects with the regularization parameter. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
ADMMsigma Penalized Precision Matrix Estimation via ADMM
Estimates a penalized precision matrix via the alternating direction method of multipliers (ADMM) algorithm. It currently supports a general elastic-net penalty that allows for both ridge and lasso-type penalties as special cases. This package is an alternative to the ‘glasso’ package. See Boyd et al (2010) <doi:10.1561/2200000016> for details regarding the estimation method.
adnuts No-U-Turn MCMC Sampling for ‘ADMB’ and ‘TMB’ Models
Bayesian inference using the no-U-turn (NUTS) algorithm by Hoffman and Gelman (2014) <http://…/hoffman14a.html>. Designed for ‘AD Model Builder’ (‘ADMB’) models, or when R functions for log-density and log-density gradient are available, such as ‘Template Model Builder’ (‘TMB’) models and other special cases. Functionality is similar to ‘Stan’, and the ‘rstan’ and ‘shinystan’ packages are used for diagnostics and inference.
adoption Modelling Adoption Process in Marketing
The classical Bass (1969) <doi:10.1287/mnsc.15.5.215> model and the agent based models, such as that by Goldenberg, Libai and Muller (2010) <doi:10.1016/j.ijresmar.2009.06.006> have been two different approaches to model adoption processes in marketing. These two approaches can be unified by explicitly modelling the utility functions. This package provides a GUI that allows, in a unified way, the modelling of these two processes and other processes.
adoptr Adaptive Optimal Two-Stage Designs in R
Optimize one or two-arm, two-stage designs for clinical trials with respect to several pre-implemented objective criteria or implement custom objectives. Optimization under uncertainty and conditional (given stage-one outcome) constraints are supported.
ADPclust Fast Clustering Using Adaptive Density Peak Detection
An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio[2014]’s idea. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘density-distance plot’.
ADPF Use Least Squares Polynomial Regression and Statistical Testing to Improve Savitzky-Golay
This function takes a vector or matrix of data and smooths the data with an improved Savitzky Golay transform. The Savitzky-Golay method for data smoothing and differentiation calculates convolution weights using Gram polynomials that exactly reproduce the results of least-squares polynomial regression. Use of the Savitzky-Golay method requires specification of both filter length and polynomial degree to calculate convolution weights. For maximum smoothing of statistical noise in data, polynomials with low degrees are desirable, while a high polynomial degree is necessary for accurate reproduction of peaks in the data. Extension of the least-squares regression formalism with statistical testing of additional terms of polynomial degree to a heuristically chosen minimum for each data window leads to an adaptive-degree polynomial filter (ADPF). Based on noise reduction for data that consist of pure noise and on signal reproduction for data that is purely signal, ADPF performed nearly as well as the optimally chosen fixed-degree Savitzky-Golay filter and outperformed sub-optimally chosen Savitzky-Golay filters. For synthetic data consisting of noise and signal, ADPF outperformed both optimally chosen and sub-optimally chosen fixed-degree Savitzky-Golay filters. See Barak, P. (1995) <doi:10.1021/ac00113a006> for more information.
adpss Design and Analysis of Locally or Globally Efficient Adaptive Designs
Provides the functions for planning and conducting a clinical trial with adaptive sample size determination. Maximal statistical efficiency will be exploited even when dramatic or multiple adaptations are made. Such a trial consists of adaptive determination of sample size at an interim analysis and implementation of frequentist statistical test at the interim and final analysis with a prefixed significance level. The required assumptions for the stage-wise test statistics are independent and stationary increments and normality. Predetermination of adaptation rule is not required.
advclust Object Oriented Advanced Clustering
S4 Object Oriented for Advanced Fuzzy Clustering and Fuzzy COnsensus Clustering. Techniques that provided by this package are Fuzzy C-Means, Gustafson Kessel (Babuska Version), Gath-Geva, Sum Voting Consensus, Product Voting Consensus, and Borda Voting Consensus. This package also provide visualization via Biplot and Radar Plot.
AEDForecasting Change Point Analysis in ARIMA Forecasting
Package to incorporate change point analysis in ARIMA forecasting.
afc Generalized Discrimination Score
This is an implementation of the Generalized Discrimination Score (also known as Two Alternatives Forced Choice Score, 2AFC) for various representations of forecasts and verifying observations. The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) <doi:10.1175/MWR-D-10-05069.1>.
afCEC Active Function Cross-Entropy Clustering
Active function cross-entropy clustering partitions the n-dimensional data into the clusters by finding the parameters of the mixed generalized multivariate normal distribution, that optimally approximates the scattering of the data in the n-dimensional space, whose density function is of the form: p_1*N(mi_1,^sigma_1,sigma_1,f_1)+…+p_k*N(mi_k,^sigma_k,sigma_k,f_k). The above-mentioned generalization is performed by introducing so called ‘f-adapted Gaussian densities’ (i.e. the ordinary Gaussian densities adapted by the ‘active function’). Additionally, the active function cross-entropy clustering performs the automatic reduction of the unnecessary clusters. For more information please refer to P. Spurek, J. Tabor, K.Byrski, ‘Active function Cross-Entropy Clustering’ (2017) <doi:10.1016/j.eswa.2016.12.011>.
afex Analysis of Factorial Experiments
Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed between-within (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex uses type 3 sums of squares as default (imitating commercial statistical software).
affluenceIndex Affluence Indices
Computes the statistical indices of affluence (richness) and constructs bootstrap confidence intervals for these indices. Also computes the Wolfson polarization index.
AFheritability The Attributable Fraction (AF) Described as a Function of Disease Heritability, Prevalence and Intervention Specific Factors
The AFfunction() is a function which returns an estimate of the Attributable Fraction (AF) and a plot of the AF as a function of heritability, disease prevalence, size of target group and intervention effect. Since the AF is a function of several factors, a shiny app is used to better illustrate how the relationship between the AF and heritability depends on several other factors. The app is ran by the function runShinyApp(). For more information see Dahlqwist E et al. (2019) <doi:10.1007/s00439-019-02006-8>.
AFM Atomic Force Microscope Image Analysis
Provides Atomic Force Microscope images analysis such as Power Spectrum Density, roughness against lengthscale, variogram and variance, fractal dimension and scale.
after Run Code in the Background
Run an R function in the background, possibly after a delay. The current version uses the Tcl event loop and was ported from the ‘tcltk2’ package.
aftgee Accelerated Failure Time Model with Generalized Estimating Equations
This package features both rank-based estimates and least square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan’s weight and the general’s weight such as the logrank weight. For the least square estimation, the estimating equation is solved with Generalized Estimating Equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE’s setting.
aggregation p-Value Aggregation Methods
Contains functionality for performing the following methods of p-value aggregation: Fisher’s method [Fisher, RA (1932, ISBN: 9780028447308)], the Lancaster method (weighted Fisher’s method) [Lancaster, HO (1961, <doi:10.1111/j.1467-842X.1961.tb00058.x>)], and Sidak correction (minimum p-value method with correction) [Sidak, Z (1967, <doi:10.1080/01621459.1967.10482935>)].
agRee Various Methods for Measuring Agreement
Bland-Altman plot and scatter plot with identity line for visualization and point and interval estimates for different metrics related to reproducibility/repeatability/agreement including the concordance correlation coefficient, intraclass correlation coefficient, within-subject coefficient of variation, smallest detectable difference, and mean normalized smallest detectable difference.
AgreementInterval Agreement Interval of Two Measurement Methods
A tool for calculating agreement interval of two measurement methods (Jason Liao (2015) <DOI:10.1515/ijb-2014-0030>) and present results in plots with discordance rate and/or clinically meaningful limit to quantify agreement quality.
agriTutorial Tutorial Analysis of Some Agricultural Experiments
Example software for the analysis of data from designed experiments, especially agricultural crop experiments. The basics of the statistical analysis of designed experiments are discussed using real examples from agricultural field trials. A range of statistical methods are exemplified using a range of R statistical packages. The experimental data is made available as separate data sets for each example and the R analysis code is made available as example code. The example code can be readily extended, as required.
AhoCorasickTrie Fast Searching for Multiple Keywords in Multiple Texts
Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.
ahp Analytical Hierarchy Process (AHP) with R
An R package to model complex decision making problems using AHP (Analytic Hierarchy Process). AHP lets you analyse complex decision making problems.
AHR Estimation and Testing of Average Hazard Ratios
Methods for estimation of multivariate average hazard ratios as defined by Kalbfleisch and Prentice. The underlying survival functions of the event of interest in each group can be estimated using either the (weighted) Kaplan-Meier estimator or the Aalen-Johansen estimator for the transition probabilities in Markov multi-state models. Right-censored and left-truncated data is supported. Moreover, the difference in restricted mean survival can be estimated.
Ake Associated Kernel Estimations
Continuous and discrete (count or categorical) estimation of density, probability mass function (pmf) and regression functions are performed using associated kernels. The cross-validation technique and the local Bayesian procedure are also implemented for bandwidth selection.
akmedoids Anchored Kmedoids for Longitudinal Data Clustering
Advances a novel adaptation of longitudinal k-means clustering technique (Genolini et al. (2015) <doi:10.18637/jss.v065.i04>) for grouping trajectories based on the similarities of their long-term trends and determines the optimal solution based on the Calinski-Harabatz criterion (Calinski and Harabatz (1974) <doi:10.1080/03610927408827101>). Includes functions to extract descriptive statistics and generate a visualisation of the resulting groups, drawing methods from the ‘ggplot2’ library (Wickham H. (2016) <doi:10.1007/978-3-319-24277-4>). The package also includes a number of other useful functions for exploring and manipulating longitudinal data prior to the clustering process.
albopictus Age-Structured Population Dynamics Model
Implements discrete time deterministic and stochastic age-structured population dynamics models described in Erguler and others (2016) <doi:10.1371/journal.pone.0149282> and Erguler and others (2017) <doi:10.1371/journal.pone.0174293>.
ALEPlot Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots
Visualizes the main effects of individual predictor variables and their second-order interaction effects in black-box supervised learning models. The package creates either Accumulated Local Effects (ALE) plots and/or Partial Dependence (PD) plots, given a fitted supervised learning model.
algorithmia Allows you to Easily Interact with the Algorithmia Platform
The company, Algorithmia, houses the largest marketplace of online algorithms. This package essentially holds a bunch of REST wrappers that make it very easy to call algorithms in the Algorithmia platform and access files and directories in the Algorithmia data API. To learn more about the services they offer and the algorithms in the platform visit <http://algorithmia.com>. More information for developers can be found at <http://developers.algorithmia.com>.
algstat Algebraic statistics in R
algstat provides functionality for algebraic statistics in R. Current applications include exact inference in log-linear models for contingency table data, analysis of ranked and partially ranked data, and general purpose tools for multivariate polynomials, building on the mpoly package. To aid in the process, algstat has ports to Macaulay2, Bertini, LattE-integrale and 4ti2.
alignfigR Visualizing Multiple Sequence Alignments with ‘ggplot2’
Create extensible figures of multiple sequence alignments, using the ‘ggplot2’ plotting engine. ‘alignfigr’ will create a baseline figure of a multiple sequence alignment which can be fully customized to the user’s liking with standard ‘ggplot2’ features.
AlignStat Comparison of Alternative Multiple Aequence Alignments
Methods for comparing two alternative multiple sequence alignments (MSAs) to determine whether they align homologous residues in the same columns as one another. It then classifies similarities and differences into conserved gaps, conserved sequence, merges, splits or shifts of one MSA relative to the other. Summarising these categories for each MSA column yields information on which sequence regions are agreed upon my both MSAs, and which differ. Several plotting functions enable easily visualisation of the comparison data for analysis.
alineR Alignment of Phonetic Sequence Using the ‘ALINE’ Algorithm
Functions are provided to calculate the ‘ALINE’ Distance between a cognate pair. The score is based on phonetic features represented using the Unicode-compliant International Phonetic Alphabet (IPA). Parameterized features weights used to determine the optimal alignment and functions are provided to estimate optimum values.This project was funded by the National Science Foundation Cultural Anthropology Program (Grant number SBS-1030031) and the University of Maryland College of Behavioral and Social Sciences.
allanvar Allan Variance Analysis
A collection of tools for stochastic sensor error characterization using the Allan Variance technique originally developed by D. Allan.
alluvial Alluvial Diagrams
Creating alluvial diagrams (also known as parallel sets plots) for multivariate and time series-like data.
alpaca Fit GLM’s with High-Dimensional k-Way Fixed Effects
Provides a routine to concentrate out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm proposed by Stammann (2018) <arXiv:1707.01815> and is restricted to glm’s that are based on maximum likelihood estimation and non-linear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine. The package also includes robust and multi-way clustered standard errors.
alphaOutlier Obtain Alpha-Outlier Regions for Well-Known Probability Distributions
Given the parameters of a distribution, the package uses the concept of alpha-outliers by Davies and Gather (1993) to flag outliers in a data set. See Davies, L.; Gather, U. (1993): The identification of multiple outliers, JASA, 88 423, 782-792, doi: 10.1080/01621459.1993.10476339 for details.
alphashape3d Implementation of the 3D Alpha-Shape for the Reconstruction of 3D Sets from a Point Cloud
Implementation in R of the alpha-shape of a finite set of points in the three-dimensional space. The alpha-shape generalizes the convex hull and allows to recover the shape of non-convex and even non-connected sets in 3D, given a random sample of points taken into it. Besides the computation of the alpha-shape, this package provides users with functions to compute the volume of the alpha-shape, identify the connected components and facilitate the three-dimensional graphical visualization of the estimated set.
alphastable Inference for Stable Distribution
Developed to perform the tasks given by the following. 1-computing the probability density function and distribution function of a univariate stable distribution; 2- generating realization from univariate stable, truncated stable, multivariate elliptically contoured stable, and bivariate strictly stable distributions; 3- estimating the parameters of univariate symmetric stable, skew stable, Cauchy, multivariate elliptically contoured stable, and multivariate strictly stable distributions; 4- estimating the parameters of the mixture of symmetric stable and mixture of Cauchy distributions.
AlphaVantageClient Wrapper for Alpha Vantage API
Download data from the Alpha Vantage API (<https://…/> ). Alpha Vantage is a RESTful API which provides various financial data, including stock prices and technical indicators. There is documentation for the underlying API available here: <https://…/>. To get access to this API, the user needs to first claim an API key: <https://…/>.
alphavantager Lightweight R Interface to the Alpha Vantage API
Alpha Vantage has free historical financial information. All you need to do is get a free API key at <https://www.alphavantage.co>. Then you can use the R interface to retrieve free equity information. Refer to the Alpha Vantage website for more information.
alR Arc Lengths of Statistical Functions
Estimation, regression and classification using arc lengths.
altmeta Alternative Meta-Analysis Methods
Provides alternative statistical methods for meta-analysis, including new heterogeneity tests, estimators of between-study variance, and heterogeneity measures that are robust to outliers.
ambient A Generator of Multidimensional Noise
Generation of natural looking noise has many application within simulation, procedural generation, and art, to name a few. The ‘ambient’ package provides an interface to the ‘FastNoise’ C++ library and allows for efficient generation of perlin, simplex, worley, cubic, value, and white noise with optional pertubation in either 2, 3, or 4 (in case of simplex and white noise) dimensions.
AMCTestmakeR Generate LaTeX Code for Auto-Multiple-Choice (AMC)
Generate code for use with the Optical Mark Recognition free software Auto Multiple Choice (AMC). More specifically, this package provides functions that use as input the question and answer texts, and output the LaTeX code for AMC.
amelie Anomaly Detection with Normal Probability Functions
Implements anomaly detection as binary classification for cross-sectional data. Uses maximum likelihood estimates and normal probability functions to classify observations as anomalous. The method is presented in the following lecture from the Machine Learning course by Andrew Ng: <https://…/>, and is also described in: Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, Jaideep Srivastava (2003) <doi:10.1137/1.9781611972733.3>.
AMIAS Alternating Minimization Induced Active Set Algorithms
An implementation of alternating minimization induced active set (AMIAS) method for solving the L0 regularized learning problems. It includes a piecewise smooth estimator by minimizing the least squares function with constraints on the number of kink points in the discrete derivatives. It also includes generalized structural sparsity via composite L0 penalty. Both time series and image segmentation can be handled by this package.
ammistability Additive Main Effects and Multiplicative Interaction Model Stability Parameters
Computes various stability parameters from Additive Main Effects and Multiplicative Interaction (AMMI) analysis results such as Modified AMMI Stability Value (MASV), Sums of the Absolute Value of the Interaction Principal Component Scores (SIPC), Sum Across Environments of Genotype-Environment Interaction Modelled by AMMI (AMGE), Sum Across Environments of Absolute Value of Genotype-Environment Interaction Modelled by AMMI (AV_(AMGE)), AMMI Stability Index (ASI), Modified ASI (MASI), AMMI Based Stability Parameter (ASTAB), Annicchiarico’s D Parameter (DA), Zhang’s D Parameter (DZ), Averages of the Squared Eigenvector Values (EV), Stability Measure Based on Fitted AMMI Model (FA), Absolute Value of the Relative Contribution of IPCs to the Interaction (Za). Further calculates the Simultaneous Selection Index for Yield and Stability from the computed stability parameters. See the vignette for complete list of citations for the methods implemented.
aMNLFA Automated Fitting of Moderated Nonlinear Factor Analysis Through the ‘Mplus’ Program
Automated generation, running, and interpretation of moderated nonlinear factor analysis models for obtaining scores from observed variables. This package creates ‘Mplus’ input files which may be run iteratively to test two different types of covariate effects on items: (1) latent variable impact (both mean and variance); and (2) differential item functioning. After sequentially testing for all effects, it also creates a final model by including all significant effects after adjusting for multiple comparisons. Finally, the package creates a scoring model which uses the final values of parameter estimates to generate latent variable scores.
ampd An Algorithm for Automatic Peak Detection in Noisy Periodic and Quasi- Periodic Signals
A method for automatic detection of peaks in noisy periodic and quasi-periodic signals. This method, called automatic multiscale-based peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scale-dependent occurrences of local maxima.
analyz Model Layer for Automatic Data Analysis
Class with methods to read and execute R commands described as steps in a CSV file.
anapuce Tools for Microarray Data Analysis
Functions for normalisation, differentially analysis of microarray data and local False Discovery Rate.
anfis Adaptive Neuro Fuzzy Inference System in R
The package implements ANFIS Type 3 Takagi and Sugeno’s fuzzy if-then rule network with the following features: (1) Independent number of membership functions(MF) for each input, and also different MF extensible types. (2) Type 3 Takagi and Sugeno’s fuzzy if-then rule (3) Full Rule combinations, e.g. 2 inputs 2 membership funtions -> 4 fuzzy rules (4) Hibrid learning, i.e. Descent Gradient for precedents and Least Squares Estimation for consequents (5) Multiple outputs.
aniDom Inferring Dominance Hierarchies and Estimating Uncertainty
Provides: (1) Tools to infer dominance hierarchies based on calculating Elo scores, but with custom functions to improve estimates in animals with relatively stable dominance ranks. (2) Tools to plot the shape of the dominance hierarchy and estimate the uncertainty of a given data set.
anipaths Animate Paths
Animation of observed trajectories using spline-based interpolation (see for example, Buderman, F. E., Hooten, M. B., Ivan, J. S. and Shenk, T. M. (2016), <doi:10.1111/2041-210X.12465> ‘A functional model for characterizing long-distance movement behaviour’. Methods Ecol Evol). Intended to be used exploratory data analysis, and perhaps for preparation of presentations.
ANLP Build Text Prediction Model
Library to sample and clean text data, build N-gram model, Backoff algorithm etc.
anMC Compute High Dimensional Orthant Probabilities
Computationally efficient method to estimate orthant probabilities of high-dimensional Gaussian vectors. Further implements a function to compute conservative estimates of excursion sets under Gaussian random field priors.
ANN2 Artificial Neural Networks for Anomaly Detection
Training of general classification and regression neural networks using gradient descent. Special features include a function for training autoencoders as well as an implementation of replicator neural networks, for details see Hawkins et al. (2012) <doi:10.1007/3-540-46145-0_17>. Multiple activation and cost functions (including Huber and pseudo-Huber) are included, as well as L1 and L2 regularization, momentum, early stopping and the possibility to specify a learning rate schedule. The package contains a vectorized gradient descent implementation which facilitates faster training through batch learning.
AnnuityRIR Annuity Random Interest Rates
Annuity Random Interest Rates proposes different techniques for the approximation of the present and final value of a unitary annuity-due or annuity-immediate considering interest rate as a random variable. Cruz Rambaud et al. (2017) <doi:10.1007/978-3-319-54819-7_16>. Cruz Rambaud et al. (2015) <doi:10.23755/rm.v28i1.25>.
anocva A Non-Parametric Statistical Test to Compare Clustering Structures
Provides ANOCVA (ANalysis Of Cluster VAriability), a non-parametric statistical test to compare clustering structures with applications in functional magnetic resonance imaging data (fMRI). The ANOCVA allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering.
ANOM Analysis of Means
Analysis of means (ANOM) as used in technometrical computing. The package takes results from multiple comparisons with the grand mean (obtained with multcomp, SimComp, nparcomp, or MCPAN) or corresponding simultaneous confidence intervals as input and produces ANOM decision charts that illustrate which group means deviate significantly from the grand mean.
anomalous Anomalous time series package for R
It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.
anomalous-acm Anomalous time series package for R (ACM)
It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.
anomaly Detecting Anomalies in Data
An implementation of CAPA (Collective And Point Anomaly) by Fisch, Eckley and Fearnhead (2018) <arXiv:1806.01947> for the detection of anomalies in time series data. The package also contains Kepler lightcurve data and shows how CAPA can be applied to detect exoplanets.
anomalyDetection Implementation of Augmented Network Log Anomaly Detection Procedures
Implements procedures to aid in detecting network log anomalies. By combining various multivariate analytic approaches relevant to network anomaly detection, it provides cyber analysts efficient means to detect suspected anomalies requiring further evaluation.
AnomalyDetection Anomaly Detection with R
AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences.
anominate Alpha-NOMINATE Ideal Point Estimator
Fits ideal point model described in Carroll, Lewis, Lo, Poole and Rosenthal (2013), ‘The Structure of Utility in Models of Spatial Voting,’ American Journal of Political Science 57(4): 1008–1028, <doi:10.1111/ajps.12029>.
anonymizer Anonymize Data Containing Personally Identifiable Information
Allows users to quickly and easily anonymize data containing Personally Identifiable Information (PII) through convenience functions.
ANOVAShiny Interactive Document for Working with Analysis of Variance
An interactive document on the topic of one-way and two-way analysis of variance using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
antaresViz Antares Visualizations
Visualize results generated by Antares, a powerful software developed by RTE to simulate and study electric power systems (more information about Antares here: <https://antares.rte-france.com> ). This package provides functions that create interactive charts to help Antares users visually explore the results of their simulations. You can see the results of several ANTARES studies here : <http://…/>.
antiword Extract Text from Microsoft Word Documents
Wraps the ‘AntiWord’ utility to extract text from Microsoft Word documents. The utility only supports the old ‘doc’ format, not the new xml based ‘docx’ format.
anyLib Install and Load Any Package from CRAN, Bioconductor or Github
Made to make your life simpler with packages, by installing and loading a list of packages, whether they are on CRAN, Bioconductor or github. For github, if you do not have the full path, with the maintainer name in it (e.g. ‘achateigner/topReviGO’), it will be able to load it but not to install it.
anytime Anything to ‘POSIXct’ Converter
Convert input in character, integer, or numeric form into ‘POSIXct’ objects, using one of a number of predefined formats, and relying on Boost facilities for date and time parsing.
Aoptbdtvc A-Optimal Block Designs for Comparing Test Treatments with Controls
A collection of functions to construct A-optimal block designs for comparing test treatments with one or more control(s). Mainly A-optimal balanced treatment incomplete block designs, weighted A-optimal balanced treatment incomplete block designs, A-optimal group divisible treatment designs and A-optimal balanced bipartite block designs can be constructed using the package. The designs are constructed using algorithms based on linear integer programming. To the best of our knowledge, these facilities to construct A-optimal block designs for comparing test treatments with one or more controls are not available in the existing R packages. For more details on designs for tests versus control(s) comparisons, please see Hedayat, A. S. and Majumdar, D. (1984) <doi:10.1080/00401706.1984.10487989> A-Optimal Incomplete Block Designs for Control-Test Treatment Comparisons, Technometrics, 26, 363-370 and Mandal, B. N. , Gupta, V. K., Parsad, Rajender. (2017) <doi:10.1080/03610926.2015.1071394> Balanced treatment incomplete block designs through integer programming. Communications in Statistics – Theory and Methods 46(8), 3728-3737.
apa Format Outputs of Statistical Tests According to APA Guidelines
Formatter functions in the ‘apa’ package take the return value of a statistical test function, e.g. a call to chisq.test() and return a string formatted according to the guidelines of the APA (American Psychological Association).
ApacheLogProcessor Process the Apache Web Server Log Files
Provides capabilities to process Apache HTTPD Log files.The main functionalities are to extract data from access and error log files to data frames.
apc Age-Period-Cohort Analysis
Functions for age-period-cohort analysis. The data can be organised in matrices indexed by age-cohort, age-period or cohort-period. The data can include dose and response or just doses. The statistical model is a generalized linear model (GLM) allowing for 3,2,1 or 0 of the age-period-cohort factors. The canonical parametrisation of Kuang, Nielsen and Nielsen (2008) is used. Thus, the analysis does not rely on ad hoc identification.
apc: An R Package for Age-Period-Cohort Analysis
apcf Adapted Pair Correlation Function
The adapted pair correlation function transfers the concept of the pair correlation function from point patterns to patterns of objects of finite size and irregular shape (e.g. lakes within a country). This is a reimplementation of the method suggested by Nuske et al. (2009) <doi:10.1016/j.foreco.2009.09.050> using the libraries ‘GEOS’ and ‘GDAL’ directly instead of through ‘PostGIS’.
apdesign An Implementation of the Additive Polynomial Design Matrix
An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality.
APfun Geo-Processing Base Functions
Base tools for facilitating the creation geo-processing functions in R.
aphid Analysis with Profile Hidden Markov Models
Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification.
APML0 Augmented and Penalized Minimization Method L0
Fit linear and Cox models regularized with L0, lasso (L1), elastic-net (L1 and L2), or net (L1 and Laplacian) penalty, and their adaptive forms, such as adaptive lasso / elastic-net and net adjusting for signs of linked coefficients. It solves L0 penalty problem by simultaneously selecting regularization parameters and the number of non-zero coefficients. This augmented and penalized minimization method provides an approximation solution to the L0 penalty problem, but runs as fast as L1 regularization problem. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients. It could deal with very high dimensional data and has superior selection performance.
apng Convert Png Files into Animated Png
Convert several png files into an animated png file. This package exports only a single function `apng’. Call the apng function with a vector of file names (which should be png files) to convert them to a single animated png file.
apollo Tools for Estimating Discrete Choice Models
The Choice Modelling Centre at the University of Leeds has developed flexible estimation code for choice models in R. Users are able to write their own likelihood functions or use a mix of already available ones. Mixing, in the form of random coefficients and components is allowed for all models. Both classical and Bayesian estimation are available. Multi-threading processing is supported. For more information on discrete choice models see Train, K. (2009) <isbn:978-0-521-74738-7>.
APPEstimation Adjusted Prediction Model Performance Estimation
Calculating predictive model performance measures adjusted for predictor distributions using density ratio method (Sugiyama et al., (2012, ISBN:9781139035613)). L1 and L2 error for continuous outcome and C-statistics for binomial outcome are computed.
approxmatch Approximately Optimal Fine Balance Matching with Multiple Groups
Tools for constructing a matched design with multiple comparison groups. Further specifications of refined covariate balance restriction and exact match on covariate can be imposed. Matches are approximately optimal in the sense that the cost of the solution is at most twice the optimal cost, Crama and Spieksma (1992) <doi:10.1016/0377-2217(92)90078-N>.
apricom Tools for the a Priori Comparison of Regression Modelling Strategies
Tools to compare several model adjustment and validation methods prior to application in a final analysis.
APtools Average Positive Predictive Values (AP) for Binary Outcomes and Censored Event Times
We provide tools to estimate two prediction performance metrics, the average positive predictive values (AP) as well as the well-known AUC (the area under the receiver operator characteristic curve) for risk scores or marker. The outcome of interest is either binary or censored event time. Note that for censored event time, our functions estimate the AP and the AUC are time-dependent for pre-specified time interval(s). A function that compares the APs of two risk scores/markers is also included. Optional outputs include positive predictive values and true positive fractions at the specified marker cut-off values, and a plot of the time-dependent AP versus time (available for event time data).
AR Another Look at the Acceptance-Rejection Method
In mathematics, ‘rejection sampling’ is a basic technique used to generate observations from a distribution. It is also commonly called ‘the Acceptance-Rejection method’ or ‘Accept-Reject algorithm’ and is a type of Monte Carlo method. ‘Acceptance-Rejection method’ is based on the observation that to sample a random variable one can perform a uniformly random sampling of the 2D cartesian graph, and keep the samples in the region under the graph of its density function. Package ‘AR’ is able to generate/simulate random data from a probability density function by Acceptance-Rejection method. Moreover, this package is a useful teaching resource for graphical presentation of Acceptance-Rejection method. From the practical point of view, the user needs to calculate a constant in Acceptance-Rejection method, which package ‘AR’ is able to compute this constant by optimization tools. Several numerical examples are provided to illustrate the graphical presentation for the Acceptance-Rejection Method.
ar.matrix Simulate Auto Regressive Data from Precision Matricies
Using sparse precision matricies and Choleski factorization simulates data that is auto-regressive.
arabicStemR Arabic Stemmer for Text Analysis
Allows users to stem Arabic texts for text analysis.
arc Association Rule Classification
Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification (ARC). The package also contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the ‘arules’ package.
ARCensReg Fitting Univariate Censored Linear Regression Model with Autoregressive Errors
It fits an univariate left or right censored linear regression model with autoregressive errors under the normal distribution. It provides estimates and standard errors of the parameters, prediction of future observations and it supports missing values on the dependent variable. It also provides convergence plots when exists at least one censored observation.
ArCo Artificial Counterfactual Package
Set of functions to analyse and estimate Artificial Counterfactual models from Carvalho, Masini and Medeiros (2016) <DOI:10.2139/ssrn.2823687>.
areaplot Stacked Area Plot
Produce a stacked area plot, or add polygons to an existing plot. The data can be a numeric vector, table, matrix, data frame, or a time-series object. Supports formula syntax and data can be plotted as proportions, so stacked areas equal 1.
arena2r Plots, Summary Statistics and Tools for Arena Simulation Users
Reads Arena <https://…/> CSV output files and generates nice tables and plots. The package contains a Shiny App that can be used to interactively visualize Arena’s results.
ArfimaMLM Arfima-MLM Estimation For Repeated Cross-Sectional Data
Functions to facilitate the estimation of Arfima-MLM models for repeated cross-sectional data and pooled cross-sectional time-series data (see Lebo and Weber 2015). The estimation procedure uses double filtering with Arfima methods to account for autocorrelation in repeated cross-sectional data followed by multilevel modeling (MLM) to estimate aggregate as well as individual-level parameters simultaneously.
argon2 Secure Password Hashing
Utilities for secure password hashing via the argon2 algorithm. It is a relatively new hashing algorithm and is believed to be very secure. The ‘argon2’ implementation included in the package is the reference implementation. The package also includes some utilities that should be useful for digest authentication, including a wrapper of ‘blake2b’. For similar R packages, see sodium and ‘bcrypt’. See <https://…/Argon2> or <https://…/430.pdf> for more information.
ArgumentCheck Improved Communication to Users with Respect to Problems in Function Arguments
The typical process of checking arguments in functions is iterative. In this process, an error may be returned and the user may fix it only to receive another error on a different argument. ‘ArgumentCheck’ facilitates a more helpful way to perform argument checks allowing the programmer to run all of the checks and then return all of the errors and warnings in a single message.
ARHT Adaptable Regularized Hotelling’s T^2 Test for High-Dimensional Data
Perform the Adaptable Regularized Hotelling’s T^2 test (ARHT) proposed by Li et al., (2016) <arXiv:1609.08725>. Both one-sample and two-sample mean test are available with various probabilistic alternative prior models. It contains a function to consistently estimate higher order moments of the population covariance spectral distribution using the spectral of the sample covariance matrix (Bai et al. (2010) <doi:10.1111/j.1467-842X.2010.00590.x>). In addition, it contains a function to sample from 3-variate chi-squared random vectors approximately with a given correlation matrix when the degrees of freedom are large.
ari Automated R Instructor
Create videos from ‘R Markdown’ documents, or images and audio files. These images can come from image files or HTML slides, and the audio files can be provided by the user or computer voice narration can be created using ‘Amazon Polly’. The purpose of this package is to allow users to create accessible, translatable, and reproducible lecture videos. See <https://…/> for more information.
aricode Efficient Computations of Standard Clustering Comparison Measures
Implements an efficient O(n) algorithm based on bucket-sorting for fast computation of standard clustering comparison measures. Available measures include adjusted Rand index (ARI), normalized information distance (NID), normalized mutual information (NMI), normalized variation information (NVI) and entropy, as described in Vinh et al (2009) <doi:10.1145/1553374.1553511>.
arkdb Archive and Unarchive Databases Using Flat Files
Flat text files provide a more robust, compressible, and portable way to store tables. This package provides convenient functions for exporting tables from relational database connections into compressed text files and streaming those text files back into a database without requiring the whole table to fit in working memory.
arpr Advanced R Pipes
Provides convenience functions for programming with magrittr pipes. Conditional pipes, a string prefixer and a function to pipe the given object into a specific argument given by character name are currently supported. It is named after the dadaist Hans Arp, a friend of Rene Magritte.
arqas Application in R for Queueing Analysis and Simulation
Provides functions for compute the main characteristics of the following queueing models: M/M/1, M/M/s, M/M/1/k, M/M/s/k, M/M/1/Inf/H, M/ M/s/Inf/H, M/M/s/Inf/H with Y replacements, M/M/Inf, Open Jackson Networks and Closed Jackson Networks. Moreover, it is also possible to simulate similar queueing models with any type of arrival or service distribution: G/ G/1, G/G/s, G/G/1/k, G/G/s/k, G/G/1/Inf/H, G/G/s/Inf/H, G/G/s/Inf/H with Y replacements, Open Networks and Closed Networks. Finally, contains functions for fit data to a statistic distribution.
arrangements Fast Generators and Iterators for Permutations, Combinations and Partitions
Fast generators and iterators for permutations, combinations and partitions. The iterators allow users to generate arrangements in a memory efficient manner and the generated arrangements are in lexicographical (dictionary) order. Permutations and combinations can be drawn with/without replacement and support multisets. It has been demonstrated that ‘arrangements’ outperforms most of the existing packages of similar kind. Some benchmarks could be found at <https://…/benchmark.html>.
arsenal An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries
An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types ‘by’ the levels of a categorical variable; modelsum(), which performs simple model fits on the same endpoint for many variables (univariate or adjusted for standard covariates); and freqlist(), a powerful frequency table across many categorical variables.
ART Aligned Rank Transform for Nonparametric Factorial Analysis
An implementation of the Aligned Rank Transform technique for factorial analysis (see references below for details) including models with missing terms (unsaturated factorial models). The function first computes a separate aligned ranked response variable for each effect of the user-specified model, and then runs a classic ANOVA on each of the aligned ranked responses. For further details, see Higgins, J. J. and Tashtoush, S. (1994). An aligned rank transform test for interaction. Nonlinear World 1 (2), pp. 201-211. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins,J.J. (2011). The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’11). New York: ACM Press, pp. 143-146. <doi:10.1145/1978942.1978963>.
artfima Fit ARTFIMA Model
Fit and simulate ARTFIMA. Theoretical autocovariance function and spectral density function for stationary ARTFIMA.
ARTIVA Time-Varying DBN Inference with the ARTIVA (Auto Regressive TIme VArying) Model
Reversible Jump MCMC (RJ-MCMC)sampling for approximating the posterior distribution of a time varying regulatory network, under the Auto Regressive TIme VArying (ARTIVA) model (for a detailed description of the algorithm, see Lebre et al. BMC Systems Biology, 2010). Starting from time-course gene expression measurements for a gene of interest (referred to as ‘target gene’) and a set of genes (referred to as ‘parent genes’) which may explain the expression of the target gene, the ARTIVA procedure identifies temporal segments for which a set of interactions occur between the ‘parent genes’ and the ‘target gene’. The time points that delimit the different temporal segments are referred to as changepoints (CP).
arules Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.
arulesCBA Classification Based on Association Rules
Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.
aRxiv Interface to the arXiv API
An interface to the API for arXiv, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
GitHub
as.color Assign Random Colors to Unique Items in a Vector
The as.color function takes an R vector of any class as an input, and outputs a vector of unique hexadecimal color values that correspond to the unique input values. This is most handy when overlaying points and lines for data that correspond to different levels or factors. The function will also print the random seed used to generate the colors. If you like the color palette generated, you can save the seed and reuse those colors.
asciiSetupReader Reads ‘SPSS’ and ‘SAS’ Files from ASCII Data Files (.txt) and Setup Files (.sps or .sas)
Lets you open an ‘SPSS’ or ‘SAS’ data file using a .txt file that has the data and a .sps or .sas file with setup instructions. This will only run in a txt-sps or txt-sas pair in which the setup file contains instructions to open that text file. It will NOT open other text files, .sav, .por, or ‘SAS’ files.
ashr Methods for Adaptive Shrinkage, using Empirical Bayes
The R package ‘ashr’ implements an Empirical Bayes approach for large-scale hypothesis testing and false discovery rate (FDR) estimation based on the methods proposed in M. Stephens, 2016, ‘False discovery rates: a new deal’, <DOI:10.1093/biostatistics/kxw041>. These methods can be applied whenever two sets of summary statistics—estimated effects and standard errors—are available, just as ‘qvalue’ can be applied to previously computed p-values. Two main interfaces are provided: ash(), which is more user-friendly; and ash.workhorse(), which has more options and is geared toward advanced users. The ash() and ash.workhorse() also provides a flexible modeling interface that can accomodate a variety of likelihoods (e.g., normal, Poisson) and mixture priors (e.g., uniform, normal).
asht Applied Statistical Hypothesis Tests
Some hypothesis test functions with a focus on non-asymptotic methods that have matching confidence intervals.
ASICS Automatic Statistical Identification in Complex Spectra
With a set of pure metabolite spectra, ASICS quantifies metabolites concentration in a complex spectrum. The identification of metabolites is performed by fitting a mixture model to the spectra of the library with a sparse penalty. The method and its statistical properties are described in Tardivel et al. (2017) <doi:10.1007/s11306-017-1244-5>.
AsioHeaders Asio C++ Header Files
Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. ‘Asio’ is also included in Boost but requires linking when used with Boost. Standalone it can be used header-only provided a recent-enough compiler. ‘Asio’ is written and maintained by Christopher M. Kohlhoff. ‘Asio’ is released under the ‘Boost Software License’, Version 1.0.
ASIP Automated Satellite Image Processing
Perform complex satellite image processes automatically and efficiently. Package currently supports satellite images from most widely used Landsat 4,5,7 and 8 and ASTER L1T data. The primary uses of this package is given below. 1. Conversion of optical bands to top of atmosphere reflectance. 2. Conversion of thermal bands to corresponding temperature images. 3. Derive application oriented products directly from source satellite image bands. 4. Compute user defined equation and produce corresponding image product. 5. Other basic tools for satellite image processing. References. i. Chander and Markham (2003) <doi:10.1109/TGRS.2003.818464>. ii. Roy et.al, (2014) <doi:10.1016/j.rse.2014.02.001>. iii. Abrams (2000) <doi:10.1080/014311600210326>.
askpass Safe Password Entry for R, Git, and SSH
Cross-platform utilities for prompting the user for credentials or a passphrase, for example to authenticate with a server or read a protected key. Includes native programs for MacOS and Windows, hence no ‘tcltk’ is required. Password entry can be invoked in two different ways: directly from R via the askpass() function, or indirectly as password-entry back-end for ‘ssh-agent’ or ‘git-credential’ via the SSH_ASKPASS and GIT_ASKPASS environment variables. Thereby the user can be prompted for credentials or a passphrase if needed when R calls out to git or ssh.
aSPC An Adaptive Sum of Powered Correlation Test (aSPC) for Global Association Between Two Random Vectors
The aSPC test is designed to test global association between two groups of variables potentially with moderate to high dimension (e.g. in hundreds). The aSPC is particularly useful when the association signals between two groups of variables are sparse.
aSPU Adaptive Sum of Powered Score Test
R codes for the (adaptive) Sum of Powered Score (‘SPU’ and ‘aSPU’) tests, inverse variance weighted Sum of Powered score (‘SPUw’ and ‘aSPUw’) tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests (‘SPUpath’) and adaptive ‘SPUpath’ (‘aSPUpath’) test, Gene-based Association Test that uses an extended Simes procedure (‘GATES’), Hybrid Set-based Test (‘HYST’), extended version of ‘GATES’ test for pathway-based association testing (‘Gates-Simes’). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative.
asremlPlus Augments the Use of ‘Asreml’ in Fitting Mixed Models
Provides functions that assist in automating the testing of terms in mixed models when ‘asreml’ is used to fit the models. The package ‘asreml’ is marketed by ‘VSNi’ (http://www.vsni.co.uk ) as ‘asreml-R’ and provides a computationally efficient algorithm for fitting mixed models using Residual Maximum Likelihood. The content falls into the following natural groupings: (i) Data, (ii) Object manipulation functions, (iii) Model modification functions, (iv) Model testing functions, (v) Model diagnostics functions, (vi) Prediction production and presentation functions, (vii) Response transformation functions, and (viii) Miscellaneous functions. A history of the fitting of a sequence of models is kept in a data frame. Procedures are available for choosing models that conform to the hierarchy or marginality principle and for displaying predictions for significant terms in tables and graphs.
ASSA Applied Singular Spectrum Analysis (ASSA)
Functions to model and decompose time series into principal components using singular spectrum analysis (de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>).
AssayCorrector Detection and Correction of Spatial Bias in HTS Screens
(1) Detects plate-specific spatial bias by identifying rows and columns of all plates of the assay affected by this bias (following the results of the Mann-Whitney U test) as well as assay-specific spatial bias by identifying well locations (i.e., well positions scanned across all plates of a given assay) affected by this bias (also following the results of the Mann-Whitney U test); (2) Allows one to correct plate-specific spatial bias using either the additive or multiplicative PMP (Partial Mean Polish) method (the most appropriate spatial bias model can be either specified by the user or determined by the program following the results of the Kolmogorov-Smirnov two-sample test) to correct the assay measurements as well as to correct assay-specific spatial bias by carrying out robust Z-scores within each plate of the assay and then traditional Z-scores across well locations.
assertive.data Assertions to Check Properties of Data
A set of predicates and assertions for checking the properties of (country independent) complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.data.us Assertions to Check Properties of Strings
A set of predicates and assertions for checking the properties of US-specific complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.files Assertions to Check Properties of Files
A set of predicates and assertions for checking the properties of files and connections. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.numbers Assertions to Check Properties of Numbers
A set of predicates and assertions for checking the properties of numbers. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.properties Assertions to Check Properties of Variables
A set of predicates and assertions for checking the properties of variables, such as length, names and attributes. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.reflection Assertions for Checking the State of R
A set of predicates and assertions for checking the state and capabilities of R, the operating system it is running on, and the IDE being used. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.sets Assertions to Check Properties of Sets
A set of predicates and assertions for checking the properties of sets. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.strings Assertions to Check Properties of Strings
A set of predicates and assertions for checking the properties of strings. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.types Assertions to Check Types of Variables
A set of predicates and assertions for checking the types of variables. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertr Assertive programming for R analysis pipelines
The assertr package supplies a suite of functions designed to verify assumptions about data early in an dplyr/magrittr analysis pipeline so that data errors are spotted early and can be addressed quickly.
assist A Suite of R Functions Implementing Spline Smoothing Techniques
A comprehensive package for fitting various non-parametric/semi-parametric linear/nonlinear fixed/mixed smoothing spline models.
ASSOCShiny Interactive Document for Working with Association Rule Mining Analysis
An interactive document on the topic of association rule mining analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
assortnet Calculate the Assortativity Coefficient of Weighted and Binary Networks
Functions to calculate the assortment of vertices in social networks. This can be measured on both weighted and binary networks, with discrete or continuous vertex values.
AST Age-Spatial-Temporal Model
Fits a model to adjust and consider additional variations in three dimensions of age groups, time, and space on residuals excluded from a prediction model that have residual such as: linear regression, mixed model and so on. Details are given in Foreman et al. (2015) <doi:10.1186/1478-7954-10-1>.
asus Adaptive SURE Thresholding Using Side Information
Provides the ASUS procedure for estimating a high dimensional sparse parameter in the presence of auxiliary data that encode side information on sparsity. It is a robust data combination procedure in the sense that even when pooling non-informative auxiliary data ASUS would be at least as efficient as competing soft thresholding based methods that do not use auxiliary data. For more information, please see the website <http://…/ASUS.htm> and the accompanying paper.
asVPC Average Shifted Visual Predictive Checks
The visual predictive checks are well-known method to validate the nonlinear mixed effect model, especially in pharmacometrics area. The average shifted visual predictive checks are the newly developed method of Visual predictive checks combined with the idea of the average shifted histogram.
asymmetry The Slide-Vector Model for Multidimensional Scaling of Asymmetric Data
The slide-vector model is provided in this package together with functions for the analysis and graphical display of asymmetry. The slide vector model is a scaling model for asymmetric data. A distance model is fitted to the symmetric part of the data whereas the asymmetric part of the data is represented by projections of the coordinates onto the slide-vector. The slide-vector points in the direction of large asymmetries in the data. The distance is modified in such a way that the distance between two points that are parallel to the slide-vector is larger in the direction of this vector. The distance is smaller in the opposite direction. If the line connecting two points is perpendicular to the slide-vector the difference between the two projections is zero. In this case the distance between the two points is symmetric. The algorithm for fitting this model is derived from the majorization approach to multidimensional scaling.
atable Create Tables for Reporting Clinical Trials
Create Tables For Reporting Clinical Trials. Calculates descriptive statistics and hypothese tests, arranges the results in a table ready for reporting with LaTeX or Word.
ATE Inference for Average Treatment Effects using Covariate Balancing
Nonparametric estimation and inference for average treatment effects based on covariate balancing.
atlas Stanford ‘ATLAS’ Search Engine API
Stanford ‘ATLAS’ (Advanced Temporal Search Engine) is a powerful tool that allows constructing cohorts of patients extremely quickly and efficiently. This package is designed to interface directly with an instance of ‘ATLAS’ search engine and facilitates API queries and data dumps. Prerequisite is a good knowledge of the temporal language to be able to efficiently construct a query. More information available at <https://…/start>.
ATR Alternative Tree Representation
Plot party trees in left-right orientation instead of the classical top-down layout.
aTSA Alternative Time Series Analysis
Contains some tools for testing, analyzing time series data and fitting popular time series models such as ARIMA, Moving Average and Holt Winters, etc. Most functions also provide nice and clear outputs like SAS does, such as identify, estimate and forecast, which are the same statements in PROC ARIMA in SAS.
attachment Deal with Dependencies
Tools to help manage dependencies during package development. This can retrieve all dependencies that are used in R files in the ‘R’ directory, in Rmd files in ‘vignettes’ directory and in ‘roxygen2’ documentation of functions. There is a function to update the Description file of your package and a function to create a file with the R commands to install all dependencies of your package. All functions to retrieve dependencies of R scripts and Rmd files can be used independently of a package development.
attempt Easy Condition Handling
A friendlier condition handler, inspired by ‘purrr’ mappers and based on ‘rlang’. ‘attempt’ extends and facilitates condition handling by providing a consistent grammar, and provides a set of easy to use functions for common tests and conditions. ‘attempt’ only depends on ‘rlang’, and focuses on speed, so it can be easily integrated in other functions and used in data analysis.
attrCUSUM Tools for Attribute VSI CUSUM Control Chart
An implementation of tools for design of attribute variable sampling interval cumulative sum chart. It currently provides information for monitoring of mean increase such as average number of sample to signal, average time to signal, a matrix of transient probabilities, suitable control limits when the data are (zero inflated) Poisson/binomial distribution. Functions in the tools can be easily applied to other count processes. Also, tools might be extended to more complicated cumulative sum control chart. We leave these issues as our perpetual work.
auctestr Statistical Testing for AUC Data
Performs statistical testing to compare predictive models based on multiple observations of the A’ statistic (also known as Area Under the Receiver Operating Characteristic Curve, or AUC). Specifically, it implements a testing method based on the equivalence between the A’ statistic and the Wilcoxon statistic. For more information, see Hanley and McNeil (1982) <doi:10.1148/radiology.143.1.7063747>.
auditor Model Audit – Verification, Validation, and Error Analysis
Provides an easy to use unified interface for creating validation plots for any model. The ‘auditor’ helps to avoid repetitive work consisting of writing code needed to create residual plots. This visualizations allow to asses and compare the goodness of fit, performance, and similarity of models.
augmentedRCBD Analysis of Augmented Randomised Complete Block Designs
Functions for analysis of data generated from experiments in augmented randomised complete block design according to Federer, W.T. (1961) <doi:10.2307/2527837>. Computes analysis of variance, adjusted means, descriptive statistics, genetic variability statistics etc. Further includes data visualization and report generation functions.
augSIMEX Analysis of Data with Mixed Measurement Error and Misclassification in Covariates
Implementation of the augmented Simulation-Extrapolation (SIMEX) algorithm proposed by Yi et al. (2015) <doi:10.1080/01621459.2014.922777> for analyzing the data with mixed measurement error and misclassification. The main function provides a similar summary output as that of glm() function. Both parametric and empirical SIMEX are considered in the package.
aurelius Generates PFA Documents from R Code and Optionally Runs Them
Provides tools for converting R objects and syntax into the Portable Format for Analytics (PFA). Allows for testing validity and runtime behavior of PFA documents through rPython and Titus, a more complete implementation of PFA for Python. The Portable Format for Analytics is a specification for event-based processors that perform predictive or analytic calculations and is aimed at helping smooth the transition from statistical model development to large-scale and/or online production. See <http://dmg.org/pfa> for more information.
AurieLSHGaussian Creates a Neighbourhood Using Locality Sensitive Hashing for Gaussian Projections
Uses locality sensitive hashing and creates a neighbourhood graph for a data set and calculates the adjusted rank index value for the same. It uses Gaussian random planes to decide the nature of a given point. Datar, Mayur, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni(2004) <doi:10.1145/997817.997857>.
auRoc Various Methods to Estimate the AUC
Estimate the AUC using a variety of methods as follows: (1) frequentist nonparametric methods based on the Mann-Whitney statistic or kernel methods. (2) frequentist parametric methods using the likelihood ratio test based on higher-order asymptotic results, the signed log-likelihood ratio test, the Wald test, or the approximate ”t” solution to the Behrens-Fisher problem. (3) Bayesian parametric MCMC methods.
auto.pca Automatic Variable Reduction Using Principal Component Analysis
PCA done by eigenvalue decomposition of a data correlation matrix, here it automatically determines the number of factors by eigenvalue greater than 1 and it gives the uncorrelated variables based on the rotated component scores, Such that in each principal component variable which has the high variance are selected. It will be useful for non-statisticians in selection of variables. For more information, see the <http://…/ijcem_032013_06.pdf> web page.
autoBagging Learning to Rank Bagging Workflows with Metalearning
A framework for automated machine learning. Concretely, the focus is on the optimisation of bagging workflows. A bagging workflows is composed by three phases: (i) generation: which and how many predictive models to learn; (ii) pruning: after learning a set of models, the worst ones are cut off from the ensemble; and (iii) integration: how the models are combined for predicting a new observation. autoBagging optimises these processes by combining metalearning and a learning to rank approach to learn from metadata. It automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. A complete description of the method can be found in: Pinto, F., Cerqueira, V., Soares, C., Mendes-Moreira, J. (2017): ‘autoBagging: Learning to Rank Bagging Workflows with Metalearning’ arXiv preprint arXiv:1706.09367.
automagic Automagically Document and Install Packages Necessary to Run R Code
Parse R code in a given directory for R packages and attempt to install them from CRAN or GitHub. Optionally use a dependencies file for tighter control over which package versions to install.
automl Deep Learning with Metaheuristic
Fits from simple regression to highly customizable deep neural networks either with gradient descent or metaheuristic, using automatic hyper parameters tuning and custom cost function. A mix inspired by the common tricks on Deep Learning and Particle Swarm Optimization.
AutoModel Automated Hierarchical Multiple Regression with Assumptions Checking
A set of functions that automates the process and produces reasonable output for hierarchical multiple regression models. It allows you to specify predictor blocks, from which it generates all of the linear models, and checks the assumptions of the model, producing the requisite plots and statistics to allow you to judge the suitability of the model.
AutoPipe Automated Transcriptome Classifier Pipeline: Comprehensive Transcriptome Analysis
An unsupervised fully-automated pipeline for transcriptome analysis or a supervised option to identify characteristic genes from predefined subclasses. We rely on the ‘pamr’ <http://…/pamr.html> clustering algorithm to cluster the Data and then draw a heatmap of the clusters with the most significant genes and the least significant genes according to the ‘pamr’ algorithm. This way we get easy to grasp heatmaps that show us for each cluster which are the clusters most defining genes.
autoplotly Automatic Generation of Interactive Visualizations for Popular Statistical Results
Functionalities to automatically generate interactive visualizations for popular statistical results supported by ‘ggfortify’, such as time series, PCA, clustering and survival analysis, with ‘plotly.js’ <https://plot.ly/> and ‘ggplot2’ style. The generated visualizations can also be easily extended using ‘ggplot2’ and ‘plotly’ syntax while staying interactive.
AutoregressionMDE Minimum Distance Estimation in Autoregressive Model
Consider autoregressive model of order p where the distribution function of innovation is unknown, but innovations are independent and symmetrically distributed. The package contains a function named ARMDE which takes X (vector of n observations) and p (order of the model) as input argument and returns minimum distance estimator of the parameters in the model.
autoSEM Performs Specification Search in Structural Equation Models
Implements multiple heuristic search algorithms for automatically creating structural equation models.
autoshiny Automatic Transformation of an ‘R’ Function into a ‘shiny’ App
Static code compilation of a ‘shiny’ app given an R function (into ‘ui.R’ and ‘server.R’ files or into a ‘shiny’ app object). See examples at <https://…/autoshiny>.
av Working with Audio and Video
Bindings to ‘FFmpeg’ <http://…/> AV library for working with audio and video in R. Generate high quality videos files by capturing images from the R graphics device combined with custom audio stream. This package interfaces directly to the C API and does not require any command line utilities.
available Check if the Title of a Package is Available, Appropriate and Interesting
Check if a given package name is available to use. It checks the name’s validity. Checks if it is used on ‘GitHub’, ‘CRAN’ and ‘Bioconductor’. Checks for unintended meanings by querying Urban Dictionary, ‘Wiktionary’ and Wikipedia.
aVirtualTwins Adaptation of Virtual Twins Method from Jared Foster
Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method.
AWR AWS’ Java ‘SDK’ for R
Installs the compiled Java modules of the Amazon Web Services (‘AWS’) ‘SDK’ to be used in downstream R packages interacting with ‘AWS’. See <https://…/sdk-for-java> for more information on the ‘AWS’ ‘SDK’ for Java.
AWR.Kinesis Amazon ‘Kinesis’ Consumer Application for Stream Processing
Fetching data from Amazon ‘Kinesis’ Streams using the Java-based ‘MultiLangDaemon’ interacting with Amazon Web Services (‘AWS’) for easy stream processing from R. For more information on ‘Kinesis’, see <https://…/kinesis>.
AWR.KMS A Simple Client to the ‘AWS’ Key Management Service
Encrypt plain text and ‘decrypt’ cipher text using encryption keys hosted at Amazon Web Services (‘AWS’) Key Management Service (‘KMS’), on which see <https://…/kms> for more information.
aws.alexa Client for the Amazon Alexa Web Information Services API
Use the Amazon Alexa Web Information Services API to find information about domains, including the kind of content that they carry, how popular are they—rank and traffic history, sites linking to them, among other things. See <https://…/> for more information.
aws.cloudtrail AWS CloudTrail Client Package
A simple client package for the Amazon Web Services (‘AWS’) ‘CloudTrail’ ‘API’ <https://…/>.
aws.comprehend AWS Comprehend’ Client Package
Client for ‘AWS Comprehend’ <https://…/comprehend>, a cloud natural language processing service that can perform a number of quantitative text analyses, including language detection, sentiment analysis, and feature extraction.
aws.iam AWS IAM Client Package
A simple client for the Amazon Web Services (‘AWS’) Identity and Access Management (‘IAM’) ‘API’ <https://…/>.
aws.kms AWS Key Management Service’ Client Package
Client package for the ‘AWS Key Management Service’ <https://…/>, a cloud service for managing encryption keys.
aws.polly Client for AWS Polly
A client for AWS Polly <http://…/polly>, a speech synthesis service.
aws.s3 AWS S3 Client Package
A simple client package for the Amazon Web Services (AWS) Simple Storage Service (S3) REST API <https://…/>.
aws.ses AWS SES Client Package
A simple client package for the Amazon Web Services (AWS) Simple Email Service (SES) <http://…/> REST API.
aws.signature Amazon Web Services Request Signatures
Generates request signatures for Amazon Web Services (AWS) APIs.
aws.sns AWS SNS Client Package
A simple client package for the Amazon Web Services (AWS) Simple Notification Service (SNS) API.
aws.sqs AWS SQS Client Package
A simple client package for the Amazon Web Services (AWS) Simple Queue Service (SQS) API.
aws.transcribe Client for ‘AWS Transcribe’
Client for ‘AWS Transcribe’ <https://…/transcribe>, a cloud transcription service that can convert an audio media file in English and other languages into a text transcript.
aws.translate Client for ‘AWS Translate’
A client for ‘AWS Translate’ <https://…/translate>, a machine translation service that will convert a text input in one language into a text output in another language.
awsjavasdk Boilerplate R Access to the Amazon Web Services (‘AWS’) Java SDK
Provides boilerplate access to all of the classes included in the Amazon Web Services (‘AWS’) Java Software Development Kit (SDK) via package:’rJava’. According to Amazon, the ‘SDK helps take the complexity out of coding by providing Java APIs for many AWS services including Amazon S3, Amazon EC2, DynamoDB, and more’. You can read more about the included Java code on Amazon’s website: <https://…/>.
awspack Amazon Web Services Bundle Package
A bundle of all of ‘cloudyr’ project <http://…/> packages for Amazon Web Services (‘AWS’) <https://…/>. It depends upon all of the ‘cloudyr’ project’s ‘AWS’ packages. It is mainly useful for installing the entire suite of packages; more likely than not you will only want to load individual packages one at a time.
AzureGraph Simple Interface to ‘Microsoft Graph’
A simple interface to the ‘Microsoft Graph’ API <https://…/overview>. ‘Graph’ is a comprehensive framework for accessing data in various online Microsoft services. Currently, this package aims to provide an R interface only to the ‘Azure Active Directory’ part, with a view to supporting interoperability of R and ‘Azure’: users, groups, registered apps and service principals. However it can be easily extended to cover other services.
AzureKeyVault Key and Secret Management in ‘Azure’
Manage keys, certificates, secrets, and storage accounts in Microsoft’s ‘Key Vault’ service: <https://…/key-vault>. Provides facilities to store and retrieve secrets, use keys to encrypt, decrypt, sign and verify data, and manage certificates. Integrates with the ‘AzureAuth’ package to enable authentication with a certificate, and with the ‘openssl’ package for importing and exporting.
AzureKusto Interface to ‘Kusto’/’Azure Data Explorer’
An interface to ‘Azure Data Explorer’, also known as ‘Kusto’, a fast, highly scalable data exploration service from Microsoft: <https://…/>. Includes ‘DBI’ and ‘dplyr’ interfaces, with the latter modelled after the ‘dbplyr’ package, whereby queries are translated from R into the native ‘KQL’ query language and executed lazily. On the admin side, the package extends the object framework provided by ‘AzureRMR’ to support creation and deletion of databases, and management of database principals.
AzureML Discover, Publish and Consume Web Services on Microsoft Azure Machine Learning
Provides an interface with Microsoft Azure to easily publish functions and trained models as a web service, and discover and consume web service.
AzureRMR Interface to ‘Azure Resource Manager’
A lightweight but powerful R interface to the ‘Azure Resource Manager’ REST API. The package exposes classes and methods for ‘OAuth’ authentication and working with subscriptions and resource groups. It also provides functionality for creating and deleting ‘Azure’ resources and deploying templates. While ‘AzureRMR’ can be used to manage any ‘Azure’ service, it can also be extended by other packages to provide extra functionality for specific services.

B

BACCT Bayesian Augmented Control for Clinical Trials
Implements the Bayesian Augmented Control (BAC, a.k.a. Bayesian historical data borrowing) method under clinical trial setting by calling ‘Just Another Gibbs Sampler’ (‘JAGS’) software. In addition, the ‘BACCT’ package evaluates user-specified decision rules by computing the type-I error/power, or probability of correct go/no-go decision at interim look. The evaluation can be presented numerically or graphically. Users need to have ‘JAGS’ 4.0.0 or newer installed due to a compatibility issue with ‘rjags’ package. Currently, the package implements the BAC method for binary outcome only. Support for continuous and survival endpoints will be added in future releases. We would like to thank AbbVie’s Statistical Innovation group and Clinical Statistics group for their support in developing the ‘BACCT’ package.
bacistool Bayesian Classification and Information Sharing (BaCIS) Tool for the Design of Multi-Group Phase II Clinical Trials
Provides the design of multi-group phase II clinical trials with binary outcomes using the hierarchical Bayesian classification and information sharing (BaCIS) model. Subgroups are classified into two clusters on the basis of their outcomes mimicking the hypothesis testing framework. Subsequently, information sharing takes place within subgroups in the same cluster, rather than across all subgroups. This method can be applied to the design and analysis of multi-group clinical trials with binary outcomes.
backpipe Backward Pipe Operator
Provides a backward-pipe operator for ‘magrittr’ (%<%) or ‘pipeR’ (%<<%) that allows for a performing operations from right-to-left. This is useful in instances where there is right-to-left ordering commonly observed with nested structures such as trees/directories and markup languages such as HTML and XML.
backports Reimplementations of Functions Introduced Since R-3.0.0
Provides implementations of functions which have been introduced in R since version 3.0.0. The backports are conditionally exported which results in R resolving the function names to the version shipped with R (if available) and uses the implemented backports as fallback. This way package developers can make use of the new functions without without worrying about the minimum required R version.
backShift Learning Causal Cyclic Graphs from Unknown Shift Interventions
Code for ‘backShift’, an algorithm to estimate the connectivity matrix of a directed (possibly cyclic) graph with hidden variables. The underlying system is required to be linear and we assume that observations under different shift interventions are available. For more details, see http://…/1506.02494 .
bacr Bayesian Adjustment for Confounding
Estimating the average causal effect based on the Bayesian Adjustment for Confounding (BAC) algorithm.
badger Badge for R Package
Query information and generate badge for using in README and GitHub Pages.
bain Bayes Factors for Informative Hypotheses
Computes approximated adjusted fractional Bayes factors for equality, inequality, and about equality constrained hypotheses. S3 methods are available for specific types of lm() models, namely ANOVA, ANCOVA, and multiple regression, and for the t_test(). The statistical underpinnings are described in Hoijtink, Mulder, van Lissa, and Gu, (2018) <doi:10.31234/osf.io/v3shc>, Gu, Mulder, and Hoijtink, (2018) <doi:10.1111/bmsp.12110>, Hoijtink, Gu, and Mulder, (2018) <doi:10.1111/bmsp.12145>, and Hoijtink, Gu, Mulder, and Rosseel, (2018) <doi:10.1037/met0000187>.
bairt Bayesian Analysis of Item Response Theory Models
Bayesian estimation of the two and three parameter models of item response theory (IRT). Also, it is possible to use a web interactive application intended for the making of an MCMC estimation and model-fit of the IRT models.
BalanceCheck Balance Check for Multiple Covariates in Matched Observational Studies
Two practical tests are provided for assessing whether multiple covariates in a treatment group and a matched control group are balanced in observational studies.
BALD Robust Loss Development Using MCMC
Bayesian analysis of loss development on insurance triangles or ‘BALD’ is a Bayesian model of developing aggregate loss triangles in property casualty insurance. This actuarial model makes use of a heteroskedastic and skewed t-likelihood with endogenous degrees of freedom, employs model averaging by means of Reversible Jump MCMC, and accommodates a structural break in the path of the consumption of benefits. Further, the model is capable of incorporating expert information in the calendar year effect. In an accompanying vignette, this model is applied to two widely studied General Liability and Auto Bodily Injury Liability loss triangles. For a description of the methodology, see Frank A. Schmid (2010) <doi:10.2139/ssrn.1501706>.
Ball Statistical Inference and Sure Independence Screening via Ball Statistics
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data. The ball divergence and ball covariance based distribution-free tests are implemented to examine equality of multivariate distributions and independence between random vectors of arbitrary dimensions. Furthermore, a generic non-parametric SIS procedure based on ball correlation and all of its variants are implemented to tackle the challenge in the context of ultra high dimensional data.
BAMBI Bivariate Angular Mixture Models
Fit (using Bayesian methods) and simulate mixtures of univariate and bivariate angular distributions.
bamlss Bayesian Additive Models for Location Scale and Shape (and Beyond)
R infrastructures for Bayesian regression models.
bamp Bayesian Age-Period-Cohort Modeling and Prediction
Bayesian Age-Period-Cohort Modeling and Prediction using efficient Markov Chain Monte Carlo Methods. This is the R version of the previous BAMP software as described in Volker Schmid and Leonhard Held (2007) <DOI:10.18637/jss.v021.i08> Bayesian Age-Period-Cohort Modeling and Prediction – BAMP, Journal of Statistical Software 21:8. This package includes checks of convergence using Gelman’s R.
BANFF Bayesian Network Feature Finder
Provides efficient Bayesian nonparametric models for network feature selection
bang Bayesian Analysis, No Gibbs
Provides functions for the Bayesian analysis of some simple commonly-used models, without using Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling. The ‘rust’ package <https://…/package=rust> is used to simulate a random sample from the required posterior distribution. At the moment three conjugate hierarchical models are available: beta-binomial, gamma-Poisson and a 1-way analysis of variance (ANOVA).
bannerCommenter Make Banner Comments with a Consistent Format
A convenience package for use while drafting code. It facilitates making stand-out comment lines decorated with bands of characters. The input text strings are converted into R comment lines, suitably formatted. These are then displayed in a console window and, if possible, automatically transferred to a clipboard ready for pasting into an R script. Designed to save time when drafting R scripts that will need to be navigated and maintained by other programmers.
BarBorGradient Function Minimum Approximator
Tool to find where a function has its lowest value(minimum). The functions can be any dimensions. Recommended use is with eps=10^-10, but can be run with 10^-20, although this depends on the function. Two more methods are in this package, simple gradient method (Gradmod) and Powell method (Powell). These are not recommended for use, their purpose are purely for comparison.
Barnard Barnard’s Unconditional Test
Barnard’s unconditional test for 2×2 contingency tables.
BART Bayesian Additive Regression Trees
Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09-AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.
bartMachine Bayesian Additive Regression Trees
An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization.
bartMachineJARs bartMachine JARs
These are bartMachine’s Java dependency libraries. Note: this package has no functionality of its own and should not be installed as a standalone package without bartMachine.
Barycenter Wasserstein Barycenter
Computation of a Wasserstein Barycenter. The package implements a method described in Cuturi (2014) ‘Fast Computation of Wasserstein Barycenters’. The paper is available at <http://…/cuturi14.pdf>. To speed up the computation time the main iteration step is based on ‘RcppArmadillo’.
BAS Bayesian Model Averaging using Bayesian Adaptive Sampling
Package for Bayesian Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors (JASA 2008) or mixtures of g-priors in GLMS of Li and Clyde 2015. Other model selection criteria include AIC and BIC. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Allows uniform or beta-binomial prior distributions on models, and may force variables to always be included.
basad Bayesian Variable Selection with Shrinking and Diffusing Priors
Provides a Bayesian variable selection approach using continuous spike and slab prior distributions. The prior choices here are motivated by the shrinking and diffusing priors studied in Narisetty & He (2014) <DOI:10.1214/14-AOS1207>.
base2grob Convert Base Plot to ‘grob’ Object
Convert base plot function call (using expression or formula) to ‘grob’ object that compatible to the ‘grid’ ecosystem. With this package, we are able to e.g. using ‘cowplot’ to align base plots with ‘ggplot’ objects and using ‘ggsave’ to export base plot to file.
base64url Fast and URL-Safe Base64 Encoder and Decoder
In contrast to RFC3548, the 62nd character (‘+’) is replaced with ‘-‘, the 63rd character (‘/’) is replaced with ‘_’. Furthermore, the encoder does not fill the string with trailing ‘=’. The resulting encoded strings comply to the regular expression pattern ‘[A-Za-z0-9_-]’ and thus are safe to use in URLs or for file names.
basefun Infrastructure for Computing with Basis Functions
Some very simple infrastructure for basis functions.
basicMCMCplots Trace Plots, Density Plots and Chain Comparisons for MCMC Samples
Provides a function for examining posterior MCMC samples from a single chain using trace plots and density plots, and from multiple chains by comparing posterior medians and credible intervals from each chain. These plotting functions have a variety of options, such as figure sizes, legends, parameters to plot, and saving plots to file. Functions interface with the NIMBLE software package, see de Valpine, Turek, Paciorek, Anderson-Bergman, Temple Lang and Bodik (2017) <doi:10.1080/10618600.2016.1172487>.
basicspace Recovering a Basic Space from Issue Scales
Conducts Aldrich-McKelvey and Blackbox Scaling (Poole et al 2016) <doi:10.18637/jss.v069.i07> to recover latent dimensions of judgment.
basictabler Construct Rich Tables for Output to ‘HTML’/’Excel’
Easily create tables from data frames/matrices. Create/manipulate tables row-by-row, column-by-column or cell-by-cell. Use common formatting/styling to output rich tables as ‘HTML’, ‘HTML widgets’ or to ‘Excel’.
basicTrendline Add Trendline of Basic Regression Models to Plot
Add trendline of basic linear or nonlinear regression models and show equation to plot as simple as possible.
basket Basket Trial Analysis
Implementation of multisource exchangeability models for Bayesian analyses of prespecified subgroups arising in the context of basket trial design and monitoring. The R ‘basket’ package facilitates implementation of the binary, symmetric multi-source exchangeability model (MEM) with posterior inference arising from both exact computation and Markov chain Monte Carlo sampling. Analysis output includes full posterior samples as well as posterior probabilities, highest posterior density (HPD) interval boundaries, effective sample sizes (ESS), mean and median estimations, posterior exchangeability probability matrices, and maximum a posteriori MEMs. In addition to providing ‘basketwise’ analyses, the package includes similar calculations for ‘clusterwise’ analyses for which subgroups are combined into meta-baskets, or clusters, using graphical clustering algorithms that treat the posterior exchangeability probabilities as edge weights. In addition plotting tools are provided to visualize basket and cluster densities as well as their exchangeability. References include Hyman, D.M., Puzanov, I., Subbiah, V., Faris, J.E., Chau, I., Blay, J.Y., Wolf, J., Raje, N.S., Diamond, E.L., Hollebecque, A. and Gervais, R (2015) <doi:10.1056/NEJMoa1502309>; Hobbs, B.P. and Landin, R. (2018) <doi:10.1002/sim.7893>; Hobbs, B.P., Kane, M.J., Hong, D.S. and Landin, R. (2018) <doi:10.1093/annonc/mdy457>; and Kaizer, A.M., Koopmeiners, J.S. and Hobbs, B.P. (2017) <doi:10.1093/biostatistics/kxx031>.
BASS Bayesian Adaptive Spline Surfaces
Bayesian fitting and sensitivity analysis methods for adaptive spline surfaces. Built to handle continuous and categorical inputs as well as functional or scalar output. An extension of the methodology in Denison, Mallick and Smith (1998) <doi:10.1023/A:1008824606259>.
bastah Big Data Statistical Analysis for High-Dimensional Models
Big data statistical analysis for high-dimensional models is made possible by modifying lasso.proj() in ‘hdi’ package by replacing its nodewise-regression with sparse precision matrix computation using ‘BigQUIC’.
BatchExperiments Statistical Experiments on Batch Computing Clusters
Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
BatchGetSymbols Downloads and Organizes Financial Data for Multiple Tickers
Makes it easy to download a large number of trade data from Yahoo or Google Finance.
BatchJobs Batch Computing with R
Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.
batchscr Batch Script Helpers
Handy frameworks, such as error handling and log generation, for batch scripts. Use case: in scripts running in remote servers, set error handling mechanism for downloading and uploading and record operation log.
batchtools Tools for Computation on Batch Systems
As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (<http://…/> ), ‘OpenLava’ (<http://…/> ), ‘Univia Grid Engine’/’Oracle Grid Engine’ (<http://…/> ), ‘Slurm’ (<http://…/> ), ‘Torque/PBS’ (<http://…/> ), or ‘Docker Swarm’ (<https://…/> ). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.
BaTFLED3D Bayesian Tensor Factorization Linked to External Data
BaTFLED is a machine learning algorithm designed to make predictions and determine interactions in data that varies along three independent modes. For example BaTFLED was developed to predict the growth of cell lines when treated with drugs at different doses. The first mode corresponds to cell lines and incorporates predictors such as cell line genomics and growth conditions. The second mode corresponds to drugs and incorporates predictors indicating known targets and structural features. The third mode corresponds to dose and there are no dose-specific predictors (although the algorithm is capable of including predictors for the third mode if present). See ‘BaTFLED3D_vignette.rmd’ for a simulated example.
batteryreduction An R Package for Data Reduction by Battery Reduction
Battery reduction is a method used in data reduction. It uses Gram-Schmidt orthogonal rotations to find out a subset of variables best representing the original set of variables.
bayesAB Fast Bayesian Methods for AB Testing
bayesAB provides a suite of functions that allow the user to analyze A/B test data in a Bayesian framework. bayesAB is intended to be a drop-in replacement for common frequentist hypothesis test such as the t-test and chi-sq test. Bayesian methods provide several benefits over frequentist methods in the context of A/B tests – namely in interpretability. Instead of p-values you get direct probabilities on whether A is better than B (and by how much). Instead of point estimates your posterior distributions are parametrized random variables which can be summarized any number of ways. Bayesian tests are also immune to ‘peeking’ and are thus valid whenever a test is stopped.
bayesammi Bayesian Estimation of the Additive Main Effects and Multiplicative Interaction Model
Performs Bayesian estimation of the additive main effects and multiplicative interaction (AMMI) model. The method is explained in Crossa, J., Perez-Elizalde, S., Jarquin, D., Cotes, J.M., Viele, K., Liu, G. and Cornelius, P.L. (2011) (<doi:10.2135/cropsci2010.06.0343>).
BayesBinMix Bayesian Estimation of Mixtures of Multivariate Bernoulli Distributions
Fully Bayesian inference for estimating the number of clusters and related parameters to heterogeneous binary data.
bayesboot An Implementation of Rubin’s (1981) Bayesian Bootstrap
Functions for performing the Bayesian bootstrap as introduced by Rubin (1981) <doi:10.1214/aos/1176345338> and for summarizing the result. The implementation can handle both summary statistics that works on a weighted version of the data and summary statistics that works on a resampled data set.
BayesBridge Bridge Regression
Bayesian bridge regression.
bayesCL Bayesian Inference on a GPU using OpenCL
Bayesian Inference on a GPU. The package currently supports sampling from PolyaGamma, Multinomial logit and Bayesian lasso.
BayesCombo Bayesian Evidence Combination
Combine diverse evidence across multiple studies to test a high level scientific theory. The methods can also be used as an alternative to a standard meta-analysis.
BayesCTDesign Two Arm Bayesian Clinical Trial Design with and Without Historical Control Data
A set of functions to help clinical trial researchers calculate power and sample size for two-arm Bayesian randomized clinical trials that do or do not incorporate historical control data. At some point during the design process, a clinical trial researcher who is designing a basic two-arm Bayesian randomized clinical trial needs to make decisions about power and sample size within the context of hypothesized treatment effects. Through simulation, the simple_sim() function will estimate power and other user specified clinical trial characteristics at user specified sample sizes given user defined scenarios about treatment effect,control group characteristics, and outcome. If the clinical trial researcher has access to historical control data, then the researcher can design a two-arm Bayesian randomized clinical trial that incorporates the historical data. In such a case, the researcher needs to work through the potential consequences of historical and randomized control differences on trial characteristics, in addition to working through issues regarding power in the context of sample size, treatment effect size, and outcome. If a researcher designs a clinical trial that will incorporate historical control data, the researcher needs the randomized controls to be from the same population as the historical controls. What if this is not the case when the designed trial is implemented? During the design phase, the researcher needs to investigate the negative effects of possible historic/randomized control differences on power, type one error, and other trial characteristics. Using this information, the researcher should design the trial to mitigate these negative effects. Through simulation, the historic_sim() function will estimate power and other user specified clinical trial characteristics at user specified sample sizes given user defined scenarios about historical and randomized control differences as well as treatment effects and outcomes. The results from historic_sim() and simple_sim() can be printed with print_table() and graphed with plot_table() methods. Outcomes considered are Gaussian, Poisson, Bernoulli, Lognormal, Weibull, and Piecewise Exponential.
bayesdfa Bayesian Dynamic Factor Analysis (DFA) with ‘Stan’
Implements Bayesian dynamic factor analysis with ‘Stan’. Dynamic factor analysis is a dimension reduction tool for multivariate time series. ‘bayesdfa’ extends conventional dynamic factor models in several ways. First, extreme events may be estimated in the latent trend by modeling process error with a student-t distribution. Second, autoregressive and moving average components can be optionally included. Third, the estimated dynamic factors can be analyzed with hidden Markov models to evaluate support for latent regimes.
bayesDP Tools for the Bayesian Discount Prior Function
Functions for augmenting data with historical controls using the Bayesian discount prior function for 1 arm and 2 arm clinical trials.
BayesESS Determining Effective Sample Size
Determines effective sample size of a parametric prior distribution in Bayesian models. To learn more about Bayesian effective sample size, see: Morita, S., Thall, P. F., & Muller, P. (2008) <https://…/25502095>.
BayesFactor Computation of Bayes Factors for Common Designs
A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.
BayesFactorExtras Extra functions for use with the BayesFactor R package
BayesFactorExtras is an R package which contains extra features related to the BayesFactor package, such as plots and analysis reports.
BayesFM Bayesian Inference for Factor Modeling
Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: Bayesian Exploratory Factor Analysis (befa), an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling. More approaches will be included in future releases of this package.
BayesGOF Bayesian Modeling via Goodness of Fit
Non-parametric method for learning prior distribution starting with parametric (subjective) prior. It performs four interconnected tasks: (i) characterizes the uncertainty of the elicited prior; (ii) exploratory diagnostic for checking prior-data conflict; (iii) computes the final statistical prior density estimate; and (iv) performs macro- and micro-inference. Primary reference is Mukhopadhyay, S. and Fletcher, D. (2017, Technical Report).
BayesH Bayesian Regression Model with Mixture of Two Scaled Inverse Chi Square as Hyperprior
Functions to performs Bayesian regression model with mixture of two scaled inverse chi square as hyperprior distribution for variance of each regression coefficient.
BayesianFROC FROC Analysis by Bayesian Approaches
For details please see vignettes in this package. This package aims to provide new methods for the so-called Free-response Receiver Operating Characteristic (FROC) analysis. The ultimate aim of FROC analysis is to compare observer performances, which means comparing characteristics, such as area under the curve (AUC) or figure of merit (FOM). In this package, we only use the notion of AUC for modality comparison. In the radiological FROC context, by a word modality we mean imaging methods such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT),Positron Emission Tomography (PET). So there is a problem that which imaging method is better to detect the lesions from shadows in radiographs. To solve the modality comparison issues, this package provides the new methods using hierarchical Bayesian models proposed by the author of this package. Using this package, one can obtain at least one conclusion that which imaging methods are better for finding lesions in radiographs with the case of your data. Fitting FROC statistical models is sometimes not so good, it can easily confirm by drawing FROC curves and comparing the curves and the points constructed by False Positive fractions (FPFs) and True Positive Fractions (TPFs), we can validate the goodness of fit intuitively. Such validation is also implemented by the Chi square goodness of fit statistics in the Bayesian context in which the parameter is not deterministic, thus by integrating it with the posterior predictive measure, we get a desired value. To compare each imaging methods, i.e., modalities, we evaluate the AUC for each modality which gives us a comparison of modalities. FROC is developed by Dev Chakraborty, his FROC model in 1989 paper relies on the maximal likelihood methodology. In this package, I modified and provided the alternative Bayesian FROC model. Strictly speaking, his model does not coincide with models in this package. I hope that medical researchers use not only the frequentist method but also alternative Bayesian methods. In medical research, many problems are considered under only frequentist methods, such as the notion of p-values. But p-value is sometimes misunderstood. Bayesian methods provide very simple, direct, intuitive answer for research questions. To know how to use this package, please execute the following codes from the R (R studio) console, demo(demo_MRMC,package=’BayesianFROC’); demo(demo_srsc, package=’BayesianFROC’); demo(demo_stan, package = ‘BayesianFROC’); demo(demo_drawcurves_srsc, package=’BayesianFROC’); demo_Bayesian_FROC(). References: Dev Chakraborty (1989) <doi:10.1118/1.596358> Maximum likelihood analysis of free – response receiver operating characteristic (FROC) data. Pre-print: Issei Tsunoda; Bayesian Models for free-response receiver operating characteristic analysis. Combining frequentist methods with Bayesian methods, we can obtain more reliable answer for research questions.
BayesianGLasso Bayesian Graphical Lasso
Implements a data-augmented block Gibbs sampler for simulating the posterior distribution of concentration matrices for specifying the topology and parameterization of a Gaussian Graphical Model (GGM). This sampler was originally proposed in Wang (2012) <doi:10.1214/12-BA729>.
BayesianNetwork Bayesian Network Modeling and Analysis
A Shiny web application for creating interactive Bayesian Network models, learning the structure and parameters of Bayesian networks, and utilities for classical network analysis.
BayesianTools General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
General-purpose MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. Implemented samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.
bayesImageS Bayesian Methods for Image Segmentation using a Potts Model
Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, and approximate Bayesian computation (ABC).
BayesLCA Bayesian Latent Class Analysis
Bayesian Latent Class Analysis using several different methods.
bayeslm Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors
Efficient sampling for Gaussian linear regression with arbitrary priors.
bayesloglin Bayesian Analysis of Contingency Table Data
The function MC3() searches for log-linear models with the highest posterior probability. The function gibbsSampler() is a blocked Gibbs sampler for sampling from the posterior distribution of the log-linear parameters. The functions findPostMean() and findPostCov() compute the posterior mean and covariance matrix for decomposable models which, for these models, is available in closed form.
bayeslongitudinal Adjust Longitudinal Regression Models Using Bayesian Methodology
Adjusts longitudinal regression models using Bayesian methodology for covariance structures of composite symmetry (SC), autoregressive ones of order 1 AR (1) and autoregressive moving average of order (1,1) ARMA (1,1).
BayesMAMS Designing Bayesian Multi-Arm Multi-Stage Studies
Calculating Bayesian sample sizes for multi-arm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.
bayesmeta Bayesian Random-Effects Meta-Analysis
A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, etc.
BayesNetBP Bayesian Network Belief Propagation
Belief propagation methods in Bayesian Networks to propagate evidence through the network. The implementation of these methods are based on the article: Cowell, RG (2005). Local Propagation in Conditional Gaussian Bayesian Networks <http://…/>.
BayesPiecewiseICAR Hierarchical Bayesian Model for a Hazard Function
Fits a piecewise exponential hazard to survival data using a Hierarchical Bayesian model with an Intrinsic Conditional Autoregressive formulation for the spatial dependency in the hazard rates for each piece. This function uses Metropolis- Hastings-Green MCMC to allow the number of split points to vary. This function outputs graphics that display the histogram of the number of split points and the trace plots of the hierarchical parameters. The function outputs a list that contains the posterior samples for the number of split points, the location of the split points, and the log hazard rates corresponding to these splits. Additionally, this outputs the posterior samples of the two hierarchical parameters, Mu and Sigma^2.
bayesplot Plotting for Bayesian Models
Plotting functions for posterior analysis, model checking, and MCMC diagnostics. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with Stan.
bayesreg Bayesian Regression Models with Continuous Shrinkage Priors
Fits linear or logistic regression model using Bayesian continuous shrinkage prior distributions. Handles ridge, lasso, horseshoe and horseshoe+ regression with logistic, Gaussian, Laplace or Student-t distributed targets.
Bayesrel Bayesian Reliability Estimation
So far, it provides the most common single test reliability estimates, being: Coefficient Alpha, Guttman’s lambda-2/-4/-6, greatest lower bound and Mcdonald’s Omega. The Bayesian estimates are provided with credible intervals. The method for the Bayesian estimates, except for omega, is sampling from the posterior inverse Wishart for the covariance matrix based measures. See Murphy (2007) <https://…/murphy-2007.pdf>. Gibbs Sampling from the joint conditional distributions of a single factor model in the case of omega. See Lee (2007, ISBN:978-0-470-02424-9). Methods for the glb are from Moltner and Revelle (2018) <https://…/glb.algebraic>; lambda-4 is from Benton (2015) <doi:10.1007/978-3-319-07503-7_19>; the principal factor analysis is from Schlegel (2017) <https://…/>; and the analytic alpha interval is from Bonnett and Wright (2014) <doi:10.1002/job.1960>.
BayesRS Bayes Factors for Hierarchical Linear Models with Continuous Predictors
Runs hierarchical linear Bayesian models. Samples from the posterior distributions of model parameters in JAGS (Just Another Gibbs Sampler; Plummer, 2003, <http://…/> ). Computes Bayes factors for group parameters of interest with the Savage-Dickey density ratio (Wetzels, Raaijmakers, Jakab, Wagenmakers, 2009, <doi:10.3758/PBR.16.4.752>).
BayesS5 Bayesian Variable Selection Using Simplified Shotgun Stochastic Search with Screening (S5)
In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities).
BayesSenMC Different Models of Posterior Distributions of Adjusted Odds Ratio
Generates different posterior distributions of adjusted odds ratio under different priors of sensitivity and specificity, and plots the models for comparison. It also provides estimations for the specifications of the models using diagnostics of exposure status with a non-linear mixed effects model. It implements the methods that are first proposed by Chu et al. (2006) <doi:10.1016/j.annepidem.2006.04.001> and Chu et al. (2010) <doi:10.1177/0272989X09353452>.
BayesSpec Bayesian Spectral Analysis Techniques
An implementation of methods for spectral analysis using the Bayesian framework. It includes functions for modelling spectrum as well as appropriate plotting and output estimates. There is segmentation capability with RJ MCMC (Reversible Jump Markov Chain Monte Carlo). The package takes these methods predominantly from the 2012 paper ‘AdaptSPEC: Adaptive Spectral Estimation for Nonstationary Time Series’ <DOI:10.1080/01621459.2012.716340>.
BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics
Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function read.regress.data.ff utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.
bayestestR Understand and Describe Bayesian Models and Posterior Distributions
Provides utilities to describe posterior distributions and Bayesian models. It includes point-estimates such as Maximum A Posteriori (MAP), measures of dispersion (Highest Density Interval – HDI; Kruschke, 2014 <doi:10.1016/B978-0-12-405888-0.09999-2>) and indices used for null-hypothesis testing (such as ROPE percentage and pd).
BayesTree Bayesian Additive Regression Trees
Implementation of BART:Bayesian Additive Regression Trees, Chipman, George, McCulloch (2010)
BayesTreePrior Bayesian Tree Prior Simulation
Provides a way to simulate from the prior distribution of Bayesian trees by Chipman et al. (1998) <DOI:10.2307/2669832>. The prior distribution of Bayesian trees is highly dependent on the design matrix X, therefore using the suggested hyperparameters by Chipman et al. (1998) <DOI:10.2307/2669832> is not recommended and could lead to unexpected prior distribution. This work is part of my master thesis (In revision, expected 2016) and a journal publication I’m working on.
BayesVarSel Bayes Factors, Model Choice and Variable Selection in Linear Models
Conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980)<doi:10.1007/bf02888369>; Zellner and Siow(1984); Zellner (1986)<doi:10.2307/2233941>; Fernandez et al. (2001)<doi:10.1016/s0304-4076(00)00076-2>; Liang et al. (2008)<doi:10.1198/016214507000001337> and Bayarri et al. (2012)<doi:10.1214/12-aos1013>. The interaction with the package is through a friendly interface that syntactically mimics the well-known lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true -data generating- model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, Garcia-Donato and Martinez-Beneito (2013)<doi:10.1080/01621459.2012.742443>.
bayesvl Visually Learning the Graphical Structure of Bayesian Networks and Performing MCMC with ‘Stan’
Provides users with its associated functions for pedagogical purposes in visually learning Bayesian networks and Markov chain Monte Carlo (MCMC) computations. It enables users to: a) Create and examine the (starting) graphical structure of Bayesian networks; b) Create random Bayesian networks using a dataset with customized constraints; c) Generate ‘Stan’ code for structures of Bayesian networks for sampling the data and learning parameters; d) Plot the network graphs; e) Perform Markov chain Monte Carlo computations and produce graphs for posteriors checks. The package refers to one reference item, which describes the methods and algorithms: Vuong, Quan-Hoang and La, Viet-Phuong (2019) <doi:10.31219/osf.io/w5dx6> The ‘bayesvl’ R package. Open Science Framework (May 18).
bayfoxr Global Bayesian Foraminifera Core Top Calibration
A Bayesian, global planktic foraminifera core top calibration to modern sea-surface temperatures. Includes four calibration models, considering species-specific calibration parameters and seasonality.
bazar Miscellaneous Basic Functions
A collection of miscellaneous functions for copying objects to the clipboard (‘Copy’); manipulating strings (‘concat’, ‘mgsub’, ‘trim’, ‘verlan’); loading or showing packages (‘library_with_rep’, ‘require_with_rep’, ‘sessionPackages’); creating or testing for named lists (‘nlist’, ‘as.nlist’, ‘is.nlist’), formulas (‘is.formula’), empty objects (‘as.empty’, ‘is.empty’), whole numbers (‘as.wholenumber’, ‘is.wholenumber’); testing for equality (‘almost.equal’, ‘almost.zero’); getting modified versions of usual functions (‘rle2’, ‘sumNA’); making a pause or a stop (‘pause’, ‘stopif’); and others (‘erase’, ‘%nin%’, ‘unwhich’).
bbw Blocked Weighted Bootstrap
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys). The method was developed by Accion Contra la Faim, Brixton Health, Concern Worldwide, Global Alliance for Improved Nutrition, UNICEF Sierra Leone, UNICEF Sudan and Valid International. It has been tested by the Centers for Disease Control (CDC) using infant and young child feeding (IYCF) data. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.
BCEA Bayesian Cost Effectiveness Analysis
Produces an economic evaluation of a Bayesian model in the form of MCMC simulations. Given suitable variables of cost and effectiveness / utility for two or more interventions, BCEA computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis.
BCEE The Bayesian Causal Effect Estimation Algorithm
Implementation of the Bayesian Causal Effect Estimation algorithm, a data-driven method for the estimation of the causal effect of a continuous exposure on a continuous outcome. For more details, see Talbot et al. (2015).
bcf Causal Inference for a Binary Treatment and Continuous Outcome using Bayesian Causal Forests
Causal inference for a binary treatment and continuous outcome using Bayesian Causal Forests. See Hahn, Murray and Carvalho (2017) <arXiv:1706.09523> for additional information. This implementation relies on code originally accompanying Pratola et. al. (2013) <arXiv:1309.1906>.
bcgam Bayesian Constrained Generalised Linear Models
Fits generalised partial linear regression models using a Bayesian approach, where shape and smoothness constraints are imposed on nonparametrically modelled predictors through shape-restricted splines, and no constraints are imposed on optional parametrically modelled covariates. See Meyer et al. (2011) <doi/10.1080/10485252.2011.597852> for more details. IMPORTANT: before installing ‘bcgam’, you need to install ‘Rtools’ (Windows) or ‘Xcode’ (Mac OS X). These are required for the correct installation of ‘nimble’ (<https://…/download> ).
bcpa Behavioral change point analysis of animal movement
The Behavioral Change Point Analysis (BCPA) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. The method is based on: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) Ecology Letters 12:5 395-408.
bcROCsurface Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests
The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption.
bcrypt Blowfish’ Password Hashing Algorithm
An R interface to the ‘OpenBSD Blowfish’ password hashing algorithm, as described in ‘A Future-Adaptable Password Scheme’ by ‘Niels Provos’. The implementation is derived from the ‘py-bcrypt’ module for Python which is a wrapper for the ‘OpenBSD’ implementation.
bcs Bayesian Compressive Sensing Using Laplace Priors
A Bayesian method for solving the compressive sensing problem. In particular, this package implements the algorithm ‘Fast Laplace’ found in the paper ‘Bayesian Compressive Sensing Using Laplace Priors’ by Babacan, Molina, Katsaggelos (2010) <DOI:10.1109/TIP.2009.2032894>.
bdchecks Biodiversity Data Checks
Supplies a Shiny app and a set of functions to perform and managing data checks for biodiversity data.
bdclean A User-Friendly Biodiversity Data Cleaning App for the Inexperienced R User
Provides features to manage the complete workflow for biodiversity data cleaning. Uploading data, gathering input from users (in order to adjust cleaning procedures), cleaning data and finally, generating various reports and several versions of the data. Facilitates user-level data cleaning, designed for the inexperienced R user. T Gueta et al (2018) <doi:10.3897/biss.2.25564>. T Gueta et al (2017) <doi:10.3897/tdwgproceedings.1.20311>.
BDEsize Efficient Determination of Sample Size in Balanced Design of Experiments
Provides the sample size in balanced design of experiments and three graphs ; detectable standardized effect size vs power, sample size vs detectable standardized effect size, and sample size vs power. Sample size is computed in order to detect a certain standardized effect size with power at the significance level. Three graphs show the mutual relationship between the sample size, power and the detectable standardized effect size. By investigating those graphs, it can be checked that which effects are sensitive to the efficient sample size determination. Lenth,R.V.(2006-9) <http://…/Power> Lim, Yong Bin (1998) Marvin, A., Kastenbaum, A. and Hoel, D.G. (1970) <doi:10.2307/2334851> Montgomery, Douglas C. (2013, ISBN: 0849323312).
BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC
Provides statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889> and Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.
bdlp Transparent and Reproducible Artificial Data Generation
The main function generateDataset() processes a user-supplied .R file that contains metadata parameters in order to generate actual data. The metadata parameters have to be structured in the form of metadata objects, the format of which is outlined in the package vignette. This approach allows to generate artificial data in a transparent and reproducible manner.
bdots Bootstrapped Differences of Time Series
Analyze differences among time series curves with Oleson et al’s modified p-value technique.
bdpopt Optimisation of Bayesian Decision Problems
Optimisation of the expected utility in single-stage and multi-stage Bayesian decision problems. The expected utility is estimated by simulation. For single-stage problems, JAGS is used to draw MCMC samples.
bdvis Biodiversity Data Visualizations
Biodiversity data visualizations using R would be helpful to understand completeness of biodiversity inventory, extent of geographical, taxonomic and temporal coverage, gaps and biases in data.
BDWreg Bayesian Inference for Discrete Weibull Regression
A Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit.
BE Bioequivalence Study Data Analysis
Analyze bioequivalence study data in a industrial strength. Sample size could be determined for various crossover designs, such as 2×2 design, 2×4 design, 4×4 design, Balaam design, Two-sequence dual design, and William design. Reference: Chow SC, Liu JP. Design and Analysis of Bioavailability and Bioequivalence Studies. 3rd ed. (2009, ISBN:978-1-58488-668-6).
beam Fast Bayesian Inference in Large Gaussian Graphical Models
Fast Bayesian inference of marginal and conditional independence structures between variables from high-dimensional data (Leday and Richardson (2018) <arXiv:1803.08155>).
beast Bayesian Estimation of Change-Points in the Slope of Multivariate Time-Series
Assume that a temporal process is composed of contiguous segments with differing slopes and replicated noise-corrupted time series measurements are observed. The unknown mean of the data generating process is modelled as a piecewise linear function of time with an unknown number of change-points. The package infers the joint posterior distribution of the number and position of change-points as well as the unknown mean parameters per time-series by MCMC sampling. A-priori, the proposed model uses an overfitting number of mean parameters but, conditionally on a set of change-points, only a subset of them influences the likelihood. An exponentially decreasing prior distribution on the number of change-points gives rise to a posterior distribution concentrating on sparse representations of the underlying sequence, but also available is the Poisson distribution. See Papastamoulis et al (2017) <arXiv:1709.06111> for a detailed presentation of the method.
beginr Functions for R Beginners
Useful functions for R beginners, including hints for the arguments of the ‘plot()’ function, self-defined functions for error bars, user-customized pair plots and hist plots, enhanced linear regression figures, etc.. This package could be helpful to R experts as well.
behaviorchange Tools for Behavior Change Researchers and Professionals
Contains specialised analyses and visualisation tools for behavior change science. These facilitate conducting determinant studies (for example, using confidence interval-based estimation of relevance, CIBER, or CIBERlite plots) and systematically developing, reporting, and analysing interventions (for example, using acyclic behavior change diagrams). This package is especially useful for researchers in the field of behavior change or health psychology and to behavior change professionals such as intervention developers and prevention workers.
belg Boltzmann Entropy of a Landscape Gradient
Calculates the Boltzmann entropy of a landscape gradient. It uses the analytical method created by Gao, P., Zhang, H. and Li, Z., 2018 (<doi:10.1111/tgis.12315>).
benchr High Precise Measurement of R Expressions Execution Time
Provides infrastructure to accurately measure and compare the execution time of R expressions.
bentcableAR Bent-Cable Regression for Independent Data or Autoregressive Time Series
Included are two main interfaces for fitting and diagnosing bent-cable regressions for autoregressive time-series data or independent data (time series or otherwise): ‘bentcable.ar()’ and ‘bentcable.dev.plot()’. Some components in the package can also be used as stand-alone functions. The bent cable (linear-quadratic-linear) generalizes the broken stick (linear-linear), which is also handled by this package. Version 0.2 corrects a glitch in the computation of confidence intervals for the CTP. References that were updated from Versions 0.2.1 and 0.2.2 appear in Version 0.2.3 and up. Version 0.3.0 improves robustness of the error-message producing mechanism. It is the author’s intention to distribute any future updates via GitHub.
Bergm Bayesian Exponential Random Graph Models
Set of tools to analyse Bayesian exponential random graph models.
BeSS Best Subset Selection for Sparse Generalized Linear Model and Cox Model
An implementation of best subset selection in generalized linear model and Cox proportional hazard model via the primal dual active set algorithm. The algorithm formulates coefficient parameters and residuals as primal and dual variables and utilizes efficient active set selection strategies based on the complementarity of the primal and dual variables.
bestNormalize Normalizing Transformation Functions
Estimate a suite of normalizing transformations, including a new technique based on ranks which can guarantee normally distributed transformed data if there are no ties: Ordered Quantile Normalization. The package is built to estimate the best normalizing transformation for a vector consistently and accurately. It implements the Box-Cox transformation, the Yeo-Johnson transformation, three types of Lambert WxF transformations, and the Ordered Quantile normalization transformation.
betaboost Boosting Beta Regression
Implements boosting beta regression for potentially high-dimensional data (Mayr et al., 2018 <doi:10.1093/ije/dyy093>). The ‘betaboost’ package uses the same parametrization as ‘betareg’ (Cribari-Neto and Zeileis, 2010 <doi:10.18637/jss.v034.i02>) to make results directly comparable. The underlying algorithms are implemented via the R add-on packages ‘mboost’ (Hofner et al., 2014 <doi:10.1007/s00180-012-0382-5>) and ‘gamboostLSS’ (Mayr et al., 2012 <doi:10.1111/j.1467-9876.2011.01033.x>).
betacal Beta Calibration
Fit beta calibration models and obtain calibrated probabilities from them.
betas Standardized Beta Coefficients
Computes standardized beta coefficients and corresponding standard errors for the following models: – linear regression models with numerical covariates only – linear regression models with numerical and factorial covariates – weighted linear regression models – robust linear regression models with numerical covariates only.
beyondWhittle Bayesian Spectral Inference for Stationary Time Series
Implementations of a Bayesian parametric (autoregressive), a Bayesian nonparametric (Whittle likelihood with Bernstein-Dirichlet prior) and a Bayesian semiparametric (autoregressive likelihood with Bernstein-Dirichlet correction) procedure are provided. The work is based on the corrected parametric likelihood by C. Kirch et al (2017) <arXiv:1701.04846>. It was supported by DFG grant KI 1443/3-1.
bfork Basic Unix Process Control
Wrappers for fork()/waitpid() meant to allow R users to quickly and easily fork child processes and wait for them to finish.
bfp Bayesian Fractional Polynomials
Implements the Bayesian paradigm for fractional polynomial models under the assumption of normally distributed error terms.
bgeva Binary Generalized Extreme Value Additive Models
Routine for fitting regression models for binary rare events with linear and nonlinear covariate effects when using the quantile function of the Generalized Extreme Value random variable.
BGLR Bayesian Generalized Linear Regression
Bayesian Generalized Linear Regression.
bgsmtr Bayesian Group Sparse Multi-Task Regression
Fits a Bayesian group-sparse multi-task regression model using Gibbs sampling. The hierarchical prior encourages shrinkage of the estimated regression coefficients at both the gene and SNP level. The model has been applied successfully to imaging phenotypes of dimension up to 100; it can be used more generally for multivariate (non-imaging) phenotypes.
BH Boost C++ Header Files
Boost provides free peer-reviewed portable C++ source libraries. A large part of Boost is provided as C++ template code which is resolved entirely at compile-time without linking. This package aims to provide the most useful subset of Boost libraries for template use among CRAN package. By placing these libraries in this package, we offer a more efficient distribution system for CRAN as replication of this code in the sources of other packages is avoided.
BHPMF Uncertainty Quantified Matrix Completion using Bayesian Hierarchical Matrix Factorization
Fills the gaps of a matrix incorporating a hierarchical side information while providing uncertainty quantification.
bhrcr Bayesian Hierarchical Regression on Clearance Rates in the Presence of Lag and Tail Phases
An implementation of the Bayesian Clearance Estimator (Fogarty et al. (2015) <doi:10.1111/biom.12307>). It takes serial measurements of a response on an individual (e.g., parasite load after treatment) that is decaying over time and performs Bayesian hierarchical regression of the clearance rates on the given covariates. This package provides tools to calculate WWARN PCE (WorldWide Antimalarial Resistance Network’s Parasite Clearance Estimator) estimates of the clearance rates as well.
bib2df Parse a BibTeX File to a Tibble
Parse a BibTeX file to a tidy tibble (trimmed down version of data.frame) to make it accessible for further analysis and visualization.
BiBitR R Wrapper for Java Implementation of BiBit
A simple R wrapper for the Java BiBit algorithm from ‘A biclustering algorithm for extracting bit-patterns from binary datasets’ from Domingo et al. (2011) <DOI:10.1093/bioinformatics/btr464>. An adaption for the BiBit algorithm which allows noise in the biclusters is also included.
BibPlots Plot Functions for JIF (Journal Impact Factor) and Paper Percentiles
Currently, the package provides two functions for plotting and analyzing bibliometric data (JIF and paper percentile values). Further extension to more plot variants is planned.
biclique Maximal Complete Bipartite Graphs
A tool for enumerating maximal complete bipartite graphs. The input should be a edge list file or a binary matrix file. The output are maximal complete bipartite graphs. Algorithms used can be found in this paper Y Zhang et al. BMC Bioinformatics 2014 15:110 <doi:10.1186/1471-2105-15-110>.
bife Binary Choice Models with Fixed Effects
Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with a bias-correction proposed by Hahn and Newey (2004) <doi:10.1111/j.1468-0262.2004.00533.x>.
BIGDAWG Case-Control Analysis of Multi-Allelic Loci
Data sets and functions for chi-squared Hardy-Weinberg and case-control association tests of highly polymorphic genetic data [e.g., human leukocyte antigen (HLA) data]. Performs association tests at multiple levels of polymorphism (haplotype, locus and HLA amino-acids) as described in Pappas DJ, Marin W, Hollenbach JA, Mack SJ (2016) <doi:10.1016/j.humimm.2015.12.006>. Combines rare variants to a common class to account for sparse cells in tables as described by Hollenbach JA, Mack SJ, Thomson G, Gourraud PA (2012) <doi:10.1007/978-1-61779-842-9_14>.
bigdist Store Distance Matrices on Disk
Provides utilities to compute, store and access distance matrices on disk as file-backed matrices provided by the ‘bigstatsr’ package. File-backed distance matrices are stored as a symmetric matrix to facilitate out-of-memory operations on file-backed matrix while the in-memory ‘dist’ object stores only the lower diagonal elements. ‘disto’ provides an unified interface to work with in-memory and disk-based distance matrices.
bigFastlm Fast Linear Models for Objects from the ‘bigmemory’ Package
A reimplementation of the fastLm() functionality of ‘RcppEigen’ for big.matrix objects for fast out-of-memory linear model fitting.
bigIntegerAlgos R Tool for Factoring Big Integers
Features the multiple polynomial quadratic sieve algorithm for factoring large integers and a vectorized factoring function that returns the complete factorization of an integer. Utilizes the C library GMP (GNU Multiple Precision Arithmetic) and classes created by Antoine Lucas et al. found in the ‘gmp’ package.
bigKRLS Optimized Kernel Regularized Least Squares
Functions for Kernel-Regularized Least Squares optimized for speed and memory usage are provided along with visualization tools. For working papers, sample code, and recent presentations visit <https://…/>.
biglasso Big Lasso: Extending Lasso Model Fitting to Big Data in R
Extend lasso and elastic-net model fitting for ultrahigh-dimensional, multi-gigabyte data sets that cannot be loaded into memory. Compared to existing lasso-fitting packages, it preserves equivalently fast computation speed but is much more memory-efficient, thus allowing for very powerful big data analysis even with only a single laptop.
bigmatch Making Optimal Matching Size-Scalable Using Optimal Calipers
Implements optimal matching with near-fine balance in large observational studies with the use of optimal calipers to get a sparse network. The caliper is optimal in the sense that it is as small as possible such that a matching exists. Glover, F. (1967). <DOI:10.1002/nav.3800140304>. Katriel, I. (2008). <DOI:10.1287/ijoc.1070.0232>. Rosenbaum, P.R. (1989). <DOI:10.1080/01621459.1989.10478868>. Yang, D., Small, D. S., Silber, J. H., and Rosenbaum, P. R. (2012). <DOI:10.1111/j.1541-0420.2011.01691.x>.
bigReg Generalized Linear Models (GLM) for Large Data Sets
Allows the user to carry out GLM on very large data sets. Data can be created using the data_frame() function and appended to the object with object$append(data); data_frame and data_matrix objects are available that allow the user to store large data on disk. The data is stored as doubles in binary format and any character columns are transformed to factors and then stored as numeric (binary) data while a look-up table is stored in a separate .meta_data file in the same folder. The data is stored in blocks and GLM regression algorithm is modified and carries out a MapReduce- like algorithm to fit the model. The functions bglm(), and summary() and bglm_predict() are available for creating and post-processing of models. The library requires Armadillo installed on your system. It probably won’t function on windows since multi-core processing is done using mclapply() which forks R on Unix/Linux type operating systems.
bigrquery An Interface to Google’s BigQuery API
Easily talk to Google’s BigQuery database from R.
bigRR Generalized Ridge Regression (with special advantage for p >> n cases)
The package fits large-scale (generalized) ridge regression for various distributions of response. The shrinkage parameters (lambdas) can be pre-specified or estimated using an internal update routine (fitting a heteroscedastic effects model, or HEM). It gives possibility to shrink any subset of parameters in the model. It has special computational advantage for the cases when the number of shrinkage parameters exceeds the number of observations. For example, the package is very useful for fitting large-scale omics data, such as high-throughput genotype data (genomics), gene expression data (transcriptomics), metabolomics data, etc.
BigSEM Constructing Large Systems of Structural Equations
Construct large systems of structural equations using the two-stage penalized least squares (2SPLS) method proposed by Chen, Zhang and Zhang (2016).
bigsnpr Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for the analysis of massive SNP arrays. Preprint: Privé et al. (2017) <doi:10.1101/190926>.
bigstatsr Statistical Tools for Filebacked Big Matrices
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more. A scientific paper associated with this package is in preparation.
bigstep Stepwise Selection for Large Data Sets
Selecting linear models for large data sets using modified stepwise procedure and modern selection criteria (like modifications of Bayesian Information Criterion). Selection can be performed on data which exceed RAM capacity. Special selection strategy is available, faster than classical stepwise procedure.
bigtcr Nonparametric Analysis of Bivariate Gap Time with Competing Risks
For studying recurrent disease and death with competing risks, comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events. Alternatively, comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. This package implements a nonparametric estimator for the conditional cumulative incidence function and a nonparametric conditional bivariate cumulative incidence function for the bivariate gap times proposed in Huang et al. (2016) <doi:10.1111/biom.12494>.
bigtime Sparse Estimation of Large Time Series Models
Estimation of large Vector AutoRegressive (VAR), Vector AutoRegressive with Exogenous Variables X (VARX) and Vector AutoRegressive Moving Average (VARMA) Models with Structured Lasso Penalties, see Nicholson, Bien and Matteson (2017) <arXiv:1412.5250v2> and Wilms, Basu, Bien and Matteson (2017) <arXiv:1707.09208>.
billboarder Create Interactive Chart with the JavaScript ‘Billboard’ Library
Provides an ‘htmlwidgets’ interface to ‘billboard.js’, a re-usable easy interface JavaScript chart library, based on D3 v4+. Chart types include line charts, scatterplots, bar charts, pie/donut charts and gauge charts. All charts are interactive, and a proxy method is implemented to smoothly update a chart without rendering it again in ‘shiny’ apps.
bimixt Estimates Mixture Models for Case-Control Data
Estimates non-Gaussian mixture models of case-control data. The four types of models supported are binormal, two component constrained, two component unconstrained, and four component. The most general model is the four component model, under which both cases and controls are distributed according to a mixture of two unimodal distributions. In the four component model, the two component distributions of the control mixture may be distinct from the two components of the case mixture distribution. In the two component unconstrained model, the components of the control and case mixtures are the same; however the mixture probabilities may differ for cases and controls. In the two component constrained model, all controls are distributed according to one of the two components while cases follow a mixture distribution of the two components. In the binormal model, cases and controls are distributed according to distinct unimodal distributions. These models assume that Box-Cox transformed case and control data with a common lambda parameter are distributed according to Gaussian mixture distributions. Model parameters are estimated using the expectation-maximization (EM) algorithm. Likelihood ratio test comparison of nested models can be performed using the lr.test function. AUC and PAUC values can be computed for the model-based and empirical ROC curves using the auc and pauc functions, respectively. The model-based and empirical ROC curves can be graphed using the roc.plot function. Finally, the model-based density estimates can be visualized by plotting a model object created with the bimixt.model function.
BimodalIndex The Bimodality Index
Defines the functions used to compute the bimodal index as defined by Wang et al. (2009) <https://…/>.
Binarize Binarization of One-Dimensional Data
Provides methods for the binarization of one-dimensional data and some visualization functions.
BinarybalancedCut Threshold Cut Point of Probability for a Binary Classifier Model
Allows to view the optimal probability cut-off point at which the Sensitivity and Specificity meets and its a best way to minimize both Type-1 and Type-2 error for a binary Classifier in determining the Probability threshold.
BinaryEMVS Variable Selection for Binary Data Using the EM Algorithm
Implements variable selection for high dimensional datasets with a binary response variable using the EM algorithm. Both probit and logit models are supported. Also included is a useful function to generate high dimensional data with correlated variables.
BinaryEPPM Mean and Variance Modeling of Binary Data
Modeling under- and over-dispersed binary data using extended Poisson process models (EPPM).
binaryGP Fit and Predict a Gaussian Process Model with (Time-Series) Binary Response
Allows the estimation and prediction for binary Gaussian process model. The mean function can be assumed to have time-series structure. The estimation methods for the unknown parameters are based on penalized quasi-likelihood/penalized quasi-partial likelihood and restricted maximum likelihood. The predicted probability and its confidence interval are computed by Metropolis-Hastings algorithm. More details can be seen in Sung et al (2017) <arXiv:1705.02511>.
binaryLogic Binary Logic
Convert to binary numbers (Base2). Shift, rotate, summary. Based on logical vector.
binb binb’ is not ‘Beamer’
A collection of ‘LaTeX’ styles using ‘Beamer’ customization for pdf-based presentation slides in ‘RMarkdown’. At present it contains ‘RMarkdown’ adaptations of the LaTeX themes ‘Metropolis’ (formerly ‘mtheme’) theme by Matthias Vogelgesang and others (now included in ‘TeXLive’), and the ‘IQSS’ by Ista Zahn (which is included here). Additional (free) fonts may be needed: ‘Metropolis’ prefers ‘Fira’, and ‘IQSS’ requires ‘Libertinus’.
bindr Parametrized Active Bindings
Provides a simple interface for creating active bindings where the bound function accepts additional arguments.
bindrcpp An ‘Rcpp’ Interface to Active Bindings
Provides an easy way to fill an environment with active bindings that call a C++ function.
binman A Binary Download Manager
Tools and functions for managing the download of binary files. Binary repositories are defined in ‘YAML’ format. Defining new pre-download, download and post-download templates allow additional repositories to be added.
binnednp Nonparametric Estimation for Interval-Grouped Data
Kernel density and distribution estimation for interval-grouped data (Reyes, Francisco-Fernandez and Cao 2016, 2017) <doi:10.1080/10485252.2016.1163348>, <doi:10.1007/s11749-017-0523-9>, (Gonzalez-Andujar, Francisco-Fernandez, Cao, Reyes, Urbano, Forcella and Bastida 2016) <doi:10.1111/wre.12216> and nonparametric estimation of seedling emergence indices (Cao, Francisco-Fernandez, Anand, Bastida and Gonzalez-Andujar 2011) <doi:10.1017/S002185961100030X>.
binomen Taxonomic’ Specification and Parsing Methods
Includes functions for working with taxonomic data, including functions for combining, separating, and filtering taxonomic groups by any rank or name. Allows standard (SE) and non-standard evaluation (NSE).
BinQuasi Analyzing Replicated ChIP Sequencing Data Using Quasi-Likelihood
Identify peaks in ChIP-seq data with biological replicates using a one-sided quasi-likelihood ratio test in quasi-Poisson or quasi-negative binomial models.
binsmooth Generate PDFs and CDFs from Binned Data
Provides several methods for generating density functions based on binned data. Data are assumed to be nonnegative, but the bin widths need not be uniform, and the top bin may be unbounded. All PDF smoothing methods maintain the areas specified by the binned data. (Equivalently, all CDF smoothing methods interpolate the points specified by the binned data.) An estimate for the mean of the distribution may be supplied as an optional argument, which greatly improves the reliability of statistics computed from the smoothed density functions. Methods include step function, recursive subdivision, and optimized spline.
binsreg Binscatter Estimation and Inference
Provides tools for statistical analysis using the binscatter methods developed by Cattaneo, Crump, Farrell and Feng (2019a) <arXiv:1902.09608> and Cattaneo, Crump, Farrell and Feng (2019b) <arXiv:1902.09615>. Binscatter provides a flexible way of describing the mean relationship between two variables based on partitioning/binning of the independent variable of interest. binsreg() implements binscatter estimation and robust (pointwise and uniform) inference of regression functions and derivatives thereof, with particular focus on constructing binned scatter plots. binsregtest() implements hypothesis testing procedures for parametric functional forms of and nonparametric shape restrictions on the regression function. binsregselect() implements data-driven procedures for selecting the number of bins for binscatter estimation. All the commands allow for covariate adjustment, smoothness restrictions and clustering.
binst Data Preprocessing, Binning for Classification and Regression
Various supervised and unsupervised binning tools including using entropy, recursive partition methods and clustering.
BiocManager Access the Bioconductor Project Package Repository
A convenient tool to install and update Bioconductor packages.
Biocomb Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis
Contains functions for the data analysis with the emphasis on biological data, including several algorithms for feature ranking, feature selection, classification algorithms with the embedded validation procedures. The functions can deal with numerical as well as with nominal features. Includes also the functions for calculation of feature AUC (Area Under the ROC Curve) and HUM (hypervolume under manifold) values and construction 2D- and 3D- ROC curves. Biocomb provides the calculation of Area Above the RCC (AAC) values and construction of Relative Cost Curves (RCC) to estimate the classifier performance under unequal misclassification costs problem. Biocomb has the special function to deal with missing values, including different imputing schemes.
biogeo Point Data Quality Assessment and Coordinate Conversion
Functions for error detection and correction in point data quality datasets that are used in species distribution modelling. Includes functions for parsing and converting coordinates into decimal degrees from various formats.
Bioi Biological Image Analysis
Single linkage clustering and connected component analyses are often performed on biological images. ‘Bioi’ provides a set of functions for performing these tasks. This functionality is implemented in several key functions that can extend to from 1 to many dimensions. The single linkage clustering method implemented here can be used on n-dimensional data sets, while connected component analyses are limited to 3 or fewer dimensions.
bioplots Visualization of Overlapping Results with Heatmap
Visualization of complex biological datasets is essential to understand complementary spects of biology in big data era. In addition, analyzing of multiple datasets enables to understand biologcal processes deeply and accurately. Multiple datasets produce multiple analysis results, and these overlappings are usually visualized in Venn diagram. bioplots is a tiny R package that generates a heatmap to visualize overlappings instead of using Venn diagram.
biorxivr Search and Download Papers from the bioRxiv Preprint Server
The bioRxiv preprint server (http://www.biorxiv.org ) is a website where scientists can post preprints of scholarly texts in biology. Users can search and download PDFs in bulk from the preprint server. The text of abstracts are stored as raw text within R, and PDFs can easily be saved and imported for text mining with packages such as ‘tm’.
Bios2cor From Biological Sequences and Simulations to Correlation Analysis
The package is dedicated to computation and analysis of correlation/co-variation in multiple sequence alignments and in side chain motions during molecular dynamics simulations. Features include the ability to compute correlation/co-variation using a variety of scoring functions between either sequence positions in alignments or side chain dihedral angles in molecular dynamics simulations and to analyze the correlation/co-variation matrix through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation. Examples of sequence co-variation analysis and utility tools are provided in: Pele J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. (2014) <doi:10.1002/prot.24570>. This work was supported by the Franch National Research Agency (Grant number: ANR-11-BSV2-026).
bioset Convert a Matrix of Raw Values into Nice and Tidy Data
Functions to help dealing with raw data from measurements, like reading and transforming raw values organised in matrices, calculating and converting concentrations and calculating precision of duplicates / triplicates / … . It is compatible with and building on top of ‘tidyverse’-packages.
biospear Biomarker Selection in Penalized Regression Models
Provides some tools for developing and validating prediction models, estimate expected survival of patients and visualize them graphically. Most of the implemented methods are based on penalized regressions such as: the lasso (Tibshirani R (1996)), the elastic net (Zou H et al. (2005) <doi:10.1111/j.1467-9868.2005.00503.x>), the adaptive lasso (Zou H (2006) <doi:10.1198/016214506000000735>), the stability selection (Meinshausen N et al. (2010) <doi:10.1111/j.1467-9868.2010.00740.x>), some extensions of the lasso (Ternes et al. (2016) <doi:10.1002/sim.6927>), some methods for the interaction setting (Ternes N et al. (2016) <doi:10.1002/bimj.201500234>), or others. A function generating simulated survival data set is also provided.
biostat3 Utility Functions, Datasets and Extended Examples for Survival Analysis
Utility functions, datasets and extended examples for survival analysis. This includes a range of other packages, some simple wrappers for time-to-event analyses, datasets, and extensive examples in HTML with R scripts. The package also supports the course Biostatistics III entitled ‘Survival analysis for epidemiologists in R’.
bipartite Visualising bipartite networks and calculating some (ecological) indices
Bipartite provides functions to visualise webs and calculate a series of indices commonly used to describe pattern in ecological webs. It focuses on webs consisting of only two trophic levels, e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the webs topology.
bipartiteD3 Interactive Bipartite Graphs
Generates interactive bipartite graphs using the D3 library. Designed for use with the ‘bipartite’ analysis package. Sources open source ‘vis-js’ library (<http://…/> ). Adapted from examples at <https://…/NPashaP> (released under GPL-3).
BiplotGUI Interactive Biplots in R
Provides a GUI with which users can construct and interact with biplots.
birdnik Connector for the Wordnik API
A connector to the API for ‘Wordnik’ <https://whoapi.com>, a dictionary service that also provides bigram generation, word frequency data, and a whole host of other functionality.
biscale Tools and Palettes for Bivariate Thematic Mapping
Provides a ‘ggplot2’ centric approach to bivariate mapping. This is a technique that maps two quantities simultaneously rather than the single value that most thematic maps display. The package provides a suite of tools for calculating breaks using multiple different approaches, a selection of palettes appropriate for bivariate mapping and a scale function for ‘ggplot2’ calls that adds those palettes to maps. A tool for creating bivariate legends is also included.
bisque Approximate Bayesian Inference via Sparse Grid Quadrature Evaluation (BISQuE) for Hierarchical Models
Implementation of the ‘bisque’ strategy for approximate Bayesian posterior inference. See Hewitt and Hoeting (2019) <arXiv:1904.07270> for complete details. ‘bisque’ combines conditioning with sparse grid quadrature rules to approximate marginal posterior quantities of hierarchical Bayesian models. The resulting approximations are computationally efficient for many hierarchical Bayesian models. The ‘bisque’ package allows approximate posterior inference for custom models; users only need to specify the conditional densities required for the approximation.
bitops Bitwise Operations
Functions for bitwise operations on integer vectors.
BiTrinA Binarization and Trinarization of One-Dimensional Data
Provides methods for the binarization and trinarization of one-dimensional data and some visualization functions.
bitsqueezr Quantize Floating-Point Numbers for Improved Compressibility
Provides a implementation of floating-point quantization algorithms for use in precision-preserving compression, similar to the approach taken in the ‘netCDF operators’ (NCO) software package and described in Zender (2016) <doi:10.5194/gmd-2016-63>.
biva Business Intelligence
Interactive shiny application for working with different kinds of data. Visuals for data are provided. Runtime examples are provided.
bivariate Bivariate Probability Distributions
Provides alternatives to persp() for plotting bivariate functions, including both step and continuous functions. Also, provides convenience functions for constructing and plotting bivariate probability distributions. Currently, only normal distributions are supported but other probability distributions are likely to be added in the near future.
Bivariate.Pareto Bivariate Pareto Models
Perform competing risks analysis under bivariate Pareto models. See Shih et al. (2018, to appear).
BivRegBLS Tolerance Intervals and Errors-in-Variables Regressions in Method Comparison Studies
Assess the agreement in method comparison studies by tolerance intervals and errors-in-variables regressions. The Ordinary Least Square regressions (OLSv and OLSh), the Deming Regression (DR), and the (Correlated)-Bivariate Least Square regressions (BLS and CBLS) can be used with unreplicated or replicated data. The BLS and CBLS are the two main functions to estimate a regression line, while XY.plot and MD.plot are the two main graphical functions to display, respectively an (X,Y) plot or (M,D) plot with the BLS or CBLS results. Assuming no proportional bias, the (M,D) plot (Band-Altman plot) may be simplified by calculating horizontal lines intervals with tolerance intervals (beta-expectation (type I) or beta-gamma content (type II)).
bivrp Bivariate Residual Plots with Simulation Polygons
Generates bivariate residual plots with simulation polygons for any diagnostics and bivariate model from which functions to extract the desired diagnostics, simulate new data and refit the models are available.
biwavelet Conduct Univariate and Bivariate Wavelet Analyses
This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses.
bkmr Bayesian Kernel Machine Regression
Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures.
BKPC Bayesian Kernel Projection Classifier
Bayesian kernel projection classifier is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. A Gibbs sampler is implemented to find the posterior distributions of the parameters.
blackbox Black Box Optimization and Exploration of Parameter Space
Performs prediction of a response function from simulated response values, allowing black-box optimization of functions estimated with some error. blackbox includes a simple user interface for such applications, as well as more specialized functions designed to be called by the Migraine software (see URL). The latter functions are used for prediction of likelihood surfaces and implied likelihood ratio confidence intervals, and for exploration of predictor space of the surface. Prediction of the response is based on ordinary kriging (with residual error) of the input. Estimation of smoothing parameters is performed by generalized cross validation.
blaise Read and Write FWF Files in the Blaise Format
Can be used to read and write a fwf with an accompanying blaise datamodel. When supplying a datamodel for writing, the dataframe will be automatically converted to that format and checked for compatibility. Supports dataframes, tibbles and LaF objects.
BlandAltmanLeh Plots (slightly extended) Bland-Altman plots
Bland-Altman Plots using base graphics as well as ggplot2, slightly extended by confidence intervals, with detailed return values and a sunflowerplot option for data with ties.
blandr Bland-Altman Method Comparison
Carries out Bland Altman analyses (also known as a Tukey mean-difference plot) as described by JM Bland and DG Altman in 1986 <doi:10.1016/S0140-6736(86)90837-8>. This package was created in 2015 as existing Bland-Altman analysis functions did not calculate confidence intervals. This package was created to rectify this, and create reproducible plots.
blastula Easily Send HTML Email Messages
Compose and send out responsive HTML email messages that render perfectly across a range of email clients and device sizes. Messages are composed using ‘Markdown’ and a text interpolation system that allows for the injection of evaluated R code within the message body, footer, and subject line. Helper functions let the user insert embedded images, web link buttons, and ‘ggplot2’ plot objects into the message body. Messages can be sent through an ‘SMTP’ server or through the ‘Mailgun’ API service <http://…/>.
blatr Send Emails Using ‘Blat’ for Windows
A wrapper around the Blat command line SMTP mailer for Windows. Blat is public domain software, but be sure to read the license before use. It can be found at the Blat website http://www.blat.net .
blavaan Bayesian Latent Variable Analysis
Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models.
BLCOP Black-Litterman and Copula Opinion Pooling Frameworks
An implementation of the Black-Litterman Model and Atilio Meucci’s copula opinion pooling framework.
blendedLink A New Link Function that Blends Two Specified Link Functions
A new link function that equals one specified link function up to a cutover then a linear rescaling of another specified link function. For use in glm() or glm2(). The intended use is in binary regression, in which case the first link should be set to ‘log’ and the second to ‘logit’. This ensures that fitted probabilities are between 0 and 1 and that exponentiated coefficients can be interpreted as relative risks for probabilities up to the cutoff.
Blendstat Joint Analysis of Experiments with Mixtures and Random Effects
Package to perform a joint analysis of experiments with mixtures and random effects, assuming a process variable, represented by a covariate, Kalirajan K P (1990) <doi:10.1080/757582835>.
blink Record Linkage for Empirically Motivated Priors
An implementation of the model in Steorts (2015) <DOI:10.1214/15-BA965SI>, which performs Bayesian entity resolution for categorical and text data, for any distance function defined by the user. In addition, the precision and recall are in the package to allow one to compare to any other comparable method such as logistic regression, Bayesian additive regression trees (BART), or random forests. The experiments are reproducible and illustrated using a simple vignette.
blkbox Data Exploration with Multiple Machine Learning Algorithms
Allows data to be processed by multiple machine learning algorithms at the same time, enables feature selection of data by single a algorithm or combinations of multiple. Easy to use tool for k-fold cross validation and nested cross validation.
BLModel Black-Litterman Posterior Distribution
Posterior distribution in the Black-Litterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function.
blob A Simple S3 Class for Representing Vectors of Binary Data (‘BLOBS’)
R’s raw vector is useful for storing a single binary object. What if you want to put a vector of them in a data frame? The blob package provides the blob object, a list of raw vectors, suitable for use as a column in data frame.
BlockFeST Bayesian Calculation of Region-Specific Fixation Index to Detect Local Adaptation
An R implementation of an extension of the ‘BayeScan’ software (Foll, 2008) <DOI:10.1534/genetics.108.092221> for codominant markers, adding the option to group individual SNPs into pre-defined blocks. A typical application of this new approach is the identification of genomic regions, genes, or gene sets containing one or more SNPs that evolved under directional selection.
blockRAR Block Design for Response-Adaptive Randomization
Computes power for response-adaptive randomization with a block design that captures both the time and treatment effect. T. Chandereng, R. Chappell (2019) <arXiv:1904.07758>.
blockseg Two Dimensional Change-Points Detection
Segments a matrix in blocks with constant values.
blorr Tools for Developing Binary Logistic Regression Models
Tools designed to make it easier for beginner and intermediate users to build and validate binary logistic regression models. Includes bivariate analysis, comprehensive regression output, model fit statistics, variable selection procedures, model validation techniques and a ‘shiny’ app for interactive model building.
Blossom Functions for making statistical comparisons with distance-function based permutation tests
Blossom is an R package with functions for making statistical comparisons with distance-function based permutation tests developed by P.W. Mielke, Jr. and colleagues at Colorado State University and for testing parameters estimated in linear models with permutation procedures developed by B. S. Cade and colleagues at the Fort Collins Science Center, U.S. Geological Survey. This implementation in R has allowed for numerous improvements not supported by the Cade and Richards Fortran implementation, including use of categorical predictor variables in most routines.
Blossom Statistical Package for R
BLPestimatoR Performs a BLP Demand Estimation
Provides the estimation algorithm to perform the demand estimation described in Berry, Levinsohn and Pakes (1995) <DOI:10.2307/2171802> . The routine uses analytic gradients and offers a large number of implemented integration methods and optimization routines.
blsAPI Request Data From The U.S. Bureau of Labor Statistics API
Allows users to request data for one or multiple series through the U.S. Bureau of Labor Statistics API. Users provide parameters as specified in http://…/api_signature.htm and the function returns a JSON string.
BLSM Bayesian Latent Space Model
Provides a Bayesian latent space model for complex networks, either weighted or unweighted. Given an observed input graph, the estimates for the latent coordinates of the nodes are obtained through a Bayesian MCMC algorithm. The overall likelihood of the graph depends on a fundamental probability equation, which is defined so that ties are more likely to exist between nodes whose latent space coordinates are close. The package is mainly based on the model by Hoff, Raftery and Handcock (2002) <doi:10.1198/016214502388618906> and contains some extra features (e.g., removal of the Procrustean step, weights implemented as coefficients of the latent distances, 3D plots). The original code related to the above model was retrieved from <https://…/>. Users can inspect the MCMC simulation, create and customize insightful graphical representations or apply clustering techniques.
BMA Bayesian Model Averaging
Package for Bayesian model averaging for linear models, generalizable linear models and survival models (cox regression).
BMAmevt Multivariate Extremes: Bayesian Estimation of the Spectral Measure
Toolkit for Bayesian estimation of the dependence structure in Multivariate Extreme Value parametric models.
BMisc Miscellaneous Functions for Panel Data, Quantiles, and Printing Results
These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make ecdf functions from a set of data points (this is particularly useful when a distribution function is created in several steps) and to combine distribution functions based on some external weights; these distribution functions can easily be inverted to obtain quantiles. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to drop covariates from formulas.
bmixture Bayesian Estimation for Finite Mixture of Distributions
Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions.
bmlm Bayesian Multilevel Mediation
Easy estimation of Bayesian multilevel mediation models with Stan.
bmotif Counting Motifs in Bipartite Networks
Counts occurrences of motifs in bipartite networks, as well as the number of times each node appears in each unique position within motifs. Intended for use in ecology, but its methods are general and can be applied to any bipartite network.
bnclassify Learning Bayesian Network Classifiers from Data
Implementation of different algorithms for learning discrete Bayesian network classifiers from data, including wrapper algorithms and those based on Chow-Liu’s algorithm.
BNDataGenerator Data Generator based on Bayesian Network Model
Data generator based on Bayesian network model
BNN Bayesian Neural Network for High-Dimensional Nonlinear Variable Selection
Perform Bayesian variable selection for high-dimensional nonlinear systems and also can be used to test nonlinearity for a general regression problem. The computation can be accelerated using multiple CPUs. You can refer to Liang, F., Li, Q. and Zhou, L. (2017) at <https://…/SAMSI_DPDA-Liang.pdf> for detail. The publication ‘Bayesian Neural Networks for Selection of drug sensitive Genes’ will be appear on Journals of American Statistical Association soon.
bnnSurvival Bagged k-Nearest Neighbors Survival Prediction
Implements a bootstrap aggregated (bagged) version of the k-nearest neighbors survival probability prediction method (Lowsky et al. 2013). In addition to the bootstrapping of training samples, the features can be subsampled in each baselearner to break the correlation between them. The Rcpp package is used to speed up the computation.
bnormnlr Bayesian Estimation for Normal Heteroscedastic Nonlinear Regression Models
Implementation of Bayesian estimation in normal heteroscedastic nonlinear regression Models following Cepeda-Cuervo, (2001)
bnpa Bayesian Networks & Path Analysis
We proposed a hybrid approach using the computational and statistical resources of the Bayesian Networks to learn a network structure from a data set using 4 different algorithms and the robustness of the statistical methods present in the Structural Equation Modeling to check the goodness of fit from model over data. We built an intermediate algorithm to join the features of ‘bnlearn’ and ‘lavaan’ R packages. The Bayesian Networks structure learning algorithms used were ‘Hill-Climbing’, ‘Max-Min Hill-Climbing’, ‘Restricted Maximization’ and ‘Tabu Search’.
BNPmix Algorithms for Pitman-Yor Process Mixtures
Contains different algorithms to both univariate and multivariate Pitman-Yor process mixture models, and Griffiths-Milne Dependent Dirichlet process mixture models. Pitman-Yor process mixture models are flexible Bayesian nonparametric models to deal with density estimation. Estimation could be done via importance conditional sampler, or via slice sampler, as done by Walker (2007) <doi:10.1080/03610910601096262>, or using a marginal sampler, as in Escobar and West (1995) <doi:10.2307/2291069> and extensions. The package contains also the procedures to estimate via importance conditional sampler a GM-Dependent Dirichlet process mixture model.
BNPMIXcluster Bayesian Nonparametric Model for Clustering with Mixed Scale Variables
Bayesian nonparametric approach for clustering that is capable to combine different types of variables (continuous, ordinal and nominal) and also accommodates for different sampling probabilities in a complex survey design. The model is based on a location mixture model with a Poisson-Dirichlet process prior on the location parameters of the associated latent variables. The package performs the clustering model described in Carmona, C., Nieto-Barajas, L. E., Canale, A. (2016) <http://…/1612.00083>.
BNPTSclust A Bayesian Nonparametric Algorithm for Time Series Clustering
Performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014).
BNSL Bayesian Network Structure Learning
From a given dataframe, this package learns its Bayesian network structure based on a selected score.
bnspatial Spatial Implementation of Bayesian Networks and Mapping
Package for the spatial implementation of Bayesian Networks and mapping in geographical space. It makes maps of expected value (or most likely state) given known and unknown conditions, maps of uncertainty measured as both coefficient of variation or Shannon index (entropy), maps of probability associated to any states of any node of the network. Some additional features are provided as well, such as parallel processing options, data discretization routines and function wrappers designed for users with minimal knowledge of the R programming language.
bnstruct Bayesian Network Structure Learning from Data with Missing Values
Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Hill-climbing heuristic search, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
bnviewer Interactive Visualization of Bayesian Networks
Interactive visualization of Bayesian Networks. The ‘bnviewer’ package reads various structure learning algorithms provided by the ‘bnlearn’ package and allows you to view them interactively.
boclust A Clustering Method Based on Boosting on Single Attributes
An overlap clustering algorithm for categorical ultra-dimension data.
BoltzMM Boltzmann Machines with MM Algorithms
Provides probability computation, data generation, and model estimation for fully-visible Boltzmann machines. It follows the methods described in Nguyen and Wood (2016a) <doi:10.1162/NECO_a_00813> and Nguyen and Wood (2016b) <doi:10.1109/TNNLS.2015.2425898>.
BonEV An Improved Multiple Testing Procedure for Controlling False Discovery Rates
An improved multiple testing procedure for controlling false discovery rates which is developed based on the Bonferroni procedure with integrated estimates from the Benjamini-Hochberg procedure and the Storey’s q-value procedure. It controls false discovery rates through controlling the expected number of false discoveries.
bookdown Authoring Books with R Markdown
Output formats and utilities for authoring books with R Markdown.
bookdownplus Generate Varied Books and Documents with R ‘bookdown’ Package
A collection and selector of R ‘bookdown’ templates. ‘bookdownplus’ helps you write academic journal articles, guitar books, chemical equations, mails, calendars, and diaries. R ‘bookdownplus’ extends the features of ‘bookdown’, and simplifies the procedure. Users only have to choose a template, clarify the book title and author name, and then focus on writing the text. No need to struggle in YAML and LaTeX.
BoolFilter Optimal Estimation of Partially Observed Boolean Dynamical Systems
Tools for optimal and approximate state estimation as well as network inference of Partially-Observed Boolean Dynamical Systems.
boostmtree Boosted Multivariate Trees for Longitudinal Data
Implements Friedman’s gradient descent boosting algorithm for longitudinal data using multivariate tree base learners. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter.
bootcluster Bootstrapping Estimates of Clustering Stability
Implementation of the bootstrapping approach for the estimation of clustering stability on observation and cluster level, as well as its application in estimating the number of clusters.
bootnet Bootstrap Methods for Various Network Estimation Routines
Bootstrap standard errors on various network estimation routines, such as EBICglasso from the qgraph package and IsingFit from the IsingFit package.
bootsPLS Bootstrap Subsamplings of Sparse Partial Least Squares – Discriminant Analysis for Classification and Signature Identification
Bootstrap Subsamplings of sparse Partial Least Squares – Discriminant Analysis (sPLS-DA) for Classification and Signature Identification. The method is applicable to any classification problem with more than 2 classes. It relies on bootstrap subsamplings of sPLS-DA and provides tools to select the most stable variables (defined as the ones consistently selected over the bootstrap subsamplings) and to predict the class of test samples.
bootstrapFP Bootstrap Algorithms for Finite Population Inference
Finite Population bootstrap algorithms to estimate the variance of the Horvitz-Thompson estimator for single-stage sampling. For a survey of bootstrap methods for finite populations, see Mashreghi et Al. (2016) <doi:10.1214/16-SS113>.
bootTimeInference Robust Performance Hypothesis Testing with the Sharpe Ratio
Applied researchers often test for the difference of the Sharpe ratios of two investment strategies. A very popular tool to this end is the test of Jobson and Korkie, which has been corrected by Memmel. Unfortunately, this test is not valid when returns have tails heavier than the normal distribution or are of time series nature. Instead, we propose the use of robust inference methods. In particular, we suggest to construct a studentized time series bootstrap confidence interval for the difference of the Sharpe ratios and to declare the two ratios different if zero is not contained in the obtained interval. This approach has the advantage that one can simply resample from the observed data as opposed to some null-restricted data.
boottol Bootstrap Tolerance Levels for Credit Scoring Validation Statistics
Used to create bootstrap tolerance levels for the Kolmogorov-Smirnov (KS) statistic, the area under receiver operator characteristic curve (AUROC) statistic, and the Gini coefficient for each score cutoff.
BootWPTOS Test Stationarity using Bootstrap Wavelet Packet Tests
Provides significance tests for second-order stationarity for time series using bootstrap wavelet packet tests.
bor Transforming Behavioral Observation Records into Data Matrices
Transforms focal observations’ data, where different types of social interactions can be recorded by multiple observers, into asymmetric data matrices. Each cell in these matrices provides counts on the number of times a specific type of social interaction was initiated by the row subject and directed to the column subject.
Boruta Wrapper Algorithm for All Relevant Feature Selection
An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies.
BoSSA A Bunch of Structure and Sequence Analysis
Reads and plots phylogenetic placements obtained using the ‘epa’, ‘pplacer’ and ‘guppy’ softwares.
boxcoxmix Response Transformations for Random Effect and Variance Component Models
Response transformations for overdispersed generalized linear models and variance component models using nonparametric profile maximum likelihood estimation. The main function is optim.boxcox().
bpa Basic Pattern Analysis
Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats.
bpnreg Bayesian Projected Normal Regression Models for Circular Data
Fitting Bayesian multiple and mixed-effect regression models for circular data based on the projected normal distribution. Both continuous and categorical predictors can be included. Sampling from the posterior is performed via an MCMC algorithm. Posterior descriptives of all parameters, model fit statistics and Bayes factors for hypothesis tests for inequality constrained hypotheses are provided. See Cremers, Mulder & Klugkist (2018) <doi:10.1111/bmsp.12108> and Nuñez-Antonio & Guttiérez-Peña (2014) <doi:10.1016/j.csda.2012.07.025>.
bpp Computations Around Bayesian Predictive Power
Implements functions to update Bayesian Predictive Power Computations after not stopping a clinical trial at an interim analysis. Such an interim analysis can either be blinded or unblinded. Code is provided for Normally distributed endpoints with known variance, with a prominent example being the hazard ratio.
BradleyTerryScalable Fits the Bradley-Terry Model to Potentially Large and Sparse Networks of Comparison Data
Facilities are provided for fitting the simple, unstructured Bradley-Terry model to networks of binary comparisons. The implemented methods are designed to scale well to large, potentially sparse, networks. A fairly high degree of scalability is achieved through the use of EM and MM algorithms, which are relatively undemanding in terms of memory usage (relative to some other commonly used methods such as iterative weighted least squares, for example). Both maximum likelihood and Bayesian MAP estimation methods are implemented. The package provides various standard methods for a newly defined ‘btfit’ model class, such as the extraction and summarisation of model parameters and the simulation of new datasets from a fitted model. Tools are also provided for reshaping data into the newly defined ‘btdata’ class, and for analysing the comparison network, prior to fitting the Bradley-Terry model. This package complements, rather than replaces, the existing ‘BradleyTerry2’ package. (BradleyTerry2 has rather different aims, which are mainly the specification and fitting of ‘structured’ Bradley-Terry models in which the strength parameters depend on covariates.)
braidReports Visualize Combined Action Response Surfaces and Report BRAID Analyses
Provides functions to generate, format, and style surface plots for visualizing combined action data. Also provides functions for reporting on a BRAID analysis, including plotting curve-shifts, calculating IAE values, and producing full BRAID analysis reports.
braidrm Fitting Dose Response with the BRAID Combined Action Model
Contains functions for evaluating, analyzing, and fitting combined action dose response surfaces with the Bivariate Response to Additive Interacting Dose (BRAID) model of combined action.
brainKCCA Region-Level Connectivity Network Construction via Kernel Canonical Correlation Analysis
It is designed to calculate connection between (among) brain regions and plot connection lines. Also, the summary function is included to summarize group-level connectivity network. Kang, Jian (2016) <doi:10.1016/j.neuroimage.2016.06.042>.
brant Test for Parallel Regression Assumption
Tests the parallel regression assumption for ordinal logit models generated with the function polr() from the package MASS.
braQCA Bootstrapped Robustness Assessment for Qualitative Comparative Analysis
Test the robustness of a user’s Qualitative Comparative Analysis solutions to randomness, using the bootstrapped assessment: baQCA(). This package also includes a function that provides recommendations for improving solutions to reach typical significance levels: brQCA(). After applying recommendations from brQCA(), QCAdiff() shows which cases are excluded from the final result.
brea Bayesian Recurrent Event Analysis
A function to produce MCMC samples for posterior inference in semiparametric Bayesian discrete time competing risks recurrent events models.
breakDown Break Down Plots
Break Down Plots are inspired by waterfall plots created by ‘xgboostExplainer’ package (see <https://…/xgboostExplainer> ). The idea behind Break Down Plots it to decompose model prediction for a single observation. Break Down Plots show the contribution of every variable present in the model. Such plots will work for binary classifiers and general regression models.
breakfast Multiple Change-Point Detection and Segmentation
Performs multiple change-point detection in data sequences, or data sequence segmentation, using computationally efficient multiscale methods. This version only implements the ‘Tail-Greedy Unbalanced Haar’ change-point detection methodology; more methods will be added in future versions. To start with, see the function segment.mean.
BreakoutDetection Breakout Detection via Robust E-Statistics
BreakoutDetection is an open-source R package that makes breakout detection simple and fast. The BreakoutDetection package can be used in wide variety of contexts. For example, detecting breakout in user engagement post an A/B test, detecting behavioral change, or for problems in econometrics, financial engineering, political and social sciences.
brglm2 Bias Reduction in Generalized Linear Models
Estimation and inference from generalized linear models based on various methods for bias reduction. The brglmFit fitting method can achieve reduction of estimation bias either through the adjusted score equations approach in Firth (1993) <https://…/80.1.27> and Kosmidis and Firth (2009) <https://…/asp055>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <http://…/2345592>. In the special case of generalized linear models for binomial and multinomial responses, the adjusted score equations approach returns estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation). Estimation in all cases takes place via a quasi Fisher scoring algorithm, and S3 methods for the construction of of confidence intervals for the reduced-bias estimates are provided.
bridgedist An Implementation of the Bridge Distribution with Logit-Link as in Wang and Louis (2003)
An implementation of the bridge distribution with logit-link in R. In Wang and Louis (2003) <doi:10.1093/biomet/90.4.765>, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.
briqr Interface to the ‘Briq’ API
An interface to the ‘Briq’ API <https://briq.github.io>. ‘Briq’ is a tool that aims to promote employee engagement by helping employees recognize and reward each other. Employees can praise and thank one another (for achieving a company goal, for example) by giving virtual credits (known as ‘briqs’ or ‘bqs’) that can be redeemed for various rewards. The ‘Briq’ API lets you create, read, update and delete users, user groups, transactions and messages. This package provides functions that simplify getting the users, user groups and transactions of your organization into R.
BRISC Fast Inference for Large Spatial Datasets using BRISC
Fits Bootstrap with univariate spatial regression models using Bootstrap for Rapid Inference on Spatial Covariances (BRISC) for large datasets using Nearest Neighbor Gaussian Processes detailed in Saha and Datta (2018) <doi:10.1002/sta4.184>.
briskaR Biological Risk Assessment
A spatio-temporal exposure-hazard model for assessing biological risk and impact. The model is based on stochastic geometry for describing the landscape and the exposed individuals, a dispersal kernel for the dissemination of contaminants and an ecotoxicological equation.
brlrmr Bias Reduction with Missing Binary Response
Provides two main functions, il() and fil(). The il() function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068> to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil() function implements the algorithm proposed by Maity et. al. (2017+) <https://…/brlrmr> to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068>.
brm Binary Regression Model
Fits novel models for the conditional relative risk, risk difference and odds ratio.
brms Bayesian Regression Models using Stan
Write and fit Bayesian generalized linear mixed models using Stan for full Bayesian inference.
broom Convert Statistical Analysis Objects into Tidy Data Frames
Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model’s statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.
http://…/broom-intro
http://…/broom-slides
broom.mixed Tidying Methods for Mixed Models
Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the ‘broom’ package. The package provides three S3 generics for each model: tidy(), which summarizes a model’s statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.
broomExtra Grouped Statistical Analyses in a Tidy Way
Collection of functions to assist ‘broom’ and ‘broom.mixed’ package-related data analysis workflows. In particular, the generic functions tidy(), glance(), and augment() choose appropriate S3 methods from these two packages depending on which package exports the needed method. Additionally, ‘grouped_’ variants of the generics provides a convenient way to execute functions across a combination of grouping variable(s) in a dataframe.
brotli A Compression Format Optimized for the Web
A lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-purpose compression methods. Brotli is similar in speed to deflate but offers more dense compression.
Brq Bayesian Analysis of Quantile Regression Models
Bayesian estimation and variable selection for quantile regression models.
brr Bayesian Inference on the Ratio of Two Poisson Rates
Implementation of the Bayesian inference for the two independent Poisson samples model, using the semi-conjugate family of prior distributions.
brt Biological Relevance Testing
Analyses of large-scale -omics datasets commonly use p-values as the indicators of statistical significance. However, considering p-value alone neglects the importance of effect size (i.e., the mean difference between groups) in determining the biological relevance of a significant difference. Here, we present a novel algorithm for computing a new statistic, the biological relevance testing (BRT) index, in the frequentist hypothesis testing framework to address this problem.
Brundle Normalisation Tools for Inter-Condition Variability of ChIP-Seq Data
Inter-sample condition variability is a key challenge of normalising ChIP-seq data. This implementation uses either spike-in or a second factor as a control for normalisation. Input can either be from ‘DiffBind’ or a matrix formatted for ‘DESeq2’. The output is either a ‘DiffBind’ object or the default ‘DESeq2’ output. Either can then be processed as normal. Supporting manuscript Guertin, Markowetz and Holding (2017) <doi:10.1101/182261>.
bsearchtools Binary Search Tools
Exposes the binary search functions of the C++ standard library (std::lower_bound, std::upper_bound) plus other convenience functions, allowing faster lookups on sorted vectors.
BSGS Bayesian Sparse Group Selection
The integration of Bayesian variable and sparse group variable selection approaches for regression models.
BSGW Bayesian Survival Model using Generalized Weibull Regression
Bayesian survival model using Weibull regression on both scale and shape parameters.
bshazard Nonparametric Smoothing of the Hazard Function
The function estimates the hazard function non parametrically from a survival object (possibly adjusted for covariates). The smoothed estimate is based on B-splines from the perspective of generalized linear mixed models. Left truncated and right censoring data are allowed.
BSL Bayesian Synthetic Likelihood with Graphical Lasso
Bayesian synthetic likelihood (BSL, Price et al. (2018) <doi:10.1080/10618600.2017.1302882>) is an alternative to standard, non-parametric approximate Bayesian computation (ABC). BSL assumes a multivariate normal distribution for the summary statistic likelihood and it is suitable when the distribution of the model summary statistics is sufficiently regular. This package provides a Metropolis Hastings Markov chain Monte Carlo implementation of BSL and BSL with graphical lasso (BSLasso, An et al. (2018) <https://…/> ), which is computationally more efficient when the dimension of the summary statistic is high. Extensions to this package are planned.
BSPADATA Bayesian Proposal to Fit Spatial Econometric Models
The purpose of this package is to fit the three Spatial Econometric Models proposed in Anselin (1988, ISBN:9024737354) in the homoscedastic and the heteroscedatic case. The fit is made through MCMC algorithms and observational working variables approach.
bsplinePsd Bayesian Nonparametric Spectral Density Estimation Using B-Spline Priors
Implementation of a Metropolis-within-Gibbs MCMC algorithm to flexibly estimate the spectral density of a stationary time series. The algorithm updates a nonparametric B-spline prior using the Whittle likelihood to produce pseudo-posterior samples and is based on the work presented by Edwards, Meyer, and Christensen (2017) <arXiv:1707.04878>.
bssm Bayesian Inference of State Space Models
Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo and importance sampling type corrected Markov chain Monte Carlo. Gaussian, Poisson, binomial, or negative binomial observation densities and Gaussian state dynamics, as well as general non-linear Gaussian models are supported.
btb Beyond the Border
Kernel density estimation dedicated to urban geography.
BTdecayLasso Bradley-Terry Model with Exponential Time Decayed Log-Likelihood and Adaptive Lasso
We apply Bradley-Terry Model to estimate teams’ ability in paired comparison data. Exponential Decayed Log-likelihood function is applied for dynamic approximation of current rankings and Lasso penalty is applied for variance reduction and grouping. The main algorithm applies the Augmented Lagrangian Method described by Masarotto and Varin (2012) <doi:10.1214/12-AOAS581>.
btergm Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood
Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs.
BTR Training and Analysing Asynchronous Boolean Models
Tools for inferring asynchronous Boolean models from single-cell expression data.
bucky Bucky’s Archive for Data Analysis in the Social Sciences
Provides functions for various statistical techniques commonly used in the social sciences, including functions to compute clustered robust standard errors, combine results across multiply-imputed data sets, and simplify the addition of robust and clustered robust standard errors. The package was originally developed, in part, to assist porting of replication code from ‘Stata’ and attempts to replicate default options from ‘Stata’ where possible.
BUCSS Bias and Uncertainty Corrected Sample Size
Implements a method of correcting for publication bias and uncertainty when planning sample sizes in a future study from an original study.
buildmer Stepwise Elimination and Term Reordering for Mixed-Effects Regression
Finds the largest possible regression model that will still converge for various types of regression analyses (including mixed models and generalized additive models) and then optionally performs stepwise elimination similar to the forward and backward effect selection methods in SAS, based on the change in log-likelihood, Akaike’s Information Criterion, or the Bayesian Information Criterion.
bulletcp Automatic Groove Identification via Bayesian Changepoint Detection
Provides functionality to automatically detect groove locations via a Bayesian changepoint detection method to be used in the data preprocessing step of forensic bullet matching algorithms. The methods in this package are based on those in Stephens (1994) <doi:10.2307/2986119>. Bayesian changepoint detection will simply be an option in the function from the package ‘bulletxtrctr’ which identifies the groove locations.
BullsEyeR Topic Modelling
Helps in initial processing like converting text to lower case, removing punctuation, numbers, stop words, stemming, sparsity control and term frequency inverse document frequency processing. Helps in recognizing domain or corpus specific stop words. Makes use of ‘ldatunig’ output to pick optimal number of topics for topic modelling. Helps in extracting dominant words or key words that represent the context/topics of the content in each document.
bullwhipgame Bullwhip Effect Demo in Shiny
The bullwhipgame is an educational game that has as purpose the illustration and exploration of the bullwhip effect,i.e, the increase in demand variability along the supply chain. Marchena Marlene (2010) <arXiv:1009.3977>.
bupaR Business Process Analytics in R
Functionalities for process analysis in R. This packages implements an S3-class for event log objects, and related handler functions. Imports related packages for subsetting event data, computation of descriptive statistics, handling of Petri Net objects and visualization of process maps.
bustt Bus and Transit Time Calculations
Calculate and work with time and schedules for bus, train, etc on transit data. Answer questions like: What is the time between any train arrival at 59-Street Columbus Circle on Saturdays What is the time between trains for stops along the A Train on weekdays
bvarsv Bayesian Analysis of a Vector Autoregressive Model with Stochastic Volatility and Time-Varying Parameters
R/C++ implementation of the model proposed by Primiceri (‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies, 2005), with a focus on generating posterior predictive distributions.
BVSNLP Bayesian Variable Selection in High Dimensional Settings using Non-Local Prior
Variable/Feature selection in high or ultra-high dimensional settings has gained a lot of attention recently specially in cancer genomic studies. This package provides a Bayesian approach to tackle this problem, where it exploits mixture of point masses at zero and nonlocal priors to improve the performance of variable selection and coefficient estimation. It performs variable selection for binary response and survival time response datasets which are widely used in biostatistic and bioinformatics community. Benefiting from parallel computing ability, it reports necessary outcomes of Bayesian variable selection such as Highest Posterior Probability Model (HPPM), Median Probability Model (MPM) and posterior inclusion probability for each of the covariates in the model. The option to use Bayesian Model Averaging (BMA) is also part of this package that can be exploited for predictive power measurements in real datasets.
bwd Backward Procedure for Change-Point Detection
Implements a backward procedure for single and multiple change point detection proposed by Shin et al. <arXiv:1812.10107>. The backward approach is particularly useful to detect short and sparse signals which is common in copy number variation (CNV) detection.
BWStest Baumgartner Weiss Schindler Test of Equal Distributions
Performs the ‘Baumgartner-Weiss-Schindler’ two-sample test of equal probability distributions.
bytescircle Statistics About Bytes Contained in a File as a Circle Plot
Shows statistics about bytes contained in a file as a circle graph of deviations from mean in sigma increments. The function can be useful for statistically analyze the content of files in a glimpse: text files are shown as a green centered crown, compressed and encrypted files should be shown as equally distributed variations with a very low CV (sigma/mean), and other types of files can be classified between these two categories depending on their text vs binary content, which can be useful to quickly determine how information is stored inside them (databases, multimedia files, etc).
bzinb Bivariate Zero-Inflated Negative Binomial Model Estimator
Provides a maximum likelihood estimation of Bivariate Zero-Inflated Negative Binomial (BZINB) model or the nested model parameters. Also estimates the underlying correlation of the a pair of count data. See Cho, H., Preisser, J., Liu, C., and Wu, D. (In preparation) for details.

C

c060 Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models
c060 provides additional functions to perform stability selection, model validation and parameter tuning for glmnet models
c2c Compare Two Classifications or Clustering Solutions of Varying Structure
Compare two classifications or clustering solutions that may or may not have the same number of classes, and that might have hard or soft (fuzzy, probabilistic) membership. Calculate various metrics to assess how the clusters compare to each other. The calculations are simple, but provide a handy tool for users unfamiliar with matrix multiplication. This package is not geared towards traditional accuracy assessment for classification/ mapping applications – the motivating use case is for comparing a probabilistic clustering solution to a set of reference or existing class labels that could have any number of classes (that is, without having to degrade the probabilistic clustering to hard classes).
c3 C3.js’ Chart Library
Create interactive charts with the ‘C3.js’ <http://…/> charting library. All plot types in ‘C3.js’ are available and include line, bar, scatter, and mixed geometry plots. Plot annotations, labels and axis are highly adjustable. Interactive web based charts can be embedded in R Markdown documents or Shiny web applications.
C443 See a Forest for the Trees
Getting insight into a forest of classification trees, by calculating similarities between the trees, and subsequently clustering them. Each cluster is represented by it’s most central cluster member. Sies, A & Van Mechelen, I. (paper submitted for publication).
CA3variants Three-Way Correspondence Analysis Variants
Provides three variants of three-way correspondence analysis (ca) three-way symmetrical ca, three-way non-symmetrical ca, three-way ordered symmetrical ca.
CADStat Provides a GUI to Several Statistical Methods
Using JGR, provides a GUI to several statistical methods – scatterplot, boxplot, linear regression, generalized linear regression, quantile, regression, conditional probability calculations, and regression trees.
caesar Encrypts and Decrypts Strings
Encrypts and decrypts strings using either the Caesar cipher or a pseudorandom number generation (using set.seed()) method.
CAinterprTools Graphical Aid in Correspondence Analysis Interpretation and Significance Testings
Allows to plot a number of information related to the interpretation of Correspondence Analysis’ results. It provides the facility to plot the contribution of rows and columns categories to the principal dimensions, the quality of points display on selected dimensions, the correlation of row and column categories to selected dimensions, etc. It also allows to assess which dimension(s) is important for the data structure interpretation by means of different statistics and tests. The package also offers the facility to plot the permuted distribution of the table total inertia as well as of the inertia accounted for by pairs of selected dimensions. Different facilities are also provided that aim to produce interpretation-oriented scatterplots. Reference: <doi:10.1016/j.softx.2015.07.001>.
CAISEr Comparison of Algorithms with Iterative Sample Size Estimation
Functions for performing experimental comparisons of algorithms using adequate sample sizes for power and accuracy.
calACS Count All Common Subsequences
Count all common subsequences between 2 string sequences, with items separated by the same delimiter. The first string input is a length- one vector, the second string input can be a vector or list containing multiple strings. Algorithm from Wang, H. All common subsequences (2007) IJCAI International Joint Conference on Artificial Intelligence, pp. 635-640.
Calculator.LR.FNs Calculator for LR Fuzzy Numbers
Arithmetic operations scalar multiplication, addition, subtraction, multiplication and division of LR fuzzy numbers (which are on the basis of Zadeh extension principle) have a complicate form for using in fuzzy Statistics, fuzzy Mathematics, machine learning, fuzzy data analysis and etc. Calculator for LR Fuzzy Numbers package, i.e. Calculator.LR.FNs package, relieve and aid applied users to achieve a simple and closed form for some complicated operator based on LR fuzzy numbers and also the user can easily draw the membership function of the obtained result by this package.
calcWOI Calculates the Wavelet-Based Organization Index
Calculates the original wavelet-based organization index, the modified wavelet-based organization index and the local wavelet-based organization index of an arbitrary 2D array using Wavelet Transform of Eckley et al (2010) (<doi:10.1111/j.1467-9876.2009.00721.x>) and Eckley and Nason (2011) (<doi:10.18637/jss.v043.i03>).
calendar Create, Read, Write, and Work with ‘iCalander’ Files, Calendars and Scheduling Data
Provides function to create, read, write, and work with ‘iCalander’ files (which typically have ‘.ics’ or ‘.ical’ extensions), and the scheduling data, calendars and timelines of people, organisations and other entities that they represent. ‘iCalendar’ is an open standard for exchanging calendar and scheduling information between users and computers, described at <https://…/>.
CALF Coarse Approximation Linear Function
Contains a greedy algorithm for coarse approximation linear function.
CalibrateSSB Weighting and Estimation for Panel Data with Non-Response
Function to calculate weights and estimates for panel data with non-response.
CalibratR Mapping ML Scores to Calibrated Predictions
Transforms your uncalibrated Machine Learning scores to well-calibrated prediction estimates that can be interpreted as probability estimates. The implemented BBQ (Bayes Binning in Quantiles) model is taken from Naeini (2015, ISBN:0-262-51129-0).
CaliCo Code Calibration in a Bayesian Framework
Calibration of every computational code. It uses a Bayesian framework to rule the estimation. With a new data set, the prediction will create a prevision set taking into account the new calibrated parameters. The choices between several models is also available. The methods are described in the paper Carmassi et al. (2018) <arXiv:1801.01810>.
callr Call R from R
It is sometimes useful to perform a computation in a separate R process, without affecting the current R process at all. This packages does exactly that.
calpassapi R Interface to Access CalPASS API
Implements methods for querying data from CalPASS using its API. CalPASS Plus. MMAP API V1. <https://…/index.html>.
CAM Causal Additive Model (CAM)
The code takes an n x p data matrix and fits a Causal Additive Model (CAM) for estimating the causal structure of the underlying process. The output is a p x p adjacency matrix (a one in entry (i,j) indicates an edge from i to j). Details of the algorithm can be found in: P. Bühlmann, J. Peters, J. Ernest: “CAM: Causal Additive Models, high-dimensional order search and penalized regression”, Annals of Statistics 42:2526-2556, 2014.
canvasXpress Visualization Package for CanvasXpress in R
Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See <http://canvasxpress.org> for more information.
cap Covariate Assisted Principal (CAP) Regression for Covariance Matrix Outcomes
Performs Covariate Assisted Principal (CAP) Regression for covariance matrix outcomes. The method identifies the optimal projection direction which maximizes the log-likelihood function of the log-linear heteroscedastic regression model in the projection space. See Zhao et al. (2018), Covariate Assisted Principal Regression for Covariance Matrix Outcomes, <doi:10.1101/425033> for details.
capitalR Capital Budgeting Analysis, Annuity Loan Calculations and Amortization Schedules
Provides Capital Budgeting Analysis functionality and the essential Annuity loan functions. Also computes Loan Amortization Schedules including schedules with irregular payments.
capn Capital Asset Pricing for Nature
Implements approximation methods for natural capital asset prices suggested by Fenichel and Abbott (2014) <doi:10.1086/676034> in Journal of the Associations of Environmental and Resource Economists (JAERE), Fenichel et al. (2016) <doi:10.1073/pnas.1513779113> in Proceedings of the National Academy of Sciences (PNAS), and Yun et al. (2017) in PNAS (accepted), and their extensions: creating Chebyshev polynomial nodes and grids, calculating basis of Chebyshev polynomials, approximation and their simulations for: V-approximation (single and multiple stocks, PNAS), P-approximation (single stock, PNAS), and Pdot-approximation (single stock, JAERE). Development of this package was generously supported by the Knobloch Family Foundation.
caRamel Automatic Calibration by Evolutionary Multi Objective Algorithm
Multi-objective optimizer initially developed for the calibration of hydrological models. The algorithm is a hybrid of the MEAS algorithm (Efstratiadis and Koutsoyiannis (2005) <doi:10.13140/RG.2.2.32963.81446>) by using the directional search method based on the simplexes of the objective space and the epsilon-NGSA-II algorithm with the method of classification of the parameter vectors archiving management by epsilon-dominance (Reed and Devireddy <doi:10.1142/9789812567796_0004>).
carData Companion to Applied Regression Data Sets
Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage (forthcoming).
careless Procedures for Computing Indices of Careless Responding
When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations. The ‘R’ package ‘careless’ provides solutions designed to detect such careless / insufficient effort responses by allowing easy calculation of indices proposed in the literature. It currently supports the calculation of longstring, even-odd consistency, psychometric synonyms/antonyms, Mahalanobis distance, and intra-individual response variability (also termed inter-item standard deviation). For a review of these methods, see Curran (2016) <doi:10.1016/j.jesp.2015.07.006>.
caret Classification and Regression Training
Misc functions for training and plotting classification and regression models.
caretEnsemble Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList, caretEnsemble, and caretStack. caretList is a convenience function for fitting multiple caret::train models to the same dataset. caretEnsemble will make a linear combination of these models using greedy forward selection, and caretStack will make linear or non-linear combinations of these models, using a caret::train model as a meta-model.
GitHub
carfima Continuous-Time Fractionally Integrated ARMA Process for Irregularly Spaced Long-Memory Time Series Data
We provide a toolbox to fit a continuous-time fractionally integrated ARMA process (CARFIMA) on univariate and irregularly spaced time series data via frequentist or Bayesian machinery. A general-order CARFIMA(p, H, q) model for p>q is specified in Tsai and Chan (2005)<doi:10.1111/j.1467-9868.2005.00522.x> and it involves p+q+2 unknown model parameters, i.e., p AR parameters, q MA parameters, Hurst parameter H, and process uncertainty (standard deviation) sigma. The package produces their maximum likelihood estimates and asymptotic uncertainties using a global optimizer called the differential evolution algorithm. It also produces their posterior distributions via Metropolis within a Gibbs sampler equipped with adaptive Markov chain Monte Carlo for posterior sampling. These fitting procedures, however, may produce numerical errors if p>2. The toolbox also contains a function to simulate discrete time series data from CARFIMA(p, H, q) process given the model parameters and observation times.
carpenter Build Common Tables of Summary Statistics for Reports
Mainly used to build tables that are commonly presented for bio-medical/health research, such as basic characteristic tables or descriptive statistics.
carrier Isolate Functions for Remote Execution
Sending functions to remote processes can be wasteful of resources because they carry their environments with them. With the carrier package, it is easy to create functions that are isolated from their environment. These isolated functions, also called crates, print at the console with their total size and can be easily tested locally before being sent to a remote.
CARRoT Predicting Categorical and Continuous Outcomes Using Rule of Ten
Predicts categorical or continuous outcomes while concentrating on four key points. These are Cross-validation, Accuracy, Regression and Rule of Ten (CARRoT). It performs the cross-validation specified number of times by partitioning the input into training and test set and fitting linear/multinomial/binary regression models to the training set. All regression models satisfying a rule of ten events per variable are fitted and the ones with the best predictive power are given as an output. Best predictive power is understood as highest accuracy in case of binary/multinomial outcomes, smallest absolute and relative errors in case of continuous outcomes. For binary case there is also an option of finding a regression model which gives the highest AUROC (Area Under Recever Operating Curve) value. The option of parallel toolbox is also available. Methods are described in Peduzzi et al. (1996) <doi:10.1016/S0895-4356(96)00236-3> and Rhemtulla et al. (2012) <doi:10.1037/a0029315>.
CARS Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference
It implements the CARS procedure, which is a two-sample multiple testing procedure that utilizes an additional auxiliary variable to capture the sparsity information, hence improving power. The CARS procedure is shown to be asymptotically valid and optimal for FDR control. For more information, please see the website <http://…/CARS.html> and the accompanying paper.
carSurv Correlation-Adjusted Regression Survival (CARS) Scores
Contains functions to estimate the Correlation-Adjusted Regression Survival (CARS) Scores. The method is described in Welchowski, T. and Zuber, V. and Schmid, M., (2018), Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection, <arXiv:1802.08178>.
cartograflow Filtering Matrix for Flow Mapping
Functions to prepare and filter an origin-destination matrix for thematic flow mapping purposes. This comes after Bahoken, Francoise (2016), Mapping flow matrix a contribution, PhD in Geography – Territorial sciences. See Bahoken (2017) <doi:10.4000/netcom.2565>.
cartogram Create Cartograms with R
Construct a continuous area cartogram by a rubber sheet distortion algorithm.
Cartographer Interactive Maps for Data Exploration
Cartographer provides interactive maps in R Markdown documents or at the R console. These maps are suitable for data exploration. This package is an R wrapper around Elijah Meeks’s d3-carto-map and d3.js, using htmlwidgets for R.
cartography Thematic Cartography
Create and integrate maps in your R workflow. This package allows various cartographic representations: proportional symbols, chroropleth, typology, flows, discontinuities… It also proposes some additional useful features: cartographic palettes, layout (scale, north arrow, title…), labels, legends, access to cartographic API…
cartools Tools for Understanding Highway Performance
Analytical tools are designed to help people understand the complex relationships associated with freeway performance and traffic breakdown. Emphasis is placed on: (1) Traffic noise or volatility; (2) Driver behavior and safety; and (3) Stochastic modeling, models that explain breakdown and performance.
carx Censored Autoregressive Model with Exogenous Covariates
A censored time series class is designed. An estimation procedure is implemented to estimate the Censored AutoRegressive time series with eXogenous covariates (CARX), assuming normality of the innovations. Some other functions that might be useful are also included.
casebase Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression
Implements the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>, for fitting flexible hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. From the fitted hazard function, cumulative incidence, risk functions of time, treatment and profile can be derived. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots.
CAST caret’ Applications for Spatial-Temporal Models
Supporting functionality to run ‘caret’ with spatial or spatial-temporal data. ‘caret’ is a frequently used package for model training and prediction using machine learning. This package includes functions to improve spatial-temporal modelling tasks using ‘caret’. It prepares data for Leave-Location-Out and Leave-Time-Out cross-validation which are target-oriented validation strategies for spatial-temporal models. To decrease overfitting and improve model performances, the package implements a forward feature selection that selects suitable predictor variables in view to their contribution to the target-oriented performance.
catch Covariate-Adjusted Tensor Classification in High-Dimensions
Performs classification and variable selection on high-dimensional tensors (multi-dimensional arrays) after adjusting for additional covariates (scalar or vectors) as CATCH model in Pan, Mai and Zhang (2018) <arXiv:1805.04421>. The low-dimensional covariates and the high-dimensional tensors are jointly modeled to predict a categorical outcome in a multi-class discriminant analysis setting. The Covariate-Adjusted Tensor Classification in High-dimensions (CATCH) model is fitted in two steps: (1) adjust for the covariates within each class; and (2) penalized estimation with the adjusted tensor using a cyclic block coordinate descent algorithm. The package can provide a solution path for tuning parameter in the penalized estimation step. Special case of the CATCH model includes linear discriminant analysis model and matrix (or tensor) discriminant analysis without covariates.
catcont Test for and Identify Categorical or Continuous Values
Methods and utilities for classifying vectors as categorical or continuous.
catdap Categorical Data Analysis Program Package
Categorical data analysis program package.
cate High Dimensional Factor Analysis and Confounder Adjusted Testing and Estimation
Provides several methods for factor analysis in high dimension (both n,p >> 1) and methods to adjust for possible confounders in multiple hypothesis testing.
CatEncoders Encoders for Categorical Variables
Contains some commonly used categorical variable encoders, such as ‘LabelEncoder’ and ‘OneHotEncoder’. Inspired by the encoders implemented in python ‘sklearn.preprocessing’ package (see <http://…/preprocessing.html> ).
CATkit Chronomics Analysis Toolkit (CAT): Analyze Periodicity
Performs analysis of sinusoidal rhythms in time series data: actogram, smoothing, autocorrelation, crosscorrelation, several flavors of cosinor.
CatPredi Optimal Categorisation of Continuous Variables in Prediction Models
Allows the user to categorise a continuous predictor variable in a logistic or a Cox proportional hazards regression setting, by maximising the discriminative ability of the model. I Barrio, I Arostegui, MX Rodriguez-Alvarez, JM Quintana (2015) <doi:10.1177/0962280215601873>. I Barrio, MX Rodriguez-Alvarez, L Meira-Machado, C Esteban, I Arostegui (2017) <https://…/41.1.3.barrio-etal.pdf>.
catSurv Computerized Adaptive Testing for Survey Research
Provides methods of computerized adaptive testing for survey researchers. Includes functionality for data fit with the classic item response methods including the latent trait model, Birnbaum`s three parameter model, the graded response, and the generalized partial credit model. Additionally, includes several ability parameter estimation and item selection routines. During item selection, all calculations are done in compiled C++ code.
CATT The Cochran-Armitage Trend Test
The Cochran-Armitage trend test can be applied to a two by k contingency table. The test statistic (Z) and p-value will be reported. A linear trend in the frequencies will be calculated, because the weights (0,1,2) will be used by default.
CATTexact Computation of the p-Value for the Exact Conditional Cochran-Armitage Trend Test
Provides functions for computing the one-sided p-values of the Cochran-Armitage trend test statistic for the asymptotic and the exact conditional test. The computation of the p-value for the exact test is performed using an algorithm following an idea by Mehta, et al. (1992) <doi:10.2307/1390598>.
CausalFX Methods for Estimating Causal Effects from Observational Data
Estimate causal effects of one variable on another, currently for binary data only. Methods include instrumental variable bounds, adjustment by a given covariate set, adjustment by an induced covariate set using a variation of the PC algorithm, and an effect bounding method (the Witness Protection Program) based on covariate adjustment with observable independence constraints.
CausalImpact An R package for causal inference in time series
This R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred. As with all approaches to causal inference on non-experimental data, valid conclusions require strong assumptions. The CausalImpact package, in particular, assumes that the outcome time series can be explained in terms of a set of control time series that were themselves not affected by the intervention. Furthermore, the relation between treated series and control series is assumed to be stable during the post-intervention period. Understanding and checking these assumptions for any given application is critical for obtaining valid conclusions.
causalMGM Causal Learning of Mixed Graphical Models
Allows users to learn undirected and directed (causal) graphs over mixed data types (i.e., continuous and discrete variables). To learn a directed graph over mixed data, it first calculates the undirected graph (Sedgewick et al, 2016) and then it uses local search strategies to prune-and-orient this graph (Sedgewick et al, 2017). AJ Sedgewick, I Shi, RM Donovan, PV Benos (2016) <doi:10.1186/s12859-016-1039-0>. AJ Sedgewick, JD Ramsey, P Spirtes, C Glymour, PV Benos (2017) <arXiv:1704.02621>.
causalpie An R Package for easily creating and visualizing sufficient-component cause models
causalpie is an R package for creating tidy sufficient-component causal models. Create and analyze sufficient causes and plot them easily in ggplot2.
causalweight Causal Inference Based on Weighting Estimators
Various estimation methods for causal inference based on inverse probability weighting. Specifically, the package includes methods for estimating average treatment effects as well as direct and indirect effects in causal mediation analysis. The models refer to the studies of Frölich (2007) <doi:10.1016/j.jeconom.2006.06.004>, Huber (2012) <doi:10.3102/1076998611411917>, Huber (2014) <doi:10.1080/07474938.2013.806197>, Huber (2014) <doi:10.1002/jae.2341>, and Frölich and Huber (2017) <doi:10.1111/rssb.12232>.
cbanalysis Coffee Break Descriptive Analysis
Contains function which subsets the input data frame based on the variable types and returns list of data frames.
cbar Contextual Bayesian Anomaly Detection in R
Detect contextual anomalies in time-series data with Bayesian data analysis. It focuses on determining a normal range of target value, and provides simple-to-use functions to abstract the outcome.
CBCgrps Compare Baseline Characteristics Between Groups
Compare baseline characteristics between two groups. The variables being compared can be factor and numeric variables. The function will automatically judge the type and distribution of the variables, and make statistical description and bivariate analysis.
CBDA Compressive Big Data Analytics
Classification performed on Big Data. It uses concepts from compressive sensing, and implements ensemble predictor (i.e., ‘SuperLearner’) and knockoff filtering as the main machine learning and feature mining engines.
cbinom Continuous Analog of a Binomial Distribution
Implementation of the d/p/q/r family of functions for a continuous analog to the standard discrete binomial with continuous size parameter and continuous support with x in [0, size + 1], following Ilienko (2013) <arXiv:1303.5990>.
cbird Clustering of Multivariate Binary Data with Dimension Reduction via L1-Regularized Likelihood Maximization
The clustering of binary data with reducing the dimensionality (CLUSBIRD) proposed by Yamamoto and Hayashi (2015) <doi:10.1016/j.patcog.2015.05.026>.
CBPS Covariate Balancing Propensity Score
Implements the covariate balancing propensity score (CBPS) proposed by Imai and Ratkovic (2014) <DOI:10.1111/rssb.12027>. The propensity score is estimated such that it maximizes the resulting covariate balance as well as the prediction of treatment assignment. The method, therefore, avoids an iteration between model fitting and balance checking. The package also implements several extensions of the CBPS beyond the cross-sectional, binary treatment setting. The current version implements the CBPS for longitudinal settings so that it can be used in conjunction with marginal structural models from Imai and Ratkovic (2015) <DOI:10.1080/01621459.2014.956872>, treatments with three- and four- valued treatment variables, continuous-valued treatments from Fong, Hazlett, and Imai (2015) <http://…/CBGPS.pdf>, and the situation with multiple distinct binary treatments administered simultaneously. In the future it will be extended to other settings including the generalization of experimental and instrumental variable estimates. Recently we have added the optimal CBPS which chooses the optimal balancing function and results in doubly robust and efficient estimator for the treatment effect as well as high dimensional CBPS when a large number of covariates exist.
cbsem Simulation, Estimation and Segmentation of Composite Based Structural Equation Models
The composites are linear combinations of their indicators in composite based structural equation models. Structural models are considered consisting of two blocks. The indicators of the exogenous composites are named by X, the indicators of the endogenous by Y. Reflective relations are given by arrows pointing from the composite to their indicators. Their values are called loadings. In a reflective-reflective scenario all indicators have loadings. Arrows are pointing to their indicators only from the endogenous composites in the formative-reflective scenario. There are no loadings at all in the formative-formative scenario. The covariance matrices are computed for these three scenarios. They can be used to simulate these models. These models can also be estimated and a segmentation procedure is included as well.
cccp Cone Constrained Convex Problems
Routines for solving convex optimization problems with cone constraints by means of interior-point methods. The implemented algorithms are partially ported from CVXOPT, a Python module for convex optimization (see http://cvxopt.org for more information ).
ccdrAlgorithm CCDr Algorithm for Learning Sparse Gaussian Bayesian Networks
Implementation of the CCDr (Concave penalized Coordinate Descent with reparametrization) structure learning algorithm as described in Aragam and Zhou (2015) <http://…/aragam15a.html>. This is a fast, score-based method for learning Bayesian networks that uses sparse regularization and block-cyclic coordinate descent.
ccfa Continuous Counterfactual Analysis
Contains methods for computing counterfactuals with a continuous treatment variable as in Callaway and Huang (2017) <https://ssrn.com/abstract=3078187>. In particular, the package can be used to calculate the expected value, the variance, the interquantile range, the fraction of observations below or above a particular cutoff, or other user-supplied functions of an outcome of interest conditional on a continuous treatment. The package can also be used for computing these same functionals after adjusting for differences in covariates at different values of the treatment. Further, one can use the package to conduct uniform inference for each parameter of interest across all values of the treatment, uniformly test whether adjusting for covariates makes a difference at any value of the treatment, and test whether a parameter of interest is different from its average value at an value of the treatment.
CCMnet Simulate Congruence Class Model for Networks
Tools to simulate networks based on Congruence Class models.
ccrs Correct and Cluster Response Style Biased Data
Functions for performing Correcting and Clustering response-style-biased preference data (CCRS). The main functions are correct.RS() for correcting for response styles, and ccrs() for simultaneously correcting and content-based clustering. The procedure begin with making rank-ordered boundary data from the given preference matrix using a function called create.ccrsdata(). Then in correct.RS(), the response style is corrected as follows: the rank-ordered boundary data are smoothed by I-spline functions, the given preference data are transformed by the smoothed functions. The resulting data matrix, which is considered as bias-corrected data, can be used for any data analysis methods. If one wants to cluster respondents based on their indicated preferences (content-based clustering), ccrs() can be applied to the given (response-style-biased) preference data, which simultaneously corrects for response styles and clusters respondents based on the contents. Also, the correction result can be checked by plot.crs() function.
cdata Wrappers for ‘tidyr::gather()’ and ‘tidyr::spread()’
Supplies deliberately verbose wrappers for ‘tidyr::gather()’ and ‘tidyr::spread()’, and an explanatory vignette. Useful for training and for enforcing preconditions.
cdcsis Conditional Distance Correlation and Its Related Feature Screening Method
Gives conditional distance correlation and performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data. The conditional distance correlation is a novel conditional dependence measurement of two random variables given a third variable. The conditional distance correlation sure independence screening is used for screening variables in ultrahigh dimensional setting.
cdfquantreg Quantile Regression for Random Variables on the Unit Interval
Employs a two-parameter family of distributions for modelling random variables on the (0, 1) interval by applying the cumulative distribution function (cdf) of one parent distribution to the quantile function of another.
cdparcoord Top Frequency-Based Parallel Coordinates
Parallel coordinate plotting with resolutions for large data sets and missing values.
CDVineCopulaConditional Sampling from Conditional C- and D-Vine Copulas
Provides tools for sampling from a conditional copula density decomposed via Pair-Copula Constructions as C- or D- vine. Here, the vines which can be used for such sampling are those which sample as first the conditioning variables (when following the sampling algorithms shown in Aas et al. (2009) <DOI:10.1016/j.insmatheco.2007.02.001>). The used sampling algorithm is presented and discussed in Bevacqua et al. (2017) <DOI:10.5194/hess-2016-652>, and it is a modified version of that from Aas et al. (2009) <DOI:10.1016/j.insmatheco.2007.02.001>. A function is available to select the best vine (based on information criteria) among those which allow for such conditional sampling. The package includes a function to compare scatterplot matrices and pair-dependencies of two multivariate datasets.
CEAmarkov Cost-Effectiveness Analysis using Markov Models
Provides an accurate, fast and easy way to perform cost-effectiveness analyses. This package can be used to validate results generated using different methods and can help create a standard for cost-effectiveness analyses, that will help compare results from different studies.
CEC Cross-Entropy Clustering
Cross-Entropy Clustering (CEC) divides the data into Gaussian type clusters. It performs the automatic reduction of unnecessary clusters, while at the same time allows the simultaneous use of various type Gaussian mixture models.
ceg Chain Event Graph
Create and learn Chain Event Graph (CEG) models using a Bayesian framework. It provides us with a Hierarchical Agglomerative algorithm to search the CEG model space. The package also includes several facilities for visualisations of the objects associated with a CEG. The CEG class can represent a range of relational data types, and supports arbitrary vertex, edge and graph attributes. A Chain Event Graph is a tree-based graphical model that provides a powerful graphical interface through which domain experts can easily translate a process into sequences of observed events using plain language. CEGs have been a useful class of graphical model especially to capture context-specific conditional independences. References: Collazo R, Gorgen C, Smith J. Chain Event Graph. CRC Press, ISBN 9781498729604, 2018 (forthcoming); and Barday LM, Collazo RA, Smith JQ, Thwaites PA, Nicholson AE. The Dynamic Chain Event Graph. Electronic Journal of Statistics, 9 (2) 2130-2169 <doi:10.1214/15-EJS1068>.
cellWise Analyzing Data with Cellwise Outliers
Tools for detecting cellwise outliers and robust methods to analyze data which may contain them.
cems Conditional Expectation Manifolds
Conditional expectation manifolds are an approach to compute principal curves and surfaces.
cenGAM Censored Regression with Smooth Terms
Implementation of Tobit type I and type II families for censored regression using the ‘mgcv’ package, based on methods detailed in Woods (2016) <doi:10.1080/01621459.2016.1180986>.
censorcopula Estimate Parameter of Bivariate Copula
Implement an interval censor method to break ties when using data with ties to fitting a bivariate copula.
CensSpatial Censored Spatial Models
It fits linear regression models for censored spatial data. It provides different estimation methods as the SAEM (Stochastic Approximation of Expectation Maximization) algorithm and seminaive that uses Kriging prediction to estimate the response at censored locations and predict new values at unknown locations. It also offers graphical tools for assessing the fitted model.
centiserve Find Graph Centrality Indices
Calculates centrality indices additional to the ‘igraph’ package centrality functions.
centralplot Show the Strength of Relationships Between Centre and Peripheral Items
The degree of correlation between centre and peripheral items are shown by the length of the line between them. You can self-define the length by inputing the ‘distance’ parameter. For example, you can input (1 – Pearson’s correlation coefficient) as ‘distance’ so that the stronger the correlation between centre and peripheral item, the nearer they will be in this plot. Also, If you do a hypothesis test and the null hypothesis is centre and peripheral items are the same, you can input -log(P) as distance. To sum up, the stronger the correlation between centre and peripheral is, the smaller the ‘distance’ parameter should be. Due to its high degree of freedom, it can be applied to many different circumstance.
cents Censored time series
Fit censored time series
CEoptim Cross-Entropy R Package for Optimization
Optimization solver based on the Cross-Entropy method.
CepLDA Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability
Performs cepstral based discriminant analysis of groups of time series when there exists Variability in power spectra from time series within the same group as described in R.T. Krafty (2016) ‘Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability’ Journal of Time Series Analysis.
cetcolor CET Perceptually Uniform Colour Maps
Collection of perceptually uniform colour maps made by Peter Kovesi (2015) ‘Good Colour Maps: How to Design Them’ <arXiv:1509.03700> at the Centre for Exploration Targeting (CET).
ceterisParibus Ceteris Paribus Plots (What-If Plots) for a Single Observation
Ceteris Paribus Plots (What-If Plots) are designed to present model responses around a single point in a feature space. For example around a single prediction for an interesting observation. Plots are designed to work in a model-agnostic fashion, they are working for any predictive Machine Learning model and allow for model comparisons. The Ceteris Paribus Plots supplement the Break Down Plots from ‘breakDown’ package.
cfa Configural Frequency Analysis (CFA)
Analysis of configuration frequencies for simple and repeated measures, multiple-samples CFA, hierarchical CFA, bootstrap CFA, functional CFA, Kieser-Victor CFA, and Lindner’s test using a conventional and an accelerated algorithm.
CFC Cause-Specific Framework for Competing-Risk Analysis
Functions for combining survival curves of competing risks to produce cumulative incidence and event-free probability functions, and for summarizing and plotting the results. Survival curves can be either time-denominated or probability-denominated. Point estimates as well as Bayesian, sample-based representations of survival can utilize this framework.
cfma Causal Functional Mediation Analysis
Performs causal functional mediation analysis (CFMA) for functional treatment, functional mediator, and functional outcome. This package includes two functional mediation model types: (1) a concurrent mediation model and (2) a historical influence mediation model. See Zhao et al. (2018), Functional Mediation Analysis with an Application to Functional Magnetic Resonance Imaging Data, <arXiv:1805.06923> for details.
CGE Computing General Equilibrium
Developing general equilibrium models, computing general equilibrium and simulating economic dynamics with structural dynamic models in LI (2019, ISBN: 9787521804225) ‘General Equilibrium and Structural Dynamics: Perspectives of New Structural Economics. Beijing: Economic Science Press’.
cghRA Array CGH Data Analysis and Visualization
Provides functions to import data from Agilent CGH arrays and process them according to the cghRA workflow. Implements several algorithms such as WACA, STEPS and cnvScore and an interactive graphical interface.
cglasso L1-Penalized Censored Gaussian Graphical Models
The l1-penalized censored Gaussian graphical model (cglasso) is an extension of the graphical lasso estimator developed to handle datasets with censored observations. An EM-like algorithm is implemented to estimate the parameters of the censored Gaussian graphical models.
CGP Composite Gaussian process models
Fit composite Gaussian process (CGP) models as described in Ba and Joseph (2012) ‘Composite Gaussian Process Models for Emulating Expensive Functions’, Annals of Applied Statistics. The CGP model is capable of approximating complex surfaces that are not second-order stationary. Important functions in this package are CGP, print.CGP, summary.CGP, predict.CGP and plotCGP.
CGPfunctions Powell Miscellaneous Functions for Teaching and Learning Statistics
Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not “new” methods but rather wrappers around either base R or other packages. Currently contains: ‘Plot2WayANOVA’ which as the name implies conducts a 2 way ANOVA and plots the results using ‘ggplot2’. ‘neweta’ which is a helper function that appends the results of a Type II eta squared calculation onto a classic ANOVA table. Mode which finds the modal value in a vector of data. ‘SeeDist’ which wraps around ‘ggplot2’ to provide visualizations of univariate data.
cgwtools Miscellaneous Tools
A set of tools the author has found useful for performing quick observations or evaluations of data, including a variety of ways to list objects by size, class, etc. Several other tools mimic Unix shell commands, including ‘head’, ‘tail’ ,’pushd’ ,and ‘popd’. The functions ‘seqle’ and ‘reverse.seqle’ mimic the base ‘rle’ but can search for linear sequences. The function ‘splatnd’ allows the user to generate zero-argument commands without the need for ‘makeActiveBinding’ .
chandwich Chandler-Bate Sandwich Loglikelihood Adjustment
Performs adjustments of a user-supplied independence loglikelihood function using a robust sandwich estimator of the parameter covariance matrix, based on the methodology in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>. This can be used for cluster correlated data when interest lies in the parameters of the marginal distributions or for performing inferences that are robust to certain types of model misspecification. Functions for profiling the adjusted loglikelihoods are also provided, as are functions for calculating and plotting confidence intervals, for single model parameters, and confidence regions, for pairs of model parameters.
changepoint An R package for changepoint analysis
Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean, cpt.var, cpt.meanvar functions should be your first point of call.
changepoint.np Methods for Nonparametric Changepoint Detection
Implements the multiple changepoint algorithm PELT with a nonparametric cost function based on the empirical distribution of the data. The cpt.np() function should be your first point of call. This package is an extension to the \code{changepoint} package which uses parametric changepoint methods. For further information on the methods see the documentation for \code{changepoint}.
changepointsHD Change-Point Estimation for Expensive and High-Dimensional Models
This implements the methods developed in, L. Bybee and Y. Atchade. (2017) <arXiv:1707.04306>. Contains a series of methods for estimating change-points given user specified black-box models. The methods include binary segmentation for multiple change-point estimation. For estimating each individual change-point the package includes simulated annealing, brute force, and, for Gaussian graphical models, an application specific rank-one update implementation. Additionally, code for estimating Gaussian graphical models is included. The goal of this package is to allow for the efficient estimation of change-points in complicated models with high dimensional data.
changepointsVar Change-Points Detections for Changes in Variance
Detection of change-points for variance of heteroscedastic Gaussian variables with piecewise constant variance function. Adelfio, G. (2012), Change-point detection for variance piecewise constant models, Communications in Statistics, Simulation and Computation, 41:4, 437-448, <doi:10.1080/03610918.2011.592248>.
ChangepointTesting Change Point Estimation for Clustered Signals
A multiple testing procedure for clustered alternative hypotheses. It is assumed that the p-values under the null hypotheses follow U(0,1) and that the distributions of p-values from the alternative hypotheses are stochastically smaller than U(0,1). By aggregating information, this method is more sensitive to detecting signals of low magnitude than standard methods. Additionally, sporadic small p-values appearing within a null hypotheses sequence are avoided by averaging on the neighboring p-values.
changer Change R Package Name
Changing the name of an existing R package is annoying but common task especially in the early stages of package development. This package (mostly) automates this task.
ChannelAttributionApp Shiny Web Application for the Multichannel Attribution Problem
Shiny Web Application for the Multichannel Attribution Problem. It is basically a user-friendly graphical interface for running and comparing all the attribution models in package ‘ChannelAttribution’. For customizations or interest in other statistical methodologies for web data analysis please contact <davide.altomare@gmail.com>.
Chaos01 0-1 Test for Chaos
Computes and plot the results of the 0-1 test for chaos proposed by Gottwald and Melbourne (2004) <DOI:10.1137/080718851>. The algorithm is available in parallel for the independent values of parameter c.
CharFun Numerical Computation Cumulative Distribution Function and Probability Density Function from Characteristic Function
The Characteristic Functions Toolbox (CharFun) consists of a set of algorithms for evaluating selected characteristic functions and algorithms for numerical inversion of the (combined and/or compound) characteristic functions, used to evaluate the probability density function (PDF) and the cumulative distribution function (CDF).
charlatan Make Fake Data
Make fake data, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers (‘DOIs’), jobs, phone numbers, ‘DNA’ sequences, doubles and integers from distributions and within a range.
chartql Simplified Language for Plots and Charts
Provides a very simple syntax for the user to generate custom plot(s) without having to remember complicated ‘ggplot2’ syntax. The ‘chartql’ package uses ‘ggplot2’ and manages all the syntax complexities internally. As an example, to generate a bar chart of company sales faceted by product category further faceted by season of the year, we simply write: ‘CHART bar X category, season Y sales’.
checkarg Check the Basic Validity of a (Function) Argument
Utility functions that allow checking the basic validity of a function argument or any other value, including generating an error and assigning a default in a single line of code. The main purpose of the package is to provide simple and easily readable argument checking to improve code robustness.
checkLuhn Checks if a Number is Valid Using the Luhn Algorithm
Confirms if the number is Luhn compliant. Can check if credit card, IMEI number or any other Luhn based number is correct. For more info see: <https://…/Luhn_algorithm>.
checkpoint Install Packages from Snapshots on the Checkpoint Server for Reproducibility
The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint() function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint’s checkpoint() can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) we refresh the Austria CRAN mirror, on the “Managed R Archived Network” server (http://mran.revolutionanalytics.com ). Immediately after completion of the rsync mirror process, we take a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17.
checkr Check Object Classes, Values, Names and Dimensions
Checks the classes, values, names and dimensions of scalar, vectors, lists and data frames. Issues an informative error (or warning) if checks fail. Otherwise it returns the original object allowing it to be used in pipes.
cheese Tools for Intuitive and Flexible Statistical Analysis Workflows
Contains flexible and intuitive functions to assist in carrying out tasks in a statistical analysis and to get from the raw data to presentation-ready results. A user-friendly interface is used in specialized functions that are aimed at common tasks such as building a univariate descriptive table for variables in a dataset. These high-level functions are built on a collection of low(er)-level functions that may be useful for aspects of a custom statistical analysis workflow or for general programming use.
CHFF Closest History Flow Field Forecasting for Bivariate Time Series
The software matches the current history to the closest history in a time series to build a forecast.
chi2x3way Chi-Squared and Tau Index Partitions for Three-Way Contingency Tables
Provides two index partitions for three-way contingency tables: partition of the association measure chi-squared and of the predictability index tau under several representative hypotheses about the expected frequencies (hypothesized probabilities).
chicane Capture Hi-C Analysis Engine
Toolkit for processing and calling interactions in capture Hi-C data. Converts BAM files into counts of reads linking restriction fragments, and identifies pairs of fragments that interact more than expected by chance. Significant interactions are identified by comparing the observed read count to the expected background rate from a count regression model.
chinese.misc Miscellaneous Tools for Chinese Text Mining and More
Efforts are made to make Chinese text mining easier, faster, and robust to errors. Document term matrix can be generated by only one line of code; detecting encoding, segmenting and removing stop words are done automatically. Some convenient tools are also supplied.
ChIPtest Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data
Nonparametric Tests to identify the differential enrichment region for two conditions or time-course ChIP-seq data. It includes: data preprocessing function, estimation of a small constant used in hypothesis testing, a kernel-based two sample nonparametric test, two assumption-free two sample nonparametric test.
CHMM Coupled Hidden Markov Models
An exact and a variational inference for coupled Hidden Markov Models applied to the joint detection of copy number variations.
chngpt Estimation and Hypothesis Testing for Threshold Regression
Threshold regression models are also called two-phase regression, broken-stick regression, split-point regression, structural change models, and regression kink models. Methods for both continuous and discontinuous threshold models are included, but the support for the former is much greater. This package is described in Fong, Huang, Gilbert and Permar (2017) chngpt: threshold regression model estimation and inference, BMC Bioinformatics, in press, <DOI:10.1186/s12859-017-1863-x>.
cholera Amend, Augment and Aid Analysis of John Snow’s Cholera Data
Amends errors, augments data and aids analysis of John Snow’s map of the 1854 London cholera outbreak. The original data come from Rusty Dodson and Waldo Tobler’s 1992 digitization of Snow’s map. Those data, <http://…/snow.html>, are no longer available. However, they are preserved in the ‘HistData’ package, <https://…/package=HistData>.
chopthin The Chopthin Resampler
Resampling is a standard step in particle filtering and in sequential Monte Carlo. This package implements the chopthin resampler, which keeps a bound on the ratio between the largest and the smallest weights after resampling.
ChoR Chordalysis R Package
Learning the structure of graphical models from datasets with thousands of variables. More information about the research papers detailing the theory behind Chordalysis is available at <http://…/Research> (KDD 2016, SDM 2015, ICDM 2014, ICDM 2013). The R package development site is <https://…/Monash-ChoR>.
choroplethr Simplify the Creation of Choropleth Maps in R
Choropleths are thematic maps where geographic regions, such as states, are colored according to some metric, such as the number of people who live in that state. This package simplifies this process by 1. Providing ready-made functions for creating choropleths of common maps. 2. Providing data and API connections to interesting data sources for making choropleths. 3. Providing a framework for creating choropleths from arbitrary shapefiles. Please see the vignettes for more details.
chunked Chunkwise Text-File Processing for ‘dplyr’
Text data can be processed chunkwise using ‘dplyr’ commands. These are recorded and executed per data chunk, so large files can be processed with limited memory using the ‘LaF’ package.
chunkR Read Tables in Chunks
Read external data tables in chunks using a C++ backend.
CIEE Estimating and Testing Direct Effects in Directed Acyclic Graphs using Estimating Equations
In many studies across different disciplines, detailed measures of the variables of interest are available. If assumptions can be made regarding the direction of effects between the assessed variables, this has to be considered in the analysis. The functions in this package implement the novel approach CIEE (causal inference using estimating equations; Konigorski et al., 2017, Genetic Epidemiology, in press) for estimating and testing the direct effect of an exposure variable on a primary outcome, while adjusting for indirect effects of the exposure on the primary outcome through a secondary intermediate outcome and potential factors influencing the secondary outcome. The underlying directed acyclic graph (DAG) of this considered model is described in the vignette. CIEE can be applied to studies in many different fields, and it is implemented here for the analysis of a continuous primary outcome and a time-to-event primary outcome subject to censoring. CIEE uses estimating equations to obtain estimates of the direct effect and robust sandwich standard error estimates. Then, a large-sample Wald-type test statistic is computed for testing the absence of the direct effect. Additionally, standard multiple regression, regression of residuals, and the structural equation modeling approach are implemented for comparison.
cinterpolate Interpolation From C
Simple interpolation methods designed to be used from C code. Supports constant, linear and spline interpolation. An R wrapper is included but this package is primarily designed to be used from C code using ‘LinkingTo’. The spline calculations are classical cubic interpolation, e.g., Forsythe, Malcolm and Moler (1977) <ISBN: 9780131653320>.
CIplot Functions to Plot Confidence Interval
Plot confidence interval from the objects of statistical tests such as t.test(), var.test(), cor.test(), prop.test() and fisher.test() (‘htest’ class), Tukey test [TukeyHSD()], Dunnett test [glht() in ‘multcomp’ package], logistic regression [glm()], and Tukey or Games-Howell test [posthocTGH() in ‘userfriendlyscience’ package]. Users are able to set the styles of lines and points. This package contains the function to calculate odds ratios and their confidence intervals from the result of logistic regression.
circglmbayes Bayesian Analysis of a Circular GLM
Perform a Bayesian analysis of a circular outcome General Linear Model (GLM), which allows regressing a circular outcome on linear and categorical predictors. Posterior samples are obtained by means of an MCMC algorithm written in ‘C++’ through ‘Rcpp’. Estimation and credible intervals are provided, as well as hypothesis testing through Bayes Factors. See Mulder and Klugkist (2017) <doi:10.1016/j.jmp.2017.07.001>.
CircOutlier Detecting of Outliers in Circular Regression
Detecting of outliers in circular-circular regression models, modifying its and estimating of models parameters.
circumplex Analysis and Visualization of Circular Data
Tools for analyzing and visualizing circular data, including a generalization of the bootstrapped structural summary method from Zimmermann & Wright (2017) <doi:10.1177/1073191115621795> and functions for creating publication-ready tables and figures from the results. Future versions will include tools for circular fit and reliability analyses, as well as greatly enhanced visualization methods.
cIRT Choice Item Response Theory
Jointly model the accuracy of cognitive responses and item choices within a bayesian hierarchical framework as described by Culpepper and Balamuta (2015) <doi:10.1007/s11336-015-9484-7>. In addition, the package contains the datasets used within the analysis of the paper.
Cite An RStudio Addin to Insert BibTex Citation in Rmarkdown Documents
Contain an RStudio addin to insert BibTex citation in Rmarkdown documents with a minimal user interface.
ciTools Confidence or Prediction Intervals, Quantiles, and Probabilities for Statistical Models
Functions to append confidence intervals, prediction intervals, and other quantities of interest to data frames. All appended quantities are for the response variable, after conditioning on the model and covariates. This package has a data frame first syntax that allows for easy piping. Currently supported models include (log-) linear, (log-) linear mixed, and generalized linear models.
citr RStudio Add-in to Insert Markdown Citations
Functions and an RStudio add-in to search a BibTeX-file to create and insert formatted Markdown citations into the current document.
ciuupi Confidence Intervals Utilizing Uncertain Prior Information
Computes a confidence interval for a specified linear combination of the regression parameters in a linear regression model with iid normal errors with known variance when there is uncertain prior information that a distinct specified linear combination of the regression parameters takes a given value. This confidence interval, found by numerical constrained optimization, has the required minimum coverage and utilizes this uncertain prior information through desirable expected length properties. This confidence interval has the following three practical applications. Firstly, if the error variance has been accurately estimated from previous data then it may be treated as being effectively known. Secondly, for sufficiently large (dimension of the response vector) minus (dimension of regression parameter vector), greater than or equal to 30 (say), if we replace the assumed known value of the error variance by its usual estimator in the formula for the confidence interval then the resulting interval has, to a very good approximation, the same coverage probability and expected length properties as when the error variance is known. Thirdly, some more complicated models can be approximated by the linear regression model with error variance known when certain unknown parameters are replaced by estimates. This confidence interval is described in Kabaila, P. and Mainzer, R. (2017) <arXiv:1708.09543>, and is a member of the family of confidence intervals proposed by Kabaila, P. and Giri, K. (2009) <doi:10.1016/j.jspi.2009.03.018>.
CKLRT Composite Kernel Machine Regression Based on Likelihood Ratio Test
Composite Kernel Machine Regression based on Likelihood Ratio Test (CKLRT): in this package, we develop a kernel machine regression framework to model the overall genetic effect of a SNP-set, considering the possible GE interaction. Specifically, we use a composite kernel to specify the overall genetic effect via a nonparametric function and we model additional covariates parametrically within the regression framework. The composite kernel is constructed as a weighted average of two kernels, one corresponding to the genetic main effect and one corresponding to the GE interaction effect. We propose a likelihood ratio test (LRT) and a restricted likelihood ratio test (RLRT) for statistical significance. We derive a Monte Carlo approach for the finite sample distributions of LRT and RLRT statistics. (N. Zhao, H. Zhang, J. Clark, A. Maity, M. Wu. Composite Kernel Machine Regression based on Likelihood Ratio Test with Application for Combined Genetic and Gene-environment Interaction Effect (Submitted).)
CLA Critical Line Algorithm in Pure R
Implements ‘Markovitz’ Critical Line Algorithm (‘CLA’) for classical mean-variance portfolio optimization. Care has been taken for correctness in light of previous buggy implementations.
clam Classical Age-Depth Modelling of Cores from Deposits
Performs ‘classical’ age-depth modelling of dated sediment deposits – prior to applying more sophisticated techniques such as Bayesian age-depth modelling. Any radiocarbon dated depths are calibrated. Age-depth models are constructed by sampling repeatedly from the dated levels, each time drawing age-depth curves. Model types include linear interpolation, linear or polynomial regression, and a range of splines. See Blaauw (2010). <doi:10.1016/j.quageo.2010.01.002>.
clampSeg Idealisation of Patch Clamp Recordings
Allows for idealisation of patch clamp recordings by implementing the non-parametric JUmp Local dEconvolution Segmentation filter JULES.
clarifai Access to Clarifai API
Get description of images from Clarifai API. For more information, see http://clarifai.com. Clarifai uses a large deep learning cloud to come up with descriptive labels of the things in an image. It also provides how confident it is about each of the labels.
classifierplots Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots
Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!
classiFunc Classification of Functional Data
Efficient implementation of k-nearest neighbor estimator and a kernel estimator for functional data classification.
classyfireR R Interface to the ClassyFire RESTful API
Access to the ClassyFire RESTful API <http://classyfire.wishartlab.com>. Retrieve existing entity classifications and submit new entities for classification.
cld2 Google’s Compact Language Detector 2
Bindings to Google’s C++ library Compact Language Detector 2 (see <https://…/cld2#readme> for more information). Probabilistically detects over 80 languages in UTF-8 text (plain text or HTML). For mixed-language input it returns the top three languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes).
cld3 Google’s Compact Language Detector 3
Google’s Compact Language Detector 3 is a neural network model for language identification and the successor of ‘cld2’ (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from ‘cld2’. See <https://…/cld3#readme> for more information.
cleancall C Resource Cleanup via Exit Handlers
Wrapper of .Call() that runs exit handlers to clean up C resources. Helps managing C (non-R) resources while using the R API.
cleanEHR The Critical Care Clinical Data Processing Tools
A toolset to deal with the Critical Care Health Informatics Collaborative dataset. It is created to address various data reliability and accessibility problems of electronic healthcare records (EHR). It provides a unique platform which enables data manipulation, transformation, reduction, anonymisation, cleaning and validation.
cleanerR How to Handle your Missing Data
How to deal with missing data?Based on the concept of almost functional dependencies, a method is proposed to fill missing data, as well as help you see what data is missing. The user can specify a measure of error and how many combinations he wish to test the dependencies against, the closer to the length of the dataset, the more precise. But the higher the number, the more time it will take for the process to finish. If the program cannot predict with the accuracy determined by the user it shall not fill the data, the user then can choose to increase the error or deal with the data another way.
cleanNLP A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford’s CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.
cleanr Helps You to Code Cleaner
Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in ‘The C Programming Language’ did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in ‘Clean Code: A Handbook of Agile Software Craftsmanship’ did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout https://…/lintr instead.
clespr Composite Likelihood Estimation for Spatial Data
Composite likelihood approach is implemented to estimating statistical models for spatial ordinal and proportional data based on Feng et al. (2014) <doi:10.1002/env.2306>. Parameter estimates are identified by maximizing composite log-likelihood functions using the limited memory BFGS optimization algorithm with bounding constraints, while standard errors are obtained by estimating the Godambe information matrix.
clhs Conditioned Latin Hypercube Sampling
Conditioned Latin hypercube sampling, as published by Minasny and McBratney (2006) <DOI:10.1016/j.cageo.2005.12.009>. This method proposes to stratify sampling in presence of ancillary data. An extension of this method, which propose to associate a cost to each individual and take it into account during the optimisation process, is also proposed (Roudier et al., 2012, <DOI:10.1201/b12728>).
cli Helpers for Developing Command Line Interfaces
A suite of tools designed to build attractive command line interfaces (‘CLIs’). Includes tools for drawing rules, boxes, trees, and ‘Unicode’ symbols with ‘ASCII’ alternatives.
cliapp Create Rich Command Line Applications
Create rich command line applications, with colors, headings, lists, alerts, progress bars, etc. It uses CSS for custom themes.
clickR Fix Data and Create Report Tables from Different Objects
Fixes data errors in numerical, factor and date variables and performs report tables from models and summaries.
clikcorr Censoring Data and Likelihood-Based Correlation Estimation
A profile likelihood based method of estimation and inference on the correlation coefficient of bivariate data with different types of censoring and missingness.
climbeR Calculate Average Minimal Depth of a Maximal Subtree for ‘ranger’ Package Forests
Calculates first, and second order, average minimal depth of a maximal subtree for a forest object produced by the R ‘ranger’ package. This variable importance metric is implemented as described in Ishwaran et. al. (‘High-Dimensional Variable Selection for Survival Data’, March 2010, <doi:10.1198/jasa.2009.tm08622>).
clinDR Simulation and Analysis Tools for Clinical Dose Response Modeling
Bayesian and ML Emax model fitting, graphics and simulation for clinical dose response. The summary data from the dose response meta-analyses in Thomas, Sweeney, and Somayaji (2014) <doi:10.1080/19466315.2014.924876> and Thomas and Roy (2016) <doi:10.1080/19466315.2016.1256229> are included in the package. The prior distributions for the Bayesian analyses default to the posterior predictive distributions derived from these references.
ClinicalTrialSummary Summary Measures for Clinical Trials with Survival Outcomes
Provides estimates of the several summary measures for clinical trials including the average hazard ratio, the weighted average hazard ratio, the restricted superiority probability ratio, the restricted mean survival difference and the ratio of restricted mean times lost, based on the short-term and long-term hazard ratio model (Yang, 2005 <doi:10.1093/biomet/92.1.1>) which accommodates various non-proportional hazards scenarios. The inference procedures and the asymptotic results for the summary measures are discussed in Yang (2017, pre-print).
ClinReport Statistical Reporting in Clinical Trials
It enables to create easily formatted statistical tables in ‘Microsoft Word’ documents in pretty formats according to ‘clinical standards’. It can be used also outside the scope of clinical trials, for any statistical reporting in ‘Word’. Descriptive tables for quantitative statistics (mean, median, max etc..) and/or qualitative statistics (frequencies and percentages) are available and formatted tables of Least Square Means of Linear Models, Linear Mixed Models and Generalized Linear Mixed Models coming from emmeans() function are also available. The package works with ‘officer’ and ‘flextable’ packages to export the outputs into ‘Microsoft Word’ documents.
clipr Read and Write from the System Clipboard
Simple utility functions to read from and write to the system clipboards of Windows, OS X, and Linux.
clisymbols Unicode Symbols at the R Prompt
A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the ‘figures’ ‘npm’ package (see https://…/figures ).
CLME Constrained Inference for Linear Mixed Effects Models
Constrained inference for linear mixed effects models using residual bootstrap methodology
clogitboost Boosting Conditional Logit Model
A set of functions to fit a boosting conditional logit model.
clogitL1 Fitting Exact Conditional Logistic Regression with Lasso and Elastic Net Penalties
Tools for the fitting and cross validation of exact conditional logistic regression models with lasso and elastic net penalties. Uses cyclic coordinate descent and warm starts to compute the entire path efficiently.
clogitLasso Lasso Estimation of Conditional Logistic Regression Models
Fit a sequence of conditional logistic regression models with lasso, for small to large sized samples.
clordr Composite Likelihood Inference for Spatial Ordinal Data with Replications
Composite likelihood parameter estimate and asymptotic covariance matrix are calculated for the spatial ordinal data with replications, where spatial ordinal response with covariate and both spatial exponential covariance within subject and independent and identically distributed measurement error. Parametric bootstrapping is used to estimate the asymptotic standard error and covariance matrix.
cloudml Interface to the Google Cloud Machine Learning Platform
Interface to the Google Cloud Machine Learning Platform <https://…/ml-engine>, which provides cloud tools for training machine learning models.
clr Curve Linear Regression via Dimension Reduction
A new methodology for linear regression with both curve response and curve regressors, which is described in Cho, Goude, Brossat and Yao (2013) <doi:10.1080/01621459.2012.722900> and (2015) <doi:10.1007/978-3-319-18732-7_3>. The key idea behind this methodology is dimension reduction based on a singular value decomposition in a Hilbert space, which reduces the curve regression problem to several scalar linear regression problems.
clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections
Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models. Several adjustments are incorporated to improve small- sample performance. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple-contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple-contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm(), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).
ClueR CLUster Evaluation (CLUE)
CLUE is an R package for identifying optimal number of clusters in a given time-course dataset clustered by cmeans or kmeans algorithms.
CluMix Clustering and Visualization of Mixed-Type Data
Provides utilities for clustering subjects and variables of mixed data types. Similarities between subjects are measured by Gower’s general similarity coefficient with an extension of Podani for ordinal variables. Similarities between variables are assessed by combination of appropriate measures of association for different pairs of data types. Alternatively, variables can also be clustered by the ‘ClustOfVar’ approach. The main feature of the package is the generation of a mixed-data heatmap. For visualizing similarities between either subjects or variables, a heatmap of the corresponding distance matrix can be drawn. Associations between variables can be explored by a ‘confounderPlot’, which allows visual detection of possible confounding, collinear, or surrogate factors for some variables of primary interest. Distance matrices and dendrograms for subjects and variables can be derived and used for further visualizations and applications.
clusrank Wilcoxon Rank Sum Test for Clustered Data
Non-parametric tests (Wilcoxon rank sum test and Wilcoxon signed rank test) for clustered data.
clust.bin.pair Statistical Methods for Analyzing Clustered Matched Pair Data
Tests, utilities, and case studies for analyzing significance in clustered binary matched-pair data. The central function clust.bin.pair uses one of several tests to calculate a Chi-square statistic. Implemented are the tests Eliasziw, Obuchowski, Durkalski, and Yang with McNemar included for comparison. The utility functions nested.to.contingency and paired.to.contingency convert data between various useful formats. Thyroids and psychiatry are the canonical datasets from Obuchowski and Petryshen respectively.
clustDRM Clustering Dose-Response Curves and Fitting Appropriate Models to Them
Functions to identify the pattern of a dose-response curve. Then fit a set of appropriate models to it according to the identified pattern, followed by model averaging to estimate the effective dose.
clustEff Clusters of Effect Curves in Quantile Regression Models
Clustering method to cluster both curves effects, through quantile regression coefficient modeling, and curves in functional data analysis. Sottile G. and Adelfio G. (2017) <https://…/IWSM_2017_V2.pdf>.
cluster Cluster Analysis Extended Rousseeuw et al
Cluster analysis methods. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990).
Cluster.OBeu Cluster Analysis ‘OpenBudgets’
Estimate and return the needed parameters for visualisations designed for ‘OpenBudgets’ <http://…/> data. Calculate cluster analysis measures in budget data of municipalities across Europe, according to the ‘OpenBudgets’ data model. It involves a set of techniques and algorithms used to find and divide the data into groups of similar observations. Also, can be used generally to extract visualisation parameters convert them to ‘JSON’ format and use them as input in a different graphical interface.
ClusterBootstrap Analyze Clustered Data with Generalized Linear Models using the Cluster Bootstrap
Provides functionality for the analysis of clustered data using the cluster bootstrap.
clusterCrit Clustering Indices
Compute clustering validation indices
clusteredinterference Causal Effects from Observational Studies with Clustered Interference
Estimating causal effects from observational studies assuming clustered (or partial) interference. These inverse probability-weighted estimators target new estimands arising from population-level treatment policies. The estimands and estimators are introduced in Barkley et al. (2017) <arXiv:1711.04834>.
clustering.sc.dp Optimal Distance-Based Clustering for Multidimensional Data with Sequential Constraint
A dynamic programming algorithm for optimal clustering multidimensional data with sequential constraint. The algorithm minimizes the sum of squares of within-cluster distances. The sequential constraint allows only subsequent items of the input data to form a cluster. The sequential constraint is typically required in clustering data streams or items with time stamps such as video frames, GPS signals of a vehicle, movement data of a person, e-pen data, etc. The algorithm represents an extension of Ckmeans.1d.dp to multiple dimensional spaces. Similarly to the one-dimensional case, the algorithm guarantees optimality and repeatability of clustering. Method clustering.sc.dp can find the optimal clustering if the number of clusters is known. Otherwise, methods findwithinss.sc.dp and backtracking.sc.dp can be used.
clustermq Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM)
Provides the Q() function to send arbitrary function calls to workers on HPC schedulers without relying on network-mounted storage. Allows using remote schedulers via SSH.
ClusterR Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans and K-Medoids Clustering
Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions.
ClusterRankTest Rank Tests for Clustered Data
Nonparametric rank based tests (rank-sum tests and signed-rank tests) for clustered data, especially useful for clusters having informative cluster size and intra-cluster group size.
ClusterStability Assessment of Stability of Individual Object or Clusters in Partitioning Solutions
Allows one to assess the stability of individual objects, clusters and whole clustering solutions based on repeated runs of the K-means and K-medoids partitioning algorithms.
clustertend Check the Clustering Tendency
Calculate some statistics aiming to help analyzing the clustering tendency of given data. In the first version, Hopkins’ statistic is implemented.
clustMixType k-Prototypes Clustering for Mixed Variable-Type Data
Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304, <DOI:10.1023/A:1009769707641>.
ClustMMDD Variable Selection in Clustering by Mixture Models for Discrete Data
An implementation of a variable selection procedure in clustering by mixture of multinomial models for discrete data. Genotype data are examples of such data with two unordered observations (alleles) at each locus for diploid individual. The two-fold problem is seen as a model selection problem where competing models are characterized by the number of clusters K, and the subset S of clustering variables. Competing models are compared by penalized maximum likelihood criteria. We considered asymptotic criteria such as Akaike and Bayesian Information criteria, and a family of penalized criteria with penalty function to be data driven calibrated.
clustRcompaR Easy Interface for Clustering a Set of Documents and Exploring Group- Based Patterns
Provides an interface to perform cluster analysis on a corpus of text. Interfaces to Quanteda to assemble text corpuses easily. Deviationalizes text vectors prior to clustering using technique described by Sherin (Sherin, B. [2013]. A computational study of commonsense science: An exploration in the automated analysis of clinical interview data. Journal of the Learning Sciences, 22(4), 600-638. Chicago. http://…/10508406.2013.836654 ). Uses cosine similarity as distance metric for two stage clustering process, involving Ward’s algorithm hierarchical agglomerative clustering, and k-means clustering. Selects optimal number of clusters to maximize ‘variance explained’ by clusters, adjusted by the number of clusters. Provides plotted output of clustering results as well as printed output. Assesses ‘model fit’ of clustering solution to a set of preexisting groups in dataset.
clustree Visualise Clusterings at Different Resolutions
Deciding what resolution to use can be a difficult question when approaching a clustering analysis. One way to approach this problem is to look at how samples move as the number of clusters increases. This package allows you to produce clustering trees, a visualisation for interrogating clusterings as resolution increases.
clustringr Cluster Strings by Edit-Distance
Returns an edit-distance based clusterization of an input vector of strings. Each cluster will contain a set of strings w/ small mutual edit-distance (e.g., Levenshtein, optimum-sequence-alignment, Damerau-Levenshtein), as computed by stringdist::stringdist(). The set of all mutual edit-distances is then used by graph algorithms (from package ‘igraph’) to single out subsets of high connectivity.
CLUSTShiny Interactive Document for Working with Cluster Analysis
An interactive document on the topic of cluster analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
ClustVarLV Clustering of Variables Around Latent Variables
The clustering of variables is a strategy for deciphering the underlying structure of a data set. Adopting an exploratory data analysis point of view, the Clustering of Variables around Latent Variables (CLV) approach has been proposed by Vigneau and Qannari (2003). Based on a family of optimization criteria, the CLV approach is adaptable to many situations. In particular, constraints may be introduced in order to take account of additional information about the observations and/or the variables. In this paper, the CLV method is depicted and the R package ClustVarLV including a set of functions developed so far within this framework is introduced. Considering successively different types of situations, the underlying CLV criteria are detailed and the various functions of the package are illustrated using real case studies.
cmaesr Covariance Matrix Adaption Evolutionary Strategy
Pure R implementation of the Covariance Matrix Adaption – Evolution Strategy (CMA-ES) with optional restarts (IPOP-CMA-ES).
CMatching Matching Algorithms for Causal Inference with Clustered Data
Provides functions to perform matching algorithms for causal inference with clustered data, as described in B. Arpino and M. Cannas (2016) <doi:10.1002/sim.6880>. Pure within-cluster and preferential-within cluster matching are implemented. Both algorithms provide causal estimates with cluster-adjusted estimates of standard errors.
cmce Computer Model Calibration for Deterministic and Stochastic Simulators
Implements the Bayesian calibration model described in Pratola and Chkrebtii (2018) <DOI:10.5705/ss.202016.0403> for stochastic and deterministic simulators. Additive and multiplicative discrepancy models are currently supported. See <http://…/software> for more information and examples.
cmenet Bi-Level Selection of Conditional Main Effects
Provides functions for implementing cmenet – a bi-level variable selection method for conditional main effects (see Mak and Wu (2018) <doi:10.1080/01621459.2018.1448828>). CMEs are reparametrized interaction effects which capture the conditional impact of a factor at a fixed level of another factor. Compared to traditional two-factor interactions, CMEs quantify more interpretable interaction effects in many problems of interest (e.g., genomics, molecular engineering, personalized medicine). The current implementation performs variable selection on only binary CMEs, but we are working on an extension for the continuous setting. This work was supported by USARO grant W911NF-14-1-0024.
cmfilter Coordinate-Wise Mediation Filter
Functions to discover, plot, and select multiple mediators from an x -> M -> y linear system. This exploratory mediation analysis is performed using the Coordinate-wise Mediation Filter as introduced by Van Kesteren and Oberski (2019) <arXiv: 1810.06334>.
CMLS Constrained Multivariate Least Squares
Solves multivariate least squares (MLS) problems subject to constraints on the coefficients, e.g., non-negativity, orthogonality, equality, inequality, monotonicity, unimodality, smoothness, etc. Includes flexible functions for solving MLS problems subject to user-specified equality and/or inequality constraints, as well as a wrapper function that implements 24 common constraint options. Also does k-fold or generalized cross-validation to tune constraint options for MLS problems. See ten Berge (1993, ISBN:9789066950832) for an overview of MLS problems, and see Goldfarb and Idnani (1983) <doi:10.1007/BF02591962> for a discussion of the underlying quadratic programming algorithm.
CMplot Circle Manhattan Plot
To visualize the results of Genome-Wide Association Study, Manhattan plot was born. However, it will take much time to draw an elaborate one. Here, this package gives a function named ‘CMplot’ can easily solve the problem. Inputting the results of GWAS and adjusting certain parameters, users will obtain the desired Manhattan plot. Also, a circle Manhattan plot is first put forward, which demonstrates multiple traits in one circle plot. A more visualized figure can spare the length of a paper and lift the paper to a higher level.
cmprskQR Analysis of Competing Risks Using Quantile Regressions
Estimation, testing and regression modeling of subdistribution functions in competing risks using quantile regressions, as described in Peng and Fine (2009) <DOI:10.1198/jasa.2009.tm08228>.
cna A Package for Coincidence Analysis (CNA)
Provides functions for performing Coincidence Analysis (CNA).
cnbdistr Conditional Negative Binomial Distribution
Provided R functions for working with the Conditional Negative Binomial distribution.
CNLTreg Complex-Valued Wavelet Lifting for Signal Denoising
Implementations of recent complex-valued wavelet shrinkage procedures for smoothing irregularly sampled signals.
CNVScope A Versatile Toolkit for Copy Number Variation Relationship Data Analysis and Visualization
Provides the ability to create interaction maps, discover CNV map domains (edges), gene annotate interactions, and create interactive visualizations of these CNV interaction maps.
coalitions Coalition Probabilities in Multi-Party Democracies
An implementation of a MCMC method to calculate probabilities for a coalition majority based on survey results, see Bender and Bauer (2018) <doi:10.21105/joss.00606>.
cobalt Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with ‘MatchIt’, ‘twang’, ‘Matching’, and ‘CBPS’ for assessing balance on the output of their preprocessing functions. Users can also specify their data not generated through the above packages.
cobiclust Biclustering via Latent Block Model Adapted to Overdispersed Count Data
Implementation of a probabilistic method for biclustering adapted to overdispersed count data. It is a Gamma-Poisson Latent Block Model. It also implements two selection criteria in order to select the number of biclusters.
cocor Comparing Correlations
Statistical tests for the comparison between two correlations based on either independent or dependent groups. Dependent correlations can either be overlapping or nonoverlapping. A web interface is available on the website http://comparingcorrelations.org. A plugin for the R GUI and IDE RKWard is included. Please install RKWard from https://rkward.kde.org to use this feature. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard.
cocoreg Extracts Shared Variation in Collections of Datasets Using Regression Models
The cocoreg algorithm extracts shared variation from a collection of datasets using regression models.
coda.base A Basic Set of Functions for Compositional Data Analysis
A minimum set of functions to perform compositional data analysis using the log-ratio approach introduced by John Aitchison in 1982. Main functions have been implemented in c++ for better performance.
cOde Automated C Code Generation for Use with the ‘deSolve’ and ‘bvpSolve” Packages
Generates all necessary C functions allowing the user to work with the compiled-code interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. The package also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis.
codebook Automatic Codebooks from Survey Metadata Encoded in Attributes
Easily automate the following tasks to describe data frames: computing reliabilities (internal consistencies, retest, multilevel) for psychological scales, summarise the distributions of scales and items graphically and using descriptive statistics, combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on ‘rmarkdown’ partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.).
CodeDepends Analysis of R Code for Reproducible Research and Code Comprehension
Tools for analyzing R expressions or blocks of code and determining the dependencies between them. It focuses on R scripts, but can be used on the bodies of functions. There are many facilities including the ability to summarize or get a high-level view of code, determining dependencies between variables, code improvement suggestions.
codemetar Generate ‘CodeMeta’ Metadata for R Packages
The ‘Codemeta’ Project defines a ‘JSON-LD’ format for describing software metadata, as detailed at <https://codemeta.github.io>. This package provides utilities to generate, parse, and modify ‘codemeta.json’ files automatically for R packages, as well as tools and examples for working with ‘codemeta.json’ ‘JSON-LD’ more generally.
codified Produce Standard/Formalized Demographics Tables
Augment clinical data with metadata to create output used in conventional publications and reports.
CoDiNA Co-Expression Differential Network Analysis
Categorize links from multiple networks in 3 categories: Common links (alpha) specific links (gamma), and different links (beta). Also categorizes the links into sub-categories and groups. The package includes a visualization tool for the networks. More information about the methodology can be found at: Gysi et. al., 2018 <arXiv:1802.00828>.
codingMatrices Alternative Factor Coding Matrices for Linear Model Formulae
A collection of coding functions as alternatives to the standard functions in the stats package, which have names starting with ‘contr.’. Their main advantage is that they provide a consistent method for defining marginal effects in multi-way factorial models. In a simple one-way ANOVA model the intercept term is always the simple average of the class means.
codyn Community Dynamics Metrics
A toolbox of ecological community dynamics metrics that are explicitly temporal. Functions fall into two categories: temporal diversity indices and community stability metrics. The diversity indices are temporal analogs to traditional diversity indices such as richness and rank-abundance curves. Specifically, functions are provided to calculate species turnover, mean rank shifts, and lags in community similarity between time points. The community stability metrics calculate overall stability and patterns of species covariance and synchrony over time.
cofeatureR Generate Cofeature Matrices
Generate cofeature (feature by sample) matrices. The package utilizies ggplot2::geom_tile to generate the matrix allowing for easy additions from the base matrix.
CoFRA Complete Functional Regulation Analysis
Calculates complete functional regulation analysis and visualize the results in a single heatmap. The provided example data is for biological data but the methodology can be used for large data sets to compare quantitative entities that can be grouped. For example, a store might divide entities into cloth, food, car products etc and want to see how sales changes in the groups after some event. The theoretical background for the calculations are provided in New insights into functional regulation in MS-based drug profiling, Ana Sofia Carvalho, Henrik Molina & Rune Matthiesen, Scientific Reports, <doi:10.1038/srep18826>.
coga Convolution of Gamma Distributions
Convolution of gamma distributions in R. The convolution of gamma distributions is the sum of series of gamma distributions and all gamma distributions here can have different parameters. This package can calculate density, distribution function and do simulation work.
cogmapr Cognitive Mapping Tools Based on Coding of Textual Sources
Functions for building cognitive maps based on qualitative data. Inputs are textual sources (articles, transcription of qualitative interviews of agents,…). These sources have been coded using relations and are linked to (i) a table describing the variables (or concepts) used for the coding and (ii) a table describing the sources (typology of agents, …). Main outputs are Individual Cognitive Maps (ICM), Social Cognitive Maps (all sources or group of sources) and a list of quotes linked to relations. This package is linked to the work done during the PhD of Frederic M. Vanwindekens (CRA-W / UCL) hold the 13 of May 2014 at University of Louvain in collaboration with the Walloon Agricultural Research Centre (project MIMOSA, MOERMAN fund).
coindeskr Access ‘CoinDesk’ Bitcoin Price Index API
Extract real-time Bitcoin price details by accessing ‘CoinDesk’ Bitcoin price Index API <https://…/>.
cointmonitoR Consistent Monitoring of Stationarity and Cointegrating Relationships
We propose a consistent monitoring procedure to detect a structural change from a cointegrating relationship to a spurious relationship. The procedure is based on residuals from modified least squares estimation, using either Fully Modified, Dynamic or Integrated Modified OLS. It is inspired by Chu et al. (1996) <DOI:10.2307/2171955> in that it is based on parameter estimation on a pre-break ‘calibration’ period only, rather than being based on sequential estimation over the full sample. See the discussion paper <DOI:10.2139/ssrn.2624657> for further information. This package provides the monitoring procedures for both the cointegration and the stationarity case (while the latter is just a special case of the former one) as well as printing and plotting methods for a clear presentation of the results.
cointReg Parameter Estimation and Inference in a Cointegrating Regression
Cointegration methods are widely used in empirical macroeconomics and empirical finance. It is well known that in a cointegrating regression the ordinary least squares (OLS) estimator of the parameters is super-consistent, i.e. converges at rate equal to the sample size T. When the regressors are endogenous, the limiting distribution of the OLS estimator is contaminated by so-called second order bias terms, see e.g. Phillips and Hansen (1990) <DOI:10.2307/2297545>. The presence of these bias terms renders inference difficult. Consequently, several modifications to OLS that lead to zero mean Gaussian mixture limiting distributions have been proposed, which in turn make standard asymptotic inference feasible. These methods include the fully modified OLS (FM-OLS) approach of Phillips and Hansen (1990) <DOI:10.2307/2297545>, the dynamic OLS (D-OLS) approach of Phillips and Loretan (1991) <DOI:10.2307/2298004>, Saikkonen (1991) <DOI:10.1017/S0266466600004217> and Stock and Watson (1993) <DOI:10.2307/2951763> and the new estimation approach called integrated modified OLS (IM-OLS) of Vogelsang and Wagner (2014) <DOI:10.1016/j.jeconom.2013.10.015>. The latter is based on an augmented partial sum (integration) transformation of the regression model. IM-OLS is similar in spirit to the FM- and D-OLS approaches, with the key difference that it does not require estimation of long run variance matrices and avoids the need to choose tuning parameters (kernels, bandwidths, lags). However, inference does require that a long run variance be scaled out. This package provides functions for the parameter estimation and inference with all three modified OLS approaches. That includes the automatic bandwidth selection approaches of Andrews (1991) <DOI:10.2307/2938229> and of Newey and West (1994) <DOI:10.2307/2297912> as well as the calculation of the long run variance.
colf Constrained Optimization on Linear Function
Performs least squares constrained optimization on a linear objective function. It contains a number of algorithms to choose from and offers a formula syntax similar to lm().
CollapsABEL Generalized CDH (GCDH) Analysis
Implements a generalized version of the CDH test <DOI:10.1371/journal.pone.0028145> for detecting compound heterozygosity on a genome-wide level, due to usage of generalized linear models it allows flexible analysis of binary and continuous traits with covariates.
CollapseLevels Collapses Levels, Computes Information Value and WoE
Provides functions to collapse levels of an attribute based on response rates. It also provides functions to compute and display information value, and weight of evidence (WoE) for the attributes, and to convert numeric variables to categorical ones by binning. These functions only work for binary classification problems.
collapsibleTree Interactive Collapsible Tree Diagrams using ‘D3.js’
Interactive Reingold-Tilford tree diagrams created using ‘D3.js’, where every node can be expanded and collapsed by clicking on it. Tooltips and color gradients can be mapped to nodes using a numeric column in the source data frame. See ‘collapsibleTree’ website for more information and examples.
collectArgs Quickly and Neatly Collect Arguments from One Environment to Pass to Another
We often want to take all (or most) of the objects in one environment (such as the parameter values of a function) and pass them to another. This might be calling a second function, or iterating over a list, calling the same function. These functions wrap often repeated code. Current stable version (committed on October 14, 2017).
collections High Performance Container Data Types
Provides high performance container data types such as Queue, Stack, Deque, Dict and OrderedDict. Benchmarks <https://…/benchmark.html> have shown that these containers are asymptotically more efficient than those offered by other packages.
collector Quantified Risk Assessment Data Collection
An open source process for collecting quantified data inputs from subject matter experts. Intended for feeding into an OpenFAIR analysis <https://…/C13K> using a tool such as ‘evaluator’ <https://evaluator.tidyrisk.org>.
collidr Check for Namespace Collisions with Other Packages and Functions on CRAN
Check for namespace collisions between a string input (your function or package name) and a quarter of a million packages and functions on CRAN.
collpcm Collapsed Latent Position Cluster Model for Social Networks
Markov chain Monte Carlo based inference routines for collapsed latent position cluster models or social networks, which includes searches over the model space (number of clusters in the latent position cluster model). The label switching algorithm used is that of Nobile and Fearnside (2007) <doi:10.1007/s11222-006-9014-7> which relies on the algorithm of Carpaneto and Toth (1980) <doi:10.1145/355873.355883>.
collUtils Auxiliary Package for Package ‘CollapsABEL’
Provides some low level functions for processing PLINK input and output files.
coloredICA Implementation of Colored Independent Component Analysis and Spatial Colored Independent Component Analysis
It implements colored Independent Component Analysis (Lee et al., 2011) and spatial colored Independent Component Analysis (Shen et al., 2014). They are two algorithms to perform ICA when sources are assumed to be temporal or spatial stochastic processes, respectively.
colorednoise Simulate Temporally Autocorrelated Population Time Series
Temporally autocorrelated populations are correlated in their vital rates (growth, death, etc.) from year to year. It is very common for populations, whether they be bacteria, plants, or humans, to be temporally autocorrelated. This poses a challenge for stochastic population modeling, because a temporally correlated population will behave differently from an uncorrelated one. This package provides tools for simulating populations with white noise (no temporal autocorrelation), red noise (positive temporal autocorrelation), and blue noise (negative temporal autocorrelation). The algebraic formulation for autocorrelated noise comes from Ruokolainen et al. (2009) <doi:10.1016/j.tree.2009.04.009>. The simulations are based on an assumption of an asexually reproducing population, but it can also be used to simulate females of a sexually reproducing species.
colorfindr Extract Colors from Windows BMP, JPEG, PNG, TIFF, and SVG Format Images
Extracts colors from various image types, returns customized reports and plots treemaps of image compositions. Selected colors and color ranges can be excluded from the analysis.
ColorPalette Color Palettes Generator
Different methods to generate a color palette based on a specified base color and a number of colors that should be created.
colorpatch Optimized Rendering of Fold Changes and Confidence Values
Shows color patches for encoding fold changes (e.g. log ratios) together with confidence values within a single diagram. This is especially useful for rendering gene expression data as well as other types of differential experiments. In addition to different rendering methods (ggplot extensions) functionality for perceptually optimizing color palettes are provided. Furthermore the package provides extension methods of the colorspace color-class in order to simplify the work with palettes (a.o. length, as.list, and append are supported).
colorplaner A ggplot2 Extension to Visualize Two Variables per Color Aesthetic Through Color Space Projections
A ggplot2 extension to visualize two variables through one color aesthetic via mapping to a color space projection. With this technique for 2-D color mapping, one can create a dichotomous choropleth in R as well as other visualizations with bivariate color scales. Includes two new scales and a new guide for ggplot2.
colorscience Color Science Methods and Data
Methods and data for color science – color conversions by observer, illuminant and gamma. Color matching functions and chromaticity diagrams. Color indices, color differences and spectral data conversion/analysis.
colorspace Color Space Manipulation
Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided.
colorSpec Color Calculations with Emphasis on Spectral Data
Calculate with spectral properties of light sources, materials, cameras, eyes, and scanners. Build complex systems from simpler parts using a spectral product algebra. For light sources, compute CCT and CRI. For object colors, compute optimal colors and Logvinenko coordinates. Work with the standard CIE illuminants and color matching functions, and read spectra from text files, including CGATS files. Sample text files, and 4 vignettes are included.
colourpicker A Colour Picker Widget for Shiny Apps, RStudio, R-markdown, and ‘htmlwidgets’
A colour picker that can be used as an input in Shiny apps or R-markdown documents. A colour picker RStudio addin is provided to let you select colours for use in your R code. The colour picker is also availble as an ‘htmlwidgets’ widget.
colr Functions to Select and Rename Data
Powerful functions to select and rename columns in dataframes, lists and numeric types by ‘Perl’ regular expression. Regular expression (‘regex’) are a very powerful grammar to match strings, such as column names.
Combine Game-Theoretic Probability Combination
Suite of R functions for combination of probabilities using a game-theoretic method.
combiter Combinatorics Iterators
Provides iterators for combinations, permutations, and subsets, which allow one to go through all elements without creating a huge set of all possible values.
cometExactTest Exact Test from the Combinations of Mutually Exclusive Alterations (CoMEt) Algorithm
An algorithm for identifying combinations of mutually exclusive alterations in cancer genomes. CoMEt represents the mutations in a set M of k genes with a 2^k dimensional contingency table, and then computes the tail probability of observing T(M) exclusive alterations using an exact statistical test.
commonmark Bindings to the ‘CommonMark’ Reference Implementation
The ‘CommonMark’ spec is a rationalized version of Markdown syntax. This package converts markdown text to various formats including a parse tree in XML format.
commonsMath JAR Files of the Apache Commons Mathematics Library
Java JAR files for the Apache Commons Mathematics Library for use by users and other packages.
COMMUNAL Robust Selection of Cluster Number K
Facilitates optimal clustering of a data set. Provides a framework to run a wide range of clustering algorithms to determine the optimal number (k) of clusters in the data. Then analyzes the cluster assignments from each clustering algorithm to identify samples that repeatedly classify to the same group. We call these ‘core clusters’, providing a basis for later class discovery.
comorbidity Computing Comorbidity Scores
Computing comorbidity scores such as the weighted Charlson score (Charlson, 1987 <doi:10.1016/0021-9681(87)90171-8>) and the Elixhauser comorbidity score (Elixhauser, 1998 <doi:10.1097/00005650-199801000-00004>) using ICD-10 codes (Quan, 2005 <doi:10.1097/01.mlr.0000182534.19832.83>).
CompareCausalNetworks Interface to Diverse Estimation Methods of Causal Networks
Unified interface for the estimation of causal networks, including the methods ‘backShift’ (from package ‘backShift’), ‘bivariateANM’ (bivariate additive noise model), ‘bivariateCAM’ (bivariate causal additive model), ‘CAM’ (causal additive model) (from package ‘CAM’), ‘hiddenICP’ (invariant causal prediction with hidden variables), ‘ICP’ (invariant causal prediction) (from package ‘InvariantCausalPrediction’), ‘GES’ (greedy equivalence search), ‘GIES’ (greedy interventional equivalence search), ‘LINGAM’, ‘PC’ (PC Algorithm), ‘RFCI’ (really fast causal inference) (all from package ‘pcalg’) and regression.
compareDF Do a Git Style Diff of the Rows Between Two Dataframes with Similar Structure
Compares two dataframes which have the same column structure to show the rows that have changed. Also gives a git style diff format to quickly see what has changes in addition to summary statistics.
compareGroups Descriptive Analysis by Groups
Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, …) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
comparer Compare Output and Run Time
Makes comparisons quickly for different functions or code blocks performing the same task with the function mbc(). Can be used to compare model fits to the same data or see which function runs faster.
compboost C++ Implementation of Component-Wise Boosting
C++ implementation of component-wise boosting implementation of component-wise boosting written in C++ to obtain high runtime performance and full memory control. The main idea is to provide a modular class system which can be extended without editing the source code. Therefore, it is possible to use R functions as well as C++ functions for custom base-learners, losses, logging mechanisms or stopping criteria.
CompDist Multisection Composite Distributions
Computes density function, cumulative distribution function, quantile function and random numbers for a multisection composite distribution specified by the user. Also fits the user specified distribution to a given data set. More details of the package can be found in the following paper submitted to the R journal Wiegand M and Nadarajah S (2017) CompDist: Multisection composite distributions.
comperes Manage Competition Results
Tools for storing and managing competition results. Competition is understood as a set of games in which players gain some abstract scores. There are two ways for storing results: in long (one row per game-player) and wide (one row per game with fixed amount of players) formats. This package provides functions for creation and conversion between them. Also there are functions for computing their summary and Head-to-Head values for players. They leverage grammar of data manipulation from ‘dplyr’.
compete Analyzing Social Hierarchies
Organizing and Analyzing Social Dominance Hierarchy Data.
CompetingRisk The Semi-Parametric Cumulative Incidence Function
Computing the point estimator and pointwise confidence interval of the cumulative incidence function from the cause-specific hazards model.
Compind Composite indicators functions
Compind package contains several functions to enhance approaches to the Composite Indicators (http://…/detail.asp?ID=6278 , https://composite-indicators.jrc.ec.europa.eu ) methods, focusing, in particular, on the normalisation and weighting-aggregation steps.
compLasso Implements the Component Lasso Method Functions
Implements the Component lasso method for linear regression using the sample covariance matrix connected-components structure, described in A Component Lasso, by Hussami and Tibshirani (2013)
complexity Calculate the Proportion of Permutations in Line with an Informative Hypothesis
Allows for the easy computation of complexity: the proportion of the parameter space in line with the hypothesis by chance.
Compositional Compositional Data Analysis
A collection of R functions for compositional data analysis.
compositions Compositional Data Analysis
The package provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by Aitchison and Pawlowsky-Glahn.
CompR Paired Comparison Data Analysis
Different tools for describing and analysing paired comparison data are presented. Main methods are estimation of products scores according Bradley Terry Luce model. A segmentation of the individual could be conducted on the basis of a mixture distribution approach. The number of classes can be tested by the use of Monte Carlo simulations. This package deals also with multi-criteria paired comparison data.
Conake Continuous Associated Kernel Estimation
Continuous smoothing of probability density function on a compact or semi-infinite support is performed using four continuous associated kernels: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The cross-validation technique is also implemented for bandwidth selection.
concatenate Human-Friendly Text from Unknown Strings
Simple functions for joining strings. Construct human-friendly messages whose elements aren’t known in advance, like in stop, warning, or message, from clean code.
concaveman A Very Fast 2D Concave Hull Algorithm
The concaveman function ports the ‘concaveman’ (<https://…/concaveman> ) library from ‘mapbox’. It computes the concave polygon(s) for one or several set of points.
conclust Pairwise Constraints Clustering
There are 3 main functions in this package: ckmeans(), lcvqe() and mpckm(). They take an unlabeled dataset and two lists of must-link and cannot-link constraints as input and produce a clustering as output.
concordance Product Concordance
A set of utilities for matching products in different classification codes used in international trade research. It supports concordance between HS (Combined), ISIC Rev. 2,3, and SITC1,2,3,4 product classification codes, as well as BEC, NAICS, and SIC classifications. It also provides code nomenclature / descriptions look-up, Rauch classification look-up (via concordance to SITC2) and trade elasticity look-up (via concordance to SITC2/3 or HS3.ss).
condformat Conditional Formatting in Data Frames
Apply and visualize conditional formatting to data frames in R. It presents a data frame as an HTML table with cells CSS formatted according to criteria defined by rules, using a syntax similar to ‘ggplot2’. The table is printed either opening a web browser or within the ‘RStudio’ viewer if available. The conditional formatting rules allow to highlight cells matching a condition or add a gradient background to a given column based on a column values.
CondIndTests Nonlinear Conditional Independence Tests
Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <arXiv:1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <arXiv:1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <arXiv:1706.08576>).
condir Computation of P Values and Bayes Factors for Conditioning Data
Set of functions for the easy analyses of conditioning data.
conditions Standardized Conditions for R
Implements specialized conditions, i.e., typed errors, warnings and messages. Offers a set of standardized conditions (value error, deprecated warning, io message, …) in the fashion of Python’s built-in exceptions.
conditionz Control How Many Times Conditions are Thrown
Provides ability to control how many times in function calls conditions are thrown (shown to the user). Includes control of warnings and messages.
condSURV Estimation of the Conditional Survival Function for Ordered Multivariate Failure Time Data
Method to implement some newly developed methods for the estimation of the conditional survival function.
condusco Query-Driven Pipeline Execution and Query Templates
Runs a function iteratively over each row of either a dataframe or the results of a query. Use the ‘BigQuery’ and ‘DBI’ wrappers to iteratively pass each row of query results to a function. If a field contains a ‘JSON’ string, it will be converted to an object. This is helpful for queries that return ‘JSON’ strings that represent objects. These fields can then be treated as objects by the pipeline.
condvis Conditional Visualization for Statistical Models
Exploring fitted model structures by interactively taking 2-D and 3-D sections in data space.
conf Plotting Two-Dimensional Confidence Regions
Plots the two-dimensional confidence region for probability distribution (Weibull or inverse Gaussian) parameters corresponding to a user given dataset and level of significance. The crplot() algorithm plots more points in areas of greater curvature to ensure a smooth appearance throughout the confidence region boundary. An alternative heuristic plots a specified number of points at roughly uniform intervals along its boundary. Both heuristics build upon the radial profile log-likelihood ratio technique for plotting two-dimensional confidence regions given by Jaeger (2016) <doi:10.1080/00031305.2016.1182946>.
ConfigParser Package to Parse an INI File, Including Variable Interpolation
Enhances the ‘ini’ package by adding the ability to interpolate variables. The INI configuration file is read into an R6 ConfigParser object (loosely inspired by Pythons ConfigParser module) and the keys can be read, where ‘%(….)s’ instances are interpolated by other included options or outside variables.
configr An Implementation of Parsing and Writing Configuration File (JSON/INI/YAML)
Implements the YAML parser, JSON parser and INI parser for R setting and writing of configuration file. The functionality of this package is similar to that of package ‘config’.
configural Multivariate Profile Analysis
R functions for criterion profile analysis, Davison and Davenport (2002) <doi:10.1037/1082-989X.7.4.468> and meta-analytic criterion profile analysis, Wiernik, Wilmot, Davison, and Ones (2019). Sensitivity analyses to aid in interpreting criterion profile analysis results are also included.
confinterpret Descriptive Interpretations of Confidence Intervals
Produces descriptive interpretations of confidence intervals. Includes (extensible) support for various test types, specified as sets of interpretations dependent on where the lower and upper confidence limits sit.
ConfIntVariance Confidence Interval for the Univariate Population Variance without Normality Assumption
Surrounds the usual sample variance of a univariate numeric sample with a confidence interval for the population variance. This has been done so far only under the assumption that the underlying distribution is normal. Under the hood, this package implements the unique least-variance unbiased estimator of the variance of the sample variance, in a formula that is equivalent to estimating kurtosis and square of the population variance in an unbiased way and combining them according to the classical formula into an estimator of the variance of the sample variance. Both the sample variance and the estimator of its variance are U-statistics. By the theory of U-statistic, the resulting estimator is unique. See Fuchs, Krautenbacher (2016) <doi:10.1080/15598608.2016.1158675> and the references therein for an overview of unbiased estimation of variances of U-statistics.
conformal Conformal Prediction for Regression and Classification
Implementation of conformal prediction using caret models for classification and regression
ConfoundedMeta Sensitivity Analyses for Unmeasured Confounding in Meta-Analyses
Conducts sensitivity analyses for unmeasured confounding in random-effects meta-analysis per Mathur & VanderWeele (in preparation). Given output from a random-effects meta-analysis with a relative risk outcome, computes point estimates and inference for: (1) the proportion of studies with true causal effect sizes more extreme than a specified threshold of scientific significance; and (2) the minimum bias factor and confounding strength required to reduce to less than a specified threshold the proportion of studies with true effect sizes of scientifically significant size. Creates plots and tables for visualizing these metrics across a range of bias values.
confSAM Estimates and Bounds for the False Discovery Proportion, by Permutation
For multiple testing. Computes estimates and confidence bounds for the False Discovery Proportion (FDP), the fraction of false positives among all rejected hypotheses. The methods in the package use permutations of the data. Doing so, they take into account the dependence structure in the data.
Conigrave Flexible Tools for Multiple Imputation
Provides a set of tools that can be used across ‘data.frame’ and ‘imputationList’ objects.
connect3 A Tool for Reproducible Research by Converting ‘LaTeX’ Files Generated by R Sweave to Rich Text Format Files#
Converts ‘LaTeX’ files (with extension ‘.tex’) generated by R Sweave using package ‘knitr’ to Rich Text Format files (with extension ‘.rtf’). Rich Text Format files can be read and written by most word processors.
conover.test Conover-Iman Test of Multiple Comparisons Using Rank Sums
Computes the Conover-Iman test (1979) for stochastic dominance and reports the results among multiple pairwise comparisons after a Kruskal-Wallis test for stochastic dominance among k groups (Kruskal and Wallis, 1952). The interpretation of stochastic dominance requires an assumption that the CDF of one group does not cross the CDF of the other. conover.test makes k(k-1)/2 multiple pairwise comparisons based on Conover-Iman t-test-statistic of the rank differences. The null hypothesis for each pairwise comparison is that the probability of observing a randomly selected value from the first group that is larger than a randomly selected value from the second group equals one half; this null hypothesis corresponds to that of the Wilcoxon-Mann-Whitney rank-sum test. Like the rank-sum test, if the data can be assumed to be continuous, and the distributions are assumed identical except for a difference in location, Conover-Iman test may be understood as a test for median difference. conover.test accounts for tied ranks. The Conover-Iman test is strictly valid if and only if the corresponding Kruskal-Wallis null hypothesis is rejected.
ConSpline Partial Linear Least-Squares Regression using Constrained Splines
Given response y, continuous predictor x, and covariate matrix, the relationship between E(y) and x is estimated with a shape-constrained regression spline. Function outputs fits and various types of inference.
ConsRank Compute the Median Ranking(s) According to the Kemeny’s Axiomatic Approach
Compute the median ranking according the Kemeny’s axiomatic approach. Rankings can or cannot contain ties, rankings can be both complete or incomplete.
constants Reference on Constants, Units and Uncertainty
CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with errors and/or the values with units are also provided if the ‘errors’ and/or the ‘units’ packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This package contains the ‘2014 CODATA’ version, published on 25 June 2015: Mohr, P. J., Newell, D. B. and Taylor, B. N. (2016) <DOI:10.1103/RevModPhys.88.035009>, <DOI:10.1063/1.4954402>.
constellation Identify Event Sequences Using Time Series Joins
Examine any number of time series data frames to identify instances in which various criteria are met within specified time frames. In clinical medicine, these types of events are often called ‘constellations of signs and symptoms’, because a single condition depends on a series of events occurring within a certain amount of time of each other. This package was written to work with any number of time series data frames and is optimized for speed to work well with data frames with millions of rows.
ContaminatedMixt Model-Based Clustering and Classification with the Multivariate Contaminated Normal Distribution
Fits mixtures of multivariate contaminated normal distributions (with eigen-decomposed scale matrices) via the expectation conditional- maximization algorithm under a clustering or classification paradigm.
ContourFunctions Create Contour Plots from Data or a Function
Provides functions for making contour plots. The contour plot can be created from grid data, a function, or a data set. If non-grid data is given, then a Gaussian process is fit to the data and used to create the contour plot.
controlTest Median Comparison for Two-Sample Right-Censored Survival Data
Nonparametric two-sample procedure for comparing the median survival time.
ConvergenceClubs Finding Convergence Clubs
Functions for clustering regions that form convergence clubs, according to the definition of Phillips and Sul (2009) <doi:10.1002/jae.1080>.
convertGraph Convert Graphical Files Format
Converts graphical file formats (SVG, PNG, JPEG, BMP, GIF, PDF, etc) to one another. The exceptions are the SVG file format that can only be converted to other formats and in contrast, PDF format, which can only be created from others graphical formats. The main purpose of the package was to provide a solution for converting SVG file format to PNG which is often needed for exporting graphical files produced by R widgets.
convertr Convert Between Units
Provides conversion functionality between a broad range of scientific, historical, and industrial unit types.
convexjlr Disciplined Convex Programming in R using Convex.jl
Package convexjlr provides a simple high-level wrapper for Julia package ‘Convex.jl’ (see <https://…/Convex.jl> for more information), which makes it easy to describe and solve convex optimization problems in R. The problems can be dealt with include: linear programs, second-order cone programs, semidefinite programs, exponential cone programs.
convey Income Concentration Analysis with Complex Survey Samples
Variance estimation on indicators of income concentration and poverty using linearized or replication-based survey designs. Wrapper around the survey package.
convoSPAT Convolution-Based Nonstationary Spatial Modeling
Fits convolution-based nonstationary Gaussian process models to point-referenced spatial data. The nonstationary covariance function allows the user to specify the underlying correlation structure and which spatial dependence parameters should be allowed to vary over space: the anisotropy, nugget variance, and process variance. The parameters are estimated via maximum likelihood, using a local likelihood approach. Also provided are functions to fit stationary spatial models for comparison, calculate the kriging predictor and standard errors, and create various plots to visualize nonstationarity.
coop Co-Operation: Fast Covariance, Correlation, and Cosine Similarity Operations
Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R’s S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes.
CoopGame Important Concepts of Cooperative Game Theory
The theory of cooperative games with transferable utility offers useful insights into the way parties can share gains from cooperation and secure sustainable agreements, see e.g. one of the books by Chakravarty, Mitra and Sarkar (2015, ISBN:978-1107058798) or by Driessen (1988, ISBN:978-9027727299) for more details. A comprehensive set of tools for cooperative game theory with transferable utility is provided. Users can create special families of cooperative games, like e.g. bankruptcy games, cost sharing games and weighted voting games. There are functions to check various game properties and to compute five different set-valued solution concepts for cooperative games. A large number of point-valued solution concepts is available reflecting the diverse application areas of cooperative game theory. Some of these point-valued solution concepts can be used to analyze weighted voting games and measure the influence of individual voters within a voting body. There are routines for visualizing both set-valued and point-valued solutions in the case of three or four players.
coopProductGame Cooperative Aspects of Linear Production Programming Problems
Computes cooperative game and allocation rules associated with linear production programming problems.
copCAR Fitting the copCAR Regression Model for Discrete Areal Data
Provides tools for fitting the copCAR regression model for discrete areal data. Three types of estimation are supported: continuous extension, composite marginal likelihood, and distributional transform.
coprimary Sample Size Calculation for Two Primary Time-to-Event Endpoints in Clinical Trials
Computes the required number of patients for two time-to-event end-points as primary endpoint in phase III clinical trial.
coRanking Co-Ranking Matrix
Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.
Corbi Collection of Rudimentary Bioinformatics Tools
Provides a bundle of basic and fundamental bioinformatics tools, such as network querying and alignment.
cord Community Estimation in G-Models via CORD
Partition data points (variables) into communities/clusters, similar to clustering algorithms, such as k-means and hierarchical clustering. This package implements a clustering algorithm based on a new metric CORD, defined for high dimensional parametric or semi-parametric distributions. Read http://…/1508.01939 for more details.
cordillera Calculation of the OPTICS Cordillera
Functions for calculating the OPTICS Cordillera. The OPTICS Cordillera measures the amount of ‘clusteredness’ in a numeric data matrix within a distance-density based framework for a given minimum number of points comprising a cluster, as described in Rusch, Hornik, Mair (2017) <doi:10.1080/10618600.2017.1349664>. There is an R native version and a version that uses ‘ELKI’, with methods for printing, summarizing, and plotting the result. There also is an interface to the reference implementation of OPTICS in ‘ELKI’.
CORE Cores of Recurrent Events
given a collection of intervals with integer start and end positions, find recurrently targeted regions and estimate the significance of finding. Randomization is implemented by parallel methods, either using local host machines, or submitting grid engine jobs.
corehunter Fast and Flexible Core Subset Selection
Interface to the Core Hunter software for core subset selection. Cores can be constructed based on genetic marker data, phenotypic traits, a precomputed distance matrix, or any combination of these. Various measures are included such as Modified Rogers’ distance and Shannon’s diversity index (for genotypes) and Gower’s distance (for phenotypes). Core Hunter can also optimize a weighted combination of multiple measures, to bring the different perspectives closer together.
CORElearn Classification, Regression and Feature Evaluation
This is a suite of machine learning algorithms written in C++ with R interface. It contains several machine learning model learning techniques in classification and regression, for example classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. It is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, DKM. These methods can be used for example to discretize numeric attributes. Its additional strength is OrdEval algorithm and its visualization used for evaluation of data sets with ordinal features and class enabling analysis according to the Kano model. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
coreSim Core Functionality for Simulating Quantities of Interest from Generalised Linear Models
Core functions for simulating quantities of interest from generalised linear models (GLM). This package will form the backbone of a series of other packages that improve the interpretation of GLM estimates.
corkscrew Preprocessor for Data Modeling
Includes binning categorical variables into lesser number of categories based on t-test, converting categorical variables into continuous features using the mean of the response variable for the respective categories, understanding the relationship between the response variable and predictor variables using data transformations.
corlink Record Linkage, Incorporating Imputation for Missing Agreement Patterns, and Modeling Correlation Patterns Between Fields
A matrix of agreement patterns and counts for record pairs is the input for the procedure. An EM algorithm is used to impute plausible values for missing record pairs. A second EM algorithm, incorporating possible correlations between per-field agreement, is used to estimate posterior probabilities that each pair is a true match – i.e. constitutes the same individual.
CornerstoneR Collection for ‘CornerstoneR’ Interface
Collection of scripts for interface between ‘Cornerstone’ and ‘R’. ‘Cornerstone’ (<https://…/> ) as a software for engineering analytics supports an interface to ‘R’. The scripts are designed to support an easy usage of this interface.
cornet Elastic Net with Dichotomised Outcomes
Implements lasso and ridge regression for dichotomised outcomes (Rauschenberger et al. 2019). Such outcomes are not naturally but artificially binary. They indicate whether an underlying measurement is greater than a threshold.
CorporaCoCo Corpora Co-Occurrence Comparison
A set of functions used to compare co-occurrence between two corpora.
CoRpower Power Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials
Calculates power for assessment of intermediate biomarker responses as correlates of risk in the active treatment group in clinical efficacy trials, as described in Gilbert, Janes, and Huang, Power/Sample Size Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials (2016, Statistics in Medicine). The methods differ from past approaches by accounting for the level of clinical treatment efficacy overall and in biomarker response subgroups, which enables the correlates of risk results to be interpreted in terms of potential correlates of efficacy/protection. The methods also account for inter-individual variability of the observed biomarker response that is not biologically relevant (e.g., due to technical measurement error of the laboratory assay used to measure the biomarker response), which is important because power to detect a specified correlate of risk effect size is heavily affected by the biomarker’s measurement error. The methods can be used for a general binary clinical endpoint model with a univariate dichotomous, trichotomous, or continuous biomarker response measured in active treatment recipients at a fixed timepoint after randomization, with either case-cohort Bernoulli sampling or case-control without-replacement sampling of the biomarker (a baseline biomarker is handled as a trivial special case). In a specified two-group trial design, the computeN() function can initially be used for calculating additional requisite design parameters pertaining to the target population of active treatment recipients observed to be at risk at the biomarker sampling timepoint. Subsequently, the power calculation employs an inverse probability weighted logistic regression model fitted by the tps() function in the ‘osDesign’ package. Power results as well as the relationship between the correlate of risk effect size and treatment efficacy can be visualized using various plotting functions.
corpus Text Corpus Analysis
Text corpus data analysis, with full support for UTF8-encoded Unicode text. The package provides the ability to seamlessly read and process text from large JSON files without holding all of the data in memory simultaneously.
corpustools Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
corr2D Implementation of 2D Correlation Analysis
Implementation of two-dimensional (2D) correlation analysis based on the Fourier-transformation approach described by Isao Noda (I. Noda (1993) <DOI:10.1366/0003702934067694>). Additionally there are two plot functions for the resulting correlation matrix: The first one creates coloured 2D plots, while the second one generates 3D plots.
correctedAUC Correcting AUC for Measurement Error
Correcting area under ROC (AUC) for measurement error based on probit-shift model.
CorrectedFDR Correcting False Discovery Rates
There are many estimators of false discovery rate. In this package we compute the Nonlocal False Discovery Rate (NFDR) and the estimators of local false discovery rate: Corrected False discovery Rate (CFDR), Re-ranked False Discovery rate (RFDR) and the blended estimator. Bickel, D. R. (2016) <http://…/34277>.
corregp Functions and Methods for Correspondence Regression
A collection of tools for correspondence regression, i.e. the correspondence analysis of the crosstabulation of a categorical variable Y in function of another one X, where X can in turn be made up of the combination of various categorical variables. Consequently, correspondence regression can be used to analyze the effects for a polytomous or multinomial outcome variable.
corrr Correlations in R
A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualising the matrix in terms of the strength of the correlations.
CorrToolBox Modeling Correlational Magnitude Transformations in Discretization Contexts
Modeling the correlation transitions under specified distributional assumptions within the realm of discretization in the context of the latency and threshold concepts.
corset Arbitrary Bounding of Series and Time Series Objects
Set of methods to constrain numerical series and time series within arbitrary boundaries.
CorShrink Adaptive Shrinkage of Correlation Vectors and Matrices
Performs adaptive shrinkage of correlation and covariance matrices using a mixture model prior over the Fisher z-transformation of the correlations, Stephens (2016) <doi:10.1093/biostatistics/kxw041> with the method flexible in choosing a separate shrinkage intensity for each cell of the correlation or covariance matrices: it is particularly efficient in handling missing data in the data matrix.
cosa Constrained Optimal Sample Allocation
Implements generalized constrained optimal sample allocation framework for multilevel regression discontinuity studies and multilevel randomized trials with continuous outcomes. Bulus, M. (2017). Design Considerations in Three-level Regression Discontinuity Studies (Doctoral dissertation). University of Missouri, Columbia, MO.
cosinor2 Extended Tools for Cosinor Analysis of Rhythms
Statistical procedures for calculating population-mean cosinor, non-stationary cosinor, estimation of best-fitting period, tests of population rhythm differences and more. See Cornélissen, G. (2014). <doi:10.1186/1742-4682-11-16>.
CoSMoS Complete Stochastic Modelling Solution
A single framework, unifying, extending, and improving a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific ‘parent’ Gaussian process Papalexiou (2018) <doi:10.1016/j.advwatres.2018.02.013>.
costsensitive Cost-Sensitive Multi-Class Classification
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, A., Langford, J., & Zadrozny, B., 2008, <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B., 2005, <https://…/citation.cfm?id=1102358> ) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don’t accept observation weights.
CosW The CosW Distribution
Density, distribution function, quantile function, random generation and survival function for the Cosine Weibull Distribution as defined by SOUZA, L. New Trigonometric Class of Probabilistic Distributions. 219 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2015 (available at <http://…obabilistic-distributions-602633.html> ) and BRITO, C. C. R. Method Distributions generator and Probability Distributions Classes. 241 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2014 (available upon request).
Counterfactual Estimation and Inference Methods for Counterfactual Analysis
Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, Fernandez-Val and Melly (2013) <DOI:10.3982/ECTA10582> ‘Inference on Counterfactual Distributions,’ Econometrica, 81(6). The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be applied to estimate quantile treatment effects and wage decompositions.
countHMM Penalized Estimation of Flexible Hidden Markov Models for Time Series of Counts
Provides tools for penalized estimation of flexible hidden Markov models for time series of counts w/o the need to specify a (parametric) family of distributions. These include functions for model fitting, model checking, and state decoding. For details, see Adam, T., Langrock, R., and Wei, C.H. (2019): Penalized Estimation of Flexible Hidden Markov Models for Time Series of Counts. <arXiv:1901.03275>.
Countr Flexible Univariate and Bivariate Count Process Probability
Flexible univariate and bivariate count models based on the Weibull distribution. The models may include covariates and can be specified with familiar formula syntax.
COUSCOus A Residue-Residue Contact Detecting Method
Contact prediction using shrinked covariance (COUSCOus). COUSCOus is a residue-residue contact detecting method approaching the contact inference using the glassofast implementation of Matyas and Sustik (2012, The University of Texas at Austin UTCS Technical Report 2012:1-3. TR-12-29.) that solves the L_1 regularised Gaussian maximum likelihood estimation of the inverse of a covariance matrix. Prior to the inverse covariance matrix estimation we utilise a covariance matrix shrinkage approach, the empirical Bayes covariance estimator, which has been shown by Haff (1980) <DOI:10.1214/aos/1176345010> to be the best estimator in a Bayesian framework, especially dominating estimators of the form aS, such as the smoothed covariance estimator applied in a related contact inference technique PSICOV.
covafillr Local Polynomial Regression of State Dependent Covariates in State-Space Models
Facilitates local polynomial regression for state dependent covariates in state-space models. The functionality can also be used from ‘C++’ based model builder tools such as ‘Rcpp’/’inline’, ‘TMB’, or ‘JAGS’.
covatest Tests on Properties of Space-Time Covariance Functions
Tests on properties of space-time covariance functions. Tests on symmetry, separability and for assessing different forms of non-separability are available. Moreover tests on some classes of covariance functions, such that the classes of product-sum models, Gneiting models and integrated product models have been provided.
covequal Test for Equality of Covariance Matrices
Computes p-values using the largest root test using an approximation to the null distribution by Johnstone (2008) <DOI:10.1214/08-AOS605>.
COveR Clustering with Overlaps
Provide functions for overlaps clustering, fuzzy clustering and interval-valued data manipulation. The package implement the following algorithms: OKM (Overlapping Kmeans) from Cleuziou, G. (2007) <doi:10.1109/icpr.2008.4761079> ; NEOKM (Non-exhaustive overlapping Kmeans) from Whang, J. J., Dhillon, I. S., and Gleich, D. F. (2015) <doi:10.1137/1.9781611974010.105> ; Fuzzy Cmeans from Bezdek, J. C. (1981) <doi:10.1007/978-1-4757-0450-1> ; Fuzzy I-Cmeans from de A.T. De Carvalho, F. (2005) <doi:10.1016/j.patrec.2006.08.014>.
covmat Covariance Matrix Estimation
We implement a collection of techniques for estimating covariance matrices. Covariance matrices can be built using missing data. Stambaugh Estimation and FMMC methods can be used to construct such matrices. Covariance matrices can be built by denoising or shrinking the eigenvalues of a sample covariance matrix. Such techniques work by exploiting the tools in Random Matrix Theory to analyse the distribution of eigenvalues. Covariance matrices can also be built assuming that data has many underlying regimes. Each regime is allowed to follow a Dynamic Conditional Correlation model. Robust covariance matrices can be constructed by multivariate cleaning and smoothing of noisy data.
covr Test Coverage for Packages
Track and report code coverage for your package and (optionally) upload the results to a coverage service like Codecov (http://codecov.io ) or Coveralls (http://coveralls.io ). Code coverage is a measure of the amount of code being exercised by the tests. It is an indirect measure of test quality. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/Fortran code.
CovSelHigh Model-Free Covariate Selection in High Dimensions
Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) <DOI:10.1093/biomet/asr041> and VanderWeele and Shpitser (2011) <DOI:10.1111/j.1541-0420.2011.01619.x>. Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO.
covTestR Covariance Matrix Tests
Testing functions for Covariance Matrices. These tests include high-dimension homogeneity of covariance matrix testing described by Schott (2007) <doi:10.1016/j.csda.2007.03.004> and high-dimensional one-sample tests of covariance matrix structure described by Fisher, et al. (2010) <doi:10.1016/j.jmva.2010.07.004>. Covariance matrix tests use C++ to speed performance and allow larger data sets.
CovTools Statistical Tools for Covariance Analysis
Covariance is of universal prevalence across various disciplines within statistics. We provide a rich collection of geometric and inferential tools for convenient analysis of covariance structures, topics including distance measures, mean covariance estimator, covariance hypothesis test for one-sample and two-sample cases, and covariance estimation. For an introduction to covariance in multivariate statistical analysis, see Schervish (1987) <doi:10.1214/ss/1177013111>.
cowbell Performs Segmented Linear Regression on Two Independent Variables
Implements a specific form of segmented linear regression with two independent variables. The visualization of that function looks like a quarter segment of a cowbell giving the package its name. The package has been specifically constructed for the case where minimum and maximum value of the dependent and two independent variables are known a prior, which is usually the case when those values are derived from Likert scales.
cowplot Streamlined Plot Theme and Plot Annotations for ‘ggplot2’
Some helpful extensions and modifications to the ‘ggplot2’ library. In particular, this package makes it easy to combine multiple ‘ggplot2’ plots into one and label them with letters, e.g. A, B, C, etc., as is often required for scientific publications. The package also provides a streamlined and clean theme that is used in the Wilke lab, hence the package name, which stands for Claus O. Wilke’s plot library.
coxed Duration-Based Quantities of Interest for the Cox Proportional Hazards Model
Functions for generating, simulating, and visualizing expected durations and marginal changes in duration from the Cox proportional hazards model.
Coxnet Regularized Cox Model
Cox model regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty. In addition, it efficiently solves an approximate L0 variable selection based on truncated likelihood function. Moreover, it can also handle the adaptive version of these regularization forms, such as adaptive lasso and net adjusting for signs of linked coefficients. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
coxphMIC Sparse estimation method for Cox Proportional Hazards
coxphMIC, which implements the sparse estimation method for Cox proportional hazards models via approximated information criterion (Su et al., 2016 Biometrics). The developed methodology is named MIC which stands for ‘Minimizing approximated Information Criteria’. A reparameterization step is introduced to enforce sparsity while at the same time keeping the objective function smooth. As a result, MIC is computationally fast with a superior performance in sparse estimation.
CoxPlus Cox Regression (Proportional Hazards Model) with Multiple Causes and Mixed Effects
A high performance package estimating Proportional Hazards Model when an even can have more than one causes, including support for random and fixed effects, tied events, and time-varying variables.
coxrt Cox Proportional Hazards Regression for Right Truncated Data
Fits Cox regression based on retrospectively ascertained times-to-event. The method uses Inverse-Probability-Weighting estimating equations.
CP Conditional Power Calculations
Functions for calculating the conditional power for different models in survival time analysis within randomized clinical trials with two different treatments to be compared and survival as an endpoint.
cplots Plots for Circular Data
Provides functions to produce some circular plots for circular data, in a height- or area-proportional manner. They include barplots, smooth density plots, stacked dot plots, histograms, multi-class stacked smooth density plots, and multi-class stacked histograms. The new methodology for general area-proportional circular visualization is described in an article submitted (after revision) to Journal of Computational and Graphical Statistics.
cpm Sequential and Batch Change Detection Using Parametric and Nonparametric Methods
Sequential and batch change detection for univariate data streams, using the change point model framework. Functions are provided to allow the parametric monitoring of sequences of Gaussian, Bernoulli and Exponential random variables, along with functions implementing more general nonparametric methods for monitoring sequences which have an unspecified or unknown distribution.
CPP Composition of Probabilistic Preferences (CPP)
CPP is a multiple criteria decision method to evaluate alternatives on complex decision making problems, by a probabilistic approach. The CPP was created and expanded by Sant’Anna, Annibal P. (2015) <doi:10.1007/978-3-319-11277-0>.
cpr Control Polygon Reduction
Implementation of the Control Polygon Reduction and Control Net Reduction methods for finding parsimonious B-spline regression models.
CPsurv Nonparametric Change Point Estimation for Survival Data
Nonparametric change point estimation for survival data based on p-values of exact binomial tests.
cpt Classification Permutation Test
Non-parametric test for equality of multivariate distributions. Trains a classifier to classify (multivariate) observations as coming from one of two distributions. If the classifier is able to classify the observations better than would be expected by chance (using permutation inference), then the null hypothesis that the two distributions are equal is rejected.
cptcity cpt-city’ Colour Gradients
Incorporates colour gradients from the ‘cpt-city’ web archive available at <http://…/>.
cpumemlog Monitor CPU and RAM usage of a process (and its children)
cpumemlog.sh is a Bash shell script that monitors CPU and RAM usage of a given process and its children. The main aim for writing this script was to get insight about the behaviour of a process and to spot bottlenecks without GUI tools, e.g., cpumemlog.sh it is very useful to spot that the computationally intensive process on a remote server died due to hitting RAM limit or something of that sort. The statistics about CPU, RAM, and all that are gathered from the system utility ps. While the utility top can be used for this interactively, it is tedious to stare at its dynamic output and quite hard to spot consumption at the peak and follow the trends etc. Yet another similar utility is time, which though only gives consumption of resources at the peak. cpumemlogplot.R is a companion R script to cpumemlog.sh used to summarize and plot the gathered data.
cqrReg Quantile, Composite Quantile Regression and Regularized Versions
Estimate quantile regression(QR) and composite quantile regression (cqr) and with adaptive lasso penalty using interior point (IP), majorize and minimize(MM), coordinate descent (CD), and alternating direction method of multipliers algorithms(ADMM).
cquad Conditional Maximum Likelihood for Quadratic Exponential Models for Binary Panel Data
Estimation, based on conditional maximum likelihood, of the quadratic exponential model proposed by Bartolucci, F. & Nigro, V. (2010, Econometrica) and of a simplified and a modified version of this model. The quadratic exponential model is suitable for the analysis of binary longitudinal data when state dependence (further to the effect of the covariates and a time-fixed individual intercept) has to be taken into account. Therefore, this is an alternative to the dynamic logit model having the advantage of easily allowing conditional inference in order to eliminate the individual intercepts and then getting consistent estimates of the parameters of main interest (for the covariates and the lagged response). The simplified version of this model does not distinguish, as the original model does, between the last time occasion and the previous occasions. The modified version formulates in a different way the interaction terms and it may be used to test in a easy way state dependence as shown in Bartolucci, F., Nigro, V. & Pigini, C. (2013, Econometric Reviews). The package also includes estimation of the dynamic logit model by a pseudo conditional estimator based on the quadratic exponential model, as proposed by Bartolucci, F. & Nigro, V. (2012, Journal of Econometrics).
cr17 Testing Differences Between Competing Risks Models and Their Visualisations
Tool for analyzing competing risks models. The main point of interest is testing differences between groups (as described in R.J Gray (1988) <doi:10.1214/aos/1176350951> and J.P. Fine, R.J Gray (1999) <doi:10.2307/2670170>) and visualizations of survival and cumulative incidence curves.
CramTest Univariate Cramer Test on Two Samples of Data
Performs the univariate two-sample Cramer test to identify differences between two groups. This package provides a faster method for calculating the p-value. For further information, refer to ‘Properties, Advantages and a Faster p-value Calculation of the Cramer test’ by Telford et al. (submitted for review).
crandatapkgs Find Data-Only Packages on CRAN
Provides a data.frame listing of known data-only and data-heavy packages available on CRAN.
crandb Access to the CRAN Database API
The CRAN database provides an API for programatically accessing all meta-data of CRAN R packages. This API can be used for various purposes, here are three examples I am woking on right now:
• Writing a package manager for R. The package manager can use the CRAN DB API to query dependencies, or other meta data.
• Building a search engine for CRAN packages. The DB itself does not provide a search API, but it can be (easily) mirrored in a search engine.
• Creating an RSS feed for the new, updated or archived packages on CRAN.
cranlike Tools for ‘CRAN’-Like Repositories
A set of functions to manage ‘CRAN’-like repositories efficiently.
cranlogs Download Logs from the RStudio CRAN Mirror
API to the database of CRAN package downloads from the RStudio CRAN mirror. The database itself is at http://cranlogs.r-pkg.org , see https://…/cranlogs.app for the raw API.
cranly Package Directives and Collaboration Networks in CRAN
Provides core visualisations and summaries for the CRAN package database. The package provides comprehensive methods for cleaning up and organising the information in the CRAN package database, for building package directives networks (depends, imports, suggests, enhances) and collaboration networks, and for computing summaries and producing interactive visualisations from the resulting networks. Network visualisation is through the ‘visNetwork’ <https://…/package=visNetwork> R package. The package also provides functions to coerce the networks to ‘igraph’ <https://…/package=igraph> objects for further analyses and modelling.
CRANsearcher RStudio Addin for Searching Packages in CRAN Database Based on Keywords
One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to Bayesian inference and from spatial analyses to pharmacokinetics (<https://…/> ). There is probably not an area of quantitative research that isn’t represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio.
crblocks Categorical Randomized Block Data Analysis
Implements a statistical test for comparing bar plots or histograms of categorical data derived from a randomized block repeated measures layout.
credentials Tools for Managing SSH and Git Credentials
Setup and retrieve HTTPS and SSH credentials for use with ‘git’ and other services. For HTTPS remotes the package interfaces the ‘git-credential’ utility which ‘git’ uses to store HTTP usernames and passwords. For SSH remotes we provide convenient functions to find or generate appropriate SSH keys. The package both helps the user to setup a local git installation, and also provides a back-end for git/ssh client libraries to authenticate with existing user credentials.
creditmodel Build Binary Classification Models in One Integrated Offering
Provides a toolkit for building predictive models in one integrated offering. Contains infrastructure functionalities such as data exploration and preparation, missing values treatment, outliers treatment, variable derivation, variable selection, dimensionality reduction, grid search for hyperparameters, data mining and visualization, model evaluation, strategy analysis etc. ‘creditmodel’ is designed to make the development of binary classification models (machine learning based models as well as credit scorecard) simpler and faster.
CreditRisk Evaluation of Credit Risk with Structural and Reduced Form Models
Evaluation of default probability of sovereign and corporate entities based on structural or intensity based models and calibration on market Credit Default Swap quotes. Damiano Brigo, Massimo Morini, Andrea Pallavicini (2013): ‘Counterparty Credit Risk, Collateral and Funding. With Pricing Cases for All Asset Classes’.
credsubs Credible Subsets
Functions for constructing simultaneous credible bands and identifying subsets via the ‘credible subsets’ (also called ‘credible subgroups’) method.
crfsuite Conditional Random Fields for Labelling Sequential Data in Natural Language Processing
Wraps the ‘CRFsuite’ library <https://…/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
crisp Fits a Model that Partitions the Covariate Space into Blocks in a Data- Adaptive Way
Implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://…/15-344.pdf>.
crminer Fetch ‘Scholary’ Full Text from ‘Crossref’
Text mining client for ‘Crossref’ (<https://crossref.org> ). Includes functions for getting getting links to full text of articles, fetching full text articles from those links or Digital Object Identifiers (‘DOIs’), and text extraction from ‘PDFs’.
crmPack Object-Oriented Implementation of CRM Designs
Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.
crochet Implementation Helper for [ and [<- Of Custom Matrix-Like Types
Functions to help implement the extraction / subsetting / indexing function [ and replacement function [<- of custom matrix-like types (based on S3, S4, etc.), modeled as closely to the base matrix class as possible (with tests to prove it).
cromwellDashboard A Dashboard to Visualize Scientific Workflows in ‘Cromwell’
A dashboard supports the usage of ‘cromwell’. ‘Cromwell’ is a scientific workflow engine for command line users. This package utilizes ‘cromwell’ REST APIs and provides these convenient functions: timing diagrams for running workflows, ‘cromwell’ engine status, a tabular workflow list. For more information about ‘cromwell’, visit <http://cromwell.readthedocs.io>.
cronR Schedule R Scripts and Processes with the ‘cron’ Job Scheduler
Create, edit, and remove ‘cron’ jobs on your unix-alike system. The package provides a set of easy-to-use wrappers to ‘crontab’. It also provides an RStudio add-in to easily launch and schedule your scripts.
crop Graphics Cropping Tool
A device closing function which is able to crop graphics (e.g., PDF, PNG files) on Unix-like operating systems with the required underlying command-line tools installed.
CrossClustering A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters and Identification of Outliers
Computes a partial clustering algorithm that combines the Ward’s minimum variance and Complete Linkage algorithms, providing automatic estimation of a suitable number of clusters and identification of outlier elements.
crossdes Construction of Crossover Designs
Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.
Crossover Analysis and Search of Crossover Designs
Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.
crossrun Joint Distribution of Number of Crossings and Longest Run
Joint distribution of number of crossings and the longest run in a series of independent Bernoulli trials. The computations uses an iterative procedure where computations are based on results from shorter series. The procedure conditions on the start value and partitions by further conditioning on the position of the first crossing (or none).
crosstalk Inter-Widget Interactivity for HTML Widgets
Provides building blocks for allowing HTML widgets to communicate with each other, with Shiny or without (i.e. static .html files). Currently supports linked brushing and filtering.
CrossValidate Classes and Methods for Cross Validation of ‘Class Prediction’ Algorithms
Defines classes and methods to cross-validate various binary classification algorithms used for ‘class prediction’ problems.
crosswalkr Rename and Encode Data Frames Using External Crosswalk Files
A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in ‘Stata’.
crossword.r Generating Crosswords from Word Lists
Generate crosswords from a list of words.
crov Constrained Regression Model for an Ordinal Response and Ordinal Predictors
Fits a constrained regression model for an ordinal response with ordinal predictors and possibly others, Espinosa and Hennig (2018) <arXiv:1804.08715>. The parameter estimates associated with an ordinal predictor are constrained to be monotonic. If a monotonicity direction (isotonic or antitonic) is not specified for an ordinal predictor by the user, then the monotonicity direction classification procedure establishes it. A monotonicity test is also available to test the null hypothesis of monotonicity over a set of parameters associated with an ordinal predictor.
crqanlp Cross-Recurrence Quantification Analysis for Dynamic Natural Language Processing
Cross-recurrence quantification analysis for word series, from text, known as categorical recurrence analysis. Uses the ‘crqa’ R package by Coco and Dale (2014) <doi:10.3389/fpsyg.2014.00510>. Functions are wrappers to facilitate exploration of the sequential properties of text.
crrp Penalized Variable Selection in Competing Risks Regression
In competing risks regression, the proportional subdistribution hazards(PSH) model is popular for its direct assessment of covariate effects on the cumulative incidence function. This package allows for penalized variable selection for the PSH model. Penalties include LASSO, SCAD, MCP, and their group versions.
crseEventStudy A Robust and Powerful Test of Abnormal Stock Returns in Long-Horizon Event Studies
Based on Dutta et al. (2018) <doi:10.1016/j.jempfin.2018.02.004>, this package provides their standardized test for abnormal returns in long-horizon event studies. The methods used improve the major weaknesses of size, power, and robustness of long-run statistical tests described in Kothari/Warner (2007) <doi:10.1016/B978-0-444-53265-7.50015-9>. Abnormal returns are weighted by their statistical precision (i.e., standard deviation), resulting in abnormal standardized returns. This procedure efficiently captures the heteroskedasticity problem. Clustering techniques following Cameron et al. (2011) <10.1198/jbes.2010.07136> are adopted for computing cross-sectional correlation robust standard errors. The statistical tests in this package therefore accounts for potential biases arising from returns’ cross-sectional correlation, autocorrelation, and volatility clustering without power loss.
crskdiag Diagnostics for Fine and Gray Model
Provides the implementation of analytical and graphical approaches for checking the assumptions of the Fine and Gray model.
crsnls Nonlinear Regression Parameters Estimation by ‘CRS4HC’ and ‘CRS4HCe’
Functions for nonlinear regression parameters estimation by algorithms based on Controlled Random Search algorithm. Both functions (crs4hc(), crs4hce()) adapt current search strategy by four heuristics competition. In addition, crs4hce() improves adaptability by adaptive stopping condition.
crtests Classification and Regression Tests
Provides wrapper functions for running classification and regression tests using different machine learning techniques, such as Random Forests and decision trees. The package provides standardized methods for preparing data to suit the algorithm’s needs, training a model, making predictions, and evaluating results. Also, some functions are provided to run multiple instances of a test.
CRTgeeDR Doubly Robust Inverse Probability Weighted Augmented GEE Estimator
Implements a semi-parametric GEE estimator accounting for missing data with Inverse-probability weighting (IPW) and for imbalance in covariates with augmentation (AUG). The estimator IPW-AUG-GEE is Doubly robust (DR).
crul HTTP Client
A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby’s ‘faraday’ gem (<https://…/faraday> ). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package ‘curl’, an interface to ‘libcurl’ (<https://…/libcurl> ).
crunch Crunch.io Data Tools
The Crunch.io service (http://crunch.io ) provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
crunchy Shiny Apps on Crunch
To facilitate building custom dashboards on the Crunch data platform <http://…/>, the ‘crunchy’ package provides tools for working with ‘shiny’. These tools include utilities to manage authentication and authorization automatically and custom stylesheets to help match the look and feel of the Crunch web application.
CRWRM Changing the Reference Group without Re-Running the Model
To re-calculate the coefficients and the standard deviation when changing the reference group.
csabounds Bounds on Distributional Treatment Effect Parameters
The joint distribution of potential outcomes is not typically identified under standard identifying assumptions such as selection on observables or even when individuals are randomly assigned to being treated. This package contains methods for obtaining tight bounds on distributional treatment effect parameters when panel data is available and under a Copula Stability Assumption as in Callaway (2017) <https://ssrn.com/abstract=3028251>.
CsChange Testing for Change in C-Statistic
Calculate the confidence interval and p value for change in C-statistic. The adjusted C-statistic is calculated by using formula as ‘Somers’ Dxy rank correlation’/2+0.5. The confidence interval was calculated by using the bootstrap method. The p value was calculated by using the Z testing method. Please refer to the article of Peter Ganz et al. (2016) <doi:10.1001/jama.2016.5951>.
CSeqpat Frequent Contiguous Sequential Pattern Mining of Text
Mines contiguous sequential patterns in text.
CSFA Connectivity Scores with Factor Analysis
Applies factor analysis methodology to microarray data in order to derive connectivity scores between compounds. The package also contains an implementation of the connectivity score algorithm by Zhang and Gant (2008) <doi:10.1186/1471-2105-9-258>.
csn Closed Skew-Normal Distribution
Provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42.
csp Correlates of State Policy Data Set in R
Provides the Correlates of State Policy data set for easy use in R.
csrplus Methods to Test Hypotheses on the Distribution of Spatial Point Processes
Includes two functions to evaluate the hypothesis of complete spatial randomness (csr) in point processes. The function ‘mwin’ calculates quadrat counts to estimate the intensity of a spatial point process through the moving window approach proposed by Bailey and Gatrell (1995). Event counts are computed within a window of a set size over a fine lattice of points within the region of observation. The function ‘pielou’ uses the nearest neighbor test statistic and asymptotic distribution proposed by Pielou (1959) to compare the observed point process to one generated under csr. The value can be compared to that given by the more widely used test proposed by Clark and Evans (1954).
cssTools Cognitive Social Structure Tools
A collection of tools for estimating a network from a random sample of cognitive social structure (CSS) slices. Also contains functions for evaluating a CSS in terms of various error types observed in each slice.
cstab Selection of Number of Clusters via Normalized Clustering Instability
Selection of the number of clusters in cluster analysis using stability methods.
CSTools Assessing Skill of Climate Forecasts on Seasonal-to-Decadal Timescales
Exploits dynamical seasonal forecasts in order to provide information relevant to stakeholders at the seasonal timescale. The package contains process-based methods for forecast calibration, bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products. Doblas-Reyes et al. (2005) <doi:10.1111/j.1600-0870.2005.00104.x>. Mishra et al. (2018) <doi:10.1007/s00382-018-4404-z>. Terzago et al. (2018) <doi:10.5194/nhess-18-2825-2018>. Torralba et al. (2017) <doi:10.1175/JAMC-D-16-0204.1>. D’Onofrio et al. (2014) <doi:10.1175/JHM-D-13-096.1>.
csv Read and Write CSV Files with Selected Conventions
Reads and writes CSV with selected conventions. Uses the same generic function for reading and writing to promote consistent formats.
CTAShiny Interactive Application for Working with Contingency Tables
An interactive application for working with contingency Tables. The application has a template for solving contingency table problems like chisquare test of independence,association plot between two categorical variables. Runtime examples are provided in the package function as well as at <https://…/>.
cthreshER Continuous Threshold Expectile Regression
Estimation and inference methods for the continuous threshold expectile regression. It can fit the continuous threshold expectile regression and test the existence of change point, for the paper, ‘Feipeng Zhang and Qunhua Li (2016). A continuous threshold expectile regression, submitted.’
CTM A Text Mining Toolkit for Chinese Document
The CTM package is designed to solve problems of text mining and is specific for Chinese document.
ctmcd Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data
Functions for estimating Markov generator matrices from discrete-time observations. The implemented approaches comprise diagonal adjustment, weighted adjustment and quasi-optimization of matrix logarithm based candidate solutions, an expectation-maximization algorithm as well as a Gibbs sampler.
ctmle Collaborative Targeted Maximum Likelihood Estimation
Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).
ctqr Censored and Truncated Quantile Regression
Estimation of quantile regression models for survival data.
ctsem Continuous Time Structural Equation Modelling
An easily accessible continuous (and discrete) time dynamic modelling package for panel and time series data, reliant upon the OpenMx. package (http://openmx.psyc.virginia.edu ) for computation. Most dynamic modelling approaches to longitudinal data rely on the assumption that time intervals between observations are consistent. When this assumption is adhered to, the data gathering process is necessarily limited to a specific schedule, and when broken, the resulting parameter estimates may be biased and reduced in power. Continuous time models are conceptually similar to vector autoregressive models (thus also the latent change models popularised in a structural equation modelling context), however by explicitly including the length of time between observations, continuous time models are freed from the assumption that measurement intervals are consistent. This allows: data to be gathered irregularly; the elimination of noise and bias due to varying measurement intervals; parsimonious structures for complex dynamics. The application of such a model in this SEM framework allows full-information maximum-likelihood estimates for both N = 1 and N > 1 cases, multiple measured indicators per latent process, and the flexibility to incorporate additional elements, including individual heterogeneity in the latent process and manifest intercepts, and time dependent and independent exogenous covariates. Furthermore, due to the SEM implementation we are able to estimate a random effects model where the impact of time dependent and time independent predictors can be assessed simultaneously, but without the classic problems of random effects models assuming no covariance between unit level effects and predictors.
ctsmr Continuous Time Stochastic Modelling for R
CTSM is a tool for estimating embedded parameters in a continuous time stochastic state space model. CTSM has been developed at DTU Compute (former DTU Informatics) over several years. CTSM-R provides a new scripting interface through the statistical language R. Mixing CTSM with R provides easy access to data handling and plotting tools required in any kind of modelling.
CTTinShiny Shiny Interface for the CTT Package
A Shiny interface developed in close coordination with the CTT package, providing a GUI that guides the user through CTT analyses.
CTTShiny Classical Test Theory via Shiny
Interactive shiny application for running classical test theory (item analysis).
CUB A Class of Mixture Models for Ordinal Data
Estimating and testing models for ordinal data within the family of CUB models and their extensions (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions).
Cubist Rule- and Instance-Based Regression Modeling
Regression modeling using rules with added instance-based corrections.
CuCubes MultiDimensional Feature Selection (MDFS)
Functions for MultiDimensional Feature Selection (MDFS): * calculating multidimensional information gains, * finding interesting tuples for chosen variables, * scoring variables, * finding important variables, * plotting selection results. CuCubes is also known as CUDA Cubes and it is a library that allows fast CUDA-accelerated computation of information gains in binary classification problems. This package wraps CuCubes and provides an alternative CPU version as well as helper functions for building MultiDimensional Feature Selectors.
CUFF Charles’s Utility Function using Formula
Utility functions that provides wrapper to descriptive base functions like correlation, mean and table . It makes use of the formula interface to pass variables to functions. It also provides operators like to concatenate (%+%), to repeat and manage character vector for nice display.
cultevo Tools, Measures and Statistical Tests for Cultural Evolution
Provides tools for measuring the compositionality of signalling systems (in particular the information-theoretic measure due to Spike (2016) <http://…/25930> and the Mantel test for distance matrix correlation (after Dietz 1983) <doi:10.1093/sysbio/32.1.21>), functions for computing string and meaning distance matrices as well as an implementation of the Page test for monotonicity of ranks (Page 1963) <doi:10.1080/01621459.1963.10500843> with exact p-values up to k = 22.
curl A Modern and Flexible Web Client for R
The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https://, ftps://), ‘gzip’ compression, authentication, and other ‘libcurl’ goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of ‘libcurl’ is recommended; for a more-user-friendly web client see the ‘httr’ package which builds on this package with HTTP specific tools and logic.
The curl package: a modern R interface to libcurl
curlconverter Tools to Transform ‘cURL’ Command-Line Calls to ‘httr’ Requests
Deciphering web/’REST’ ‘API’ and ‘XHR’ calls can be tricky, which is one reason why internet browsers provide ‘Copy as cURL’ functionality within their ‘Developer Tools’ pane(s). These ‘cURL’ command-lines can be difficult to wrangle into an ‘httr’ ‘GET’ or ‘POST’ request, but you can now ‘straighten’ these ‘cURLs’ either from data copied to the system clipboard or by passing in a vector of ‘cURL’ command-lines and getting back a list of parameter elements which can be used to form ‘httr’ requests. You can also make a complete/working/callable ‘httr::VERB’ function right from the tools provided.
curry Partial Function Application with %<%, %-<%
Partial application is the process of reducing the arity of a function by fixing one or more arguments, thus creating a new function lacking the fixed arguments. The curry package provides three different ways of performing partial function application by fixing arguments from either end of the argument list (currying and tail currying) or by fixing multiple named arguments (partial application). This package provides this functionality through the %<%, %-<%, and %><% operators which allows for a programming style comparable to modern functional languages. Compared to other implementations such a purrr::partial() the operators in curry composes functions with named arguments, aiding in autocomplete etc.
curstatCI Confidence Intervals for the Current Status Model
Computes the maximum likelihood estimator, the smoothed maximum likelihood estimator and pointwise bootstrap confidence intervals for the distribution function under current status data. Groeneboom and Hendrickx (2017) <arXiv:1701.07359>.
curvecomp Multiple Curve Comparisons Using Parametric Bootstrap
Performs multiple comparison procedures on curve observations among different treatment groups. The methods are applicable in a variety of situations (such as independent groups with equal or unequal sample sizes, or repeated measures) by using parametric bootstrap. References to these procedures can be found at Konietschke, Gel, and Brunner (2014) <doi:10.1090/conm/622/12431> and Westfall (2011) <doi:10.1080/10543406.2011.607751>.
CustomerScoringMetrics Evaluation Metrics for Customer Scoring Models Depending on Binary Classifiers
Functions for evaluating and visualizing predictive model performance (specifically: binary classifiers) in the field of customer scoring. These metrics include lift, lift index, gain percentage, top-decile lift, F1-score, expected misclassification cost and absolute misclassification cost. See Berry & Linoff (2004, ISBN:0-471-47064-3), Witten and Frank (2005, 0-12-088407-0) and Blattberg, Kim & Neslin (2008, ISBN:978-0-387-72578-9) for details. Visualization functions are included for lift charts and gain percentage charts. All metrics that require class predictions offer the possibility to dynamically determine cutoff values for transforming real-valued probability predictions into class predictions.
customizedTraining Customized Training for Lasso and Elastic-Net Regularized Generalized Linear Models
Customized training is a simple technique for transductive learning, when the test covariates are known at the time of training. The method identifies a subset of the training set to serve as the training set for each of a few identified subsets in the training set. This package implements customized training for the glmnet() and cv.glmnet() functions.
customLayout Extended Version of Layout Functionality for ‘Base’ and ‘Grid’ Graphics Systems
Create complicated drawing areas for multiple plots by combining much simpler layouts. It is an extended version of layout function from the ‘graphics’ package, but it also works with ‘grid’ graphics.
customsteps Customizable Higher-Order Recipe Step Functions
Customizable higher-order recipe step functions for the ‘recipes’ package. These step functions take ‘prep’ and ‘bake’ helper functions as inputs and create specifications of customized recipe steps as output.
cusum CUSUM Charts for Monitoring of Hospital Performance
Provides functions for constructing and evaluating CUSUM charts and RA-CUSUM charts with focus on false signal probability.
CUSUMdesign Compute Decision Interval and Average Run Length for CUSUM Charts
Computation of decision intervals (H) and average run lengths (ARL) for CUSUM charts.
cutpointr Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks
Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation and various plotting functions are included.
CutpointsOEHR Optimal Equal-HR Method to Find Two Cutpoints for U-Shaped Relationships in Cox Model
Use optimal equal-HR method to determine two optimal cutpoints of a continuous predictor that has a U-shaped relationship with survival outcomes based on Cox regression model. The optimal equal-HR method estimates two optimal cut-points that have approximately the same log hazard value based on Cox regression model and divides individuals into different groups according to their HR values.
cvar Compute Expected Shortfall and Value at Risk for Continuous Distributions
Compute expected shortfall (ES) and Value at Risk (VaR) from a quantile function, distribution function, random number generator or probability density function. ES is also known as Conditional Value at Risk (CVaR). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/1468-0300.00091>.
cvcrand Efficient Design and Analysis of Cluster Randomized Trials
Constrained randomization by Raab and Butcher (2001) <doi:10.1002/1097-0258(20010215)20:3%3C351::AID-SIM797%3E3.0.CO;2-C> is suitable for cluster randomized trials (CRTs) with a small number of clusters (e.g., 20 or fewer). The procedure of constrained randomization is based on the baseline values of some cluster-level covariates specified. The intervention effect on the individual outcome can then be analyzed through clustered permutation test introduced by Gail, et al. (1996) <doi:10.1002/(SICI)1097-0258(19960615)15:11%3C1069::AID-SIM220%3E3.0.CO;2-Q>. Motivated from Li, et al. (2016) <doi:10.1002/sim.7410>, the package performs constrained randomization on the baseline values of cluster-level covariates and cluster permutation test on the individual-level outcome for cluster randomized trials.
cvequality Tests for the Equality of Coefficients of Variation from Multiple Groups
Contains functions for testing for significant differences between multiple coefficients of variation. Includes Feltz and Miller’s (1996) <DOI:10.1002/(SICI)1097-0258(19960330)15:6%3C647::AID-SIM184%3E3.0.CO;2-P> asymptotic test and Krishnamoorthy and Lee’s (2014) <DOI:10.1007/s00180-013-0445-2> modified signed-likelihood ratio test. See the vignette for more, including full details of citations.
cvmgof Cramer-von Mises Goodness-of-Fit Tests
It is devoted to Cramer-von Mises goodness-of-fit tests. It implements three statistical methods based on Cramer-von Mises statistics to estimate and test a regression model.
CVR Canonical Variate Regression
Perform canonical variate regression (CVR) for two sets of covariates and a univariate response, with regularization and weight parameters tuned by cross validation.
cvxbiclustr Convex Biclustering Algorithm
An iterative algorithm for solving a convex formulation of the biclustering problem.
CVXR Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution.
cxhull Convex Hull
Computes the convex hull in arbitrary dimension, based on the Qhull library (<http://www.qhull.org> ). The package provides a complete description of the convex hull: edges, ridges, facets, adjacencies. Triangulation is optional.
cyclocomp Cyclomatic Complexity of R Code
Cyclomatic complexity is a software metric (measurement), used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program’s source code. It was developed by Thomas J. McCabe, Sr. in 1976.
Cyclops Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis
This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets.
cyphr High Level Encryption Wrappers
Encryption wrappers, using low-level support from ‘sodium’ and ‘openssl’. ‘cyphr’ tries to smooth over some pain points when using encryption within applications and data analysis by wrapping around differences in function names and arguments in different encryption providing packages. It also provides high-level wrappers for input/output functions for seamlessly adding encryption to existing analyses.
cytofan Plot Fan Plots for Cytometry Data using ‘ggplot2’
An implementation of Fan plots for cytometry data in ‘ggplot2’. For reference see Britton, E.; Fisher, P. & J. Whitley (1998) The Inflation Report Projections: Understanding the Fan Chart <https://…ojections-understanding-the-fan-chart> ).

D

d3heatmap A D3.js-based heatmap htmlwidget for R
This is an R package that implements a heatmap htmlwidget. It has the following features:
• Highlight rows/columns by clicking axis labels
• Click and drag over colormap to zoom in (click on colormap to zoom out)
• Optional clustering and dendrograms, courtesy of base::heatmap
Interactive heat maps
D3M Two Sample Test with Wasserstein Metric
Two sample test based on Wasserstein metric. This is motivated from detection of differential DNA-methylation sites based on underlying distributions.
D3partitionR Plotting D3 Hierarchical Plots in R and Shiny
Plotting hierarchical plots in R such as Sunburst, Treemap, Circle Treemap and Partition Chart.
d3plus Seamless ‘D3Plus’ Integration
Provides functions that offer seamless ‘D3Plus’ integration. The examples provided here are taken from the official ‘D3Plus’ website <http://d3plus.org>.
d3r d3.js’ Utilities for R
Helper functions for using ‘d3.js’ in R.
d3Tree Create Interactive Collapsible Trees with the JavaScript ‘D3’ Library
Create and customize interactive collapsible ‘D3’ trees using the ‘D3’ JavaScript library and the ‘htmlwidgets’ package. These trees can be used directly from the R console, from ‘RStudio’, in Shiny apps and R Markdown documents. When in Shiny the tree layout is observed by the server and can be used as a reactive filter of structured data.
DA.MRFA Dimensionality Assessment using Minimum Rank Factor Analysis
Performs Parallel Analysis for assessing the dimensionality of a set of variables using Minimum Rank Factor Analysis (see Timmerman & Lorenzo-Seva (2011) <DOI:10.1037/a0023353> and ten Berge & Kiers (1991) <DOI:10.1007/BF02294464> for more information). The package also includes the option to compute Minimum Rank Factor Analysis by itself, as well as the Greater Lower Bound calculation.
daarem Damped Anderson Acceleration with Epsilon Monotonicity for Accelerating EM-Like Monotone Algorithms
Implements the DAAREM method for accelerating the convergence of slow, monotone sequences from smooth, fixed-point iterations such as the EM algorithm. For further details about the DAAREM method. see Henderson, N.C. and Varadhan, R. (2018) <arXiv:1803.06673>.
DAC Calculating Data Agreement Criterion Scores to Rank Experts Based on Their Beliefs
Allows to calculate Data Agreement Criterion (DAC) scores. This can be done to determine prior-data conflict or to evaluate and compare multiple priors, which can be experts’ predictions. Bousquet (2008) <doi.org/10.1080/02664760802192981>.
dad Three-Way Data Analysis Through Densities
The three-way data consists of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides functional methods (principal component analysis, multidimensional scaling, discriminant analysis…) for such probability densities.
daff Diff, Patch and Merge for Data.frames
Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff. Daff uses the V8 package to wrap the ‘daff.js’ javascript library which is included in the package. Daff exposes a subset of ‘daff.js’ functionality, tailored for usage within R.
dagitty Graphical Analysis of Structural Causal Models
A port of the web-based software “DAGitty” for analyzing structural causal models (also known as directed acyclic graphs or DAGs). The package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
DALEX Descriptive mAchine Learning EXplanations
Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance, but such black-box models usually lack of interpretability. ‘DALEX’ package contains various explainers that help to understand the link between input variables and model output. The single_variable() explainer extracts conditional response of a model as a function of a single selected variable. It is a wrapper over packages ‘pdp’ and ‘ALEPlot’. The single_prediction() explainer attributes arts of model prediction to articular variables used in the model. It is a wrapper over ‘breakDown’ package. The variable_dropout() explainer assess variable importance based on consecutive permutations. All these explainers can be plotted with generic plot() function and compared across different models.
dalmatian Automating the Fitting of Double Linear Mixed Models in ‘JAGS’
Automates fitting of double GLM in ‘JAGS’. Includes automatic generation of ‘JAGS’ scripts, running ‘JAGS’ via ‘rjags’, and summarizing the resulting output.
dang Dang’ Associated New Goodies
A collection of utility functions.
DAP Discriminant Analysis via Projections
An implementation of Discriminant Analysis via Projections (DAP) method for high-dimensional binary classification in the case of unequal covariance matrices. See Irina Gaynanova and Tianying Wang (2018) <arXiv:1711.04817v2>.
dapr purrr’-Like Apply Functions Over Input Elements
An easy-to-use, dependency-free set of functions for iterating over elements of various input objects. Functions are wrappers around base apply()/lapply()/vapply() functions but designed to have similar functionality to the mapping functions in the ‘purrr’ package <https://…/>. Specifically, function names more explicitly communicate the expected class of the output and functions also allow for the convenient shortcut of ‘~ .x’ instead of the more verbose ‘function(.x) .x’.
DarkDiv Estimating Probabilistic Dark Diversity
Estimation of dark diversity using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.
dashboard Interactive Data Visualization with D3.js
The dashboard package allows users to create web pages which display interactive data visualizations working in a standard modern browser. It displays them locally using the Rook server. Nor knowledge about web technologies nor Internet connection are required. D3.js is a JavaScript library for manipulating documents based on data. D3 helps the dashboard package bring data to life using HTML, SVG and CSS.
dat Tools for Data Manipulation
An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to ‘dplyr’ for common transformations on data frames to work around non standard evaluation by default.
data.table Extension of data.frame
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.
data.tree Hierarchical Data Structures
Create tree structures from hierarchical data, and use the utility methods to traverse the tree in various orders. Aggregate, print, convert to and from data.frame, and apply functions to your tree data. Useful for decision trees, machine learning, finance, and many other applications.
data.world Main Package for Working with ‘data.world’ Data Sets
High-level tools for working with data.world data sets. data.world is a community where you can find interesting data, store and showcase your own data and data projects, and find and collaborate with other members. In addition to exploring, querying and charting data on the data.world site, you can access data via ‘API’ endpoints and integrations. Use this package to access, query and explore data sets, and to integrate data into R projects. Visit <https://data.world>, for additional information.
Data2LD Functional Data Analysis with Linear Differential Equations
Package ‘Data2LD’ was developed to support functional data analysis using the functions in package ‘fda’. The functions in this package are designed for the use of differential equations as modelling objects as described in J. Ramsay G. and Hooker (2017,ISBN 978-1-4939-7188-6) Dynamic Data Analysis, New York: Springer. The package includes data sets and script files for analyzing many of the examples in this book. ‘Matlab’ versions of the code and sample analyses are available by ftp from <http://…/>. There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information.
DatabaseConnector Connecting to Various Database Platforms
An R ‘DataBase Interface’ (‘DBI’) compatible interface to various database platforms (‘PostgreSQL’, ‘Oracle’, ‘Microsoft SQL Server’, ‘Amazon Redshift’, ‘Microsoft Parallel Database Warehouse’, ‘IBM Netezza’, ‘Apache Impala’, and ‘Google BigQuery’). Also includes support for fetching data as ‘ffdf’ objects. Uses ‘Java Database Connectivity’ (‘JDBC’) to connect to databases.
DatabaseConnectorJars JAR Dependencies for the ‘DatabaseConnector’ Package
Provides external JAR dependencies for the ‘DatabaseConnector’ package.
DatabionicSwarm Swarm Intelligence for Self-Organized Clustering
Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data such as natural clusters characterized by distance and/or density based structures in the data space. The first module is the parameter-free projection method Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors based on the generalized U-matrix. The third module is the clustering method itself with non-critical parameters. The clustering can be verified by the visualization and vice versa. The term DBS refers to the method as a whole. DBS enables even a non-professional in the field of data mining to apply its algorithms for visualization and/or clustering to data sets with completely different structures drawn from diverse research fields.
datacheckr Data Frame Column Name, Class and Value Checking
The primary function check_data() checks a data frame for column presence, column class and column values. If the user-defined conditions are met the function returns the an invisible copy of the original data frame, otherwise the function throws an informative error.
DataClean Data Cleaning
Includes functions that researchers or practitioners may use to clean raw data, transferring html, xlsx, txt data file into other formats. And it also can be used to manipulate text variables, extract numeric variables from text variables and other variable cleaning processes. It is originated from a author’s project which focuses on creative performance in online education environment. The resulting paper of that study will be published soon.
dataCompareR Compare Two Data Frames and Summarise the Difference
Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn’t intended to replace all.equal() as a way to test for equality.
datadr Divide and Recombine for Large, Complex Data
Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).
DataEntry Make it Easier to Enter Questionnaire Data
This is a GUI application for defining attributes and setting valid values of variables, and then, entering questionnaire data in a data.frame.
DataExplorer Data Explorer
Data exploration process for data analysis and model building, so that users could focus on understanding data and extracting insights. The package automatically scans through each variable and does data profiling. Typical graphical techniques will be performed for both discrete and continuous features.
datafsm Estimating Finite State Machine Models from Data
Our method automatically generates models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it’s ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
DataLoader Import Multiple File Types
Functions to import multiple files of multiple data file types (‘.xlsx’, ‘.xls’, ‘.csv’, ‘.txt’) from a given directory into R data frames.
dataMaid A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Cleaning Process
Data cleaning is an important first step of any statistical analysis. dataMaid provides an extendable suite of test for common potential errors in a dataset. It produces a document with a thorough summary of the checks and the results that a human can use to identify possible errors.
dataMeta Create and Append a Data Dictionary for an R Dataset
Designed to create a basic data dictionary and append to the original dataset’s attributes list. The package makes use of a tidy dataset and creates a data frame that will serve as a linker that will aid in building the dictionary. The dictionary is then appended to the list of the original dataset’s attributes. The user will have the option of entering variable and item descriptions by writing code or use alternate functions that will prompt the user to add these.
datapack A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI-ORE standard is described at <https://…/ore>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://…/draft-kunze-bagit-08>.
datapackage.r Data Package ‘Frictionless Data’
Work with ‘Frictionless Data Packages’ (<https://…/> ). Allows to load and validate any descriptor for a data package profile, create and modify descriptors and provides expose methods for reading and streaming data in the package. When a descriptor is a ‘Tabular Data Package’, it uses the ‘Table Schema’ package (<https://…/package=tableschema.r> ) and exposes its functionality, for each resource object in the resources field.
DataPackageR Construct Reproducible Analytic Data Sets as R Packages
A framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual R CMD build process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled in github, and used to share data for manuscripts, collaboration and general reproducibility.
datarium Data Bank for Statistical Analysis and Visualization
Contains data organized by topics: categorical data, regression model, means comparisons, independent and repeated measures ANOVA, mixed ANOVA and ANCOVA.
datarobot DataRobot Predictive Modeling API
For working with the DataRobot predictive modeling platform’s API.
datasauRus Datasets from the Datasaurus Dozen
The Datasaurus Dozen is a set of datasets with the same summary statistics. They retain the same summary statistics despite having radically different distributions. The datasets represent a larger and quirkier object lesson that is typically taught via Anscombe’s Quartet (available in the ‘datasets’ package). Anscombe’s Quartet contains four very different distributions with the same summary statistics and as such highlights the value of visualisation in understanding data, over and above summary statistics. As well as being an engaging variant on the Quartet, the data is generated in a novel way. The simulated annealing process used to derive datasets from the original Datasaurus is detailed in ‘Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing’ <http://…/3025453.3025912>.
datasets.load Interface for Loading Datasets
Visual interface for loading datasets in RStudio from all installed (unloaded) packages.
Datasmith Tools to Complete Euclidean Distance Matrices
Implements several algorithms for Euclidean distance matrix completion, Sensor Network Localization, and sparse Euclidean distance matrix completion using the minimum spanning tree.
datastepr An Implementation of a SAS-Style Data Step
Based on a SAS data step. This allows for row-wise dynamic building of data, iteratively importing slices of existing dataframes, conducting analyses, and exporting to a results frame. This is particularly useful for differential or time-series analyses, which are often not well suited to vector-based operations.
datastructures Implementation of Core Data Structures
Implementation of advanced data structures such as hashmaps, heaps, or queues. Advanced data structures are essential in many computer science and statistics problems, for example graph algorithms or string analysis. The package uses ‘Boost’ and ‘STL’ data types and extends these to R with ‘Rcpp’ modules.
DataVisualizations Visualizations of High-Dimensional Data
Various visualizations of high-dimensional data such as heat map and silhouette plot for grouped data, visualizations of the distribution of distances, the scatter-density plot for two variables, the Shepard density plot and many more are presented here. Additionally, ‘DataVisualizations’ makes it possible to inspect the distribution of each feature of a dataset visually through the combination of four methods. More detailed explanations can be found in the book of Thrun, M.C.:’Projection-Based Clustering through Self-Organization and Swarm Intelligence’ (2018) <DOI:10.1007/978-3-658-20540-9>.
DataViz Data Visualisation Using an HTML Page and ‘D3.js’
Gives access to data visualisation methods that are relevant from the statistician’s point of view. Using ‘D3”s existing data visualisation tools to empower R language and environment. The throw chart method is a line chart used to illustrate paired data sets (such as before-after, male-female).
datetimeutils Utilities for Dates and Times
Utilities for handling dates and times, such as selecting particular days of the week or month, formatting timestamps as required by RSS feeds, or converting timestamp representations of other software (such as ‘MATLAB’ and ‘Excel’) to R. The package is lightweight (no dependencies, pure R implementations) and relies only on R’s standard classes to represent dates and times (‘Date’ and ‘POSIXt’); it aims to provide efficient implementations, through vectorisation and the use of R’s native numeric representations of timestamps where possible.
datr Dat’ Protocol Interface
Interface with the ‘Dat’ p2p network protocol <https://datproject.org>. Clone archives from the network, share your own files, and install packages from the network.
dawai Discriminant Analysis with Additional Information
In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate.
dbarts Discrete Bayesian Additive Regression Trees Sampler
Fits Bayesian additive regression trees (BART) while allowing the updating of predictors or response so that BART can be incorporated as a conditional model in a Gibbs/MH sampler. Also serves as a drop-in replacement for package ‘BayesTree’.
dbfaker A Tool to Ensure the Validity of Database Writes
A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.
DBfit A Double Bootstrap Method for Analyzing Linear Models with Autoregressive Errors
Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000) <doi:10.1037/1082-989X.5.1.87>. The double bootstrap method provides a better fit for a linear model with autoregressive errors than ARIMA when the sample size is small.
DBHC Sequence Clustering with Discrete-Output HMMs
Provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters.
dbparser DrugBank’ Database XML Parser
This tool is for parsing the ‘DrugBank’ XML database <http://…/>. The parsed data are then returned in a proper ‘R’ dataframe with the ability to save them in a given database.
dbplyr A ‘dplyr’ Back End for Databases
A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author.
dbscan Density Based Clustering of Applications with Noise (DBSCAN)
A fast reimplementation of the DBSCAN clustering algorithm using the kd-tree data structure for speedup.
dbx A Fast, Easy-to-Use Database Interface
Provides select, insert, update, upsert, and delete database operations. Supports ‘PostgreSQL’, ‘MySQL’, ‘SQLite’, and more, and plays nicely with the ‘DBI’ package.
dc3net Inferring Condition-Specific Networks via Differential Network Inference
Performs differential network analysis to infer disease specific gene networks.
DCA Dynamic Correlation Analysis for High Dimensional Data
Finding dominant latent signals that regulate dynamic correlation between many pairs of variables.
DCEM Clustering for Multivariate and Univariate Data Using Expectation Maximization Algorithm
Implements the Expectation Maximisation (EM) algorithm for clustering finite gaussian mixture models for both multivariate and univariate datasets. The initialization is done by randomly selecting the samples from the dataset as the mean of the Gaussian(s). Future versions will improve the parameter initialization and execution on big datasets. The algorithm returns a set of Gaussian parameters-posterior probabilities, mean, co-variance matrices (multivariate data)/standard-deviation (for univariate datasets) and priors. Reference: Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>. This work is partially supported by NCI Grant 1R01CA213466-01.
DChaos Chaotic Time Series Analysis
Provides several algorithms for the purpose of detecting chaotic signals inside univariate time series. We focus on methods derived from chaos theory which estimate the complexity of a dataset through exploring the structure of the attractor. We have taken into account the Lyapunov exponents as an ergodic measure. We have implemented the Jacobian method by a fit through neural networks in order to estimate both the largest and the spectrum of Lyapunov exponents. We have considered the full sample and three different methods of subsampling by blocks (non-overlapping, equally spaced and bootstrap) to estimate them. In addition, it is possible to make inference about them and know if the estimated Lyapunov exponents values are or not statistically significant. This library can be used with time series whose time-lapse is fixed or variable. That is, it considers time series whose observations are sampled at fixed or variable time intervals. For a review see David Ruelle and Floris Takens (1971) <doi:10.1007/BF01646553>, Ramazan Gencay and W. Davis Dechert (1992) <doi:10.1016/0167-2789(92)90210-E>, Jean-Pierre Eckmann and David Ruelle (1995) <doi:10.1103/RevModPhys.57.617>, Mototsugu Shintani and Oliver Linton (2004) <doi:10.1016/S0304-4076(03)00205-7>, Jeremy P. Huke and David S. Broomhead (2007) <doi:10.1088/0951-7715/20/9/011>.
DClusterm Model-Based Detection of Disease Clusters
Model-based methods for the detection of disease clusters using GLMs, GLMMs and zero-inflated models.
DCM Data Converter Module
Data Converter Module (DCM) converts the dataset format from split into stack and to the reverse.
dcminfo Information Matrix for Diagnostic Classification Models
A set of asymptotic methods that can be used to directly estimate the expected (Fisher) information matrix by Liu, Tian, and Xin (2016) <doi:10.3102/1076998615621293> in diagnostic classification models or cognitive diagnostic models are provided when marginal maximum likelihood estimation is used. For these methods, both the item and structural model parameters are considered simultaneously. Specifically, the observed information matrix, the empirical cross-product information matrix and the sandwich-type co-variance matrix that can be used to estimate the asymptotic co-variance matrix (or the model parameter standard errors) within the context of diagnostic classification models are provided.
dcmodify Modify Data Using Externally Defined Modification Rules
Data cleaning scripts typically contain a lot of ‘if this change that’ type of statements. Such statements are typically condensed expert knowledge. With this package, such ‘data modifying rules’ are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules separately from the workflow.
dCovTS Distance Covariance and Correlation for Time Series Analysis
Computing and plotting the distance covariance and correlation function of a univariate or a multivariate time series. Test statistics for testing pairwise independence are also implemented. Some data sets are also included.
dcurver Utility Functions for Davidian Curves
A Davidian curve defines a seminonparametric density, whose flexibility can be tuned by a parameter. Since a special case of a Davidian curve is the standard normal density, Davidian curves can be used for relaxing normality assumption in statistical applications (Zhang & Davidian, 2001) <doi:10.1111/j.0006-341X.2001.00795.x>. This package provides the density function, the gradient of the loglikelihood and a random generator for Davidian curves.
DDM Death Registration Coverage Estimation
A set of three two-census methods to the estimate the degree of death registration coverage for a population. Implemented methods include the Generalized Growth Balance method (GGB), the Synthetic Extinct Generation method (SEG), and a hybrid of the two, GGB-SEG. Each method offers automatic estimation, but users may also specify exact parameters or use a graphical interface to guess parameters in the traditional way if desired.
DDoutlier Distance & Density-Based Outlier Detection
Outlier detection in multidimensional domains. Implementation of notable distance and density-based outlier algorithms. Allows users to identify local outliers by comparing observations to their nearest neighbors, reverse nearest neighbors, shared neighbors or natural neighbors. For distance-based approaches, see Knorr, M., & Ng, R. T. (1997) <doi:10.1145/782010.782021>, Angiulli, F., & Pizzuti, C. (2002) <doi:10.1007/3-540-45681-3_2>, Hautamaki, V., & Ismo, K. (2004) <doi:10.1109/ICPR.2004.1334558> and Zhang, K., Hutter, M. & Jin, H. (2009) <doi:10.1007/978-3-642-01307-2_84>. For density-based approaches, see Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002) <doi:10.1007/3-540-47887-6_53>, Jin, W., Tung, A. K. H., Han, J., & Wang, W. (2006) <doi:10.1007/11731139_68>, Schubert, E., Zimek, A. & Kriegel, H-P. (2014) <doi:10.1137/1.9781611973440.63>, Latecki, L., Lazarevic, A. & Prokrajac, D. (2007) <doi:10.1007/978-3-540-73499-4_6>, Papadimitriou, S., Gibbons, P. B., & Faloutsos, C. (2003) <doi:10.1109/ICDE.2003.1260802>, Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000) <doi:10.1145/342009.335388>, Kriegel, H.-P., Kröger, P., Schubert, E., & Zimek, A. (2009) <doi:10.1145/1645953.1646195>, Zhu, Q., Feng, Ji. & Huang, J. (2016) <doi:10.1016/j.patrec.2016.05.007>, Huang, J., Zhu, Q., Yang, L. & Feng, J. (2015) <doi:10.1016/j.knosys.2015.10.014>, Tang, B. & Haibo, He. (2017) <doi:10.1016/j.neucom.2017.02.039> and Gao, J., Hu, W., Zhang, X. & Wu, Ou. (2011) <doi:10.1007/978-3-642-20847-8_23>.
ddpcr Analysis and Visualization of Droplet Digital PCR in R and on the Web
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing duplex ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
DDPGPSurv DDP-GP Survival Analysis
A nonparametric Bayesian approach to survival analysis. The functions perform inference via MCMC simulations from the posterior distributions for a Dependent Dirichlet Process-Gaussian Process prior. To maximize computational efficiency, some of the computations are performed in ‘Rcpp’.
ddR Distributed Data Structures in R
Provides distributed data structures and simplifies distributed computing in R.
DDRTree Learning Principal Graphs with DDRTree
Project data into a reduced dimensional space and construct a principal graph from the reduced dimension.
ddsPLS Multi-Data-Driven Sparse PLS Robust to Missing Samples
Allows to build Multi-Data-Driven Sparse PLS models. Multi-blocks with high-dimensional settings are particularly sensible to this.
deadband Statistical Deadband Algorithms Comparison
Statistical deadband algorithms are based on the Send-On-Delta concept as in Miskowicz(2006,<doi:10.3390/s6010049>). A collection of functions compare effectiveness and fidelity of sampled signals using statistical deadband algorithms.
deal Learning Bayesian Networks with Mixed Variables
Bayesian networks with continuous and/or discrete variables can be learned and compared from data. The method is described in Boettcher and Dethlefsen (2003), <doi:10.18637/jss.v008.i20>.
debugme Debug R Packages
Specify debug messages as special string constants, and control debugging of packages via environment variables.
debugr Debug Tool to Watch Objects/Expressions While Running an R Script
Tool to print out the value of R objects/expressions while running an R script. Outputs can be made dependent on user-defined conditions/criteria. Debug messages only appear when a global option for debugging is set. This way, ‘debugr’ code can even remain in the debugged code for later use without any negative effects during normal runtime.
decido Bindings for ‘Mapbox’ Ear Cutting Triangulation Library
Provides constrained triangulation of polygons. Ear cutting (or ear clipping) applies constrained triangulation by successively ‘cutting’ triangles from a polygon defined by path/s. Holes are supported by introducing a bridge segment between polygon paths. This package wraps the ‘header-only’ library ‘earcut.hpp’ <https://…/earcut.hpp.git> which includes a reference to the method used by Held, M. (2001) <doi:10.1007/s00453-001-0028-4>.
decision Statistical Decision Analysis
Contains a function called dmur() which accepts four parameters like possible values, probabilities of the values, selling cost and preparation cost. The dmur() function generates various numeric decision parameters like MEMV (Maximum (optimum) expected monitory value), best choice, EPPI (Expected profit with perfect information), EVPI (Expected value of the perfect information), EOL (Expected opportunity loss), which facilitate effective decision-making.
DecisionCurve Calculate and Plot Decision Curves
Decision curves are a useful tool to evaluate the population impact of adopting a risk prediction instrument into clinical practice. Given one or more instruments (risk models) that predict the probability of a binary outcome, this package calculates and plots decision curves, which display estimates of the standardized net benefit by the probability threshold used to categorize observations as ‘high risk.’ Curves can be estimated using data from an observational cohort, or from case-control studies when an estimate of the population outcome prevalence is available. Confidence intervals calculated using the bootstrap can be displayed and a wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
decisionSupport Quantitative Support of Decision Making under Uncertainty
Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual and Clustered Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own.
DeclareDesign Declare and Diagnose Research Designs
Researchers can characterize and learn about the properties of research designs before implementation using `DeclareDesign`. Ex ante declaration and diagnosis of designs can help researchers clarify the strengths and limitations of their designs and to improve their properties, and can help readers evaluate a research strategy prior to implementation and without access to results. It can also make it easier for designs to be shared, replicated, and critiqued.
decoder Decode Coded Variables to Plain Text (and Vice Versa)
Main function ‘decode’ is used to decode coded key values to plain text. Function ‘code” can be used to code plain text to code if there is a 1:1 relation between the two. The concept relies on ‘keyvalue’ objects used for translation. There are several ‘keyvalue” objects included in the areas of geographical regional codes, administrative health care unit codes, diagnosis codes et cetera but it is also easy to extend the use by arbitrary code sets.
decomposedPSF Time Series Prediction with PSF and Decomposition Methods (EMD and EEMD)
Predict future values with hybrid combinations of Pattern Sequence based Forecasting (PSF), Autoregressive Integrated Moving Average (ARIMA), Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD) methods based hybrid methods.
deconvolveR Empirical Bayes Estimation Strategies
Empirical Bayes methods for learning prior distributions from data. An unknown prior distribution (g) has yielded (unobservable) parameters, each of which produces a data point from a parametric exponential family (f). The goal is to estimate the unknown prior (‘g-modeling’) by deconvolution and Empirical Bayes methods.
DecorateR Fit and Deploy DECORATE Trees
DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) builds an ensemble of J48 trees by recursively adding artificial samples of the training data (‘Melville, P., & Mooney, R. J. (2005). Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 99-111. <doi:10.1016/j.inffus.2004.04.001>’).
deductive Data Correction and Imputation Using Deductive Methods
Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.
deepboost Deep Boosting Ensemble Modeling
Provides deep boosting models training, evaluation, predicting and hyper parameter optimising using grid search and cross validation. Based on Google’s Deep Boosting algorithm, and Google’s C++ implementation. Cortes, C., Mohri, M., & Syed, U. (2014) <URL: http://…/icml2014c2_cortesb14>.
deeplearning An Implementation of Deep Neural Network for Regression and Classification
An implementation of deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization. A combination of these methods have achieved state-of-the-art performance in ImageNet classification by overcoming the gradient saturation problem experienced by many deep architecture neural network models in the past. In addition, batch normalization and dropout are implemented as a means of regularization. The deeplearning package is inspired by the darch package and uses its class DArch.
<a href="deeplr Interface to the ‘DeepL’ Translation API
A wrapper for the ‘DeepL’ API (see <https://…/translator> ), a web service that translates texts between different languages. Access to the API is subject to a monthly fee.
deepnet deep learning toolkit in R
Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
deepNN Deep Learning
Implementation of some Deep Learning methods. Includes multilayer perceptron, different activation functions, regularisation strategies, stochastic gradient descent and dropout. Thanks go to the following references for helping to inspire and develop the package: Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach (2016, ISBN:978-0262035613) Deep Learning. Terrence J. Sejnowski (2018, ISBN:978-0262038034) The Deep Learning Revolution. Grant Sanderson (3brown1blue) <https://…st=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi> Neural Networks YouTube playlist. Michael A. Nielsen <http://…/> Neural Networks and Deep Learning.
default Change the Default Arguments in R Functions
A simple syntax to change the default values for function arguments, whether they are in packages or defined locally.
define Create FDA-Style Data and Program Definitions
Creates a directory of archived files with a descriptive ‘PDF’ document at the root level (i.e. ‘define.pdf’) containing tables of definitions of data items and relative-path hyperlinks to the documented files. Converts file extensions to ‘txt’ per FDA expectations and converts ‘CSV’ files to ‘SAS’ Transport format. Relies on data item descriptors stored as per R package ‘spec’. See ‘package?define’. See also ‘?define’. Requires a compatible installation of ‘pdflatex’, e.g. <https://…/>.
deformula Integration of One-Dimensional Functions with Double Exponential Formulas
Numerical quadrature of functions of one variable over a finite or infinite interval with double exponential formulas.
dejaVu Multiple Imputation for Recurrent Events
Performs reference based multiple imputation of recurrent event data based on a negative binomial regression model, as described by Keene et al (2014) <doi:10.1002/pst.1624>.
DelayedEffect.Design Sample Size and Power Calculations using the APPLE and SEPPLE Methods
Provides sample size and power calculations when the treatment time-lag effect is present and the lag duration is homogeneous across the individual subject. The methods used are described in Xu, Z., Zhen, B., Park, Y., & Zhu, B. (2017) <doi:10.1002/sim.7157>.
DeLorean Estimates Pseudotimes for Single Cell Expression Data
Implements the DeLorean model (Reid & Wernisch (2016) <doi:10.1093/bioinformatics/btw372>) to estimate pseudotimes for single cell expression data. The DeLorean model uses a Gaussian process latent variable model to model uncertainty in the capture time of cross-sectional data.
delt Estimation of Multivariate Densities Using Adaptive Partitions
We implement methods for estimating multivariate densities. We include a discretized kernel estimator, an adaptive histogram (a greedy histogram and a CART-histogram), stagewise minimization, and bootstrap aggregation.
deming Deming, Thiel-Sen and Passing-Bablock Regression
Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions.
demu Optimal Design Emulators via Point Processes
Implements the Determinantal point process (DPP) based optimal design emulator described in Pratola, Lin and Craigmile (2018) <arXiv:1804.02089> for Gaussian process regression models. See <http://…/software> for more information and examples.
dendextend Extending R’s Dendrogram Functionality
Offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings. You can (1) Adjust a trees graphical parameters – the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different dendrograms to one another.
denoiSeq Differential Expression Analysis Using a Bottom-Up Model
Given count data from two conditions, it determines which transcripts are differentially expressed across the two conditions using Bayesian inference of the parameters of a bottom-up model for PCR amplification. This model is developed in Ndifon Wilfred, Hilah Gal, Eric Shifrut, Rina Aharoni, Nissan Yissachar, Nir Waysbort, Shlomit Reich Zeliger, Ruth Arnon, and Nir Friedman (2012), <http://…/15865.full>, and results in a distribution for the counts that is a superposition of the binomial and negative binomial distribution.
denoiseR Regularized low rank matrix estimation
Regularized low rank matrix estimation
denseFLMM Functional Linear Mixed Models for Densely Sampled Data
Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.
densityClust Clustering by fast search and find of density peaks
An implementation of the clustering algorithm described by Alex Rodriguez and Alessandro Laio (Science, 2014 vol. 344), along with tools to inspect and visualize the results.
DensParcorr Dens-Based Method for Partial Correlation Estimation in Large Scale Brain Networks
Provide a Dens-based method for estimating functional connection in large scale brain networks using partial correlation.
densratio Density Ratio Estimation
Density ratio estimation. The estimated density ratio function can be used in many applications such as the inlier-based outlier detection, covariate shift adaptation and etc.
DEoptim Global Optimization by Differential Evolution
Implements the differential evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector.
Dependency Logo
Plots dependency logos from a set of input sequences.
depmixS4 Dependent Mixture Models – Hidden Markov Models of GLMs and Other Distributions in S4
Fit latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models
depth.plot Multivariate Analogy of Quantiles
Could be used to obtain spatial depths, spatial ranks and outliers of multivariate random variables. Could also be used to visualize DD-plots (a multivariate generalization of QQ-plots).
dequer An R ‘Deque’ Container
Offers a special data structure called a ‘deque’ (pronounced like ‘deck’), which is a list-like structure. However, unlike R’s list structure, data put into a ‘deque’ is not necessarily stored contiguously, making insertions and deletions at the front/end of the structure much faster. The implementation here is new and uses a doubly linked list, and whence does not rely on R’s environments. To avoid unnecessary data copying, most ‘deque’ operations are performed via side-effects.
DeRezende.Ferreira Zero Coupon Yield Curve Modelling
Modeling the zero coupon yield curve using the dynamic De Rezende and Ferreira (2011) <doi:10.1002/for.1256> five factor model with variable or fixed decaying parameters. For explanatory purposes, the package also includes various short datasets of interest rates for the BRICS countries.
DES Discrete Event Simulation
Discrete event simulation (DES) involves modeling of systems having discrete, i.e. abrupt, state changes. For instance, when a job arrives to a queue, the queue length abruptly increases by 1. This package is an R implementation of the event-oriented approach to DES; see the tutorial in Matloff (2008) <http://…/DESimIntro.pdf>.
desc Manipulate DESCRIPTION Files
Tools to read, write, create, and manipulate DESCRIPTION files. It is intended for packages that create or manipulate other packages.
DescribeDisplay An Interface to the ‘DescribeDisplay’ ‘GGobi’ Plugin
Produce publication quality graphics from output of ‘GGobi’ describe display plugin.
describer Describe Data in R Using Common Descriptive Statistics
Allows users to quickly and easily describe data using common descriptive statistics.
descriptr Descriptive Statistics & Distributions Exploration
Generate descriptive statistics such as measures of location, dispersion, frequency tables, cross tables, group summaries and multiple one/two way tables. Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions.
desctable Produce Descriptive and Comparative Tables Easily
Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as data frames/lists of data frames for easy manipulation after creation, and ready to be saved as csv, or piped to DT::datatable() or pander::pander() to integrate into reports.
DescToolsAddIns Some Functions to be Used as Shortcuts in RStudio
RStudio as of recently offers the option to define addins and assign shortcuts to them. This package contains AddIns for a few most used functions in an analysts (at least mine) daily work (like str(), example(), plot(), head(), view(), Desc()). Most of these functions will get the current selection in RStudio’s editor window and send the specific command to the console while instantly executing it. Assigning shortcuts to these AddIns will spare you quite a few keystrokes.
designGLMM Finding Optimal Block Designs for a Generalised Linear Mixed Model
Use simulated annealing to find optimal designs for Poisson regression models with blocks.
DesignLibrary Library of Research Designs
A simple interface to build designs using the using the package ‘DeclareDesign’. In one line of code, users can specify the parameters of individual designs and diagnose their properties. The designers can also be used to compare performance of a given design across a range of combinations of parameters, such as effect size, sample size, and assignment probabilities.
deSolve General Solvers for Initial Value Problems of Ordinary Differential Equations (ODE), Partial Differential Equations (PDE), Differential Algebraic Equations (DAE), and Delay Differential Equations (DDE)
Functions that solve initial value problems of a system of first-order ordinary differential equations (ODE), of partial differential equations (PDE), of differential algebraic equations (DAE), and of delay differential equations. The functions provide an interface to the FORTRAN functions lsoda, lsodar, lsode, lsodes of the ODEPACK collection, to the FORTRAN functions dvode and daspk and a C-implementation of solvers of the Runge-Kutta family with fixed or variable time steps. The package contains routines designed for solving ODEs resulting from 1-D, 2-D and 3-D partial differential equations (PDE) that have been converted to ODEs by numerical differencing.
DESP Estimation of Diagonal Elements of Sparse Precision-Matrices
Several estimators of the diagonal elements of a sparse precision (inverse covariance) matrix from a sample of Gaussian vectors for a given matrix of estimated marginal regression coefficients. To install package ‘gurobi’, instructions at http://…/gurobi-optimizer and http://…/r_api_overview.html.
desplot Plotting Field Plans for Agricultural Experiments
A function for plotting maps of agricultural field experiments that are laid out in grids.
detector Detect Data Containing Personally Identifiable Information
Allows users to quickly and easily detect data containing Personally Identifiable Information (PII) through convenience functions.
DetMCD DetMCD Algorithm (Robust and Deterministic Estimation of Location and Scatter)
DetMCD is a new algorithm for robust and deterministic estimation of location and scatter. The benefits of robust and deterministic estimation are explained in Hubert, M., Rousseeuw, P.J. and Verdonck, T. (2012),’A deterministic algorithm for robust location and scatter’, Journal of Computational and Graphical Statistics, Volume 21, Number 3, Pages 618-637.
detpack Density Estimation and Random Number Generation with Distribution Element Trees
Density estimation for possibly large data sets and conditional/unconditional random number generation with distribution element trees. For more details on distribution element trees see: Meyer, D.W. (2016) <arXiv:1610.00345> or Meyer, D.W., Statistics and Computing (2017) <doi:10.1007/s11222-017-9751-9> and Meyer, D.W. (2017) <arXiv:1711.04632>.
DetR Suite of Deterministic and Robust Algorithms for Linear Regression
DetLTS, DetMM (and DetS) Algorithms for Deterministic, Robust Linear Regression.
detrendr Detrend Images
Image series affected by bleaching must be corrected by ‘detrending’ prior to the performance of quantitative analysis. ‘detrendr’ is for correctly detrending images. It uses Nolan’s algorithm (Nolan et al., 2017 <doi:10.1093/bioinformatics/btx434>).
devEMF EMF Graphics Output Device
Output graphics to EMF (enhanced metafile).
devFunc Clear and Condense Argument Check for User-Defined Functions
A concise check of the format of one or multiple input arguments (data type, length or value) is provided. Since multiple input arguments can be tested simultaneously, a lengthly list of checks at the beginning of your function can be avoided, hereby enhancing the readability and maintainability of your code.
DEVis A Differential Expression Analysis Toolkit for Visual Analytics and Data Aggregation
Differential expression analysis tools for data aggregation, visualization, exploratory analysis, and project organization.
devtools Tools to Make Developing R Packages Easier
Collection of package development tools.
dexter Data Management and Analysis of Tests
A system for the management, assessment, and psychometric analysis of data from educational and psychological tests. Developed at Cito, The Netherlands, with subsidy from the Dutch Ministry of Education, Culture, and Science.
dextergui A Graphic User Interface to Dexter
A graphical user interface for dexter. Offers Classical Test and Item analysis, Item Response analysis and data management for educational and psychological tests.
dexterMST CML Calibration of Multi Stage Tests
Conditional Maximum Likelihood Calibration and data management of multistage tests. Functions for calibration of the Extended Nominal Response and the Interaction models, DIF and profile analysis. See Robert J. Zwitser and Gunter Maris (2015)<doi:10.1007/s11336-013-9369-6>.
dfCompare Compare Two Dataframes and Return Adds, Changes, and Deletes
Compares two dataframes with a common key and returns the delta records. The package will return three dataframes that contain the added, changed, and deleted records.
dfConn Dynamic Functional Connectivity Analysis
An implementation of multivariate linear process bootstrap (MLPB) method and sliding window technique to assess the dynamic functional connectivity (dFC) estimate by providing its confidence bands, based on Maria Kudela (2017) <doi: 10.1016/j.neuroimage.2017.01.056>. It also integrates features to visualize non-zero coverage for selected a-priori regions of interest estimated by the dynamic functional connectivity model (dFCM) and dynamic functional connectivity (dFC) curves for reward-related a-priori regions of interest where the activation-based analysis reported.
dfmeta Meta-Analysis of Phase I Dose-Finding Early Clinical Trials
Meta-analysis approaches for Phase I dose finding early phases clinical trials in order to better suit requirements in terms of maximum tolerated dose (MTD) and maximal dose regimen (MDR). This package has currently three different approaches: (a) an approach proposed by Zohar et al, 2011, <doi:10.1002/sim.4121> (denoted as ZKO), (b) the Variance Weighted pooling analysis (called VarWT) and (c) the Random Effects Model Based (REMB) algorithm, where user can input his/her own model based approach or use the existing random effect logistic regression model (named as glimem) through the ‘dfmeta’ package.
dfphase1 Phase I Control Charts (with Emphasis on Distribution-Free Methods)
Statistical methods for retrospectively detecting changes in location and/or dispersion of univariate and multivariate variables. Data can be individual (one observation at each instant of time) or subgrouped (more than one observation at each instant of time). Control limits are computed, often using a permutation approach, so that a prescribed false alarm probability is guaranteed without making any parametric assumptions on the stable (in-control) distribution.
dga Capture-Recapture Estimation using Bayesian Model Averaging
Performs Bayesian model averaging for capture-recapture. This includes code to stratify records, check the strata for suitable overlap to be used for capture-recapture, and some functions to plot the estimated population size.
dGAselID Genetic Algorithm with Incomplete Dominance for Feature Selection
Feature selection from high dimensional data using a diploid genetic algorithm with Incomplete Dominance for genotype to phenotype mapping and Random Assortment of chromosomes approach to recombination.
dggridR Discrete Global Grids for R
Spatial analyses involving binning require that every bin have the same area, but this is impossible using a rectangular grid laid over the Earth or over any projection of the Earth. Discrete global grids use hexagons, triangles, and diamonds to overcome this issue, overlaying the Earth with equally-sized bins. This package provides utilities for working with discrete global grids, along with utilities to aid in plotting such data.
DGM Dynamic Graphical Models
Dynamic graphical models for multivariate time series data to estimate directed dynamic networks in functional magnetic resonance imaging (fMRI), see Schwab et al. (2017) <doi:10.1101/198887>.
dgo Dynamic Estimation of Group-Level Opinion
Fit dynamic group-level IRT and MRP models from individual or aggregated item response data. This package handles common preprocessing tasks and extends functions for inspecting results, poststratification, and quick iteration over alternative models.
DGVM3D 3D Forest Simulation Visualization Tool
This is a visualization tool for vegetation structure/succession in space and/or time mainly for forest gap models. However, it could also be used to visualize observed forest stands. If used for models, they should contain either individual trees or cohorts (e.g. LPJ-GUESS by Smith et al. (2014) <doi:10.5194/bg-11-2027-2014>). For a list of required and additional data fields see the vignette.
DHARMa Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models
The ‘DHARMa’ package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals from fitted generalized linear mixed models. Currently supported are ‘lme4’, ‘glm’ (except quasi-distributions) and ‘lm’ model classes. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problem, such as over/underdispersion, zero-inflation, and spatial and temporal autocorrelation.
dhglm Double Hierarchical Generalized Linear Models
Implements double hierarchical generalized linear models in which the mean, dispersion parameters for variance of random effects, and residual variance (overdispersion) can be further modeled as random-effect models.
dhh A Heavy-Headed Distribution
The density, cumulative distribution, quantiles, and i.i.d random variables of a heavy-headed distribution. For more information, please see the vignette.
dHSIC Independence Testing via Hilbert Schmidt Independence Criterion
Contains an implementation of the d-variable Hilbert Schmidt independence criterion and several hypothesis tests based on it.
diagis Diagnostic Plot and Multivariate Summary Statistics of Weighted Samples from Importance Sampling
Fast functions for effective sample size, weighted multivariate mean and variance computation, and weight diagnostic plot for generic importance sampling type results.
diagmeta Meta-Analysis of Diagnostic Accuracy Studies with Several Cutpoints
Provides methods by Steinhauser et al. (2016) <DOI:10.1186/s12874-016-0196-1> for meta-analysis of diagnostic accuracy studies with several cutpoints.
diagonals Block Diagonal Extraction or Replacement
Several tools for handling block-matrix diagonals and similar constructs are implemented. Block-diagonal matrices can be extracted or removed using two small functions implemented here. In addition, non-square matrices are supported. Block diagonal matrices occur when two dimensions of a data set are combined along one edge of a matrix. For example, trade-flow data in the ‘decompr’ and ‘gvc’ packages have each country-industry combination occur along both edges of the matrix.
DiagrammeR Create diagrams and flowcharts using R
Create diagrams and flowcharts using R.
https://…/DiagrammeR
DiallelAnalysisR Diallel Analysis with R
Performs Diallel Analysis with R using Griffing’s and Hayman’s approaches. Four different methods (1: Method-I (Parents + F1’s + reciprocals); 2: Method-II (Parents and one set of F1’s); 3: Method-III (One set of F1’s and reciprocals); 4: Method-IV (One set of F1’s only)) and two methods (1: Fixed Effects Model; 2: Random Effects Model) can be applied using Griffing’s approach.
dialrjars Required ‘libphonenumber’ jars for the ‘dialr’ Package
Collects ‘libphonenumber’ jars required for the ‘dialr’ package.
diceR Diverse Cluster Ensemble in R
Performs cluster analysis using an ensemble clustering framework. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.
DiceView Plot Methods for Computer Experiments Design and Surrogate
View 2D/3D sections or contours of computer experiments designs, surrogates or test functions.
dichromat Color Schemes for Dichromats
Collapse red-green or green-blue distinctions to simulate the effects of different types of color-blindness.
DIconvex Finding Patterns of Monotonicity and Convexity in Data
Given an initial set of points, this package minimizes the number of elements to discard from this set such that there exists at least one monotonic and convex mapping within pre-specified upper and lower bounds.
did Treatment Effects with Multiple Periods and Groups
The standard Difference-in-Differences (DID) setup involves two periods and two groups — a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences models with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant’Anna (2018) <https://ssrn.com/abstract=3148250>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects.
DidacticBoost A Simple Implementation and Demonstration of Gradient Boosting
A basic, clear implementation of tree-based gradient boosting designed to illustrate the core operation of boosting models. Tuning parameters (such as stochastic subsampling, modified learning rate, or regularization) are not implemented. The only adjustable parameter is the number of training rounds. If you are looking for a high performance boosting implementation with tuning parameters, consider the ‘xgboost’ package.
didrooRFM Compute Recency Frequency Monetary Scores for your Customer Data
This hosts the findRFM function which generates RFM scores on a 1-5 point scale for customer transaction data. The function consumes a data frame with Transaction Number, Customer ID, Date of Purchase (in date format) and Amount of Purchase as the attributes. The function returns a data frame with RFM data for the sales information.
dief Metrics for Continuous Efficiency
An implementation of the metrics dief@t and dief@k to measure the diefficiency (or continuous efficiency) of incremental approaches, see Acosta, M., Vidal, M. E., & Sure-Vetter, Y. (2017) <doi:10.1007/978-3-319-68204-4_1>. The metrics dief@t and dief@k allow for measuring the diefficiency during an elapsed time period t or while k answers are produced, respectively. dief@t and dief@k rely on the computation of the area under the curve of answer traces, and thus capturing the answer rate concentration over a time interval.
diezeit R Interface to the ZEIT ONLINE Content API
A wrapper for the ZEIT ONLINE Content API, available at <http://developer.zeit.de>. ‘diezeit’ gives access to articles and corresponding metadata from the ZEIT archive and from ZEIT ONLINE. A personal API key is required for usage.
DIFboost Detection of Differential Item Functioning (DIF) in Rasch Models by Boosting Techniques
Performs detection of Differential Item Functioning using the method DIFboost as proposed in Schauberger and Tutz (2015): Detection of Differential item functioning in Rasch models by boosting techniques, British Journal of Mathematical and Statistical Psychology.
difconet Differential Coexpressed Networks
Estimation of DIFferential COexpressed NETworks using diverse and user metrics. This package is basically used for three functions related to the estimation of differential coexpression. First, to estimate differential coexpression where the coexpression is estimated, by default, by Spearman correlation. For this, a metric to compare two correlation distributions is needed. The package includes 6 metrics. Some of them needs a threshold. A new metric can also be specified as a user function with specific parameters (see difconet.run). The significance is be estimated by permutations. Second, to generate datasets with controlled differential correlation data. This is done by either adding noise, or adding specific correlation structure. Third, to show the results of differential correlation analyses. Please see <http://…/difconet> for further information.
Difdtl Difference of Two Precision Matrices Estimation
Difference of two precision matrices is estimated by the d-trace loss with lasso penalty, given two sample classes.
diffdf Dataframe Difference Tool
Functions for comparing two data.frames against each other. The core functionality is to provide a detailed breakdown of any differences between two data.frames as well as providing utility functions to help narrow down the source of problems and differences.
diffee Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure
This is an R implementation of Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure (DIFFEE). The DIFFEE algorithm can be used to fast estimate the differential network between two related datasets. For instance, it can identify differential gene network from datasets of case and control. By performing data-driven network inference from two high-dimensional data sets, this tool can help users effectively translate two aggregated data blocks into knowledge of the changes among entities between two Gaussian Graphical Model. Please run demo(diffeeDemo) to learn the basic functions provided by this package. For further details, please read the original paper: Beilun Wang, Arshdeep Sekhon, Yanjun Qi (2018) <arXiv:1710.11223>.
diffeqr Solving Differential Equations (ODEs, SDEs, DDEs, DAEs)
An interface to ‘DifferentialEquations.jl’ from the R programming language. It has unique high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differential-algebraic equations (DAE), and more. Much of the functionality, including features like adaptive time stepping in SDEs, are unique and allow for multiple orders of magnitude speedup over more common methods. ‘diffeqr’ attaches an R interface onto the package, allowing seamless use of this tooling by R users.
diffobj Diffs for R Objects
Generate a colorized diff of two R objects for an intuitive visualization of their differences.
diffpriv Easy Differential Privacy
An implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006) <doi:10.1007/11681878_14>. Example mechanisms include the Laplace mechanism for releasing numeric aggregates, and the exponential mechanism for releasing set elements. A sensitivity sampler (Rubinstein & Alda, 2017) <arXiv:1706.02562> permits sampling target non-private function sensitivity; combined with the generic mechanisms, it permits turn-key privatization of arbitrary programs.
diffrprojects Projects for Text Version Comparison and Analytics in R
Provides data structures and methods for measuring, coding, and analysing text within text corpora. The package allows for manual as well computer aided coding on character, token and text pair level.
diffrprojectswidget Visualization for ‘diffrprojects’
Interactive visualizations and tabulations for diffrprojects. All presentations are based on the htmlwidgets framework allowing for interactivity via HTML and Javascript, Rstudio viewer integration, RMarkdown integration, as well as Shiny compatibility.
diffusion Forecast the Diffusion of New Products
Various diffusion models to forecast new product growth. Currently the package contains Bass, Gompertz and Gamma/Shifted Gompertz curves. See Meade and Islam (2006) <doi:10.1016/j.ijforecast.2006.01.005>.
diffusionMap Diffusion Map
Implements diffusion map method of data parametrization, including creation and visualization of diffusion map, clustering with diffusion K-means and regression using adaptive regression model.
diffusr Network Diffusion Algorithms
Implementation of network diffusion algorithms such as insulated heat propagation or Markov random walks. Network diffusion algorithms generally spread information in the form of node weights along the edges of a graph to other nodes. These weights can for example be interpreted as temperature, an initial amount of water, the activation of neurons in the brain, or the location of a random surfer in the internet. The information (node weights) is iteratively propagated to other nodes until a equilibrium state or stop criterion occurs.
difNLR Detection of Dichotomous Differential Item Functioning (DIF) by Non-Linear Regression Function
Detection of differential item functioning among dichotomously scored items with non-linear regression procedure.
difR Collection of methods to detect dichotomous differential item functioning (DIF) in psychometrics
The difR package contains several traditional methods to detect DIF in dichotomously scored items. Both uniform and non-uniform DIF effects can be detected, with methods relying upon item response models or not. Some methods deal with more than one focal group.
digest Create Cryptographic Hash Digests of R Objects
Implementation of a function ‘digest()’ for the creation of hash digests of arbitrary R objects (using the md5, sha-1, sha-256, crc32, xxhash and murmurhash algorithms) permitting easy comparison of R language objects, as well as a function ‘hmac()’ to create hash-based message authentication code. The md5 algorithm by Ron Rivest is specified in RFC 1321, the sha-1 and sha-256 algorithms are specified in FIPS-180-1 and FIPS-180-2, and the crc32 algorithm is described in ftp://ftp.rocksoft.com/cliens/rocksoft/papers/crc_v3.txt. For md5, sha-1, sha-256 and aes, this package uses small standalone implementations that were provided by Christophe Devine. For crc32, code from the zlib library is used. For sha-512, an implementation by Aaron D. Gifford is used. For xxHash, the implementation by Yann Collet is used. For murmurhash, an implementation by Shane Day is used. Please note that this package is not meant to be deployed for cryptographic purposes for which more comprehensive (and widely tested) libraries such as OpenSSL should be used.
digitize Use Data from Published Plots in R
Import data from a digital image; it requires user input for calibration and to locate the data points. The end result is similar to ‘DataThief’ and other other programs that ‘digitize’ published plots or graphs.
DIMORA Diffusion Models R Analysis
The implemented methods are: Bass Standard model, Bass Generalized model (with rectangular shock, exponential shock, mixed shock and armonic shock. You can choose to add from 1 to 3 shocks), Guseo-Guidolin model and Variable Potential Market model. The Bass model consists of a simple differential equation that describes the process of how new products get adopted in a population, the Generalized Bass model is a generalization of the Bass model in which there is a ‘carrier’ function x(t) that allows to change the speed of time sliding. In some real processes the reachable potential of the resource available in a temporal instant may appear to be not constant over time, because of this we use Variable Potential Market model, in which the Guseo-Guidolin has a particular specification for the market function.
dimple dimple charts for R
The aim of dimple is to open up the power and flexibility of d3 to analysts. It aims to give a gentle learning curve and minimal code to achieve something productive. It also exposes the d3 objects so you can pick them up and run to create some really cool stuff.
dimRed A Framework for Dimensionality Reduction
A collection of dimensionality reduction techniques from R packages and provides a common interface for calling the methods.
dineq Decomposition of (Income) Inequality
Decomposition of (income) inequality by population sub groups. For a decomposition on a single variable the mean log deviation can be used (see Mookherjee Shorrocks (1982) <DOI:10.2307/2232673>). For a decomposition on multiple variables a regression based technique can be used (see Fields (2003) <DOI:10.1016/s0147-9121(03)22001-x>). Recentered influence function regression for marginal effects of the (income or wealth) distribution (see Firpo et al. (2009) <DOI:10.3982/ECTA6822>). Some extensions to inequality functions to handle weights and/or missings.
dint A Toolkit for Year-Quarter and Year-Month Dates
S3 classes and methods to create and work with year-quarter and year-month vectors. Basic arithmetic operations (such as adding and subtracting) are supported, as well as formatting and converting to and from standard R Date types.
DirectedClustering Directed Weighted Clustering Coefficient
Allows the computation of clustering coefficients for directed and weighted networks by using different approaches. It allows to compute clustering coefficients that are not present in ‘igraph’ package. A description of clustering coefficients can be found in ‘Directed clustering in weighted networks: a new perspective’, Clemente, G.P., Grassi, R. (2017), <doi:10.1016/j.chaos.2017.12.007>.
DirectEffects Estimating Controlled Direct Effects for Explaining Causal Findings
A set of functions to estimate the controlled direct effect of treatment fixing a potential mediator to a specific value. Implements the sequential g-estimation estimator described in Vansteelandt (2009) <doi:10.1097/EDE.0b013e3181b6f4c9> and Acharya, Blackwell, and Sen (2016) <doi:10.1017/S0003055416000216>.
Directional Directional Statistics
A collection of R functions for directional data analysis.
directotree Creates an Interactive Tree Structure of a Directory
Represents the content of a directory as an interactive collapsible tree. Offers the possibility to assign a text (e.g., a ‘Readme.txt’) to each folder (represented as a clickable node), so that when the user hovers the pointer over a node, the corresponding text is displayed as a tooltip.
DirectStandardisation Adjusted Means and Proportions by Direct Standardisation
Calculate adjusted means and proportions of a variable by groups defined by another variable by direct standardisation, standardised to the structure of the dataset.
dirichletprocess Build Dirichlet Process Objects for Bayesian Modelling
Create Dirichlet process objects that can be used as infinite mixture models in a variety of ways. Some examples include; density estimation, Poisson process intensity inference, hierarchical modelling and clustering. See Teh, Y. W. (2011) <https://…/Teh2010a.pdf>, among many other sources.
dirmcmc Directional Metropolis Hastings Algorithm
Implementation of Directional Metropolis Hastings Algorithm for MCMC.
discfrail Cox Models for Time-to-Event Data with Nonparametric Discrete Group-Specific Frailties
Functions for fitting Cox proportional hazards models for grouped time-to-event data, where the shared group-specific frailties have a discrete nonparametric distribution. The methods proposed in the package is described by Gasperoni, F., Ieva, F., Paganoni, A. M., Jackson, C. H., Sharples, L. (2018) <doi:10.1093/biostatistics/kxy071>. There are also functions for simulating from these models, with a nonparametric or a parametric baseline hazard function.
discord Functions for Discordant Kinship Modeling
Functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Currently, the package contains data restructuring functions; functions for generating genetically- and environmentally-informed data for kin pairs.
discoveR Exploratory Data Analysis System
Performs an exploratory data analysis through a shiny interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-means method.
discreteRV Create and Manipulate Discrete Random Variables
Create, manipulate, transform, and simulate from discrete random variables. The syntax is modeled after that which is used in mathematical statistics and probability courses, but with powerful support for more advanced probability calculations. This includes the creation of joint random variables, and the derivation and manipulation of their conditional and marginal distributions.
http://…/hare-buja-hofmann.pdf
DisimForMixed Calculate Dissimilarity Matrix for Dataset with Mixed Attributes
Implement the methods proposed by Ahmad & Dey (2007) <doi:10.1016/j.datak.2007.03.016> in calculating the dissimilarity matrix at the presence of mixed attributes. This Package includes functions to discretize quantitative variables, calculate conditional probability for each pair of attribute values, distance between every pair of attribute values, significance of attributes, calculate dissimilarity between each pair of objects.
disparityfilter Disparity Filter Algorithm of Weighted Network
Disparity filter is a network reduction algorithm to extract the backbone structure of both directed and undirected weighted networks. Disparity filter can reduce the network without destroying the multi-scale nature of the network. The algorithm has been developed by M. Angeles Serrano, Marian Boguna, and Alessandro Vespignani in Extracting the multiscale backbone of complex weighted networks.
dispRity Measuring Disparity
A modular package for measuring disparity from multidimensional matrices. Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several basic statistical tests for disparity analysis.
dissever Spatial Downscaling using the Dissever Algorithm
Spatial downscaling of coarse grid mapping to fine grid mapping using predictive covariates and a model fitted using the ‘caret’ package. The original dissever algorithm was published by Malone et al. (2012) <doi:10.1016/j.cageo.2011.08.021>, and extended by Roudier et al. (2017) <doi:10.1016/j.compag.2017.08.021>.
DiSSMod Fitting Sample Selection Models for Discrete Response Variables
Tools to fit sample selection models in case of discrete response variables, through a parametric formulation which represents a natural extension of the well-known Heckman selection model are provided in the package. The response variable can be of Bernoulli, Poisson or Negative Binomial type. The sample selection mechanism allows to choose among a Normal, Logistic or Gumbel distribution.
distance.sample.size Calculates Study Size Required for Distance Sampling
Calculates the study size (either number of detections, or proportion of region that should be covered) to achieve a target precision for the estimated abundance. The calculation allows for the penalty due to unknown detection function, and for overdispersion. The user must specify a guess at the true detection function.
distances Tools for Distances and Metrics
Provides tools for constructing, manipulating and using distance metrics.
distcomp Distributed Computations
Distcomp, a new R package available on GitHub from a group of Stanford researchers has the potential to significantly advance the practice of collaborative computing with large data sets distributed over separate sites that may be unwilling to explicitly share data. The fundamental idea is to be able to rapidly set up a web service based on Shiny and opencpu technology that manages and performs a series of master / slave computations which require sharing only intermediate results. The particular target application for distcomp is any group of medical researchers who would like to fit a statistical model using the data from several data sets, but face daunting difficulties with data aggregation or are constrained by privacy concerns. Distcomp and its methodology, however, ought to be of interest to any organization with data spread across multiple heterogeneous database environments.
distcrete Discrete Distribution Approximations
Creates discretised versions of continuous distribution functions by mapping continuous values to an underlying discrete grid, based on a (uniform) frequency of discretisation, a valid discretisation point, and an integration range. For a review of discretisation methods, see Chakraborty (2015) <doi:10.1186/s40488-015-0028-6>.
distdichoR Distributional Method for the Dichotomisation of Continuous Outcomes
Contains a range of functions covering the present development of the distributional method for the dichotomisation of continuous outcomes. The method provides estimates with standard error of a comparison of proportions (difference, odds ratio and risk ratio) derived, with similar precision, from a comparison of means. See the URL below or <arXiv:1809.03279> for more information.
distill R Markdown’ Format for Scientific and Technical Writing
Scientific and technical article format for the web. ‘Distill’ articles feature attractive, reader-friendly typography, flexible layout options for visualizations, and full support for footnotes and citations.
disto Unified Interface to Distance, Dissimilarity, Similarity Matrices
Provides a high level API to interface over sources storing distance, dissimilarity, similarity matrices with matrix style extraction, replacement and other utilities. Currently, in-memory dist object backend is supported.
distreg.vis Framework for the Visualization of Distributional Regression Models
Functions for visualizing distributional regression models fitted using the ‘gamlss’ or ‘bamlss’ R package. The core of the package consists of a ‘shiny’ application, where the model results can be interactively explored and visualized.
DISTRIB Four Essential Functions for Statistical Distributions Analysis: A New Functional Approach
A different way for calculating pdf/pmf, cdf, quantile and random data such that the user is able to consider the name of related distribution as an argument and so easily can changed by a changing argument by user. It must be mentioned that the core and computation base of package ‘DISTRIB’ is package ‘stats’. Although similar functions are introduced previously in package ‘stats’, but the package ‘DISTRIB’ has some special applications in some special computational programs.
distrr Estimate and Manage Empirical Distributions
Tools to estimate and manage empirical distributions, which should work with survey data. One of the main features is the possibility to create data cubes of estimated statistics, that include all the combinations of the variables of interest (see for example functions dcc5() and dcc6()).
disttools Distance Object Manipulation Tools
Provides convenient methods for accessing the data in ‘dist’ objects with minimal memory and computational overhead. ‘disttools’ can be used to extract the distance between any pair or combination of points encoded by a ‘dist’ object using only the indices of those points. This is an improvement over existing functionality, which requires either coercing a ‘dist’ object into a matrix or calculating the one dimensional index corresponding to a pair of observations. Coercion to a matrix is undesirable because doing so doubles the amount of memory required for storage. In contrast, there is no inherent downside to the latter solution. However, in part due to several edge cases, correctly and efficiently implementing such a solution can be challenging. ‘disttools’ abstracts away these challenges and provides a simple interface to access the data in a ‘dist’ object using the latter approach.
dixonTest Dixon’s Ratio Test for Outlier Detection
For outlier detection in small and normally distributed samples the ratio test of Dixon (Q-test) can be used. Density, distribution function, quantile function and random generation for Dixon’s ratio statistics are provided as wrapper functions. The core applies McBane’s Fortran functions <doi:10.18637/jss.v016.i03> that use Gaussian quadrature for a numerical solution.
DJL Distance Measure Based Judgment and Learning
Implements various decision support tools related to the new product development. Subroutines include productivity evaluation using distance measures, benchmarking, risk analysis, technology adoption model, inverse optimization, etc.
dLagM Time Series Regression Models with Distributed Lag Models
Provides time series regression models with one predictor using finite distributed lag models, polynomial (Almon) distributed lag models, geometric distributed lag models with Koyck transformation, and autoregressive distributed lag models. It also consists of functions for computation of h-step ahead forecasts from these models. See Baltagi (2011) <doi:10.1007/978-3-642-20059-5> for more information.
DLASSO Implementation of Differentiable Lasso Penalty in Linear Models
An implementation of the differentiable lasso (dlasso) using iterative ridge algorithm. This package allows selecting the tuning parameter by AIC, BIC and GCV.
dlib Allow Access to the ‘Dlib’ C++ Library
Interface for ‘Rcpp’ users to ‘dlib’ <http://dlib.net> which is a ‘C++’ toolkit containing machine learning algorithms and computer vision tools. It is used in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. This package allows R users to use ‘dlib’ through ‘Rcpp’.
dlm Bayesian and Likelihood Analysis of Dynamic Linear Models
Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models
dlookr Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values and outliers and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and relationship between target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputates missing values and outliers, resolving skewness. And it creates automated reports that support these three tasks.
dlsem Distributed-Lag Structural Equation Modelling
Fit distributed-lag structural equation models and perform path analysis at different time lags.
dlstats Download Stats of R Packages
Monthly download stats of ‘CRAN’ and ‘Bioconductor’ packages. Download stats of ‘CRAN’ packages is from the ‘RStudio’ ‘CRAN mirror’, see <http://cranlogs.r-pkg.org>. ‘Bioconductor’ package download stats is at <https://…/>.
dmai Divisia Monetary Aggregates Index
Functions to calculate Divisia monetary aggregates index as given in Barnett, W. A. (1980) (<DOI:10.1016/0304-4076(80)90070-6>).
dml Distance Metric Learning in R
The state-of-the-art algorithms for distance metric learning, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.
dmm Dyadic Mixed Model for Pedigree Data
Dyadic mixed model analysis with multi-trait responses and pedigree-based partitioning of individual variation into a range of environmental and genetic variance components for individual and maternal effects.
dMod Dynamic Modeling and Parameter Estimation in ODE Models
The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.
DMRnet Delete or Merge Regressors Algorithms for Linear and Logistic Model Selection and High-Dimensional Data
Model selection algorithms for regression and classification, where the predictors can be numerical and categorical and the number of regressors exceeds the number of observations. The selected model consists of a subset of numerical regressors and partitions of levels of factors. Aleksandra Maj-KaÅ„ska, Piotr Pokarowski and Agnieszka Prochenka (2015) <doi:10.1214/15-EJS1050>. Piotr Pokarowski and Jan Mielniczuk (2015) <http://…/pokarowski15a.pdf>.
dmutate Mutate Data Frames with Random Variates
Work within the ‘dplyr’ workflow to add random variates to your data frame. Variates can be added at any level of an existing column. Also, bounds can be specified for simulated variates.
dnc Dynamic Network Clustering
Community detection for dynamic networks, i.e., networks measured repeatedly over a sequence of discrete time points, using a latent space approach.
dng Distributions and Gradients
Provides density, distribution function, quantile function and random generation for the split-t distribution, and computes the mean, variance, skewness and kurtosis for the split-t distribution (Li, F, Villani, M. and Kohn, R. (2010) <doi:10.1016/j.jspi.2010.04.031>).
DNLC Differential Network Local Consistency Analysis
Using Local Moran’s I for detection of differential network local consistency.
DNMF Discriminant Non-Negative Matrix Factorization
Discriminant Non-Negative Matrix Factorization aims to extend the Non-negative Matrix Factorization algorithm in order to extract features that enforce not only the spatial locality, but also the separability between classes in a discriminant manner. This algorithm refers to an article, Zafeiriou, Stefanos, et al. “Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification.” Neural Networks, IEEE Transactions on 17.3 (2006): 683-695.
dnr Simulate Dynamic Networks using Exponential Random Graph Models (ERGM) Family
Functions are provided to fit temporal lag models to dynamic networks. The models are build on top of exponential random graph models (ERGM) framework. There are functions for simulating or forecasting networks for future time points. Stable Multiple Time Step Simulation/Prediction from Lagged Dynamic Network Regression Models. Mallik, Almquist (2017, under review).
docker Wraps Docker Python SDK
Allows accessing ‘Docker’ ‘SDK’ from ‘R’ via the ‘Docker’ ‘Python’ ‘SDK’ using the ‘reticulate’ package. This is a very thin wrapper that tries to do very little and get out of the way. The user is expected to know how to use the ‘reticulate’ package to access ‘Python’ modules, and how the ‘Docker’ ‘Python’ ‘SDK’ works.
dockerfiler Easy Dockerfile Creation
Build a Dockerfile straight from your R session. ‘dockerfiler’ allows you to create step by step a Dockerfile, and provides convenient tools to wrap R code inside this Dockerfile.
docopulae Optimal Designs for Copula Models
A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding Ds-optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common low-level tasks.
docstring Provides Docstring Capabilities to R Functions
Provides the ability to display something analogous to Python’s docstrings within R. By allowing the user to document their functions as comments at the beginning of their function without requiring putting the function into a package we allow more users to easily provide documentation for their functions. The documentation can be viewed just like any other help files for functions provided by packages as well.
doctr Easily Check Data Consistency and Quality
A tool that helps you check the consistency and the quality of data. Like a real doctor, it has functions for examining, diagnosing and assessing the progress of its ‘patients”.
document Run ‘roxygen2’ on (Chunks of) Single Code Files
Have you ever been tempted to create ‘roxygen2’-style documentation comments for one of your functions that was not part of one of your packages (yet)? This is exactly what this package is about: running ‘roxygen2’ on (chunks of) a single code file.
documenter Documents Files
It is sometimes necessary to create documentation for all files in a directory. Doing so by hand can be very tedious. This task is made fast and reproducible using the functionality of ‘documenter’. It aggregates all text files in a directory and its subdirectories into a single word document in a semi-automated fashion.
docuSignr Connect to ‘DocuSign’ API
Connect to the ‘DocuSign’ Rest API <https://…/RESTAPIGuide.htm>, which supports embedded signing, and sending of documents.
docxtractr Extract Tables from Microsoft Word Documents with R
docxtractr is an R pacakge for extracting tables out of Word documents (docx) Microsoft Word docx files provide an XML structure that is fairly straightforward to navigate, especially when it applies to Word tables. The docxtractr package provides tools to determine table count, table structure and extract tables from Microsoft Word docx documents.
dodgr Distances on Directed Graphs
Distances on dual-weighted directed graphs using priority-queue shortest paths. Weighted directed graphs have weights from A to B which may differ from those from B to A. Dual-weighted directed graphs have two sets of such weights. A canonical example is a street network to be used for routing in which routes are calculated by weighting distances according to the type of way and mode of transport, yet lengths of routes must be calculated from direct distances.
DODR Detection of Differential Rhythmicity
Detect Differences in rhythmic time series. Using linear least squares and the robust semi-parametric rfit() method. Differences in harmonic fitting could be detected as well as differences in scale of the noise distribution.
DoE.MIParray Creation of Arrays by Mixed Integer Programming
CRAN’ package ‘DoE.base’ and non-‘CRAN’ packages ‘gurobi’ and ‘Rmosek’ (newer version than that on ‘CRAN’) are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. Optimization requires the availability of at least one of the commercial products ‘Gurobi’ or ‘Mosek’ (free academic licenses available for both). For installing ‘Gurobi’ and its R package ‘gurobi’, follow instructions at <http://…/gurobi-optimizer> and <http://…/r_api_overview.html>. For installing ‘Mosek’ and its R package ‘Rmosek’, follow instructions at <https://…/> and <http://…/install-interface.html>.
DoEstRare Rare Variant Association Test Based on Position Density Estimation
Rare variant association test integrating variant position information. It aims to identify the presence of clusters of disease-risk variants in specific gene regions. For more details, please read the publication from Persyn et al. (2017) <doi:10.1371/journal.pone.0179364>.
doex One-Way Heteroscedastic ANOVA Tests
Contains several one-way heteroscedastic ANOVA tests such as Alexander-Govern, Alvandi et al. Generalized F, Approximate F, Box F, Brown-Forsythe, B2, Cochran F, Fiducial Approach, Generalized F, Johansen F, Modified Brown-Forsythe, Modified Welch, One-Stage, One-Stage Range, Parametric Bootstrap, Permutation F, Scott-Smith, Welch and Welch-Aspin test. These tests are used to test the equality of group means under unequal variance. Furthermore, a modified version of Generalized F-test is improved to test the equality of non-normal group means under unequal variances. Tukey’s bi-square estimators, one-step Tukey’s bi-square estimators, Andrew’s wave estimator, one-step Andrew’s wave estimators, Huber’s M-estimators are used to modify Generalized F-test.
doFuture Foreach Parallel Adaptor using the Future API of the ‘future’ Package
Provides a ‘%dopar%’ adaptor such that any type of futures can be used as backends for the ‘foreach’ framework.
domaintools R API interface to the DomainTools API
The following functions are implemented:
• domaintools_api_key: Get or set DOMAINTOOLS_API_KEY value
• domaintools_username: Get or set DOMAINTOOLS_API_USERNAME value
• domain_profile: Domain Profile
• hosting_history: Hosting History
• parsed_whois: Parsed Whois
• reverse_ip: Reverse IP
• reverse_ns: Reverse Nameserver
• shared_ips: Shared IPs
• whois: Whois Lookup
• whois_history: Whois History
doMC Foreach parallel adaptor for the multicore package
Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package..
doremi Dynamics of Return to Equilibrium During Multiple Inputs
Provides models to fit the dynamics of a regulated system experiencing exogenous inputs. The underlying models use differential equations and linear mixed-effects regressions to estimate the characteristic parameters of the equation (the coefficients) and an estimated signal. The package also provides print, summary, plot and predict functions, specific for the models outputs.
dosearch Causal Effect Identification from Multiple Incomplete Data Sources
Identification of causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations using a search-based algorithm. Allows for the presence of mechanisms related to selection bias (Bareinboim, E. and Tian, J. (2015) <http://…/r445.pdf> ), transportability (Bareinboim, E. and Pearl, J. (2014) <http://…/r443.pdf> ) and missing data (Mohan, K. and Pearl, J. and Tian., J. (2013) <http://…/r410.pdf> ).
DOT Render and Export DOT Graphs in R
Renders DOT diagram markup language in R and also provides the possibility to export the graphs in PostScript and SVG (Scalable Vector Graphics) formats. In addition, it supports literate programming packages such as ‘knitr’ and ‘rmarkdown’.
DoTC Distribution of Typicality Coefficients
Calculation of cluster typicality coefficients as being generated by fuzzy k-means clustering.
dotdot Enhanced Assignment Operator to Overwrite or Grow Objects
Use ‘..’ on the right hand side of the ‘:=’ operator as a shorthand for the left hand side, so that ‘var := f(..) + ..’ is equivalent to ‘var <- f(var) + var’. This permits the user to be explicit about growing an object or overwriting it using its previous value, avoids repeating a variable name, and saves keystrokes, time, visual space and cognitive load.
dotwhisker Dot-and-Whisker Plots of Regression Coefficients from Tidy Data Frames
Quick and easy dot-and-whisker plots of regression models saved in tidy data frames.
doubcens Survivor Function Estimation for Doubly Interval-Censored Failure Time Data
Contains the discrete nonparametric survivor function estimation algorithm of De Gruttola and Lagakos for doubly interval-censored failure time data and the discrete nonparametric survivor function estimation algorithm of Sun for doubly interval-censored left-truncated failure time data [Victor De Gruttola & Stephen W. Lagakos (1989) <doi:10.2307/2532030>] [Jianguo Sun (1995) <doi:10.2307/2533008>].
double.truncation Analysis of Doubly-Truncated Data
Likelihood-based inference methods with doubly-truncated data are developed under various models. Parametric models from the special exponential family (SEF) are based on Hu and Emura (2015) <doi:10.1007/s00180-015-0564-z> and Emura, Hu and Konno (2017) <doi:10.1007/s00362-015-0730-y>.
Dowd Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk
Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from http://www.kevindowd.org/measuring-market-risk/. It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods.
downsize A Tool to Scale Down Large Workflows for Testing
Toggles the test and production versions of a large workflow.
dparser Port of Dparser Package
A Scannerless GLR parser/parser generator. Note that GLR standing for ‘generalized LR’, where L stands for ‘left-to-right’ and R stands for ‘rightmost (derivation)’. For more information see <https://…/GLR_parser>. This parser is based on the Tomita (1987) algorithm. (Paper can be found at <http://…/J87-1004.pdf> ). The original dparser package documentation can be found at <http://…/>. This allows you to add mini-languages to R (like RxODE’s ODE mini-language Wang, Hallow, and James 2015 <DOI:10.1002/psp4.12052>) or to parse other languages like NONMEM to automatically translate them to R code. To use this in your code, add a LinkingTo ‘dparser’ in your DESCRIPTION file and instead of using ‘#include <dparse.h>’ use ‘#include <dparser.h>’. This also provides a R-based port of the make_dparser <http://…/make_dparser.cat> command called ‘mkdparser’. Additionally you can parse an arbitrary grammar within R using the ‘dparse’ function.
dpcR Digital PCR Analysis
Analysis, visualisation and simulation of digital polymerase chain reaction (dPCR) (Burdukiewicz et al. (2016) <doi:10.1016/j.bdq.2016.06.004>). Supports data formats of commercial systems (Bio-Rad QX100 and QX200; Fluidigm BioMark) and other systems.
dplyr A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
dplyr.teradata A ‘Teradata’ Backend for ‘dplyr’
A ‘Teradata’ backend for ‘dplyr’. It makes it possible to operate ‘Teradata’ database <https://…/> in the same way as manipulating data frames with ‘dplyr’.
dplyrAssist RStudio Addin for Teaching and Learning Data Manipulation Using ‘dplyr’
An RStudio addin for teaching and learning data manipulation using the ‘dplyr’ package. You can learn each steps of data manipulation by clicking your mouse without coding. You can get resultant data (as a ‘tibble’) and the code for data manipulation.
dplyrr Utilities for comfortable use of dplyr with databases
dplyr is the most powerful package for data handling in R, and it has also the ability of working with databases(See Vignette). But the functionalities of dealing with databases in dplyr is developing yet. Now, I’m trying to make dplyr with databases more comfortable by using some functions. For that purpose, I’ve created dplyrr package.
New package ‘dplyrr’
dplyrXdf dplyr backend for Revolution Analytics xdf files
The dplyr package is a popular toolkit for data transformation and manipulation. Over the last year and a half, dplyr has become a hot topic in the R community, for the way in which it streamlines and simplifies many common data manipulation tasks. Out of the box, dplyr supports data frames, data tables (from the data.table package), and the following SQL databases: MySQL/MariaDB, SQLite, and PostgreSQL. However, a feature of dplyr is that it’s extensible: by writing a specific backend, you can make it work with many other kinds of data sources. For example the development version of the RSQLServer package implements a dplyr backend for Microsoft SQL Server. The dplyrXdf package implements such a backend for the xdf file format, a technology supplied as part of Revolution R Enterprise. All of the data transformation and modelling functions provided with Revolution R Enterprise support xdf files, which allow you to break R’s memory barrier: by storing the data on disk, rather than in memory, they make it possible to work with multi-gigabyte or terabyte-sized datasets. dplyrXdf brings the benefits of dplyr to xdf files, including support for pipeline notation, all major verbs, and the ability to incorporate xdfs into dplyr pipelines.
dpmr Data Package Manager for R
Create, install, and summarise data packages that follow the Open Knowledge Foundation’s Data Package Protocol.
DPP Inference of Parameters of Normal Distributions from a Mixture of Normals
This MCMC method takes a data numeric vector (Y) and assigns the elements of Y to a (potentially infinite) number of normal distributions. The individual normal distributions from a mixture of normals can be inferred. Following the method described in Escobar (1994) <doi:10.2307/2291223> we use a Dirichlet Process Prior (DPP) to describe stochastically our prior assumptions about the dimensionality of the data.
dprep Data Pre-Processing and Visualization Functions for Classification
Data preprocessing techniques for classification. Functions for normalization, handling of missing values,discretization, outlier detection, feature selection, and data visualization are included.
dprint Print Tabular Data to Graphics Device
Provides a generalized method for printing tabular data within the R environment in order to make the process of presenting high quality tabular output seamless for the user. Output is directed to the R graphics device so that tables can be exported to any file format supported by the graphics device. Utilizes a formula interface to specify the contents of tables often found in manuscripts or business reports. In addition, formula interface provides inline formatting of the numeric cells of a table and renaming column labels.
DPtree Dirichlet-Based Polya Tree
Contains functions to perform copula estimation by the non-parametric Bayesian method, Dirichlet-based Polya Tree. See Ning (2018) <doi:10.1080/00949655.2017.1421194>.
DPWeibull Dirichlet Process Weibull Mixture Model for Survival Data
Use Dirichlet process Weibull mixture model and dependent Dirichlet process Weibull mixture model for survival data with and without competing risks. Dirichlet process Weibull mixture model is used for data without covariates and dependent Dirichlet process model is used for regression data. The package is designed to handle exact/right-censored/ interval-censored observations without competing risks and exact/right-censored observations for data with competing risks. Inside each cluster of Dirichlet process, we assume a multiplicative effect of covariates as in Cox model and Fine and Gray model. In addition, we provide a wrapper for DPdensity() function from the R package ‘DPpackage’. This wrapper automatically uses Low Information Omnibus prior and can model one and two dimensional data with Dirichlet mixture of Gaussian distributions.
dqrng Fast Pseudo Random Number Generators
Several fast random number generators are provided as C++ header only libraries: The PCG family by O’Neill (2014 <https://…/hmc-cs-2014-0905.pdf> ) as well as Xoroshiro128+ and Xoshiro256+ by Blackman and Vigna (2018 <arXiv:1805.01407>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). These functions are exported to R and as a C++ interface and are enabled for use with the 64 bit version of the Mersenne-Twister by Matsumoto and Nishimura (1998 <doi:10.1145/272991.272995>), the default 64 bit generator from the PCG family as well as Xoroshiro128+ and Xoshiro256+.
dr4pl Dose Response Data Analysis using the 4 Parameter Logistic (4PL) Model
Models the relationship between dose levels and responses in a pharmacological experiment using the 4 Parameter Logistic model. Traditional packages on dose-response modelling such as ‘drc’ and ‘nplr’ often draw errors due to convergence failure especially when data have outliers or non-logistic shapes. This package provides robust estimation methods that are less affected by outliers and other initialization methods that work well for data lacking logistic shapes. We provide the bounds on the parameters of the 4PL model that prevent parameter estimates from diverging or converging to zero and base their justification in a statistical principle. These methods are used as remedies to convergence failure problems. Gadagkar, S. R. and Call, G. B. (2015) <doi:10.1016/j.vascn.2014.08.006> Ritz, C. and Baty, F. and Streibig, J. C. and Gerhard, D. (2015) <doi:10.1371/journal.pone.0146021>.
dragonking Statistical Tools to Identify Dragon Kings
Statistical tests and test statistics to identify events in a dataset that are dragon kings (DKs). The statistical methods in this package were reviewed in Wheatley & Sornette (2015) <doi:10.2139/ssrn.2645709>.
dragulaR Drag and Drop Elements in ‘Shiny’ using ‘Dragula Javascript Library’
Move elements between containers in ‘Shiny’ without explicitly using ‘JavaScript’. It can be used to build custom inputs or to change the positions of user interface elements like plots or tables.
drake An R-focused pipeline toolkit for reproducibility and high-performance computing
Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. How much of that valuable output can you keep, and how much do you need to update? How much runtime must you endure all over again? For projects in R, the drake package can help. It analyzes your workflow, skips steps with up-to-date results, and orchestrates the rest with optional distributed computing. At the end, drake provides evidence that your results match the underlying code and data, which increases your ability to trust your research.
drake Data Frames in R for Make
Efficiently keep your results up to date with your code.
drat Drat R Archive Template
Creation and Use of R Repositories via two helper functions to insert packages into a repository, and to add repository information to the current R session. Two primary types of repositories are support: gh-pages at GitHub, as well as local repositories on either the same machine or a local network. Drat is a recursive acronym which stands for Drat R Archive Template.
draw Wrapper Functions for Producing Graphics
A set of user-friendly wrapper functions for creating consistent graphics and diagrams with lines, common shapes, text, and page settings. Compatible with and based on the R ‘grid’ package.
DRaWR Discriminative Random Walk with Restart
We present DRaWR, a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types, preserving more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only the relevant properties. We then rerank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork.
DrBats Data Representation: Bayesian Approach That’s Sparse
Feed longitudinal data into a Bayesian Latent Factor Model to obtain a low-rank representation. Parameters are estimated using a Hamiltonian Monte Carlo algorithm with STAN. See G. Weinrott, B. Fontez, N. Hilgert and S. Holmes, ‘Bayesian Latent Factor Model for Functional Data Analysis’, Actes des JdS 2016.
DREGAR Regularized Estimation of Dynamic Linear Regression in the Presence of Autocorrelated Residuals (DREGAR)
A penalized/non-penalized implementation for dynamic regression in the presence of autocorrelated residuals (DREGAR) using iterative penalized/ordinary least squares. It applies Mallows CP, AIC, BIC and GCV to select the tuning parameters.
DrillR R Driver for Apache Drill
Provides a R driver for Apache Drill<https://drill.apache.org>, which could connect to the Apache Drill cluster<https://…/installing-drill-on-the-cluster> or drillbit<https://…/embedded-mode-prerequisites> and get result(in data frame) from the SQL query and check the current configuration status. This link <https://…/docs> contains more information about Apache Drill.
DRIP Discontinuous Regression and Image Processing
This is a collection of functions for discontinuous regression analysis and image processing.
drtmle Doubly-Robust Nonparametric Estimation and Inference
Targeted minimum loss-based estimators of counterfactual means and causal effects that are doubly-robust with respect both to consistency and asymptotic normality (van der Laan (2014), <doi:10.1515/ijb-2012-0038>).
dsa Seasonal Adjustment of Daily Time Series
Seasonal- and calendar adjustment of time series with daily frequency using the DSA approach developed by Ollech, Daniel (2018): Seasonal adjustment of daily time series. Bundesbank Discussion Paper 41/2018.
dsm Density Surface Modelling of Distance Sampling Data
Density surface modelling of line transect data. A Generalized Additive Model-based approach is used to calculate spatially-explicit estimates of animal abundance from distance sampling (also presence/absence and strip transect) data. Several utility functions are provided for model checking, plotting and variance estimation.
dsmodels A Language to Facilitate the Creation and Visualization of Two- Dimensional Dynamical Systems
An expressive language to facilitate the creation and visualization of two-dimensional dynamical systems. The basic elements of the language are a model wrapping around a function(x,y) which outputs a list(x = xprime, y = yprime), and a range. The language supports three types of visual objects: visualizations, features, and backgrounds. Visualizations, including dots and arrows, depict the behavior of the dynamical system over the entire range. Features display user-defined curves and points, and their images under the system. Backgrounds define and color regions of interest, such as areas of convergence and divergence. The language can also automatically guess attractors and regions of convergence and divergence.
dsr Compute Directly Standardized Rates, Ratios and Differences
A set of functions to compute and compare directly standardized rates, rate differences and ratios. A variety of user defined options for analysis (e.g confidence intervals) and formatting are included.
dsrTest Tests and Confidence Intervals on Directly Standardized Rates for Several Methods
Perform a test of a simple null hypothesis about a directly standardized rate and obtain the matching confidence interval using a choice of methods.
DSsim Distance Sampling Simulations
Performs distance sampling simulations. It repeatedly generates instances of a user defined population within a given survey region, generates realisations of a survey design (currently these must be pregenerated using Distance software <http://…/> ) and simulates the detection process. The data are then analysed so that the results can be compared for accuracy and precision across all replications. This will allow users to select survey designs which will give them the best accuracy and precision given their expectations about population distribution. Any uncertainty in population distribution or population parameters can be included by running the different survey designs for a number of different population descriptions. An example simulation can be found in the help file for make.simulation.
dst Using Dempster-Shafer Theory
This package allows you to make basic probability assignments on a set of possibilities (events) and combine these events with Dempster’s rule of combination.
dstat Conditional Sensitivity Analysis for Matched Observational Studies
A d-statistic tests the null hypothesis of no treatment effect in a matched, nonrandomized study of the effects caused by treatments. A d-statistic focuses on subsets of matched pairs that demonstrate insensitivity to unmeasured bias in such an observational study, correcting for double-use of the data by conditional inference. This conditional inference can, in favorable circumstances, substantially increase the power of a sensitivity analysis (Rosenbaum (2010) <doi:10.1007/978-1-4419-1213-8_14>). There are two examples, one concerning unemployment from Lalive et al. (2006) <doi:10.1111/j.1467-937X.2006.00406.x>, the other concerning smoking and periodontal disease from Rosenbaum (2017) <doi:10.1214/17-STS621>.
dSVA Direct Surrogate Variable Analysis
Functions for direct surrogate variable analysis, which can identify hidden factors in high-dimensional biomedical data.
DT R Interface to the jQuery Plug-in DataTables http://rstudio.github.io/DT
This package provides a function datatable() to display R data via the DataTables library (N.B. not to be confused with the data.table package).
An R interface to the DataTables library
An R interface to the DataTables library
dtables Simplifying Descriptive Frequencies and Statistics
Towards automation of descriptive frequencies and statistics tables.
DTDA.ni Doubly Truncated Data Analysis, Non Iterative
Non-iterative estimator for the cumulative distribution of a doubly truncated variable. de Uña-Álvarez J. (2018) <doi:10.1007/978-3-319-73848-2_37>.
dtp Dynamic Panel Threshold Model
Compute the dynamic threshold panel model suggested by (Stephanie Kremer, Alexander Bick and Dieter Nautz (2013) <doi:10.1007/s00181-012-0553-9>) in which they extended the (Hansen (1999) <doi: 10.1016/S0304-4076(99)00025-1>) original static panel threshold estimation and the Caner and (Hansen (2004) <doi:10.1017/S0266466604205011>) cross-sectional instrumental variable threshold model, where generalized methods of moments type estimators are used.
dtplyr Data Table Back-End for ‘dplyr’
This implements the data table back-end for ‘dplyr’ so that you can seamlessly use data table and ‘dplyr’ together.
dtq data.table query
Auditing data transformation can be simply described as gathering metadata about the transformation process. The most basics metadata would be a timestamp, atomic transformation description, data volume on input, data volume on output, time elapsed. If you work with R only interactively you may find it more like a fancy tool. On the other hand for automated scheduled R jobs it may be quite helpful to have traceability on the lower grain of processing than just binary success or fail after the script is executed, for example a logging each query against the data. Similar features are already available in ETL tools for decades. I’ve addressed this in my dtq package.
http://…/dtq.html
dtree Decision Trees
Combines various decision tree algorithms, plus both linear regression and ensemble methods into one package. Allows for the use of both continuous and categorical outcomes. An optional feature is to quantify the (in)stability to the decision tree methods, indicating when results can be trusted and when ensemble methods may be preferential.
DTRlearn Learning Algorithms for Dynamic Treatment Regimes
Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage by potentially time-varying patient features and intermediate outcomes observed in previous stages. There are 3 main type methods, O-learning, Q-learning and P-learning to learn the optimal Dynamic Treatment Regimes with continuous variables. This package provide these state of arts algorithms to learn DTRs.
DTRlearn2 Statistical Learning Methods for Optimizing Dynamic Treatment Regimes
We provide a comprehensive software to estimate general K-stage DTRs from SMARTs with Q-learning and a variety of outcome-weighted learning methods. Penalizations are allowed for variable selection and model regularization. With the outcome-weighted learning scheme, different loss functions – SVM hinge loss, SVM ramp loss, binomial deviance loss, and L2 loss – are adopted to solve the weighted classification problem at each stage; augmentation in the outcomes is allowed to improve efficiency. The estimated DTR can be easily applied to a new sample for individualized treatment recommendations or DTR evaluation.
DTRreg DTR Estimation and Inference via G-Estimation, Dynamic WOLS, and Q-Learning
Dynamic treatment regime estimation and inference via G-estimation, dynamic weighted ordinary least squares (dWOLS) and Q-learning. Inference via bootstrap and (for G-estimation) recursive sandwich estimation.
DTSg A Class for Working with Time Series Based on ‘data.table’ and ‘R6’ with Largely Optional Reference Semantics
Basic time series functionalities such as listing of missing values, application of arbitrary aggregation as well as rolling window functions and automatic detection of periodicity. As it is mainly based on ‘data.table’, it is fast and – in combination with the ‘R6’ package – offers reference semantics. In addition to its native R6 interface, it provides an S3 interface inclusive an S3 wrapper method generator for those who prefer the latter.
dtwclust Time Series Clustering with Dynamic Time Warping
Time series clustering using different techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Additionally, an implementation of k-Shape clustering is available.
dtwSat Time-Weighted Dynamic Time Warping for Remote Sensing Time Series Analysis
Provides a Time-Weighted Dynamic Time Warping (TWDTW) algorithm to measure similarity between two temporal sequences. This adaptation of the classical Dynamic Time Warping (DTW) algorithm is flexible to compare events that have a strong time dependency, such as phenological stages of cropland systems and tropical forests. This package provides methods for visualization of minimum cost paths, time series alignment, and time intervals classification.
duawranglr Securely Wrangle Dataset According to Data Usage Agreement
Create shareable data sets from raw data files that contain protected elements. Relying on master crosswalk files that list restricted variables, package functions warn users about possible violations of data usage agreement and prevent writing protected elements.
dub Unpacking Assignment for Lists via Pattern Matching
Provides an operator for assigning nested components of a list to names via a concise, Haskell-like pattern matching syntax. This is especially convenient for assigning individual names to the multiple values that a function may return in the form of a list, and for extracting deeply nested list components.
dvir TeX as a layout engine
The package reads DVI files that are produced from TeX files and renders the content using the R package ‘grid’.
dvmisc Faster Computation of Common Statistics and Miscellaneous Functions
Faster versions of base R functions (e.g. mean, standard deviation, covariance, weighted mean), mostly written in C++, along with miscellaneous functions for various purposes (e.g. create histogram with fitted probability density function or probability mass function curve, create body mass index groups, assess linearity assumption in logistic regression).
dwapi A Client for Data.world’s REST API
A set of wrapper functions for data.world’s REST API endpoints.
DWDLargeR Fast Algorithms for Large Scale Generalized Distance Weighted Discrimination
Solving large scale distance weighted discrimination. The main algorithm is a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method. See Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018) <arXiv:1604.05473> for more details.
DWLasso Degree Weighted Lasso
Infers networks with hubs using degree weighted Lasso method.
DWreg Parametric Regression for Discrete Response
Regression for a discrete response, where the conditional distribution is modelled via a discrete Weibull distribution.
dwtools Data Warehouse related functions
Handy wrappers for extraction, loading, denormalization, normalization. Additionally data.table Nth key feature, timing+logging and more.
dyads Dyadic Network Analysis
Includes a function for estimation of the p2 model (van Duijn, Snijders and Zijlstra (2004) <doi:10.1046/j.0039-0402.2003.00258.x>), more specifically, the adaptive random walk algorithm (Zijlstra, van Duijn and Snijders (2009) <doi:10.1348/000711007X255336>).
dydea Detection of Chaotic and Regular Intervals in the Data
Finds regular and chaotic intervals in the data using the 0-1 test for chaos proposed by Gottwald and Melbourne (2004) <DOI:10.1137/080718851>.
dygraphs Interface to Dygraphs Interactive Time Series Charting Library
An R interface to the dygraphs JavaScript charting library (a copy of which is included in the package). Provides rich facilities for charting time-series data in R, including highly configurable series- and axis-display and interactive features like zoom/pan and series/point highlighting.
http://…/dygraphs
DYM Did You Mean?
Add a ‘Did You Mean’ feature to the R interactive. With this package, error messages for misspelled input of variable names or package names suggest what you really want to do in addition to notification of the mistake.
dynamac Dynamic Simulation and Testing for Single-Equation ARDL Models
While autoregressive distributed lag models allow for extremely flexible dynamics, interpreting substantive significance of complex lag structures remains difficult. This package is designed to assist users in dynamically simulating and plotting the results of various autoregressive distributed lag models. It also contains post-estimation diagnostics, including a test for cointegration when estimating the error-correction variant of the autoregressive distributed lag model (Pesaran, Shin, and Smith 2001 <doi:10.1002/jae.616>).
DynamicGP Local Gaussian Process Model for Large-Scale Dynamic Computer Experiments
Fits localized GP model for dynamic computer experiments via singular value decomposition of the response matrix Y for large N (the number of observations) using the algorithm proposed by Zhang et al. (2018) <arXiv:1611.09488>. The current version only supports 64-bit architecture.
dynamichazard Dynamic Hazard Models using State Space Models
Contains functions that lets you fit dynamic hazard models with binary outcomes using state space models. The methods are originally described in Fahrmeir (1992) <doi:10.1080/01621459.1992.10475232> and Fahrmeir (1994) <doi:10.1093/biomet/81.2.317>. The functions also provide an extension hereof where the Extended Kalman filter is replaced by an Unscented Kalman filter. Models are fitted with the regular coxph() like formula.
dynamo Fit a Stochastic Dynamical Array Model to Array Data
An implementation of the method proposed in Lund and Hansen (2018) for fitting 3-dimensional dynamical array models. The implementation is based on the glamlasso package, see Lund et al. (2017) <doi:10.1080/10618600.2017.1279548>, for efficient design matrix free lasso regularized estimation in a generalized linear array model. The implementation uses a block relaxation scheme to fit each individual component in the model using functions from the glamlasso package.
DynaRankR Inferring Longitudinal Dominance Hierarchies
Provides functions for inferring longitudinal dominance hierarchies, which describe dominance relationships and their dynamics in a single latent hierarchy over time. Strauss & Holekamp (in press).
dynaTree Dynamic Trees for Learning and Design
Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=’dynaTree’).
dyncomp Complexity of Short and Coarse-Grained Time Series
While there are many well-established measures for identifying critical fluctuations and phase transitions, these measures only work with many points of measurement and thus are unreliable when studying short and coarse-grained time series. This package provides a measure for complexity in a time series that does not rely on long time series (Kaiser (2017), <doi:10.17605/OSF.IO/GWTKX>).
dyndimred Dimensionality Reduction Methods in a Common Format
Provides a common interface for applying dimensionality reduction methods, such as Principal Component Analysis (‘PCA’), Independent Component Analysis (‘ICA’), diffusion maps, Locally-Linear Embedding (‘LLE’), t-distributed Stochastic Neighbor Embedding (‘t-SNE’), and Uniform Manifold Approximation and Projection (‘UMAP’). Has built-in support for sparse matrices.
dynetNLAResistance Resisting Neighbor Label Attack in a Dynamic Network
An anonymization algorithm to resist neighbor label attack in a dynamic network.
dynfrail Fitting Dynamic Frailty Models with the EM Algorithm
Fits semiparametric dynamic frailty models according to the methodology of Putter and van Houwelingen (2015) <doi:10.1093/biostatistics/kxv002>. Intermediate models, where the frailty is piecewise constant on prespecified intervals, are also supported. The frailty process is taken to have a specific auto-correlation structure, and the supported distributions include gamma, inverse Gaussian, power variance family (PVF) and positive stable.
dynOmics Fast Fourier Transform to Identify Associations Between Time Course Omics Data
Implements the fast Fourier transform to estimate delays of expression initiation between trajectories to integrate and analyse time course omics data.
dynpanel Dynamic Panel Data Models
Computes the first stage GMM estimate of a dynamic linear model with p lags of the dependent variables.
dynprog Dynamic Programming Domain-Specific Language
A domain-specific language for specifying translating recursions into dynamic-programming algorithms. See <https://…/Dynamic_programming> for a description of dynamic programming.
dynr Dynamic Modeling in R
Intensive longitudinal data have become increasingly prevalent in various scientific disciplines. Many such data sets are noisy, multivariate, and multi-subject in nature. The change functions may also be continuous, or continuous but interspersed with periods of discontinuities (i.e., showing regime switches). The package ‘dynr’ (Dynamic Modeling in R) is an R package that implements a set of computationally efficient algorithms for handling a broad class of linear and nonlinear discrete- and continuous-time models with regime-switching properties under the constraint of linear Gaussian measurement functions. The discrete-time models can generally take on the form of a state- space or difference equation model. The continuous-time models are generally expressed as a set of ordinary or stochastic differential equations. All estimation and computations are performed in C, but users are provided with the option to specify the model of interest via a set of simple and easy-to-learn model specification functions in R. Model fitting can be performed using single- subject time series data or multiple-subject longitudinal data.
dynRB Dynamic Range Boxes
Improves the concept of multivariate range boxes, which is highly susceptible for outliers and does not consider the distribution of the data. The package uses dynamic range boxes to overcome these problems.
dynsbm Dynamic Stochastic Block Models
Dynamic stochastic block model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time, developed in Matias and Miele (2016) <doi:10.1111/rssb.12200>.
DynTxRegime Methods for Estimating Dynamic Treatment Regimes
A comprehensive toolkit for estimating Dynamic Treatment Regimes. Available methods include Interactive Q-Learning, Q-Learning, and value-search methods based on Augmented Inverse Probability Weighted estimators and Inverse Probability Weighted estimators.
dynutils Common Functions for the Dynverse Packages
Provides a common functionality for the dynverse packages. Dynverse is created to support the development, execution, and benchmarking of trajectory inference methods. For more information, see <https://…/dynverse>.
DySeq Functions for Dyadic Sequence Analyses
Small collection of functions for dyadic binary/dichotomous sequence analyses, e.g. transforming sequences into time-to-event data, implementation of Bakeman & Gottman’s (1997) approach of aggregated logit-models, and simulating expected number of low/zero frequencies for state-transition tables. Further functions will be added in future releases. References: Bakeman, R., & Gottman, J. M. (1997) <DOI:10.1017/cbo9780511527685>.
DZEXPM Estimation and Prediction of Skewed Spatial Processes
A collection of functions designed to estimate and predict skewed spatial processes, and a real data set.

E

e1071 Misc Functions of the Department of Statistics (e1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
EAinference Simulation Based Inference of Lasso Estimator
Estimator augmentation methods for statistical inference on high-dimensional data, as described in Zhou, Q. (2014) <doi:10.1080/01621459.2014.946035> and Zhou, Q. and Min, S. (2017) <doi:10.1214/17-EJS1309>. It provides several simulation-based inference methods: (a) Gaussian and wild multiplier bootstrap for lasso, group lasso, scaled lasso, scaled group lasso and their de-biased estimators, (b) importance sampler for approximating p-values in these methods, (c) Markov chain Monte Carlo lasso sampler with applications in post-selection inference.
EAlasso Simulation Based Inference of Lasso Estimator
Simulation based inference of lasso estimator. It provides several methods to sample lasso estimator: (a) Gaussian and wild multiplier bootstrap for lasso, group lasso, scaled lasso and scaled group lasso, (b) importance sampler for lasso, group lasso, scaled lasso and scaled group lasso, (c) Markov chain Monte Carlo sampler for lasso, (d) post-selection inference for lasso. See Zhou, Q. and Min, S. (2017) <doi:10.1214/17-EJS1309> for details.
eAnalytics Dynamic Web Analytics for the Energy Industry
A ‘Shiny’ web application for energy industry analytics. Take an overview of the industry, measure Key Performance Indicators, identify changes in the industry over time, and discover new relationships in the data.
earlygating Properties of Bayesian Early Gating Designs
Computes the most important properties of four ‘Bayesian’ early gating designs (two single arm and two randomized controlled designs), such as minimum required number of successes in the experimental group to make a GO decision, operating characteristics and average operating characteristics with respect to the sample size. These might aid in deciding what design to use for the early phase trial.
earth Multivariate Adaptive Regression Splines
Build regression models using the techniques in Friedman’s papers ‘Fast MARS’ and ‘Multivariate Adaptive Regression Splines’. (The term ‘MARS’ is trademarked and thus not used in the name of the package.)
earthtones Derive a Color Palette from a Particular Location on Earth
Downloads a satellite image via Google Maps/Earth (these are originally from a variety of aerial photography sources), translates the image into a perceptually uniform color space, runs one of a few different clustering algorithms on the colors in the image searching for a user-supplied number of colors, and returns the resulting color palette.
easyAHP Analytic Hierarchy Process (AHP)
Given the scores from decision makers, the analytic hierarchy process can be conducted easily.
easyalluvial Generate Alluvial Plots with a Single Line of Code
Alluvial plots are similar to sankey diagrams and visualise categorical data over multiple dimensions as flows. (Rosvall M, Bergstrom CT (2010) Mapping Change in Large Networks. PLoS ONE 5(1): e8694. <doi:10.1371/journal.pone.0008694> Their graphical grammar however is a bit more complex then that of a regular x/y plots. The ‘ggalluvial’ package made a great job of translating that grammar into ‘ggplot2’ syntax and gives you many option to tweak the appearance of an alluvial plot, however there still remains a multi-layered complexity that makes it difficult to use ‘ggalluvial’ for explorative data analysis. ‘easyalluvial’ provides a simple interface to this package that allows you to produce a decent alluvial plot from any dataframe in either long or wide format from a single line of code while also handling continuous data. It is meant to allow a quick visualisation of entire dataframes with a focus on different colouring options that can make alluvial plots a great tool for data exploration.
easycsv Load Multiple ‘csv’ and ‘txt’ Tables
Allows users to easily read multiple comma separated tables and create a data frame under the same name. Is able to read multiple comma separated tables from a local directory, a zip file or a zip file on a remote directory.
easyDes An Easy Way to Descriptive Analysis
Descriptive analysis is essential for publishing medical articles. This package provides an easy way to conduct the descriptive analysis. 1. Both numeric and factor variables can be handled. For numeric variables, normality test will be applied to choose the parametric and nonparametric test. 2. Both two or more groups can be handled. For groups more than two, the post hoc test will be applied, ‘Tukey’ for the numeric variables and ‘FDR’ for the factor variables. 3. ANOVA or Fisher test can be forced to apply.
easyformatr Tools for Building Formats
Builds format strings for both times and numbers.
easyml Easily Build and Evaluate Machine Learning Models
Easily build and evaluate machine learning models on a dataset. Machine learning models supported include penalized linear models, penalized linear models with interactions, random forest, support vector machines, neural networks, and deep neural networks.
EasyMx Easy Model-Builder Functions for OpenMx
Utilities for building certain kinds of common matrices and models in the extended structural equation modeling package, OpenMx.
easyNCDF Tools to Easily Read/Write NetCDF Files into/from Multidimensional R Arrays
Set of wrappers for the ‘ncdf4’ package to simplify and extend its reading/writing capabilities into/from multidimensional R arrays.
easypackages Easy Loading and Installing of Packages
Easily load and install multiple packages from different sources, including CRAN and GitHub. The libraries function allows you to load or attach multiple packages in the same function call. The packages function will load one or more packages, and install any packages that are not installed on your system (after prompting you). Also included is a from_import function that allows you to import specific functions from a package into the global environment.
easypower Sample Size Estimation for Experimental Designs
Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package.
easyreg Easy Regression
Performs analysis of regression in simple designs with quantitative treatments, including mixed models and non linear models. Plot graphics (equations and data).
easySdcTable Easy Interface to the Statistical Disclosure Control Package ‘sdcTable’
The main function, ProtectTable(), performs table suppression according to a frequency rule with a data set as the only required input. Within this function, protectTable() or protectLinkedTables() in package ‘sdcTable’ is called. Lists of level-hierarchy (parameter ‘dimList’) and other required input to these functions are created automatically.
easySVG An Easy SVG Basic Elements Generator
This SVG elements generator can easily generate SVG elements such as rect, line, circle, ellipse, polygon, polyline, text and group. Also, it can combine and output SVG elements into a SVG file.
easyVerification Ensemble Forecast Verification for Large Datasets
Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large datasets. The forecast metrics are imported from the ‘SpecsVerification’ package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package.
EBASS Sample Size Calculation Method for Cost-Effectiveness Studies Based on Expected Value of Perfect Information
We propose a new sample size calculation method for trial-based cost-effectiveness analyses. Our strategy is based on the value of perfect information that would remain after the completion of the study.
ebmc Ensemble-Based Methods for Class Imbalance Problem
Four ensemble-based methods (SMOTEBoost, RUSBoost, UnderBagging, and SMOTEBagging) for class imbalance problem are implemented for binary classification. Such methods adopt ensemble methods and data re-sampling techniques to improve model performance in presence of class imbalance problem. One special feature offers the possibility to choose multiple supervised learning algorithms to build weak learners within ensemble models. References: Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer (2003) <doi:10.1007/978-3-540-39804-2_12>, Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano (2010) <doi:10.1109/TSMCA.2009.2029559>, R. Barandela, J. S. Sanchez, R. M. Valdovinos (2003) <doi:10.1007/s10044-003-0192-z>, Shuo Wang and Xin Yao (2009) <doi:10.1109/CIDM.2009.4938667>, Yoav Freund and Robert E. Schapire (1997) <doi:10.1006/jcss.1997.1504>.
EBPRS Derive Polygenic Risk Score Based on Emprical Bayes Theory
EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2.
EBrank Empirical Bayes Ranking
Empirical Bayes ranking applicable to parallel-estimation settings where the estimated parameters are asymptotically unbiased and normal, with known standard errors. A mixture normal prior for each parameter is estimated using Empirical Bayes methods, subsequentially ranks for each parameter are simulated from the resulting joint posterior over all parameters (The marginal posterior densities for each parameter are assumed independent). Finally, experiments are ordered by expected posterior rank, although computations minimizing other plausible rank-loss functions are also given.
ECctmc Simulation from Endpoint-Conditioned Continuous Time Markov Chains
Draw sample paths for endpoint-conditioned continuous time Markov chains via modified rejection sampling or uniformization.
ecd Elliptic Distribution Based on Elliptic Curves
An implementation of the univariate elliptic distribution and elliptic option pricing model. It provides detailed functionality and data sets for the distribution and modelling. Especially, it contains functions for the computation of density, probability, quantile, fitting procedures, option prices, volatility smile. It also comes with sample financial data, and plotting routines.
ecdfHT Empirical CDF for Heavy Tailed Data
Computes and plots a transformed empirical CDF (ecdf) as a diagnostic for heavy tailed data, specifically data with power law decay on the tails. Routines for annotating the plot, comparing data to a model, fitting a nonparametric model, and some multivariate extensions are given.
ECharts2Shiny Embedding Charts Generated with ECharts Library into Shiny Applications
With this package, users can embed interactive charts to their Shiny applications. These charts will be generated by ECharts library developed by Baidu (http://echarts.baidu.com ). Current version support line charts, bar charts, pie charts and gauge.
echarts4r Create Interactive Graphs with ‘Echarts JavaScript’ Version 4
Easily create interactive charts by leveraging the ‘Echarts Javascript’ library which includes 33 chart types, themes, ‘Shiny’ proxies and animations.
ecm Build Error Correction Models
Functions for easy building of error correction models (ECM) for time series regression.
ecmwfr Interface to the ‘ECMWF’ Data Web Services
Programmatic interface to the ‘ECMWF’ public dataset web services (<https://…/> ). Allows for easy downloads of climate data directly to your R work space or your computer.
ECoL Complexity Measures for Classification Problems
Provides measures to characterize the complexity of classification problems based on the ambiguity and the separation between the classes and the data sparsity and dimensionality of the datasets. This package provides bug fixes, generalizations and implementations of many state of the art measures. The measures are described in the paper: Tin Ho and Mitra Basu (2002) <doi:10.1109/34.990132>.
ecolottery Coalescent-Based Simulation of Ecological Communities
Coalescent-Based Simulation of Ecological Communities as proposed by Munoz et al. (2017) <doi:10.13140/RG.2.2.31737.26728>. The package includes a tool for estimating parameters of community assembly by using Approximate Bayesian Computation.
EcoMem An R package for quantifying ecological memory
An R package for quantifying ecological memory functions using common environmental time series data (continuous, count, proportional) applying a Bayesian hierarchical framework. The package estimates memory functions for continuous and binary (e.g., disturbance chronology) variables making no a priori assumption on the form of the functions. EcoMem allows users to quantify ecological memory for a wide range of ecosystem processes and responses. The utility of the package to advance understanding of the memory of ecosystems to environmental drivers is demonstrated using a simulated dataset and a case study assessing the memory of boreal tree growth to insect defoliation.
EconDemand General Analysis of Various Economics Demand Systems
Tools for general properties including price, quantity, elasticity, convexity, marginal revenue and manifold of various economics demand systems including Linear, Translog, CES, LES and CREMR.
econullnetr Null Model Analysis for Ecological Networks
Tools for using null models to analyse ecological networks (e.g. food webs, flower-visitation networks, seed-dispersal networks) and detect resource preferences or non-random interactions among network nodes. Tools are provided to run null models, test for and plot preferences, plot and analyse bipartite networks, and export null model results in a form compatible with other network analysis packages. The underlying null model was developed by Agusti et al. (2003) <doi:10.1046/j.1365-294X.2003.02014.x> and the full application to ecological networks by Vaughan et al. (2017) econullnetr: an R package using null models to analyse the structure of ecological networks and identify resource selection. Methods in Ecology & Evolution, in press.
ECOSolveR Embedded Conic Solver in R
R interface to the Embedded COnic Solver (ECOS) for convex problems. Conic and equality constraints can be specified in addition to mixed integer problems.
ecp Nonparametric Multiple Change-Point Analysis of Multivariate Data
Implements hierarchical procedures to find multiple change-points through the use of U-statistics. The procedures do not make any distributional assumptions other than the existence of certain absolute moments. Both agglomerative and divisive procedures are included. These methods return the set of estimated change-points as well as other summary information.
ecr Evolutionary Computing in R
Provides a powerful framework for evolutionary computing in R. The user can easily construct powerful evolutionary algorithms for tackling both single- and multi-objective problems by plugging in different predefined evolutionary building blocks, e. g., operators for mutation, recombination and selection with just a few lines of code. Your problem cannot be easily solved with a standard EA which works on real-valued vectors, permutations or binary strings? No problem, ‘ecr’ has been developed with that in mind. Extending the framework with own operators is also possible. Additionally there are various comfort functions, like monitoring, logging and more.
ed50 Estimate ED50 and Its Confidence Interval
Functions of five estimation method for ED50 (50 percent effective dose) are provided, and they are respectively Dixon-Mood method (1948) <doi:10.2307/2280071>, Choi’s original turning point method (1990) <doi:10.2307/2531453> and it’s modified version given by us, as well as logistic regression and isotonic regression. Besides, the package also supports comparison between two estimation results.
eda4treeR Experimental Design and Analysis for Tree Improvement
Provides data sets and R Codes for Williams, E.R., Matheson, A.C. and Harwood, C.E. (2002). Experimental Design and Analysis for Tree Improvement, CSIRO Publishing.
edarf Exploratory Data Analysis using Random Forests
Functions useful for exploratory data analysis using random forests which can be used to compute multivariate partial dependence, observation, class, and variable-wise marginal and joint permutation importance as well as observation-specific measures of distance (supervised or unsupervised). All of the aforementioned functions are accompanied by ‘ggplot2’ plotting functions.
edci Edge Detection and Clustering in Images
Detection of edge points in images based on the difference of two asymmetric M-kernel estimators. Linear and circular regression clustering based on redescending M-estimators. Detection of linear edges in images.
edeaR Exploratory and Descriptive Event-Based Data Analysis
Functions for exploratory and descriptive analysis of event based data. Can be used to import and export xes-files, the IEEE eXtensible Event Stream standard. Provides methods for describing and selecting process data.
edesign Maximum Entropy Sampling
An implementation of maximum entropy sampling for spatial data is provided. An exact branch-and-bound algorithm as well as greedy and dual greedy heuristics are included.
edfun Creating Empirical Distribution Functions
Easily creating empirical distribution functions from data: ‘dfun’, ‘pfun’, ‘qfun’ and ‘rfun’.
edgeCorr Spatial Edge Correction
Facilitates basic spatial edge correction to point pattern data.
editData RStudio’ Addin for Editing a ‘data.frame’
An ‘RStudio’ addin for editing a ‘data.frame’ or a ‘tibble’. You can delete, add or update a ‘data.frame’ without coding. You can get resultant data as a ‘tibble’ or ‘data.frame’.
editheme Palettes and graphics matching your RStudio editor
The package editheme provides a collection of color palettes designed to match the different themes available in RStudio. It also includes functions to customize ‘base’ and ‘ggplot2’ graphs styles in order to harmonize the look of your favorite IDE.
EditImputeCont Simultaneous Edit-Imputation for Continuous Microdata
An integrated editing and imputation method for continuous microdata under linear constraints is implemented. It relies on a Bayesian nonparametric hierarchical modeling approach in which the joint distribution of the data is estimated by a flexible joint probability model. The generated edit-imputed data are guaranteed to satisfy all imposed edit rules, whose types include ratio edits, balance edits and range restriction
editR A Rmarkdown editor with instant preview
editR is a basic Rmarkdown editor with instant previewing of your document. It allows you to create and edit Rmarkdown documents while instantly previewing the result of your writing and coding. It also allows you to render your Rmarkdown file in any format permitted by the rmarkdown R package.
eDMA Dynamic Model Averaging with Grid Search
Perform dynamic model averaging with grid search as in Dangl and Halling (2012) <doi:10.1016/j.jfineco.2012.04.003> using parallel computing.
edmcr Tools to Complete Euclidean Distance Matrices
Implements the Euclidean distance matrix completion algorithms of Alfakih, Khandani, and Wolkowicz (1999) <doi:10.1023/A:1008655427845>, Trosset (2000) <doi:10.1023/A:1008722907820>, Fang and O’Leary (2012) <doi:10.1080/10556788.2011.643888>, and Rahman and Oldford (2017) <arXiv:1610.06599> the Sensor Network Localization Algorithm of Krislock and Wolkowicz (2010) <doi:10.1137/090759392>, and the molecular reconstruction algorithm of Alipanahi (2011).
EDMeasure Dependence Measures via Energy Statistics
Implementations of (1) mutual dependence measures and mutual independence tests in Jin, Z., and Matteson, D. S. (2017) <arXiv:1709.0253>; (2) independent component analysis methods based on mutual dependence measures in Jin, Z., and Matteson, D. S. (2017) <arXiv:1709.0253> and Pfister, N., et al. (2018) <doi:10.1111/rssb.12235>; (3) conditional mean dependence measures and conditional mean independence tests in Shao, X., and Zhang, J. (2014) <doi:10.1080/01621459.2014.887012> and Park, T., et al. (2015) <doi:10.1214/15-EJS1047>.
edpclient Empirical Data Platform Client
R client for Empirical Data Platform. More information is at <https://empirical.com>. For support, contact support@empirical.com.
edstan Stan Models for Item Response Theory
Provides convenience functions and pre-programmed Stan models related to item response theory. Its purpose is to make fitting common item response theory models using Stan easy.
eefAnalytics Analysing Education Trials
Provides tools for analysing education trials. Making different methods accessible in a single place is essential for sensitivity analysis of education trials, particularly the implication of the different methods in analysing simple randomised trials, cluster randomised trials and multisite trials.
eel Extended Empirical Likelihood
Compute the extended empirical log likelihood ratio (Tsao & Wu, 2014) for the mean and parameters defined by estimating equations.
eesim Simulate and Evaluate Time Series for Environmental Epidemiology
Provides functions to create simulated time series of environmental exposures (e.g., temperature, air pollution) and health outcomes for use in power analysis and simulation studies in environmental epidemiology. This package also provides functions to evaluate the results of simulation studies based on these simulated time series. This work was supported by a grant from the National Institute of Environmental Health Sciences (R00ES022631) and a fellowship from the Colorado State University Programs for Research and Scholarly Excellence.
EFAutilities Utility Functions for Exploratory Factor Analysis
A number of utility function for exploratory factor analysis are included in this package. In particular, it computes standard errors for parameter estimates and factor correlations under a variety of conditions.
effectFusion Bayesian Effect Fusion for Categorical Predictors
Variable selection and Bayesian effect fusion for categorical predictors in linear regression models. Effect fusion aims at the question which categories have a similar effect on the response and therefore can be fused to obtain a sparser representation of the model. Effect fusion and variable selection can be obtained either with a prior that has an interpretation as spike and slab prior on the level effect differences or with a sparse finite mixture prior on the level effects. The regression coefficients are estimated with a flat uninformative prior after model selection or model averaged. For posterior inference, an MCMC sampling scheme is used that involves only Gibbs sampling steps.
EffectLiteR Average and Conditional Effects
Use structural equation modeling to estimate average and conditional effects of a treatment variable on an outcome variable, taking into account multiple continuous and categorical covariates.
effectsizescr Indices for Single-Case Research
Parametric and nonparametric statistics for single-case design. Regarding nonparametric statistics, the index suggested by Parker, Vannest, Davis and Sauber (2011) <doi:10.1016/j.beth.2010.08.006> was included. It combines both nonoverlap and trend to estimate the effect size of a treatment in a single case design.
EffectStars Visualization of Categorical Response Models
The package provides functions to visualize regression models with categorical response. The effects of the covariates are plotted with star plots in order to allow for an optical impression of the fitted model.
EffectTreat Prediction of Therapeutic Success
In personalized medicine, one wants to know, for a given patient and his or her outcome for a predictor (pre-treatment variable), how likely it is that a treatment will be more beneficial than an alternative treatment. This package allows for the quantification of the predictive causal association(i.e., the association between the predictor variable and the individual causal effect of the treatment) and related metrics.
EfficientMaxEigenpair Efficient Initials for Computing the Maximal Eigenpair
An implementation for using efficient initials to compute the maximal eigenpair in R. It provides two algorithms to find the efficient initials under two cases: the tridiagonal matrix case and the general matrix case. Besides, it also provides algorithms for the next to the maximal eigenpair under these two cases.
efflog The Causal Effects for a Causal Loglinear Model
Fitting a causal loglinear model and calculating the causal effects for a causal loglinear model with the multiplicative interaction or without the multiplicative interaction, obtaining the natural direct, indirect and the total effect. It calculates also the cell effect, which is a new interaction effect.
EFS Tool for Ensemble Feature Selection
Provides a function to check the importance of a feature based on a dependent classification variable. An ensemble of correlation and importance measure tests are used to determine the normed importance value of all features. Combining these methods in one function (building the sum of the importance values) leads to a better tool for selecting most important features. This selection can also be viewed in a barplot using the barplot_fs() function and proved using an also provided function for a logistic regression model, namely logreg_test().
efts High-Level Functions to Read and Write Ensemble Forecast Time Series in netCDF
The binary file format ‘netCDF’ is developed primarily for climate, ocean and meteorological data, and ‘efts’ is a package to read and write Ensemble Forecast Time Series data in ‘netCDF’. ‘netCDF’ has traditionally been used to store time slices of gridded data, rather than complete time series of point data. ‘efts’ facilitates data handling stored in ‘netCDF’ files that follow a convention devised in the domain of ensemble hydrologic forecasting, but possibly applicable in other domains. ‘efts’ uses reference class objects to provide a high level interface to read and write such data, wrapping lower level operations performed using ‘ncdf4’.
EGAnet Exploratory Graph Analysis: A Framework for Estimating the Number of Dimensions in Multivariate Data Using Network Psychometrics
An implementation of the Exploratory Graph Analysis (EGA) framework for dimensionality assessment. EGA is part of a new area called network psychometrics that focuses on the estimation of undirected network models in psychological datasets. EGA estimates the number of dimensions or factors using graphical lasso or Triangulated Maximally Filtered Graph (TMFG) and a weighted network community analysis. A bootstrap method for verifying the stability of the estimation is also available. The fit of the structure suggested by EGA can be verified using confirmatory factor analysis and a direct way to convert the EGA structure to a confirmatory factor model is also implemented. Documentation and examples are available. Golino, H. F., & Epskamp, S. (2017) <doi:10.1371/journal.pone.0174035>. Golino, H. F., & Demetriou, A. (2017) <doi:10.1016/j.intell.2017.02.007> Golino, H., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. D., Sadana, R., & Thiyagarajan, J. A. (2018) <doi:10.31234/osf.io/gzcre>. Christensen, A. P. & Golino, H.F. (2019) <doi:10.31234/osf.io/9deay>.
egcm Engle-Granger Cointegration Models
An easy-to-use implementation of the Engle-Granger two-step procedure for identifying pairs of cointegrated series. It is geared towards the analysis of pairs of securities. Summary and plot functions are provided, and the package is able to fetch closing prices of securities from Yahoo. A variety of unit root tests are supported, and an improved unit root test is included.
egor Import and Analyse Ego-Centered Network Data
Tools for importing, analyzing and visualizing ego-centered network data. Supports several data formats, including the export formats of ‘EgoNet’, ‘EgoWeb 2.0’ and ‘openeddi’. An interactive (shiny) app for the intuitive visualization of ego-centered networks is provided. Also included are procedures for creating and visualizing Clustered Graphs (Lerner 2008 <DOI:10.1109/PACIFICVIS.2008.4475458>).
egoTERGM Estimation of Ego-Temporal Exponential Random Graph Models via Expectation Maximization (EM)
Estimation of ego-temporal exponential random graph models with two-stage estimation including initialization through k-means clustering on temporal exponential random graph model parameters and EM as per Campbell (2018) <doi:10.7910/DVN/TWHEZ9>.
eha Event History Analysis
Sampling of risk sets in Cox regression, selections in the Lexis diagram, bootstrapping. Parametric proportional hazards fitting with left truncation and right censoring for common families of distributions, piecewise constant hazards, and discrete models. AFT regression for left truncated and right censored data.
EHRtemporalVariability Delineating Reference Changes in Electronic Health Records over Time
The ‘EHRtemporalVariability’ package contains functions to delineate reference changes over time in Electronic Health Records through the projection and visualization of dissimilarities among data temporal batches. This is done through the estimation of data statistical distributions over time and their projection in non-parametric statistical manifolds uncovering the patterns of the data latent temporal variability. Results can be explored through visual analytics formats such as Data Temporal heatmaps and Information Geometric Temporal (IGT) plots. An additional ‘EHRtemporalVariability’ Shiny app can be used to load and explore the package results and even to allow the use of these functions to those users non-experienced in R coding.
eiPartialID Ecological Regression with Partial Identification
Estimate district-level bounds for 2×2 ecological inference based on the approach described in the forthcoming article Jiang et al. (2019), ‘Ecological Regression with Partial Identification’, Political Analysis. Interval data regression is used to bound the nonidentified regression parameter in a linear contextual effects model, from which district-level bounds are derived. The approach here can be useful as a baseline of comparison for future work on ecological inference.
eivtools Measurement Error Modeling Tools
This includes functions for analysis with error-prone covariates, including deconvolution, latent regression and errors-in-variables regression. It implements methods by Rabe-Hesketh et al. (2003) <doi:10.1191/1471082x03st056oa>, Lockwood and McCaffrey (2014) <doi:10.3102/1076998613509405>, and Lockwood and McCaffrey (2017) <doi:10.1007/s11336-017-9556-y>, among others.
EKMCMC MCMC Procedures for Estimating Enzyme Kinetics Constants
Functions for estimating catalytic constant and Michaelis-Menten constant for enzyme kinetics model using Metropolis-Hasting algorithm within Gibbs sampler based on the Bayesian framework. Additionally, a function to create plot to identify the goodness-of-fit is included.
elasso Enhanced Least Absolute Shrinkage Operator
Performs some enhanced variable selection algorithms based on least absolute shrinkage operator for regression model.
elasticsearchr A Lightweight Interface for Interacting with Elasticsearch from R
A lightweight R interface to ‘Elasticsearch’ – a NoSQL search-engine and column store database (see <https://…/elasticsearch> for more information). This package implements a simple Domain-Specific Language (DSL) for indexing, deleting, querying, sorting and aggregating data using ‘Elasticsearch’.
elect Estimation of Life Expectancies Using Multi-State Models
Functions to compute state-specific and marginal life expectancies. The computation is based on a fitted continuous-time multi-state model that includes an absorbing death state; see Van den Hout (2017, ISBN:9781466568402). The fitted multi-state model model should be estimated using the ‘msm’ package using age as the time-scale.
electoral Allocating Seats Methods and Party System Scores
Highest averages & largest remainders allocating seats methods and several party system scores. Implemented highest averages allocating seats methods are D’Hondt, Webster, Danish, Imperiali, Hill-Huntington, Dean, Modified Sainte-Lague, equal proportions and Adams. Implemented largest remainders allocating seats methods are Hare, Droop, Hangenbach-Bischoff, Imperial, modified Imperial and quotas & remainders. The main advantage of this package is that ties are always reported and not incorrectly allocated. Party system scores provided are competitiveness, concentration, effective number of parties, party nationalization score, party system nationalization score and volatility. References. Gallagher (1991) <doi:10.1016/0261-3794(91)90004-C>. Norris (2004, ISBN:0-521-82977-1). Consejo Nacional Electoral del Ecuador (2014)<http://…/CAPITULO%206%20web.pdf>. Laakso & Taagepera (1979) <http://…/001041407901200101>. Jones & Mainwaring (2003) <https://…/304_0.pdf>. Pedersen (1979) <http://…/Pedersen.htm>.
elevatr Access Elevation Data from Various APIs
Several web services are available that provide access to elevation data. This package provides access to several of those services and returns elevation data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service <https://…/>, Mapzen Terrain Service <https://…/>, Amazon Web Services Terrain Tiles <https://…/> and the USGS Elevation Point Query Service <http://…/>.
elhmc Sampling from a Empirical Likelihood Bayesian Posterior of Parameters Using Hamiltonian Monte Carlo
A tool to draw samples from a Empirical Likelihood Bayesian posterior of parameters using Hamiltonian Monte Carlo.
ellipsis Tools for Working with …
In S3 generics, it’s useful to take … so that methods can have additional argument. But this flexibility comes at a cost: misspelled arguments will be silently ignored. The ellipsis packages is an experiment that allows a generic to warn if any arguments passed in … are not used.
elmNNRcpp The Extreme Learning Machine Algorithm
Training and predict functions for Single Hidden-layer Feedforward Neural Networks (SLFN) using the Extreme Learning Machine (ELM) algorithm. The ELM algorithm differs from the traditional gradient-based algorithms for very short training times (it doesn’t need any iterative tuning, this makes learning time very fast) and there is no need to set any other parameters like learning rate, momentum, epochs, etc. This is a reimplementation of the ‘elmNN’ package using ‘RcppArmadillo’ after the ‘elmNN’ package was archived. For more information, see ‘Extreme learning machine: Theory and applications’ by Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew (2006), Elsevier B.V, <doi:10.1016/j.neucom.2005.12.126>.
ELMR Extreme Machine Learning (ELM)
Training and prediction functions are provided for the Extreme Learning Machine algorithm (ELM). The ELM use a Single Hidden Layer Feedforward Neural Network (SLFN) with random generated weights and no gradient-based backpropagation. The training time is very short and the online version allows to update the model using small chunk of the training set at each iteration. The only parameter to tune is the hidden layer size and the learning function.
ELMSurv Extreme Learning Machine for Survival Analysis
We use the Buckley-James method to impute the data and extend the emerging Extreme Learning Machine approach to survival analysis. Currently, only right censored data are supported. For a detailed information, see the paper by Hong Wang, Jianxin Wang and Lifeng Zhou (2017) <https://…/elmsurv-revised.pdf>, which will appear in Applied Intelligence <https://…/10489> soon.
elo Elo Ratings
A flexible framework for calculating Elo ratings and resulting rankings of any two-team-per-matchup system (chess, sports leagues, ‘Go’, etc.). This implementation is capable of evaluating a variety of matchups, Elo rating updates, and win probabilities, all based on the basic Elo rating system.
EloChoice Preference Rating for Visual Stimuli Based on Elo Ratings
Allows calculating global scores for characteristics of visual stimuli. Stimuli are presented as sequence of pairwise comparisons (‘contests’), during each of which a rater expresses preference for one stimulus over the other. The algorithm for calculating global scores is based on Elo rating, which updates individual scores after each single pairwise contest. Elo rating is widely used to rank chess players according to their performance. Its core feature is that dyadic contests with expected outcomes lead to smaller changes of participants’ scores than outcomes that were unexpected. As such, Elo rating is an efficient tool to rate individual stimuli when a large number of such stimuli are paired against each other in the context of experiments where the goal is to rank stimuli according to some characteristic of interest.
EloOptimized Optimized Elo Rating Method for Obtaining Dominance Ranks
Provides an implementation of the maximum likelihood methods for deriving Elo scores as published in Foerster, Franz et al. (2016) <DOI:10.1038/srep35404>.
elpatron Bicycling Data Analysis with R
Functions to facilitate cycling analysis within the R environment.
EMAtools Data Management Tools for Real-Time Monitoring/Ecological Momentary Assessment Data
Do data management functions common in real-time monitoring (also called: ecological momentary assessment, experience sampling, micro-longitudinal) data, including centering on participant means and merging event-level data into momentary data sets where you need the events to correspond to the nearest data point in the momentary data. This is VERY early release software, and more features will be added over time.
EMbC Expectation-Maximization Binary Clustering
Unsupervised, multivariate, clustering algorithm yielding a meaningful binary clustering taking into account the uncertainty in the data. A specific constructor for trajectory movement analysis yields behavioural annotation of the tracks based on estimated local measures of velocity and turning angle, eventually with solar position covariate as a daytime indicator.
embed Extra Recipes for Encoding Categorical Predictors
Factor predictors can be converted to one or more numeric representations using simple generalized linear models <arXiv:1611.09477> or nonlinear models <arXiv:1604.06737>. All encoding methods are supervised.
EMCC Evolutionary Monte Carlo (EMC) Methods for Clustering
Evolutionary Monte Carlo methods for clustering, temperature ladder construction and placement. This package implements methods introduced in Goswami, Liu and Wong (2007) <doi:10.1198/106186007X255072>. The paper above introduced probabilistic genetic-algorithm-style crossover moves for clustering. The paper applied the algorithm to several clustering problems including Bernoulli clustering, biological sequence motif clustering, BIC based variable selection, mixture of Normals clustering, and showed that the proposed algorithm performed better both as a sampler and as a stochastic optimizer than the existing tools, namely, Gibbs sampling, “split-merge” Metropolis-Hastings algorithm, K-means clustering, and the MCLUST algorithm (in the package ‘mclust’).
Emcdf Computation and Visualization of Empirical Joint Distribution (Empirical Joint CDF)
Computes and visualizes empirical joint distribution of multivariate data with optimized algorithms and multi-thread computation. There is a faster algorithm using dynamic programming to compute the whole empirical joint distribution of a bivariate data. There are optimized algorithms for computing empirical joint CDF function values for other multivariate data. Visualization is focused on bivariate data. Levelplots and wireframes are included.
emdi Estimating and Mapping Disaggregated Indicators
Functions that support estimating, assessing and mapping regional disaggregated indicators. So far, estimation methods comprise the model-based approach Empirical Best Prediction (see ‘Small area estimation of poverty indicators’ by Molina and Rao (2010)<doi:10.1002/cjs.10051>), as well as their precision estimates. The assessment of the used model is supported by a summary and diagnostic plots. For a suitable presentation of estimates, map plots can be easily created. Furthermore, results can easily be exported to excel.
emhawkes Exponential Multivariate Hawkes Model
Simulate and fitting exponential multivariate Hawkes model. This package simulates a multivariate Hawkes model, introduced by Hawkes (1971) <doi:10.1093/biomet/58.1.83>, with an exponential kernel and fits the parameters from the data. Models with the constant parameters, as well as complex dependent structures, can also be simulated and estimated. The estimation is based on the maximum likelihood method, introduced by introduced by Ozaki (1979) <doi:10.1007/BF02480272>, with ‘maxLik’ package.
emil Evaluation of Modeling without Information Leakage
A toolbox for designing and evaluating predictive models with resampling methods. The aim of this package is to provide a simple and efficient general framework for working with any type of prediction problem, be it classification, regression or survival analysis, that is easy to extend and adapt to your specific setting. Some commonly used methods for classification, regression and survival analysis are included.
emIRT EM Algorithms for Estimating Item Response Theory Models
Various Expectation-Maximization (EM) algorithms are implemented for item response theory (IRT) models. The current implementation includes IRT models for binary and ordinal responses, along with dynamic and hierarchical IRT models with binary responses. The latter two models are derived and implemented using variational EM.
eMLEloglin Fitting log-Linear Models in Sparse Contingency Tables
Log-linear modeling is a popular method for the analysis of contingency table data. When the table is sparse, the data can fall on the boundary of the convex support, and we say that ‘the MLE does not exist’ in the sense that some parameters cannot be estimated. However, an extended MLE always exists, and a subset of the original parameters will be estimable. The ‘eMLEloglin’ package determines which sampling zeros contribute to the non-existence of the MLE. These problematic zero cells can be removed from the contingency table and the model can then be fit (as far as is possible) using the glm() function.
emmeans Estimated Marginal Means, aka Least-Squares Means
Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and compact letter displays. Least-squares means are discussed, and the term ‘estimated marginal means’ is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.
EMMIXcontrasts2 Contrasts in Mixed Effects for EMMIX Model with Random Effects 2
For forming contrasts in the mixed effects for mixtures of linear mixed models fitted to the gene profiles.
EMMIXcskew Fitting Mixtures of CFUST Distributions
Functions to fit finite mixture of multivariate canonical fundamental skew t (FM-CFUST) distributions, random sample generation, 2D and 3D contour plots.
EMMIXmfa Mixture Models with Component-Wise Factor Analyzers
We provide functions to fit finite mixtures of multivariate normal or t-distributions to data with various factor analytic structures adopted for the covariance/scale matrices. The factor analytic structures available include mixtures of factor analyzers and mixtures of common factor analyzers. The latter approach is so termed because the matrix of factor loadings is common to components before the component-specific rotation of the component factors to make them white noise. Note that the component-factor loadings are not common after this rotation. Maximum likelihood estimators of model parameters are obtained via the Expectation-Maximization algorithm. See descriptions of the algorithms used in McLachlan GJ, Peel D (2000) <doi:10.1002/0471721182.ch8> McLachlan GJ, Peel D (2000) <ISBN:1-55860-707-2> McLachlan GJ, Peel D, Bean RW (2003) <doi:10.1016/S0167-9473(02)00183-4> McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007) <doi:10.1016/j.csda.2006.09.015> Baek J, McLachlan GJ, Flack LK (2010) <doi:10.1109/TPAMI.2009.149> Baek J, McLachlan GJ (2011) <doi:10.1093/bioinformatics/btr112> McLachlan GJ, Baek J, Rathnayake SI (2011) <doi:10.1002/9781119995678.ch9>.
EMMLi A Maximum Likelihood Approach to the Analysis of Modularity
Fit models of modularity to morphological landmarks. Perform model selection on results. Fit models with a single within-module correlation or with separate within-module correlations fitted to each module.
emojifont Emoji Fonts for using in R
An implementation of using emoji font in both base and ‘ggplot2’ graphics.
empichar Evaluates the Empirical Characteristic Function for Multivariate Samples
Evaluates the empirical characteristic function of univariate and multivariate samples. This package uses ‘RcppArmadillo’ for fast evaluation. It is also possible to export the code to be used in other packages at ‘C++’ level.
empirical Empirical Probability Density Functions and Empirical Cumulative Distribution Functions
Implements empirical probability density functions (continuous functions) and empirical cumulative distribution functions (step functions or continuous). Currently, univariate only.
ems Epimed Solutions Collection for Data Editing, Analysis, and Benchmarking of Health Units
Collection of functions for data analysis and editing. Most of them are related to benchmarking with prediction models.
EMSaov The Analysis of Variance with EMS
The analysis of variance table including the expected mean squares (EMS) for various types of experimental design is provided. When some variables are random effects or we use special experimental design such as nested design, repeated-measures design, or split-plot design, it is not easy to find the appropriate test, especially denominator for F-statistic which depends on EMS.
EMSC Extended Multiplicative Signal Correction
Background correction of spectral like data. Handles variations in scaling, polynomial baselines and interferents. Parameters for corrections are stored for further analysis, and spectra are corrected accordingly.
EMSHS EM Algorithm for Bayesian Shrinkage Approach with Structural Information Incorporated
Fits a Bayesian shrinkage regression model that can incorporate structural information. Changgee Chang, Suprateek Kundu, Qi Long (2018) <doi:10.1111/biom.12882>.
EMSNM EM Algorithm for Sigmoid Normal Model
It provides a method based on EM algorithm to estimate the parameter of a mixture model, Sigmoid-Normal Model, where the samples come from several normal distributions (also call them subgroups) whose mean is determined by co-variable Z and coefficient alpha while the variance are homogeneous. Meanwhile, the subgroup each item belongs to is determined by co-variables X and coefficient eta through Sigmoid link function which is the extension of Logistic Link function. It uses bootstrap to estimate the standard error of parameters. When sample is indeed separable, removing estimation with abnormal sigma, the estimation of alpha is quite well. I used this method to explore the subgroup structure of HIV patients and it can be used in other domains where exists subgroup structure.
emstreeR Fast Computing Euclidean Minimum Spanning Trees
Computes an Euclidean Minimum Spanning Tree using the Dual-Tree Boruvka algorithm (March, Ram, Gray, 2010, doi:<10.1145/1835804.1835882>) implemented in ‘mlpack’ – the C++ Machine Learning library (Curtin, 2005, <doi:10.21105/joss.00726>). ’emstreeR’ works as a wrapper so that R users can benefit from the fast C++ function for computing an Euclidean Minimum Spanning Tree without touching the C++ code. The package also provides functions and an S3 method for readily plotting the Minimum Spanning Trees (MST) using either ‘base’ R, ‘scatterplot3d’ or ‘ggplot2’ style.
emuR Main Package of the EMU Speech Database Management System
Provides the next iteration of the EMU Speech Database Management System (EMU_SDMS) with database management, data extraction, data preparation and data visualization facilities.
EMVS The Expectation-Maximization Approach to Bayesian Variable Selection
An efficient expectation-maximization algorithm for fitting Bayesian spike-and-slab regularization paths for linear regression. Rockova and George (2014) <doi:10.1080/01621459.2013.869223>.
enc Portable Tools for ‘UTF-8’ Character Data
Implements an S3 class for storing ‘UTF-8’ strings, based on regular character vectors. Also contains routines to portably read and write ‘UTF-8’ encoded text files, to convert all strings in an object to ‘UTF-8’, and to create character vectors with various encodings.
encode Represent Ordered Lists and Pairs as Strings
Interconverts between ordered lists and compact string notation. Useful for capturing code lists, and pair-wise codes and decodes, for text storage. Analogous to factor levels and labels. Generics ‘encode’ and ‘decode’ perform interconversion, while ‘codes’ and ‘decodes’ extract components of an encoding. The function ‘encoded’ checks whether something is interpretable as an encoding.
encryptr Easily Encrypt and Decrypt Data Frame or Tibble Columns using RSA Public/Private Keys
It is important to ensure that sensitive data is protected. This straightforward package is aimed at the end-user. Strong RSA encryption using a public/private key pair is used to encrypt data frame or tibble columns. A public key can be shared to allow others to encrypt data to be sent to you. This is particularly aimed a healthcare settings so patient data can be pseudonymised.
endogenous Classical Simultaneous Equation Models
Likelihood-based approaches to estimate linear regression parameters and treatment effects in the presence of endogeneity. Specifically, this package includes James Heckman’s classical simultaneous equation models-the sample selection model for outcome selection bias and hybrid model with structural shift for endogenous treatment. For more information, see the seminal paper of Heckman (1978) <DOI:10.3386/w0177> in which the details of these models are provided. This package accommodates repeated measures on subjects with a working independence approach. The hybrid model further accommodates treatment effect modification.
endtoend Transmissions and Receptions in an End to End Network
Computes the expectation of the number of transmissions and receptions considering an End-to-End transport model with limited number of retransmissions per packet. It provides theoretical results and also estimated values based on Monte Carlo simulations.
EnergyOnlineCPM Distribution Free Multivariate Control Chart Based on Energy Test
Provides a function for distribution free control chart based on the change point model, for multivariate statistical process control. The main constituent of the chart is the energy test that focuses on the discrepancy between empirical characteristic functions of two random vectors. This new control chart highlights in three aspects. Firstly, it is distribution free, requiring no knowledge of the random processes. Secondly, this control chart can monitor mean and variance simultaneously. Thirdly it is devised for multivariate time series which is more practical in real data application. Fourthly, it is designed for online detection (Phase II), which is central for real time surveillance of stream data. For more information please refer to O. Okhrin and Y.F. Xu (2017) <https://…/CPM102.pdf>.
enetLTS Robust and Sparse Methods for High Dimensional Linear and
Logistic Regression
Fully robust versions of the elastic net estimator are introduced for linear and logistic regression, in particular high dimensional data by Kurnaz, Hoffmann and Filzmoser (2017) <DOI:10.1016/j.chemolab.2017.11.017>. The algorithm searches for outlier free subsets on which the classical elastic net estimators can be applied.
eNetXplorer Quantitative Exploration of Elastic Net Families for Generalized Linear Models
Provides a quantitative toolkit to explore elastic net families and to uncover correlates contributing to prediction under a cross-validation framework. Fits linear, binomial (logistic) and multinomial models. Candia J and Tsang JS (2018), (application note under review).
enpls Ensemble Partial Least Squares (EnPLS) Regression
R package for ensemble partial least squares regression, a unified framework for feature selection, outlier detection, and ensemble learning.
enrichwith Methods to Enrich R Objects with Extra Components
The enrichwith package provides the ‘enrich’ method to enrich list-like R objects with new, relevant components. The current version has methods for enriching objects of class ‘family’, ‘link-glm’ and ‘glm’. The resulting objects preserve their class, so all methods associated to them still apply. The package can also be used to produce customisable source code templates for the structured implementation of methods to compute new components.
EnsembleCV Extensible Package for Cross-Validation-Based Integration of Base Learners
This package extends the base classes and methods of EnsembleBase package for cross-validation-based integration of base learners. Default implementation calculates average of repeated CV errors, and selects the base learner / configuration with minimum average error. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. The package can be extended, e.g. by adding variants of the current implementation.
ensembleEN Ensembling Regularized Linear Models
Functions for computing the ensembles of regularized linear regression estimators defined in Christidis, Lakshmanan, Smucler and Zamar (2017) <arXiv:1712.03561>. The procedure works on top of a given penalized linear regression estimator, the Elastic Net in this implementation, by fitting it to possibly overlapping subsets of features, while at the same time encouraging diversity among the subsets, to reduce the correlations between the predictions that result from each fitted model. The predictions from the models are then aggregated.
EnsemblePCReg Extensible Package for Principal-Component-Regression-based Integration of Base Learners
This package extends the base classes and methods of EnsembleBase package for Principal-Components-Regression-based (PCR) integration of base learners. Default implementation uses cross-validation error to choose the optimal number of PC components for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package.
EnsemblePenReg Extensible Classes and Methods for Penalized-Regression-based Integration of Base Learners
Extending the base classes and methods of EnsembleBase package for Penalized-Regression-based (Ridge and Lasso) integration of base learners. Default implementation uses cross-validation error to choose the optimal lambda (shrinkage parameter) for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package.
ensembleR Ensemble Models in R
Functions to use ensembles of several machine learning models specified in caret package.
ensr Elastic Net SearcheR
Elastic net regression models are controlled by two parameters, lambda, a measure of shrinkage, and alpha, a metric defining the model’s location on the spectrum between ridge and lasso regression. glmnet provides tools for selecting lambda via cross validation but no automated methods for selection of alpha. Elastic Net SearcheR automates the simultaneous selection of both lambda and alpha. Developed, in part, with support by NICHD R03 HD094912.
EntropyExplorer Tools for Exploring Differential Shannon Entropy, Differential Coefficient of Variation and Differential Expression
Rows of two matrices are compared for Shannon entropy, coefficient of variation, and expression. P-values can be requested for all metrics.
envestigate R package to interrogate environments.
R package to interrogate environments. Scary, I know.
EnviroPRA Environmental Probabilistic Risk Assessment Tools
Methods to perform a Probabilistic Environmental Risk assessment from exposure to toxic substances – i.e. USEPA (1997) <https://…iding-principles-monte-carlo-analysis> -.
envnames Track User-Defined Environment Names
Set of functions to keep track of user-defined environment names which cannot be retrieved with the built-in function environmentName. The main function in this package for this purpose has a similar name: environment_name(), which returns the name of the environment given as parameter, be it a system, package, user-defined, or function execution environment. The package also provides additional functionality, the most important ones being: – A function (obj_find()) to search for objects. This function extends the functionality of exists() as follows: objects are searched recursively within all user-defined environments. – A way to get the stack of calling function names (get_fun_calling_chain()), in an easier way than the built-in function sys.call(), which requires further non-intuitive parsing of the output. – A function (get_obj_address()) to retrieve the memory address of an object. This package was inspired by an ‘R for developers’ course given by Andrea Spano from Quantide (<http://…/r-for-developers> ) and by a post by Gabor Grothendieck at the R-Help forum (<https://…/245646.html> ).
epandist Statistical Functions for the Censored and Uncensored Epanechnikov Distribution
Analyzing censored variables usually requires the use of optimization algorithms. This package provides an alternative algebraic approach to the task of determining the expected value of a random censored variable with a known censoring point. Likewise this approach allows for the determination of the censoring point if the expected value is known. These results are derived under the assumption that the variable follows an Epanechnikov kernel distribution with known mean and range prior to censoring. Statistical functions related to the uncensored Epanechnikov distribution are also provided by this package.
EPGLM Gaussian Approximation of Bayesian Binary Regression Models
The main functions compute the expectation propagation approximation of a Bayesian probit/logit models with Gaussian prior. More information can be found in Chopin and Ridgway (2015). More models and priors should follow.
epiflows Predicting Disease Spread from Flow Data
Provides functions and classes designed to handle and visualise epidemiological flows between locations. Also contains a statistical method for predicting disease spread from flow data initially described in Dorigatti et al. (2017) <doi:10.2807/1560-7917.ES.2017.22.28.30572>. This package is part of the RECON (<http://…/> ) toolkit for outbreak analysis.
episcan Scan Pairwise Epistasis
Searching genomic interactions with linear/logistic regression in a high-dimensional dataset is a time-consuming task. This package provides some efficient ways to scan epistasis in genome-wide interaction studies (GWIS). Both case-control status (binary outcome) and quantitative phenotype (continuous outcome) are supported (the main references: 1. Kam-Thong, T., D. Czamara, K. Tsuda, K. Borgwardt, C. M. Lewis, A. Erhardt-Lehmann, B. Hemmer, et al. (2011). <doi:10.1038/ejhg.2010.196>. 2. Kam-Thong, T., B. Pütz, N. Karbalai, B. Müller-Myhsok, and K. Borgwardt. (2011). <doi:10.1093/bioinformatics/btr218>.)
episode Estimation with Penalisation in Systems of Ordinary Differential Equations
A set of statistical tools for inferring unknown parameters in continuous time processes governed by ordinary differential equations (ODE). Moreover, variance reduction and model selection can be obtained through various implemented penalisation schemes. The package offers two estimation procedures: exact estimation via least squares and a faster approximate estimation via inverse collocation methods. All estimators can handle multiple data sets arising from the same ODE system, but subjected to different interventions.
EpistemicGameTheory Constructing an Epistemic Model for the Games with Two Players
Constructing an epistemic model such that, for every player i and for every choice c(i) which is optimal, there is one type that expresses common belief in rationality.
EpiWeek Conversion Between Epidemiological Weeks and Calendar Dates
Users can easily derive the calendar dates from epidemiological weeks, and vice versa.
eplusr A Toolkit for Using Whole Building Simulation Program ‘EnergyPlus’
A rich toolkit of using the whole building simulation program ‘EnergyPlus’, which enables programmatic navigation, modification of ‘EnergyPlus’ models and makes it less painful to do parametric simulations and analysis.
equalCovs Testing the Equality of Two Covariance Matrices
Tests the equality of two covariance matrices, used in paper ‘Two sample tests for high dimensional covariance matrices.’ Li and Chen (2012) <arXiv:1206.0917>.
equaltestMI Examine Measurement Invariance via Equivalence Testing and Projection Method
Functions for examining measurement invariance via equivalence testing along with adjusted RMSEA(root mean square error of approximation; Steiger & Lind, 1980) cutoff values. In particular, a projection-based method is implemented to test the equality of latent factor means across groups without assuming the equality of intercepts.
equateMultiple Equating of Multiple Forms
Equating of multiple forms using Item Response Theory (IRT) methods (Battauz M. (2017) <doi:10.1007/s11336-016-9517-x> and Haberman S. J. (2009) <doi:10.1002/j.2333-8504.2009.tb02197.x>).
equivalenceTest Equivalence Test for the Means of Two Normal Distributions
Two methods for performing equivalence test for the means of two (test and reference) normal distributions are implemented. The null hypothesis of the equivalence test is that the absolute difference between the two means are greater than or equal to the equivalence margin and the alternative is that the absolute difference is less than the margin. Given that the margin is often difficult to obtain a priori, it is assumed to be a constant multiple of the standard deviation of the reference distribution. The first method assumes a fixed margin which is a constant multiple of the estimated standard deviation of the reference data and whose variability is ignored. The second method takes into account the margin variability. In addition, some tools to summarize and illustrate the data and test results are included to facilitate the evaluation of the data and interpretation of the results.
EQUIVNONINF Testing for Equivalence and Noninferiority
Making available in R the complete set of programs accompanying S. Wellek’s (2010) monograph ”Testing Statistical Hypotheses of Equivalence and Noninferiority. Second Edition” (Chapman&Hall/CRC).
equivUMP Uniformly Most Powerful Invariant Tests of Equivalence
Implementation of uniformly most powerful invariant equivalence tests for one- and two-sample problems (paired and unpaired) as described in Wellek (2010, ISBN:978-1-4398-0818-4). Also one-sided alternatives (non-inferiority and non-superiority tests) are supported. Basically a variant of a t-test with (relaxed) null and alternative hypotheses exchanged.
equSA Estimate a Single or Multiple Graphical Models and Construct Networks
Provides an equivalent measure of partial correlation coefficients for high-dimensional Gaussian Graphical Models to learn and visualize the underlying relationships between variables from single or multiple datasets. You can refer to Liang, F., Song, Q. and Qiu, P. (2015) <doi:10.1080/01621459.2015.1012391> for more detail. Based on this method, the package also provides the method for constructing networks for Next Generation Sequencing Data. Besides, it includes the method for jointly estimating Gaussian Graphical Models of multiple datasets.
ercv Fitting Tails by the Empirical Residual Coefficient of Variation
Provides a methodology simple and trustworthy for the analysis of extreme values and multiple threshold tests for a generalized Pareto distribution, together with an automatic threshold selection algorithm. See del Castillo, J, Daoudi, J and Lockhart, R (2014) <doi:10.1111/sjos.12037>.
ergm.rank Fit, Simulate and Diagnose Exponential-Family Models for Rank-Order Relational Data
A set of extensions for the ‘ergm’ package to fit weighted networks whose edge weights are ranks.
erhcv Equi-Rank Hierarchical Clustering Validation
Assesses the statistical significance of clusters for a given dataset through bootstrapping and hypothesis testing of a given matrix of empirical Spearman’s rho, based on the technique of S. Gaiser et al. (2010) <doi:10.1016/j.jmva.2010.07.008>.
err Customizable Object Sensitive Messages
Messages should provide users with readable information about R objects without flooding their console. ‘cc()’ concatenates vector and data frame values into a grammatically correct string using commas, an ellipsis and conjunction. ‘cn()’ allows the user to define a string which varies based on a count. ‘co()’ combines the two to produce a customizable object aware string. The package further facilitates this process by providing five ‘sprintf’-like types such as ‘%n’ for the length of an object and ‘%o’ for its name as well as wrappers for pasting objects and issuing errors, warnings and messages.
errint Build Error Intervals
Build and analyze error intervals for a particular model predictions assuming different distributions for noise in the data.
errorist Automatically Search Errors or Warnings
Provides environment hooks that obtain errors and warnings which occur during the execution of code to automatically search for solutions.
errorizer Function Errorizer
Provides a function to convert existing R functions into ‘errorized’ versions with added logging and handling functionality when encountering errors or warnings. The errorize function accepts an existing R function as its first argument and returns a R function with the exact same arguments and functionality. However, if an error or warning occurs when running that ‘errorized’ R function, it will save a .Rds file to the current working directory with the relevant objects and information required to immediately recreate the error.
errorlocate Locate Errors with Validation Rules
Errors in data can be located and removed using validation rules from package ‘validate’.
errors Error Propagation for R Vectors
Support for painless automatic error propagation in numerical operations.
ERSA Exploratory Regression ‘Shiny’ App
Constructs a ‘shiny’ app function with interactive displays for summary and analysis of variance regression tables, and parallel coordinate plots of data and residuals.
es.dif Compute Effect Sizes of the Difference
Computes various effect sizes of the difference, their variance, and confidence interval. This package treats Cohen’s d, Hedges’ d, biased/unbiased c (an effect size between a mean and a constant) and e (an effect size between means without assuming the variance equality).
esaBcv Estimate Number of Latent Factors and Factor Matrix for Factor Analysis
These functions estimate the latent factors of a given matrix, no matter it is high-dimensional or not. It tries to first estimate the number of factors using bi-cross-validation and then estimate the latent factor matrix and the noise variances. For more information about the method, see Art B. Owen and Jingshu Wang 2015 archived article on factor model (http://…/1503.03515 ).
esaddle Extended Empirical Saddlepoint Density Approximation
Tools for fitting the Extended Empirical Saddlepoint (EES) density.
esc Effect Size Computation for Meta Analysis
Implementation of the web-based ‘Practical Meta-Analysis Effect Size Calculator’ from David B. Wilson in R. Based on the input, the effect size can be returned as standardized mean difference, Hedges’ g, correlation coefficient r or Fisher’s transformation z, odds ratio or log odds effect size.
esDesign Adaptive Enrichment Designs with Sample Size Re-Estimation
Software of ‘esDesign’ is developed to implement the adaptive enrichment designs with sample size re-estimation. In details, three-proposed trial designs are provided, including the AED1-SSR (or ES1-SSR), AED2-SSR (or ES2-SSR) and AED3-SSR (or ES3-SSR). In addition, this package also contains several widely used adaptive designs, such as the Marker Sequential Test (MaST) design proposed Freidlin et al. (2014) <doi:10.1177/1740774513503739>, the adaptive enrichment designs without early stopping (AED or ES), the sample size re-estimation procedure (SSR) based on the conditional power proposed by Proschan and Hunsberger (1995) <doi:10.2307/2533262>, and some useful functions. In details, we can calculate the futility and/or efficacy stopping boundaries, the sample size required, calibrate the value of the threshold of the difference between subgroup-specific test statistics, conduct the simulation studies in AED, SSR, AED1-SSR, AED2-SSR and AED3-SSR.
eshrink Shrinkage for Effect Estimation
Computes shrinkage estimators for regression problems. Selects penalty parameter by minimizing bias and variance in the effect estimate, where bias and variance are estimated from the posterior predictive distribution.
ESKNN Ensemble of Subset of K-Nearest Neighbours Classifiers for Classification and Class Membership Probability Estimation
Functions for classification and group membership probability estimation are given. The issue of non-informative features in the data is addressed by utilizing the ensemble method. A few optimal models are selected in the ensemble from an initially large set of base k-nearest neighbours (KNN) models, generated on subset of features from the training data. A two stage assessment is applied in selection of optimal models for the ensemble in the training function. The prediction functions for classification and class membership probability estimation returns class outcomes and class membership probability estimates for the test data. The package includes measure of classification error and brier score, for classification and probability estimation tasks respectively.
esmprep Data Preparation During and After the Use of the Experience Sampling Methodology (ESM)
Support in preparing a raw ESM dataset for statistical analysis. Preparation includes the handling of errors (mostly due to technological reasons) and the generating of new variables that are necessary and/or helpful in meeting the conditions when statistically analyzing ESM data. The functions in ‘esmprep’ are meant to hierarchically lead from bottom, i.e. the raw (separated) ESM dataset(s), to top, i.e. a single ESM dataset ready for statistical analysis. This hierarchy evolved out of my personal experience in working with ESM data.
esquisse Explore and Visualize Your Data Interactively
A ‘shiny’ gadget to create ‘ggplot2’ charts interactively with drag-and-drop to map your variables. You can quickly visualize your data accordingly to their type, export to ‘PNG’ or ‘PowerPoint’, and retrieve the code to reproduce the chart.
esreg Joint Quantile and Expected Shortfall Regression
Simultaneous modeling of the quantile and the expected shortfall of a response variable given a set of covariates, see Dimitriadis and Bayer (2017) <arXiv:1704.02213>.
essHist The Essential Histogram
Provide an optimal histogram, in the sense of probability density estimation and features detection, by means of multiscale variational inference. For details see Li, Munk, Sieling and Walther (2016) <arXiv:1612.07216>.
ESTER Efficient Sequential Testing with Evidence Ratios
An implementation of sequential testing that uses evidence ratios computed from the Akaike weights of a set of models. These weights are being computed using either the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC), and following Burnham & Anderson (2004) recommendations. Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods and Research, 33(2), 261-304. <doi:10.1177/0049124104268644>.
EstHer Estimation of Heritability in High Dimensional Sparse Linear Mixed Models using Variable Selection
Our method is a variable selection method to select active components in sparse linear mixed models in order to estimate the heritability. The selection allows us to reduce the size of the data sets which improves the accuracy of the estimations. Our package also provides a confidence interval for the estimated heritability.
estimability Estimability Tools for Linear Models
Provides tools for determining estimability of linear functions of regression coefficients, and alternative epredict methods for lm, glm, and mlm objects that handle non-estimable cases correctly.
EstimateGroupNetwork Perform the Joint Graphical Lasso and Selects Tuning Parameters
Can be used to simultaneously estimate networks (Gaussian Graphical Models) in data from different groups or classes via Joint Graphical Lasso. Tuning parameters are selected via information criteria (AIC / BIC / eBIC) or crossvalidation.
estimatr Fast Estimators for Design-Based Inference
Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>, difference-in-means, and Horvitz-Thompson estimation.
estprod Estimation of Production Functions
Estimation of production functions by the Olley-Pakes and Levinsohn-Petrin methodologies. The package aims to reproduce the results obtained with the Stata’s user written opreg <http://…/article.html?article=st0145> and levpet <http://…/article.html?article=st0060> commands. The first was originally proposed by Olley, G.S. and Pakes, A. (1996) <doi:10.2307/2171831>. And the second by Levinsohn, J. and Petrin, A. (2003) <doi:10.1111/1467-937X.00246>.
EstSimPDMP Estimation and Simulation for PDMPs
This package deals with the estimation of the jump rate for piecewise-deterministic Markov processes (PDMPs), from only one observation of the process within a long time. The main functions provide an estimate of this function. The state space may be discrete or continuous. The associated paper has been published in Scandinavian Journal of Statistics and is given in references. Other functions provide a method to simulate random variables from their (conditional) hazard rate, and then to simulate PDMPs.
estudy2 An Implementation of Parametric and Nonparametric Event Study
An implementation of a most commonly used event study methodology, including both parametric and nonparametric tests. It contains variety aspects of the rate of return estimation (the core calculation is done in C++), as well as three classical for event study market models: mean adjusted returns, market adjusted returns and single-index market models. There are 6 parametric and 6 nonparametric tests provided, which examine cross-sectional daily abnormal return (see the documentation of the functions for more information). Parametric tests include tests proposed by Brown and Warner (1980) <DOI:10.1016/0304-405X(80)90002-1>, Brown and Warner (1985) <DOI:10.1016/0304-405X(85)90042-X>, Boehmer et al. (1991) <DOI:10.1016/0304-405X(91)90032-F>, Patell (1976) <DOI:10.2307/2490543>, and Lamb (1995) <DOI:10.2307/253695>. Nonparametric tests covered in estudy2 are tests described in Corrado and Zivney (1992) <DOI:10.2307/2331331>, McConnell and Muscarella (1985) <DOI:10.1016/0304-405X(85)90006-6>, Boehmer et al. (1991) <DOI:10.1016/0304-405X(91)90032-F>, Cowan (1992) <DOI:10.1007/BF00939016>, Corrado (1989) <DOI:10.1016/0304-405X(89)90064-0>, Campbell and Wasley (1993) <DOI:10.1016/0304-405X(93)90025-7>, Savickas (2003) <DOI:10.1111/1475-6803.00052>, Kolari and Pynnonen (2010) <DOI:10.1093/rfs/hhq072>. Furthermore, tests for the cumulative abnormal returns proposed by Brown and Warner (1985) <DOI:10.1016/0304-405X(85)90042-X> and Lamb (1995) <DOI:10.2307/253695> are included.
esvis Visualization and Estimation of Effect Sizes
A variety of methods are provided to estimate and visualize distributional differences in terms of effect sizes. Particular emphasis is upon evaluating differences between two or more distributions across the entire scale, rather than at a single point (e.g., differences in means). For example, Probability-Probability (PP) plots display the difference between two or more distributions, matched by their empirical CDFs (see Ho and Reardon, 2012; <doi:10.3102/1076998611411918>), allowing for examinations of where on the scale distributional differences are largest or smallest. The area under the PP curve (AUC) is an effect-size metric, corresponding to the probability that a randomly selected observation from the x-axis distribution will have a higher value than a randomly selected observation from the y-axis distribution. Binned effect size plots are also available, in which the distributions are split into bins (set by the user) and separate effect sizes (Cohen’s d) are produced for each bin – again providing a means to evaluate the consistency (or lack thereof) of the difference between two or more distributions at different points on the scale. Evaluation of empirical CDFs is also provided, with built-in arguments for providing annotations to help evaluate distributional differences at specific points (e.g., semi-transparent shading). All function take a consistent argument structure. Calculation of specific effect sizes is also possible. The following effect sizes are estimable: (a) Cohen’s d, (b) Hedges’ g, (c) percentage above a cut, (d) transformed (normalized) percentage above a cut, (e) area under the PP curve, and (f) the V statistic (see Ho, 2009; <doi:10.3102/1076998609332755>), which essentially transforms the area under the curve to standard deviation units. By default, effect sizes are calculated for all possible pairwise comparisons, but a reference group (distribution) can be specified.
ether Interaction with the ‘Ethereum’ Blockchain
Interacts with the open-source, public ‘Ethereum’ <https://…/> blockchain. It provides a distributed computing platform via smart contracts. This package provides functions which interrogate blocks and transactions in the ‘Ethereum’ blockchain.
etrunct Computes Moments of Univariate Truncated t Distribution
Computes moments of univariate truncated t distribution. There is only one exported function, e_trunct(), which should be seen for details.
eulerr Area-Proportional Euler Diagrams
If possible, generates exactly area-proportional Euler diagrams, or otherwise approximately proportional diagrams using numeric optimization. A Euler diagram is a generalization of a Venn diagram, relaxing the criterion that all interactions need to be represented.
ev.trawl Extreme Value Trawls
Implementation of trawl processes and an extension of such processes into a univariate latent model for extreme values. Inference, simulation and initialization tools are available. See Noven et al. (2018) <DOI:10.21314/JEM.2018.179> which can be found on arXiv (<arXiv:1511.08190>) .
EvaluationMeasures Collection of Model Evaluation Measure Functions
Provides Some of the most important evaluation measures for evaluating a model. Just by giving the real and predicted class, measures such as accuracy, sensitivity, specificity, ppv, npv, fmeasure, mcc and … will be returned.
evaluator Information Security Quantified Risk Assessment Toolkit
An open source information security strategic risk analysis toolkit based on the OpenFAIR taxonomy <https://…/C13K> and risk assessment standard <https://…/C13G>. Empowers an organization to perform a quantifiable, repeatable, and data-driven review of its security program.
EValue Sensitivity Analyses for Unmeasured Confounding in Observational Studies and Meta-Analyses
Conducts sensitivity analyses for unmeasured confounding for either an observational study or a meta-analysis of observational studies. For a single observational study, the package reports E-values, defined as the minimum strength of association on the risk ratio scale that an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates. One can use one of the evalues.XX() functions to compute E-values for the relevant outcome types. Outcome types include risk ratios, odds ratio with common or rare outcomes, hazard ratios with common or rare outcomes, and standardized differences in outcomes. Optionally, one can use the biasPlot() function to plot the bias factor as a function of two sensitivity parameters. (See VanderWeele & Ding, 2017 [<http://…/2643434>] for details.) For a meta-analysis, use the function confounded_meta to compute point estimates and inference for: (1) the proportion of studies with true causal effect sizes more extreme than a specified threshold of scientific importance; and (2) the minimum bias factor and confounding strength required to reduce to less than a specified threshold the proportion of studies with true effect sizes of scientifically significant size. The functions sens_plot() and sens_table() create plots and tables for visualizing these meta-analysis metrics across a range of bias values, and scrape_meta helps scrape study-level data from a published forest plot or summary table to obtain the needed estimates when these are not reported. (See Mathur & VanderWeele [<https://…/>] for details.) Most of the analyses available in this package can also be conducted using web-based graphical interfaces (for a single observational study: <https://…/>; for a meta-analysis: <https://…/> ).
evclass Evidential Distance-Based Classification
Different evidential distance-based classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule and the evidential neural network.
evclust Evidential Clustering
Various clustering algorithms that produce a credal partition, i.e., a set of Dempster-Shafer mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership uncertainty of the objects. The algorithms are: Evidential c-Means (ECM), Relational Evidential c-Means (RECM), Constrained Evidential c-Means (CECM), EVCLUS and EK-NNclus.
event Event History Procedures and Models
Functions for setting up and analyzing event history data.
eventdataR Event Data Repository
Event dataset repository including both real-life and artificial event logs. They can be used in combination with functionalities provided by the ‘bupaR’ packages ‘edeaR’, ‘processmapR’, etc.
eventstudies Event Study Analysis
A platform for conducting event studies (Fama, Fisher, Jensen, Roll (1969) <doi:10.2307/2525569>) and for methodological research on event studies. The package supports market model, augmented market model, and excess returns methods for data modelling along with Wilcox, classical t-test, and Bootstrap as inference procedures.
evian Evidential Analysis of Genetic Association Data
Evidential regression analysis for dichotomous and quantitative outcome data. The following references described the methods in this package: Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010) <doi:10.1038/ejhg.2010.47>. Strug, L. J., & Hodge, S. E. (2006) <doi:10.1159/000094709>. Royall, R. (1997) <ISBN:0-412-04411-0>.
evidence Analysis of Scientific Evidence Using Bayesian and Likelihood Methods
Bayesian (and some likelihoodist) functions as alternatives to hypothesis-testing functions in R base using a user interface patterned after those of R’s hypothesis testing functions. See McElreath (2016, ISBN: 978-1-4822-5344-3), Gelman and Hill (2007, ISBN: 0-521-68689-X) (new edition in preparation) and Albert (2009, ISBN: 978-0-387-71384-7) for good introductions to Bayesian analysis and Pawitan (2002, ISBN: 0-19-850765-8) for the Likelihood approach. The functions in the package also make extensive use of graphical displays for data exploration and model comparison.
evidenceFactors Reporting Tools for Sensitivity Analysis of Evidence Factors in Observational Studies
Integrated Sensitivity Analysis of Evidence Factors in Observational Studies.
EvolutionaryGames Important Concepts of Evolutionary Game Theory
A comprehensive set of tools to illustrate the core concepts of evolutionary game theory, such as evolutionary stability or various evolutionary dynamics, for teaching and academic research.
Evomorph Evolutionary Morphometric Simulation
Evolutionary process simulation using geometric morphometric data. Manipulation of landmark data files (TPS), shape plotting and distances plotting functions.
evoper Evolutionary Parameter Estimation for ‘Repast Simphony’ Models
The EvoPER, Evolutionary Parameter Estimation for ‘Repast Simphony’ Agent-Based framework, provides optimization driven parameter estimation methods based on evolutionary computation techniques which could be more efficient and require, in some cases, fewer model evaluations than other alternatives relaying on experimental design.
EW Edgeworth Expansion
Edgeworth Expansion calculation.
exampletestr Help for Writing Tests Based on Function Examples
Take the examples written in your documentation of functions and use them to create shells (skeletons which must be manually completed by the user) of test files to be tested with the ‘testthat’ package. Documentation must be done with ‘roxygen2’.
ExcessMass Excess Mass Calculation and Plots
Implementation of a function which calculates the empirical excess mass for given \eqn{\lambda} and given maximal number of modes (excessm()). Offering powerful plot features to visualize empirical excess mass (exmplot()). This includes the possibility of drawing several plots (with different maximal number of modes / cut off values) in a single graph.
exif Read EXIF Metadata from JPEGs
Extracts Exchangeable Image File Format (EXIF) metadata, such as camera make and model, ISO speed and the date-time the picture was taken on, from JPEG images. Incorporates the ‘easyexif’ (https://…/easyexif ) library.
exifr EXIF Image Data in R
Reads EXIF data using ExifTool <http://…/> and returns results as a data frame. ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. ExifTool supports many different metadata formats including EXIF, GPS, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the maker notes of many digital cameras by Canon, Casio, FLIR, FujiFilm, GE, HP, JVC/Victor, Kodak, Leaf, Minolta/Konica-Minolta, Motorola, Nikon, Nintendo, Olympus/Epson, Panasonic/Leica, Pentax/Asahi, Phase One, Reconyx, Ricoh, Samsung, Sanyo, Sigma/Foveon and Sony.
exiftoolr ExifTool Functionality from R
Reads, writes, and edits EXIF and other file metadata using ExifTool <http://…/>, returning read results as a data frame. ExifTool supports many different metadata formats including EXIF, GPS, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the maker notes of many digital cameras by Canon, Casio, DJI, FLIR, FujiFilm, GE, GoPro, HP, JVC/Victor, Kodak, Leaf, Minolta/Konica-Minolta, Motorola, Nikon, Nintendo, Olympus/Epson, Panasonic/Leica, Pentax/Asahi, Phase One, Reconyx, Ricoh, Samsung, Sanyo, Sigma/Foveon and Sony.
ExPanDaR Explore Panel Data Interactively
Provides a shiny-based front end (the ‘ExPanD’ app) and a set of functions for exploratory panel data analysis. Run as a web-based app, ‘ExPanD’ enables users to assess the robustness of empirical evidence without providing them access to the underlying data. You can also use the functions of the package to support your exploratory data analysis workflow. Refer to the vignettes of the package for more information on how to use ‘ExPanD’ and/or the functions of this package.
expandFunctions Feature Matrix Builder
Generates feature matrix outputs from R object inputs using a variety of expansion functions. The generated feature matrices have applications as inputs for a variety of machine learning algorithms. The expansion functions are based on coercing the input to a matrix, treating the columns as features and converting individual columns or combinations into blocks of columns. Currently these include expansion of columns by efficient sparse embedding by vectors of lags, quadratic expansion into squares and unique products, powers by vectors of degree, vectors of orthogonal polynomials functions, and block random affine projection transformations (RAPTs). The transformations are magrittr- and cbind-friendly, and can be used in a building block fashion. For instance, taking the cos() of the output of the RAPT transformation generates a stationary kernel expansion via Bochner’s theorem, and this expansion can then be cbind-ed with other features. Additionally, there are utilities for replacing features, removing rows with NAs, creating matrix samples of a given distribution, a simple wrapper for LASSO with CV, a Freeman-Tukey transform, generalizations of the outer function, matrix size-preserving discrete difference by row, plotting, etc.
expands Expanding Ploidy and Allele-Frequency on Nested Subpopulations
Expanding Ploidy and Allele Frequency on Nested Subpopulations (expands) characterizes coexisting subpopulations in a single tumor sample using copy number and allele frequencies derived from exome- or whole genome sequencing input data (<http://…/24177718> ). The model detects coexisting genotypes by leveraging run-specific tradeoffs between depth of coverage and breadth of coverage. This package predicts the number of clonal expansions, the size of the resulting subpopulations in the tumor bulk, the mutations specific to each subpopulation, tumor purity and phylogeny. The main function runExPANdS() provides the complete functionality needed to predict coexisting subpopulations from single nucleotide variations (SNVs) and associated copy numbers. The robustness of subpopulation predictions increases with the number of mutations provided. It is recommended that at least 200 mutations are used as input to obtain stable results. Updates in version 2.1 include: (i) new parameter ploidy in runExPANdS.R allows specification of non-diploid background ploidies (e.g. for near-triploid cell lines); (ii) parallel computing option is available. Further documentation and FAQ available at <http://…/expands>.
ExpDE Modular Differential Evolution for Experimenting with Operators
Modular implementation of the Differential Evolution algorithm for experimenting with different types of operators.
expint Exponential Integral and Incomplete Gamma Function
The exponential integrals E_1(x), E_2(x), E_n(x) and Ei(x), and the incomplete gamma function G(a, x) defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details. A test package included in sub-directory example_API provides an implementation. C routines derived from the GNU Scientific Library <https://…/>.
ExplainPrediction Explanation of Predictions for Classification and Regression Models
Package contains methods to generate explanations for individual predictions of classification and regression models. Weighted averages of individual explanations form explanation of the whole model. The package extends ‘CORElearn’ package, but other prediction models can also be explained using a wrapper.
explor Interactive Interfaces for Results Exploration
Shiny interfaces and graphical functions for multivariate analysis results exploration.
explore Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code or use a easy to remember set of tidy functions. Introduces two main verbs for data analysis. describe() to describe a variable or table, explore() to graphically explore a variable or table.
exploreR Tools for Quickly Exploring Data
Simplifies some complicated and labor intensive processes involved in exploring and explaining data. Allows you to quickly and efficiently visualize the interaction between variables and simplifies the process of discovering covariation in your data. Also includes some convenience features designed to remove as much redundant typing as possible.
expm Matrix exponential
Computation of the matrix exponential and related quantities.
ExPosition Exploratory Analysis with the Singular Value Decomposition
A variety of descriptive multivariate analyses with the singular value decomposition, such as principal components analysis, correspondence analysis, and multidimensional scaling. See An ExPosition of the Singular Value Decomposition in R (Beaton et al 2014) <doi:10.1016/j.csda.2013.11.006>.
expp Spatial Analysis of Extra-Pair Paternity
Tools and data to accompany Schlicht, L., Valcu, M., & Kempenaers, B. (2015) <doi:10.1111/1365-2656.12293>. Spatial patterns of extra-pair paternity: beyond paternity gains and losses. Journal of Animal Ecology, 84(2), 518-531.
expperm Computing Expectations and Marginal Likelihoods for Permutations
A set of functions for computing expected permutation matrices given a matrix of likelihoods for each individual assignment. It has been written to accompany the forthcoming paper ‘Computing expectations and marginal likelihoods for permutations’. Publication details will be updated as soon as they are finalized.
ExpRep Experiment Repetitions
Allows to calculate the probabilities of occurrences of an event in a great number of repetitions of Bernoulli experiment, through the application of the local and the integral theorem of De Moivre Laplace, and the theorem of Poisson. Gives the possibility to show the results graphically and analytically, and to compare the results obtained by the application of the above theorems with those calculated by the direct application of the Binomial formula. Is basically useful for educational purposes.
expSBM An Exponential Stochastic Block Model for Interaction Lengths
Given a continuous-time dynamic network, this package allows one to fit a stochastic blockmodel where nodes belonging to the same group create interactions and non-interactions of similar lengths. This package implements the methodology described by R. Rastelli and M. Fop (2019) <arXiv:1901.09828>.
expss Some Useful Functions from Spreadsheets and ‘SPSS’ Statistics
Package implements several popular functions from Excel (‘COUNTIF’, ‘VLOOKUP’, etc.) and ‘SPSS’ Statistics (‘RECODE’, ‘COUNT’, etc.). Also there are functions for basic tables with value labels/variable labels support. Package aimed to help people to move data processing from Excel/’SPSS’ to R.
expstudies Calculate Exposures, Assign Records to Intervals
Creation of an exposure table with rows for policy-intervals from a table with a unique policy number key and beginning and ending dates for each policy. Methods for assigning supplemental data containing dates and policy numbers to the corresponding interval from the created exposures table.
exreport Fast, Reliable and Elegant Reproducible Research
Analysis of experimental results and automatic report generation in both interactive HTML and LaTeX. This package ships with a rich interface for data modeling and built in functions for the rapid application of statistical tests and generation of common plots and tables with publish-ready quality.
EXRQ Extreme Regression of Quantiles
Estimation for high conditional quantiles based on quantile regression.
ExtDist Extending the Range of Functions for Probability Distributions
A consistent, unified and extensible framework for estimation of parameters for probability distributions, including parameter estimation procedures that allow for weighted samples; the current set of distributions included are: the standard beta, The four-parameter beta, Burr, gamma, Gumbel, Johnson SB and SU, Laplace, logistic, normal, symmetric truncated normal, truncated normal, symmetric-reflected truncated beta, standard symmetric-reflected truncated beta, triangular, uniform, and Weibull distributions; decision criteria and selections based on these decision criteria.
exteriorMatch Constructs the Exterior Match from Two Matched Control Groups
If one treated group is matched to one control reservoir in two different ways to produce two sets of treated-control matched pairs, then the two control groups may be entwined, in the sense that some control individuals are in both control groups. The exterior match is used to compare the two control groups.
extracat Categorical Data Analysis and Visualization
Categorical Data Analysis and Visualization.
ExtremeBounds ExtremeBounds: Extreme Bounds Analysis in R
An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and Sala-i-Martin’s versions of EBA, and allows users to customize all aspects of the analysis.
extremefit Estimation of Extreme Conditional Quantiles and Probabilities
Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.
extremeStat Extreme Value Statistics and Quantile Estimation
Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale.
extremogram Estimation of Extreme Value Dependence for Time Series Data
Estimation of the sample univariate, cross and return time extremograms. The package can also adds empirical confidence bands to each of the extremogram plots via a permutation procedure under the assumption that the data are independent. Finally, the stationary bootstrap allows us to construct credible confidence bands for the extremograms.
exuber Econometric Analysis of Explosive Time Series
Testing for and dating periods of explosive dynamics (exuberance) in time series using recursive unit root tests as proposed by Phillips, P. C., Shi, S. and Yu, J. (2015a) <doi:10.1111/iere.12132>. Simulate a variety of periodically-collapsing bubble models. The estimation and simulation utilizes the matrix inversion lemma from the recursive least squares algorithm, which results in a significant speed improvement.
ezknitr Avoid the Typical Working Directory Pain When Using ‘knitr’
An extension of ‘knitr that adds flexibility in several ways. One common source of frustration with ‘knitr’ is that it assumes the directory where the source file lives should be the working directory, which is often not true. ‘ezknitr’ addresses this problem by giving you complete control over where all the inputs and outputs are, and adds several other convenient features to make rendering markdown/HTML documents easier.
ezpickr Import Various Data File Types as a Rectangular Form Using a File Picker Dialogue Box
Easy data importing for the most frequently used file formats in a ‘tibble’ form.
ezsummary Summarise Data in the Quick and Easy Way
Functions that can fulfill the gap between the outcomes of ‘dplyr’ and a print-ready summary table.

F

fabCI FAB Confidence Intervals
Frequentist assisted by Bayes (FAB) confidence interval construction. See ‘Adaptive multigroup confidence intervals with constant coverage’ by Yu and Hoff <https://…/1612.08287>.
fabricatr Imagine Your Data Before You Collect It
Helps you imagine your data before you collect it. Hierarchical data structures and correlated data can be easily simulated, either from random number generators or by resampling from existing data sources. This package is faster with ‘data.table’ and ‘mvnfast’ installed.
face Fast Covariance Estimation for Sparse Functional Data
Fast covariance estimation for sparse functional data.
facebook.S4 Access to Facebook API V2 via a Set of S4 Classes
Provides an interface to the Facebook API and builds collections of elements that reflects the graph architecture of Facebook. See <https://…/graph-api> for more information.
facerec An Interface for Face Recognition
Provides an interface to the ‘Kairos’ Face Recognition API <https://…/face-recognition-api>. The API detects faces in images and returns estimates for demographics like gender, ethnicity and age.
factoextra Extract and Visualize the Results of Multivariate Data Analyses
Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including ‘PCA’ (Principal Component Analysis), ‘CA’ (Correspondence Analysis), ‘MCA’ (Multiple Correspondence Analysis), ‘MFA’ (Multiple Factor Analysis) and ‘HMFA’ (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides ‘ggplot2’ – based elegant data visualization.
FactoInvestigate Automatic Description of Factorial Analysis
Brings a set of tools to help and automatically realise the description of principal component analyses (from ‘FactoMineR’ functions). Detection of existing outliers, identification of the informative components, graphical views and dimensions description are performed threw dedicated functions. The Investigate() function performs all these functions in one, and returns the result as a report document (Word, PDF or HTML).
FactoMineR Multivariate Exploratory Data Analysis and Data Mining
Exploratory data analysis methods such as principal component methods and clustering
factorcpt Simultaneous Change-Point and Factor Analysis
Identifies change-points in the common and the idiosyncratic components via factor modelling.
FactoRizationMachines Machine Learning with Higher-Order Factorization Machines
Implementation of three machine learning approaches: Support Vector Machines (SVM) with a linear kernel, second-order Factorization Machines (FM), and higher-order Factorization Machines (HoFM).
factorMerger Hierarchical Algorithm for Post-Hoc Testing
A set of tools to support results of post-hoc testing and enable to extract hierarchical structure of factors. Work on this package was financially supported by the ‘NCN Opus grant 2016/21/B/ST6/02176’.
factorstochvol Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models
Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix.
Factoshiny Perform Factorial Analysis from FactoMineR with a Shiny Application
Perform factorial analysis with a menu and draw graphs interactively thanks to FactoMineR and a Shiny application.
faisalconjoint Faisal Conjoint Model: A New Approach to Conjoint Analysis
It is used for systematic analysis of decisions based on attributes and its levels.
fakemake Mock the Unix Make Utility
Use R as a minimal build system. This might come in handy if you are developing R packages and can not use a proper build system. Stay away if you can (use a proper build system).
fakeR Simulates Data from a Data Frame of Different Variable Types
Generates fake data from a dataset of different variable types. The package contains the functions simulate_dataset and simulate_dataset_ts to simulate time-independent and time-dependent data. It randomly samples character and factor variables from contingency tables and numeric and ordered factors from a multivariate normal distribution. It currently supports the simulation of stationary and zero-inflated count time series.
famSKATRC Family Sequence Kernel Association Test for Rare and Common Variants
FamSKAT-RC is a family-based association kernel test for both rare and common variants. This test is general and several special cases are known as other methods: famSKAT, which only focuses on rare variants in family-based data, SKAT, which focuses on rare variants in population-based data (unrelated individuals), and SKAT-RC, which focuses on both rare and common variants in population-based data. When one applies famSKAT-RC and sets the value of phi to 1, famSKAT-RC becomes famSKAT. When one applies famSKAT-RC and set the value of phi to 1 and the kinship matrix to the identity matrix, famSKAT-RC becomes SKAT. When one applies famSKAT-RC and set the kinship matrix (fullkins) to the identity matrix (and phi is not equal to 1), famSKAT-RC becomes SKAT-RC. We also include a small sample synthetic pedigree to demonstrate the method with. For more details see Saad M and Wijsman EM (2014) <doi:10.1002/gepi.21844>.
fancycut A Fancy Version of ‘base::cut’
Provides the function fancycut() which is like cut() except you can mix left open and right open intervals with point values, intervals that are closed on both ends and intervals that are open on both ends.
fanovaGraph Building Kriging Models from FANOVA Graphs
Estimation and plotting of a function’s FANOVA graph to identify the interaction structure and fitting, prediction and simulation of a Kriging model modified by the identified structure. The interactive function plotManipulate() can only be run on the RStudio IDE with RStudio’s package ‘manipulate’ loaded. RStudio is freely available (www.rstudio.org), and includes package ‘manipulate’. The equivalent function plotTk() bases on CRAN Repository packages only.
fanplot Visualisation of Sequential Probability Distributions Using Fan Charts
Visualise sequential distributions using a range of plotting styles. Sequential distribution data can be input as either simulations or values corresponding to percentiles over time. Plots are added to existing graphic devices using the fan function. Users can choose from four different styles, including fan chart type plots, where a set of coloured polygon, with shadings corresponding to the percentile values are layered to represent different uncertainty levels.
fansi ANSI Control Sequence Aware String Functions
Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences.
farff A Faster ‘ARFF’ File Reader and Writer
Reads and writes ‘ARFF’ files. ‘ARFF’ (Attribute-Relation File Format) files are like ‘CSV’ files, with a little bit of added meta information in a header and standardized NA values. They are quite often used for machine learning data sets and were introduced for the ‘WEKA’ machine learning ‘Java’ toolbox. See <http://…/ARFF> for further info on ‘ARFF’ and for <http://…/> for more info on ‘WEKA’. ‘farff’ gets rid of the ‘Java’ dependency that ‘RWeka’ enforces, and it is at least a faster reader (for bigger files). It uses ‘readr’ as parser back-end for the data section of the ‘ARFF’ file. Consistency with ‘RWeka’ is tested on ‘Github’ and ‘Travis CI’ with hundreds of ‘ARFF’ files from ‘OpenML’. Note that the ‘OpenML’ package is currently only available from ‘Github’ at: <https://…/openml-r>.
fArma Rmetrics – Modelling ARMA Time Series Processes
Modelling ARMA Time Series Processes.
FarmSelect Factor Adjusted Robust Model Selection
Implements a consistent model selection strategy for high dimensional sparse regression when the covariate dependence can be reduced through factor models. By separating the latent factors from idiosyncratic components, the problem is transformed from model selection with highly correlated covariates to that with weakly correlated variables. It is appropriate for cases where we have many variables compared to the number of samples. Moreover, it implements a robust procedure to estimate distribution parameters wherever possible, hence being suitable for cases when the underlying distribution deviates from Gaussianity. See the paper on the ‘FarmSelect’ method, Fan et al.(2017) <arXiv:1612.08490>, for detailed description of methods and further references.
FarmTest Factor Adjusted Robust Multiple Testing
Performs robust multiple testing for means in the presence of known and unknown latent factors. It implements a robust procedure to estimate distribution parameters using the Huber’s loss function and accounts for strong dependence among coordinates via an approximate factor model. This method is particularly suitable for high dimensional data when there are many variables but only a small number of observations available. Moreover, the method is tailored to cases when the underlying distribution deviates from Gaussian, which is commonly assumed in the literature. Besides the results of hypotheses testing, the estimated underlying factors and diagnostic plots are also output. Multiple comparison correction is done after estimating the proportion of true null hypotheses using the method in Storey (2015) <https://…/qvalue>. See the paper on the ‘FarmTest’ method, Zhou et al.(2017) <https://goo.gl/68SJpd>, for detailed description of methods and further references.
FASeg Joint Segmentation of Correlated Time Series
It contains a function designed to the joint segmentation in the mean of several correlated series. The method is described in the paper X. Collilieux, E. Lebarbier and S. Robin. A factor model approach for the joint segmentation with between-series correlation (2015) <arXiv:1505.05660>.
fasjem A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models
The FASJEM (A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models) is a joint estimator which is fast and scalable for learning multiple related sparse Gaussian graphical models. For more details, please see <https://…/2017_JEM_combined.pdf>.
fasta Fast Adaptive Shrinkage/Thresholding Algorithm
A collection of acceleration schemes for proximal gradient methods for estimating penalized regression parameters described in Goldstein, Studer, and Baraniuk (2016) <arXiv:1411.3406>. Schemes such as Fast Iterative Shrinkage and Thresholding Algorithm (FISTA) by Beck and Teboulle (2009) <doi:10.1137/080716542> and the adaptive stepsize rule introduced in Wright, Nowak, and Figueiredo (2009) <doi:10.1109/TSP.2009.2016892> are included. You provide the objective function and proximal mappings, and it takes care of the issues like stepsize selection, acceleration, and stopping conditions for you.
fastAdaboost a Fast Implementation of Adaboost
Implements Adaboost based on C++ backend code. This is blazingly fast and especially useful for large, in memory data sets. The package uses decision trees as weak classifiers. Once the classifiers have been trained, they can be used to predict new data. Currently, we support only binary classification tasks. The package implements the Adaboost.M1 algorithm and the real Adaboost(SAMME.R) algorithm.
FastBandChol Fast Estimation of a Covariance Matrix by Banding the Cholesky Factor
Fast and numerically stable estimation of a covariance matrix by banding the Cholesky factor using a modified Gram-Schmidt algorithm implemented in RcppArmadilo. See <http://…/~molst029> for details on the algorithm.
fastcmh Significant Interval Discovery with Categorical Covariates
A method which uses the Cochran-Mantel-Haenszel test with significant pattern mining to detect intervals in binary genotype data which are significantly associated with a particular phenotype, while accounting for categorical covariates.
fastcox Lasso and Elastic-Net Penalized Cox’s Regression in High Dimensions Models using the Cocktail Algorithm
We implement a cocktail algorithm, a good mixture of coordinate decent, the majorization-minimization principle and the strong rule, for computing the solution paths of the elastic net penalized Cox’s proportional hazards model. The package is an implementation of Yang, Y. and Zou, H. (2013) DOI: <doi:10.4310/SII.2013.v6.n2.a1>.
fastdigest Fast, Low Memory-Footprint Digests of R Objects
Provides an R interface to Bob Jenkin’s streaming, non-cryptographic ‘SpookyHash’ hash algorithm for use in digest-based comparisons of R objects. ‘fastdigest’ plugs directly into R’s internal serialization machinery, allowing digests of all R objects the serialize() function supports, including reference-style objects via custom hooks. Speed is high and scales linearly by object size; memory usage is constant and negligible.
fastDummies Fast Creation of Dummy (Binary) Columns from Categorical Variables
Creates dummy columns from columns that have categorical variables (character or factor types). This package provides a significant speed increase from creating dummy variables through model.matrix().
fasteraster Raster Images Processing and Vector Recognition
If there is a need to recognise edges on a raster image or a bitmap or any kind of a matrix, one can find packages that does only 90 degrees vectorization. Typically the nature of artefact images is linear and can be vectorized in much more efficient way than draw a series of 90 degrees lines. The fasteraster package does recognition of lines using only one pass.
fasterElasticNet An Amazing Fast Way to Fit Elastic Net
Fit Elastic Net, Lasso, and Ridge regression and do cross-validation in a fast way. We build the algorithm based on Least Angle Regression by Bradley Efron, Trevor Hastie, Iain Johnstone, etc. (2004)(<doi:10.1214/009053604000000067 >) and some algorithms like Givens rotation and Forward/Back Substitution. In this way, many matrices to be computed are retained as triangular matrices which can eventually speed up the computation. The fitting algorithm for Elastic Net is written in C++ using Armadillo linear algebra library.
fasterize Fast Polygon to Raster Conversion
Provides a drop-in replacement for rasterize() from the ‘raster’ package that takes ‘sf’-type objects, and is much faster. There is support for the main options provided by the rasterize() function, including setting the field used and background value, and options for aggregating multi-layer rasters. Uses the scan line algorithm attributed to Wylie et al. (1967) <doi:10.1145/1465611.1465619>.
fastGraph Fast Drawing and Shading of Graphs of Statistical Distributions
Provides functionality to produce graphs of probability density functions and cumulative distribution functions with few keystrokes, allows shading under the curve of the probability density function to illustrate concepts such as p-values and critical values, and fits a simple linear regression line on a scatter plot with the equation as the main title.
fastHorseshoe The Elliptical Slice Sampler for Bayesian Horseshoe Regression
The elliptical slice sampler for Bayesian shrinkage linear regression, such as horseshoe, double-exponential and user specific priors.
FastKM A Fast Multiple-Kernel Method Based on a Low-Rank Approximation
A computationally efficient and statistically rigorous fast Kernel Machine method for multi-kernel analysis. The approach is based on a low-rank approximation to the nuisance effect kernel matrices. The algorithm is applicable to continuous, binary, and survival traits and is implemented using the existing single-kernel analysis software ‘SKAT’ and ‘coxKM’. ‘coxKM’ can be obtained from http://…/software.html.
FastKNN Fast k-Nearest Neighbors
Compute labels for a test set according to the k-Nearest Neighbors classification. This is a fast way to do k-Nearest Neighbors classification because the distance matrix -between the features of the observations- is an input to the function rather than being calculated in the function itself every time.
fastLink Fast Probabilistic Record Linkage with Missing Data
Implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. This includes functionalities to conduct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. In addition, tools for preparing, adjusting, and summarizing data merges are included. The package implements methods described in Enamorado, Fifield, and Imai (2017) ”Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records”, available at <http://…/linkage.html>.
fastLSU Fast Linear Step Up Procedure of Benjamini-Hochberg FDR Method for Huge-Scale Testing Problems
An efficient algorithm to apply the Benjamini-Hochberg Linear Step Up FDR controlling procedure in huge-scale testing problems (proposed in Vered Madar and Sandra Batista(2016) <DOI:10.1093/bioinformatics/btw029>). Unlike ‘BH’ method, the package does not require any p value ordering. Besides, it permits separating p values arbitrarily into computationally feasible chunks of arbitrary size and produces the same results as those from applying linear step up BH procedure to the entire set of tests.
fastmaRching Fast Marching Method for Modelling Evolving Boundaries
Fast Marching Method (FMM) first developed by Sethian (1996) <http://…/1591.short>, and further extended by including a second-order approximation, the first-arrival rule, additive weights, and non-homogeneous domains following Silva and Steele (2012) <doi:10.1142/S0219525911003293> and Silva and Steele (2014) <doi:10.1016/j.jas.2014.04.021>.
fastNaiveBayes Extremely Fast Implementation of a Naive Bayes Classifier
This is an extremely fast implementation of a Naive Bayes classifier. The package currently supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution. The implementation is based on the paper ‘A comparison of event models for Naive Bayes anti-spam e-mail filtering’ written by K.M. Schneider (2003) <doi:10.3115/1067807>. This implementation offers a huge performance gain compared to the ‘e1071’ implementation in R. The execution times were compared on a data set of tweets and was found to be 331 times faster. See the vignette for more details. This performance gain is only realized using a Bernoulli event model. Furthermore, the Multinomial event model is equally fast and available in contrast to ‘e1071’.
fastnet Large-Scale Social Network Analysis
We present an implementation of the algorithms required to simulate large-scale social networks and retrieve their most relevant metrics.
fastpseudo Fast Pseudo Observations
Computes pseudo-observations for survival analysis on right-censored data based on restricted mean survival time.
fastqcr Quality Control of Sequencing Data
FASTQC’ is the most widely used tool for evaluating the quality of high throughput sequencing data. It produces, for each sample, an html report and a compressed file containing the raw data. If you have hundreds of samples, you are not going to open up each ‘HTML’ page. You need some way of looking at these data in aggregate. ‘fastqcr’ Provides helper functions to easily parse, aggregate and analyze ‘FastQC’ reports for large numbers of samples. It provides a convenient solution for building a ‘Multi-QC’ report, as well as, a ‘one-sample’ report with result interpretations.
fastrtext fastText’ Wrapper for Text Classification and Word Representation
fastText’ is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It transforms text into continuous vectors that can later be used on many language related task. It works on standard, generic hardware (no ‘GPU’ required). It also includes model size reduction feature. ‘fastText’ original source code is available at <https://…/fastText>.
fastTextR An Interface to the ‘fastText’ Library
An interface to the ‘fastText’ library <https://…/fastText>. The package can be used for text classification and to learn word vectors. The install folder contains the ‘PATENTS’ file. An example how to use ‘fastTextR’ can be found in the ‘README’ file.
fasttime Fast Utility Function for Time Parsing and Conversion
Fast functions for timestamp manipulation that avoid system calls and take shortcuts to facilitate operations on very large data.
fauxpas HTTP Error Helpers
HTTP error helpers. Methods included for general purpose HTTP error handling, as well as individual methods for every HTTP status code, both via status code numbers as well as their descriptive names. Supports ability to adjust behavior to stop, message or warning. Includes ability to use custom whisker template to have any configuration of status code, short description, and verbose message. Currently supports integration with ‘crul’, ‘curl’, and ‘httr’.
fbRads Analyzing and Managing Facebook Ads from R
Wrapper functions around the Facebook Marketing ‘API’ to create, read, update and delete custom audiences, images, campaigns, ad sets, ads and related content.
fbroc Fast Algorithms to Bootstrap ROC Curves
Implements a very fast C++ algorithm to quickly bootstrap ROC Curves and derived performance metrics (e.g. AUC). You can also plot the results and calculate confidence intervals. Currently the calculation of 100000 bootstrap replicates for 500 observations takes about one second.
fc Standard Evaluation-Based Multivariate Function Composition
Provides a streamlined, standard evaluation-based approach to multivariate function composition. Allows for chaining commands via a forward-pipe operator, %>%.
fChange Change Point Analysis in Functional Data
Change point estimation and detection methods for functional data are implemented using dimension reduction via functional principal component analysis and a fully-functional (norm-based) method. Detecting and dating structural breaks for both dependent and independent functional samples is illustrated along with some basic functional data generating processes.
fcm Inference of Fuzzy Cognitive Maps (FCMs)
Provides a selection of 6 different inference rules and 4 threshold functions in order to obtain the inference of the FCM (Fuzzy Cognitive Map). Moreover, the ‘fcm’ package returns a data frame of the concepts’ values of each state after the inference procedure. Fuzzy cognitive maps were introduced by Kosko (1986) <doi:10.1002/int.4550010405> providing ideal causal cognition tools for modeling and simulating dynamic systems.
FCMapper Fuzzy Cognitive Mapping
Provides several functions to create and manipulate fuzzy cognitive maps. It is based on FCMapper for Excel, distributed at http://…/joomla , developed by Michael Bachhofer and Martin Wildenberg. Maps are inputted as adjacency matrices. Attributes of the maps and the equilibrium values of the concepts (including with user-defined constrained values) can be calculated. The maps can be graphed with a function that calls “igraph”. Multiple maps with shared concepts can be aggregated.
FCNN4R Fast Compressed Neural Networks for R
The FCNN4R package provides an interface to kernel routines from the FCNN C++ library. FCNN is based on a completely new Artificial Neural Network representation that offers unmatched efficiency, modularity, and extensibility. FCNN4R provides standard teaching (backpropagation, Rprop) and pruning algorithms (minimum magnitude, Optimal Brain Surgeon), but it is first and foremost an efficient computational engine. Users can easily implement their algorithms by taking advantage of fast gradient computing routines, as well as network reconstruction functionality (removing weights and redundant neurons).
fcr Functional Concurrent Regression for Sparse Data
Dynamic prediction in functional concurrent regression with an application to child growth. Extends the pffr() function from the ‘refund’ package to handle the scenario where the functional response and concurrently measured functional predictor are irregularly measured. Leroux et al. (2017), Statistics in Medicine, <doi:10.1002/sim.7582>.
fdadensity Functional Data Analysis for Density Functions by Transformation to a Hilbert Space
An implementation of the methodology described in Petersen and Mueller (2016) <doi:10.1214/15-AOS1363> for the functional data analysis of samples of density functions. Densities are first transformed to their corresponding log quantile densities, followed by ordinary Functional Principal Components Analysis (FPCA). Transformation modes of variation yield improved interpretation of the variability in the data as compared to FPCA on the densities themselves. The standard fraction of variance explained (FVE) criterion commonly used for functional data is adapted to the transformation setting, also allowing for an alternative quantification of variability for density data through the Wasserstein metric of optimal transport.
fdANOVA Analysis of Variance for Univariate and Multivariate Functional Data
Performs analysis of variance testing procedures for univariate and multivariate functional data (Cuesta-Albertos and Febrero-Bande (2010) <doi:10.1007/s11749-010-0185-3>, Gorecki and Smaga (2015) <doi:10.1007/s00180-015-0555-0>, Gorecki and Smaga (2017) <doi:10.1080/02664763.2016.1247791>, Zhang et al. (2016) <arXiv:1309.7376v1>).
fdapace Functional Data Analysis and Empirical Dynamics
Provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. PACE is useful for the analysis of data that have been generated by a sample of underlying (but usually not fully observed) random trajectories. It does not rely on pre-smoothing of trajectories, which is problematic if functional data are sparsely sampled. PACE provides options for functional regression and correlation, for Longitudinal Data Analysis, the analysis of stochastic processes from samples of realized trajectories, and for the analysis of underlying dynamics. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ ‘glue’.
fdaPDE Regression with Partial Differential Regularizations, using the Finite Element Method
An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization.
FDboost Boosting Functional Regression Models
Regression models for functional data, i.e. scalar-on-function, function-on-scalar and function-on-function regression models are fitted using a component-wise gradient boosting algorithm.
fdcov Analysis of Covariance Operators
Provides a variety of tools for the analysis of covariance operators.
fDMA Dynamic Model Averaging and Dynamic Model Selection for Continuous Outcomes
It allows to estimate Dynamic Model Averaging, Dynamic Model Selection and Median Probability Model. The original methods (see References) are implemented, as well as, selected further modifications of these methods. In particular the User might choose between recursive moment estimation and exponentially moving average for variance updating. Inclusion probabilities might be modified in a way using Google Trends. The code is written in a way which minimises the computational burden (which is quite an obstacle for Dynamic Model Averaging if many variables are used). For example, this package allows for parallel computations under Windows machines. Additionally, the User might reduce a set of models according to a certain algorithm. The package is designed in a way that is hoped to be especially useful in economics and finance. (Research funded by the Polish National Science Centre grant under the contract number DEC-2015/19/N/HS4/00205.)
FDRsampsize Compute Sample Size that Meets Requirements for Average Power and FDR
Defines a collection of functions to compute average power and sample size for studies that use the false discovery rate as the final measure of statistical significance.
FDRSeg FDR-Control in Multiscale Change-Point Segmentation
Estimate step functions via multiscale inference with controlled false discovery rate (FDR). For details see H. Li, A. Munk and H. Sieling (2016) <doi:10.1214/16-EJS1131>.
FeaLect Scores Features for Feature Selection
For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models. Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
FeatureHashing Implement Feature Hashing on Model Matrix
Feature hashing, also called as the hashing trick, is a method to transform features to vector. Without looking the indices up in an associative array, it applies a hash function to the features and uses their hash values as indices directly. This package implements the method of feature hashing proposed in Weinberger et. al. (2009) with Murmurhash3 and provides a formula interface in R. See the README.md for more information.
featurizer Some Helper Functions that Help Create Features from Data
A collection of functions that would help one to build features based on external data. Very useful for Data Scientists in data to day work. Many functions create features using parallel computation. Since the nitty gritty of parallel computation is hidden under the hood, the user need not worry about creating clusters and shutting them down.
FedData Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources
Functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package allows for retrieval of four datasets: The National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS); The National Hydrography Dataset (USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; and the Global Historical Climatology Network (GHCN), coordinated by National Climatic Data Center at NOAA. Additional data sources are in the works, including global DEM resources (ETOPO1, ETOPO5, ETOPO30, SRTM), global soils (HWSD), tree-ring records (ITRDB), MODIS satellite data products, the National Atlas (US), Natural Earth, PRISM, and WorldClim.
feedeR Read RSS/Atom Feeds from R
Retrieve data from RSS/Atom feeds.
feisr Estimating Fixed Effects Individual Slope Models
Provides the function feis() to estimate fixed effects individual slope (FEIS) models. The FEIS model constitutes a more general version of the often-used fixed effects (FE) panel model, as implemented in the package ‘plm’ by Croissant and Millo (2008) <doi:10.18637/jss.v027.i02>. In FEIS models, data are not only person ‘demeaned’ like in conventional FE models, but ‘detrended’ by the predicted individual slope of each person or group. Estimation is performed by applying least squares lm() to the transformed data. For more details on FEIS models see Bruederl and Ludwig (2015, ISBN:1446252442); Frees (2001) <doi:10.2307/3316008>; Polachek and Kim (1994) <doi:10.1016/0304-4076(94)90075-2>; Wooldridge (2010, ISBN:0262294354). To test consistency of conventional FE and random effects estimators against heterogeneous slopes, the package also provides the functions feistest() for an artificial regression test and bsfeistest() for a bootstrapped version of the Hausman test.
fence Using Fence Methods for Model Selection
This method is a new class of model selection strategies, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. References: 1. Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692. <DOI:10.1214/07-AOS517> <https://…/1216237296>. 2. Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625-629. <DOI:10.1016/j.spl.2008.10.014> <https://…A_simplified_adaptive_fence_procedure> 3. Jiang J., Nguyen T., Rao J.S. (2010), Fence Method for Nonparametric Small Area Estimation. Survey Methodology, 36(1), 3-11. <http://…/12-001-x2010001-eng.pdf>. 4. Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403-415. <http://…/SII-2011-0004-0003-a014.pdf>. 5. Thuan Nguyen & Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303-314. <DOI:10.1093/biostatistics/kxr046> <https://…ce-method-for-covariate-selection-in>. 6. Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644-662. <DOI:10.1080/00949655.2012.721885> <https://…/>. 7. Jiang, J. (2014), The fence methods, in Advances in Statistics, Hindawi Publishing Corp., Cairo. <DOI:10.1155/2014/830821>. 8. Jiming Jiang and Thuan Nguyen (2015), The Fence Methods, World Scientific, Singapore. <https://…/plp>.
FENmlm Fixed Effects Nonlinear Maximum Likelihood Models
Efficient estimation of fixed-effect maximum likelihood models with, possibly, non-linear right hand sides.
fergm Estimation and Fit Assessment of Frailty Exponential Random Graph Models
Frailty Exponential Random Graph Models estimated through pseudo likelihood with frailty terms estimated using ‘Stan’ as per Box-Steffensmeier et. al (2017) <doi:10.7910/DVN/K3D1M2>. Goodness of fit for Frailty Exponential Random Graph Models is also available, with easy visualizations for comparison to fit Exponential Random Graph Models.
ffstream Forgetting Factor Methods for Change Detection in Streaming Data
An implementation of the adaptive forgetting factor scheme described in Bodenham and Adams (2016) <doi:10.1007/s11222-016-9684-8> which adaptively estimates the mean and variance of a stream in order to detect multiple changepoints in streaming data. The implementation is in C++ and uses Rcpp. Additionally, implementations of the fixed forgetting factor scheme from the same paper, as well as the classic CUSUM and EWMA methods, are included.
FFTrees Generate, Visualise, and Compare Fast and Frugal Decision Trees (FFTs)
Fast and Frugal Trees (FFTs) are very simply decision trees for classifying cases (i.e.; breast cancer patients) into one of two classes (e.g.; no cancer vs. true cancer). FFTs can be preferable to more complex algorithms (such as logistic regression) because they are easy to communicate and implement, and are robust against noisy data. This package contains several functions that allow users to input their own data, set model criteria and visualize the best tree(s) for their data.
fgeo Analyze Forest Diversity and Dynamics
To help you access, transform, analyze, and visualize ForestGEO data, we developed a collection of R packages (<https://…/> ). This package, in particular, helps you to install and load the entire package-collection with a single R command, and provides convenient ways to find relevant documentation. Most commonly, you should not worry about the individual packages that make up the package-collection as you can access all features via this package. To learn more about ForestGEO visit <http://…/>.
fgeo.plot Plot ForestGEO Data
To help you access, transform, analyze, and visualize ForestGEO data, we developed a collection of R packages (<https://…/> ). This package, in particular, helps you to plot ForestGEO data. To learn more about ForestGEO visit <http://…/>.
Fgmutils Forest Growth Model Utilities
Growth models and forest production require existing data manipulation and the creation of new data, structured from basic forest inventory data. The purpose of this package is provide functions to support these activities.
FHDI Fractional Hot Deck and Fully Efficient Fractional Imputation
Impute general multivariate missing data with the fractional hot deck imputation.
fheatmap Draw Heatmaps with Colored Dendogram
R function to plot high quality, elegant heatmap using ‘ggplot2’ graphics . Some of the important features of this package are, coloring of row/column side tree with respect to the number of user defined cuts in the cluster, add annotations to both columns and rows, option to input annotation palette for tree and column annotations and multiple parameters to modify aesthetics (style, color, font) of texts in the plot.
FIAR Functional Integration Analysis in R
Contains Dynamic Causal Models (DCM), Autoregressive Structural Equation Models (ARSEM), and multivariate partial and conditional Granger causality tests for analysing fMRI connectivity data.
fic Focused Information Criteria for Model Comparison
Compares how well different models estimate a quantity of interest (the ‘focus’) so that different models may be preferred for different purposes. Comparisons within any class of models fitted by maximum likelihood are supported, with shortcuts for commonly-used classes such as generalised linear models and parametric survival models. The methods originate from Claeskens and Hjort (2003) <doi:10.1198/016214503000000819> and Claeskens and Hjort (2008, ISBN:9780521852258).
fieldRS Remote Sensing Field Work Tools
In remote sensing, designing a field campaign to collect ground-truth data can be a challenging task. We need to collect representative samples while accounting for issues such as budget constraints and limited accessibility created by e.g. poor infrastructure. As suggested by Olofsson et al. (2014) <doi:10.1016/j.rse.2014.02.015>, this demands the establishment of best-practices to collect ground-truth data that avoid the waste of time and funds. ‘fieldRS’ addresses this issue by helping scientists and practitioners design field campaigns through the identification of priority sampling sites, the extraction of potential sampling plots and the conversion of plots into consistent training and validation samples that can be used in e.g. land cover classification.
fiery A Lightweight and Flexible Web Framework
A very flexible framework for building server side logic in R. The framework is unoppinionated when it comes to how HTTP requests and WebSocket messages are handled and supports all levels of app complexity; from serving static content to full-blown dynamic web-apps. Fiery does not hold your hand as much as e.g. the shiny package does, but instead sets you free to create your web app the way you want.
filelock Portable File Locking
Place an exclusive or shared lock on a file. It uses ‘LockFile’ on Windows and ‘fcntl’ locks on Unix-like systems.
filematrix File-Backed Matrix Class with Convenient Read and Write Access
Interface for working with large matrices stored in files, not in computer memory. Supports multiple data types (double, integer, logical and raw) of different sizes (e.g. 4, 2, or 1 byte integers). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. Supports very large matrices (tested on 1 terabyte matrix), allowing for more than 2^32 rows or columns. Cross-platform as the package has R code only, no C/C++.
filenamer Easy Management of File Names
Create descriptive file names with ease. New file names are automatically (but optionally) time stamped and placed in date stamped directories. Streamline your analysis pipeline with input and output file names that have informative tags and proper file extensions.
fileplyr Chunk Processing or Split-Apply-Combine on Delimited Files(CSV Etc)
Perform chunk processing or split-apply-combine on data in a delimited file(example: CSV) across multiple cores of a single machine with low memory footprint. These functions are a convenient wrapper over the versatile package ‘datadr’.
filesstrings Handy String and File Manipulation
Handy string and file processing and manipulation tools. Built on top of the functionality of base and ‘stringr’. Good for those who like to do all of their file and string manipulation from within R.
FILEST Fine-Level Structure Simulator
A population genetic simulator, which is able to generate synthetic datasets for single-nucleotide polymorphisms (SNP) for multiple populations. The genetic distances among populations can be set according to the Fixation Index (Fst) as explained in Balding and Nichols (1995) <doi:10.1007/BF01441146>. This tool is able to simulate outlying individuals and missing SNPs can be specified. For Genome-wide association study (GWAS), disease status can be set in desired level according risk ratio.
filling Matrix Completion, Imputation, and Inpainting Methods
Filling in the missing entries of a partially observed data is one of fundamental problems in various disciplines of mathematical science. For many cases, data at our interests have canonical form of matrix in that the problem is posed upon a matrix with missing values to fill in the entries under preset assumptions and models. We provide a collection of methods from multiple disciplines under Matrix Completion, Imputation, and Inpainting. See Davenport and Romberg (2016) <doi:10.1109/JSTSP.2016.2539100> for an overview of the topic.
finalfit Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and ‘Word’ using ‘RMarkdown’.
FinAna Financial Analysis and Regression Diagnostic Analysis
Functions for regression analysis and financial modeling, including batch graphs generation, beta calculation, descriptive statistics.
findR Find R Scripts, R Markdown, PDF and Text Files by Content with Pattern Matching
Scans all directories and subdirectories of a path for R scripts, R Markdown, PDF or text files containing a specific pattern. Hits can be copied to a new folder.
findviews A View Generator for Multidimensional Data
A tool to explore wide data sets, by detecting, ranking and plotting groups of statistically dependent columns.
finreportr Financial Data from U.S. Securities and Exchange Commission
Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://…/companysearch.html> for more information.
FinTS Companion to Tsay (2005) Analysis of Financial Time Series
R companion to Tsay (2005) Analysis of Financial Time Series, second edition (Wiley). Includes data sets, functions and script files required to work some of the examples. Version 0.3-x includes R objects for all data files used in the text and script files to recreate most of the analyses in chapters 1-3 and 9 plus parts of chapters 4 and 11.
FiRE Finder of Rare Entities (FiRE)
The algorithm assigns rareness/ outlierness score to every sample in voluminous datasets. The algorithm makes multiple estimations of the proximity between a pair of samples, in low-dimensional spaces. To compute proximity, FiRE uses Sketching, a variant of locality sensitive hashing. For more details: Jindal, A., Gupta, P., Jayadeva and Sengupta, D., 2018. Discovery of rare cells from voluminous single cell expression data. Nature Communications, 9(1), p.4719. <doi:10.1038/s41467-018-07234-6>.
FisherEM The FisherEM Algorithm to Simultaneously Cluster and Visualize High-Dimensional Data
FisherEM is an efficient algorithm for the unsupervised classification of high-dimensional data. FisherEM models and clusters the data in a discriminative and low-dimensional latent subspace. It also provides a low-dimensional representation of the clustered data. A sparse version of Fisher-EM algorithm is also provided.
fitODBOD Modeling Over Dispersed Binomial Outcome Data Using BMD and ABD
Contains probability mass functions, cumulative mass functions, negative log likelihood value, parameter estimation and modeling data using Binomial Mixture distributions (BMD) (Manoj et al (2013) <doi:10.5539/ijsp.v2n2p24>) and Alternate Binomial distributions (ABD).
fitteR Fit Hundreds of Theoretical Distributions to Empirical Data
Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a ‘shiny’ app (here with additional features like visual representation of fits). All output formats provide assessment of goodness-of-fit by the following methods: Kolmogorov-Smirnov test, Shapiro-Wilks test, Anderson-Darling test.
fitur Fit Univariate Distributions
Wrapper for computing parameters and then assigning to distribution function families.
FixedPoint Algorithms for Finding Fixed Point Vectors of Functions
For functions that take and return vectors (or scalars), this package provides 8 algorithms for finding fixed point vectors (vectors for which the inputs and outputs to the function are the same vector). These algorithms include Anderson (1965) acceleration <doi:10.1145/321296.321305>, epsilon extrapolation methods (Wynn 1962 <doi:10.2307/2004051>) and minimal polynomial methods (Cabay and Jackson 1976 <doi:10.1137/0713060>).
fixedTimeEvents The Distribution of Distances Between Discrete Events in Fixed Time
Distribution functions and test for over-representation of short distances in the Liland distribution. Simulation functions are included for comparison.
FixSeqMTP Fixed Sequence Multiple Testing Procedures
Generalized Fixed Sequence Multiple Testing Procedures (g-FSMTPs) are used to test a sequence of pre- ordered hypotheses. The proposed three Family-wise Error Rate (FWER) controlling g-FSMTPs utilize numbers of rejections and acceptances, all the procedures are designed under arbitrary dependence. The proposed two False Discovery Rate (FDR) controlling g-FSMTPs allows more but a given number of acceptances (k>=1), the procedures are designed for arbitrary dependence and independence. The main functions for each proposed g-FSMTPs are designed to calculate adjusted p-values and critical values, respectively. For users’ convenience, the output results also include the option of decision rules for convenience.
flacco Feature-Based Landscape Analysis of Continuous and Constraint Optimization Problems
Contains tools and features, which can be used for an exploratory landscape analysis of continuous optimization problems. Those are able to quantify rather complex properties, such as the global structure, separability, etc., of continuous optimization problems.
flagr Implementation of Flag Aggregation
Three methods are implemented in R to facilitate the aggregations of flags in official statistics. From the underlying flags the highest in the hierarchy, the most frequent, or with the highest total weight is propagated to the flag(s) for EU or other aggregates. Below there are some reference documents for the topic: <https://…/CL_OBS_STATUS_v2_1.docx>, <https://…/CL_CONF_STATUS_1_2_2018.docx>, <http://…/information>, <http://…/33869551.pdf>, <https://…STATUS_implementation_20-10-2014.pdf>.
flare Family of Lasso Regression
The package ‘flare’ provides the implementation of a family of Lasso variants including Dantzig Selector, LAD Lasso, SQRT Lasso, Lq Lasso for estimating high dimensional sparse linear model. We adopt the alternating direction method of multipliers and convert the original optimization problem into a sequential L1 penalized least square minimization problem, which can be efficiently solved by linearization algorithm. A multi-stage screening approach is adopted for further acceleration. Besides the sparse linear model estimation, we also provide the extension of these Lasso variants to sparse Gaussian graphical model estimation including TIGER and CLIME using either L1 or adaptive penalty. Missing values can be tolerated for Dantzig selector and CLIME. The computation is memory-optimized using the sparse matrix output.
flars Functional LARS
Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors.
flatr Transforms Contingency Tables to Data Frames, and Analyses Them
Contingency Tables are a pain to work with when you want to run regressions. This package takes them, flattens them into a long data frame, so you can more easily analyse them! As well, you can calculate other related statistics. All of this is done so in a ‘tidy’ manner, so it should tie in nicely with ‘tidyverse’ series of packages.
flatxml Tools for Working with XML Files as R Dataframes
On import, the XML information is converted to a dataframe that reflects the hierarchical XML structure. Intuitive functions allow to navigate within this transparent XML data structure (without any knowledge of ‘XPath’). ‘flatXML’ also provides tools to extract data from the XML into a flat dataframe that can be used to perform statistical operations.
FlexDir Tools to Work with the Flexible Dirichlet Distribution
Provides tools to work with the Flexible Dirichlet distribution. The main features are an E-M algorithm for computing the maximum likelihood estimate of the parameter vector and a function based on conditional bootstrap to estimate its asymptotic variance-covariance matrix. It contains also functions to plot graphs, to generate random observations and to handle compositional data.
FlexGAM Generalized Additive Models with Flexible Response Functions
Standard generalized additive models assume a response function, which induces an assumption on the shape of the distribution of the response. However, miss-specifying the response function results in biased estimates. Therefore in Spiegel et al. (2017) <doi:10.1007/s11222-017-9799-6> we propose to estimate the response function jointly with the covariate effects. This package provides the underlying functions to estimate these generalized additive models with flexible response functions. The estimation is based on an iterative algorithm. In the outer loop the response function is estimated, while in the inner loop the covariate effects are determined. For the response function a strictly monotone P-spline is used while the covariate effects are estimated based on a modified Fisher-Scoring algorithm. Overall the estimation relies on the ‘mgcv’-package.
flexmet Flexible Latent Trait Metrics using the Filtered Monotonic Polynomial Item Response Model
Application of the filtered monotonic polynomial (FMP) item response model to flexibly fit item response models. The package includes tools that allow the item response model to be build on any monotonic transformation of the latent trait metric, as described by Feuerstahler (2016) <http://…/182267>.
FlexParamCurve Tools to Fit Flexible Parametric Curves
Model selection tools and ‘selfStart’ functions to fit parametric curves in ‘nls’, ‘nlsList’ and ‘nlme’ frameworks.
flexPM Flexible Parametric Models for Censored and Truncated Data
Estimation of flexible parametric models for survival data.
flexrsurv Flexible Relative Survival
Perform relative survival analyses using approaches described in Remontet et al. (2007) <DOI:10.1002/sim.2656> and Mahboubi et al. (2011) <DOI:10.1002/sim.4208>. It implements non-linear, non-proportional effects and both non proportional and non linear effects using splines (B-spline and truncated power basis).
flexsurvcure Flexible Parametric Cure Models
Flexible parametric mixture and non-mixture cure models for time-to-event data.
flextable Tabular Reporting API
Create pretty tables for ‘Microsoft Word’, ‘Microsoft PowerPoint’ and ‘HTML’ documents. Functions are provided to let users create tables, modify and format their content. It extends package ‘officer’ that does not contain any feature for customized tabular reporting. Function ‘tabwid’ produces an ‘htmlwidget’ ready to be used in ‘Shiny’ or ‘R Markdown (*.Rmd)’ documents. See the ‘flextable’ website for more information.
flifo Don’t Get Stuck with Stacks in R
Functions to create and manipulate FIFO (First In First Out), LIFO (Last In First Out), and NINO (Not In or Never Out) stacks in R.
FLIM Farewell’s Linear Increments Model
FLIM fits linear models for the observed increments in a longitudinal dataset, and imputes missing values according to the models.
flipscores Robust Testing in GLMs
Provides two robust tests for testing in GLMs, by sign-flipping score contributions. The tests are often robust against overdispersion, heteroscedasticity and, in some cases, ignored nuisance variables. See Hemerik and Goeman (2017) <doi:10.1007/s11749-017-0571-1>.
float 32-Bit Floats
R comes with a suite of utilities for linear algebra with ‘numeric’ (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R’s linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their ‘numeric’-type but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R’s. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision ‘BLAS’ and ‘LAPACK’, which are automatically built in the event they are not available on the system.
flock Process Synchronization Using File Locks
Implements synchronization between R processes (spawned by using the ‘parallel’ package for instance) using file locks. Supports both exclusive and shared locking.
flowr Streamlining Design and Deployment of Complex Workflows
An interface to streamline design of complex workflows and their deployment to a High Performance Computing Cluster.
flows Flow Selection and Analysis
Selections on flow matrices, statistics on selected flows, map and graph visualisations.
FLSSS Mining Rigs for Specialized Subset Sum, Multi-Subset Sum, Multidimensional Subset Sum, Multidimensional Knapsack, Generalized Assignment Problems
Specialized solvers for combinatorial optimization problems in the Subset Sum family. These solvers differ from the mainstream in the options of (i) subset size restriction, (ii) bounds on the subset elements, (iii) mining real-value sets with predefined subset sum errors, and (iv) finding one or more subsets in limited time. A novel algorithm for mining the one-dimensional Subset Sum induced algorithms for the multi-Subset Sum and the multidimensional Subset Sum. The latter is creatively scheduled in a multi-threaded environment, and the framework offers strong applications to the multidimensional Knapsack and the Generalized Assignment problems. Package updates include (a) renewed implementation of the multi-Subset Sum, multidimensional Knapsack and Generalized Assignment solvers; (b) availability of bounding solution space in the multidimensional Subset Sum; (c) fundamental data structure and architectural changes for enhanced cache locality and better chance of SIMD vectorization; (d) an option of mapping real-domain problems to the integer domain with controlled precision loss, and those integers are further zipped non-uniformly in 64-bit buffers. Arithmetic on compressed integers has a novel design with virtually zero speed lag relative to that on normal integers, and the consequent reduction in dimensionality often leads to substantial acceleration. Compilation with aggressive optimization, e.g. g++ ‘-Ofast’, may speed up mining on some platforms. Package documentation (<arXiv:1612.04484v2>) is outdated as the time of writing.
flyio Read or Write any Format from Anywhere
Perform input, output of files in R from data sources like Google Cloud Storage (‘GCS’) <https://…/>, Amazon Web Services (‘AWS S3’) <https://…/s3> or local drive.
fmbasics Financial Market Building Blocks
Implements basic financial market objects like currencies, currency pairs, interest rates and interest rate indices. You will be able to use Benchmark instances of these objects which have been defined using their most common conventions or those defined by International Swap Dealer Association (ISDA, <http://www2.isda.org> ) legal documentation.
FMC Factorial Experiments with Minimum Level Changes
Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs.
fmlogcondens Fast Multivariate Log-Concave Density Estimation
A fast solver for the maximum likelihood estimator (MLE) of a multivariate log-concave probability function. Given a sample X, it estimates a non-parametric density function whose logarithm is a concave function. Many well-known parametric densities belong to that class, among them the normal density, the uniform density, the exponential distribution and many more. This package provides functions for the estimation of a log-concave density and a mixture of log-concave densities in multiple dimensions. While being similar to the package LogConcDEAD, fmlogcondens provides much fast run times for large samples (>= 250 points). As a reference see Fabian Rathke, Christoph Schnörr (2015), <doi:10.1515/auom-2015-0053>.
fmriqa Functional MRI Quality Assurance Routines
Methods for performing fMRI quality assurance (QA) measurements of test objects. Heavily based on the fBIRN procedures detailed by Friedman and Glover (2006) <doi:10.1002/jmri.20583>.
fmrs Variable Selection in Finite Mixture of AFT Regression and FMR
Provides parameter estimation as well as variable selection in Finite Mixture of Accelerated Failure Time Regression Models and Finite Mixture of Regression models. It also provides the Ridge regression and Elastic Net.
FMsmsnReg Regression Models with Finite Mixtures of Skew Heavy-Tailed Errors
Fit linear regression models where the random errors follow a finite mixture of of Skew Heavy-Tailed Errors.
foghorn Summarizes CRAN Check Results in the Terminal
The CRAN check results in your R terminal.
foieGras Fit Continuous-Time State-Space Models for Filtering Argos Satellite (and Other) Telemetry Data
Fits continuous-time random walk and correlated random walk state-space models to filter Argos satellite location data. Template Model Builder (‘TMB’) is used for fast estimation. The Argos data can be: (older) least squares-based locations; (newer) Kalman filter-based locations with error ellipse information; or a mixture of both. Separate measurement models are used for these two data types. The models estimate two sets of location states corresponding to: 1) each observation, which are (usually) irregularly timed; and 2) user-specified time intervals (regular or irregular). Jonsen I, McMahon CR, Patterson TA, Auger-Methe M, Harcourt R, Hindell MA, Bestley S (2019) Movement responses to environment: fast inference of variation among southern elephant seals with a mixed effects model. Ecology 100:e02566 <doi:10.1002/ecy.2566>.
fold A Self-Describing Dataset Format and Interface
Defines a compact data format that includes metadata. The function fold() creates the format by converting from data.frame, and unfold() converts back. The predictability of the folded format supports reusability of data processing tools, while the presence of embedded metadata improves portability, interpretability, and efficiency.
fontquiver Set of Installed Fonts
Provides a set of fonts with permissive licences. This is useful when you want to avoid system fonts to make sure your outputs are reproducible.
foolbox Function manipulation toolbox
Provides functionality for manipulating functions and translating them in metaprogramming.
forcats Tools for Working with Categorical Variables (Factors)
Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, anonymising, and manually recoding).
foreach Foreach looping construct for R
Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
ForecastComb Forecast Combination Methods
Provides geometric- and regression-based forecast combination methods under a unified user interface for the packages ‘ForecastCombinations’ and ‘GeomComb’. Additionally, updated tools and convenience functions for data pre-processing are available in order to deal with common problems in forecast combination (missingness, collinearity). For method details see Hsiao C, Wan SK (2014). <doi:10.1016/j.jeconom.2013.11.003>, Hansen BE (2007). <doi:10.1111/j.1468-0262.2007.00785.x>, Elliott G, Gargano A, Timmermann A (2013). <doi:10.1016/j.jeconom.2013.04.017>, and Clemen RT (1989). <doi:10.1016/0169-2070(89)90012-5>.
ForecastCombinations Forecast Combinations
Aim: Supports the most frequently used methods to combine forecasts. Among others: Simple average, Ordinary Least Squares, Least Absolute Deviation, Constrained Least Squares, Variance-based, Best Individual model, Complete subset regressions and Information-theoretic (information criteria based).
forecastHybrid Convenient Functions for Ensemble Time Series Forecasts
Convenient functions for ensemble forecasts in R combining approaches from the ‘forecast’ package. Forecasts generated from auto.arima(), ets(), nnetar(), stlm(), and tbats() can be combined with equal weights or weights based on in-sample errors. Future methods such as cross validation are planned.
forecastSNSTS Forecasting for Stationary and Non-Stationary Time Series
Methods to compute linear h-step prediction coefficients based on localised and iterated Yule-Walker estimates and empirical mean square prediction errors for the resulting predictors.
forecTheta Forecasting Time Series by Theta Method
Routines for forecasting univariate time series using Theta Method and Optimised Theta Method (Fioruci et al, 2015). Contains two cross-validation routines of Tashman (2000).
forega Floating-Point Genetic Algorithms with Statistical Forecast Based Inheritance Operator
The implemented algorithm performs a floating-point genetic algorithm search with a statistical forecasting operator that generates offspring which probably will be generated in future generations. Use of this operator enhances the search capabilities of floating-point genetic algorithms because offspring generated by usual genetic operators rapidly forecasted before performing more generations.
forestControl Approximate False Positive Rate Control in Selection Frequency for Random Forest
Approximate false positive rate control in selection frequency for random forest using the methods described by Ender Konukoglu and Melanie Ganz (2015) <arXiv:1410.2838>. Methods for calculating the selection frequency threshold at false positive rates and selection frequency false positive rate feature selection.
forestFloor Visualizes Random Forests with Feature Contributions
Enables user to form appropriate visualization of high dimensional mapping curvature of random forests.
ForestGapR Tropical Forest Gaps Analysis
Set of tools for detecting and analyzing Airborne Laser Scanning-derived Tropical Forest Canopy Gaps.
forestinventory Design-Based Global and Small-Area Estimations for Multiphase Forest Inventories
Extensive global and small-area estimation procedures for multiphase forest inventories under the design-based Monte-Carlo approach are provided. The implementation includes estimators for simple and cluster sampling published by Daniel Mandallaz in 2007 (<DOI:10.1201/9781584889779>), 2013 (<DOI:10.1139/cjfr-2012-0381>, <DOI:10.1139/cjfr-2013-0181>, <DOI:10.1139/cjfr-2013-0449>, <DOI:10.3929/ethz-a-009990020>) and 2016 (<DOI:10.3929/ethz-a-010579388>). It provides point estimates, their external- and design-based variances as well as confidence intervals. The procedures have also been optimized for the use of remote sensing data as auxiliary information.
forestmodel Forest Plots from Regression Models
Produces forest plots using ‘ggplot2’ from models produced by functions such as stats::lm(), stats::glm() and survival::coxph().
forestplot Advanced Forest Plot Using ‘grid’ Graphics
The plot allows for multiple confidence intervals per row, custom fonts for each text element, custom confidence intervals, text mixed with expressions, and more. The aim is to extend the use of forest plots beyond meta-analyses. This is a more general version of the original ‘rmeta’ package’s forestplot function and relies heavily on the ‘grid’ package.
forestSAS Forest Spatial Structure Analysis Systems
In recent years, there has been considerable interest in a group of neighborhood-based structural parameters that properly express the spatial structure characteristics of tree populations and forest communities and have strong operability for guiding forestry practices.the ‘forestSAS’ package provide more important information and allow us to better understand and analyze the fine-scale spatial structure of tree populations and stand structure.
ForestTools Analysing Remotely Sensed Forest Data
Forest Tools provides functions for analyzing remotely sensed forest data.
foretell Projecting Customer Retention Based on Fader and Hardie Probability Models
Project Customer Retention based on Beta Geometric, Beta Discrete Weibull and Latent Class Discrete Weibull Models This package is based on Fader and Hardie (2007) <doi:10.1002/dir.20074> and Fader and Hardie et al. (2018) <doi:10.1016/j.intmar.2018.01.002> .
formattable Formattable Data Structures
Provides functions to create formattable vectors and data frames. Formattable vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in markdown to improve the readability of data presented in tabular form rendered as web pages.
formulize Add Formula Interfaces to Modelling Functions
Automatically generates wrappers for modelling functions that accept data as a data matrix X and a data vector y and produces a wrapper that allows users to specify input data with a formula and a data frame. In addition to generating formula interfaces, users may also generated wrapper S3 generics.
forward Forward search
Forward search approach to robust analysis in linear and generalized linear regression models.
forwards Data from Surveys Conducted by Forwards
Anonymized data from surveys conducted by Forwards <http://…/>, the R Foundation task force on women and other under-represented groups. Currently, a single data set of responses to a survey of attendees at useR! 2016 <http://…/>, the R user conference held at Stanford University, Stanford, California, USA, June 27 – June 30 2016.
ForwardSearch Forward Search using asymptotic theory
Forward Search analysis of time series regressions. Implements the asymptotic theory developed in Johansen and Nielsen (2013, 2014).
foto Fourier Transform Textural Ordination
The Fourier Transform Textural Ordination method uses a principal component analysis on radially averaged two dimensional Fourier spectra to characterize image texture.
fourierin Computes Numeric Fourier Integrals
Computes Fourier integrals of functions of one and two variables using the Fast Fourier transform. The Fourier transforms must be evaluated on a regular grid.
fourPNO Bayesian 4 Parameter Item Response Model
Estimate Lord & Barton’s four parameter IRT model with lower and upper asymptotes using Bayesian formulation described by Culpepper (2015).
fpa Spatio-Temporal Fixation Pattern Analysis
Spatio-temporal Fixation Pattern Analysis (FPA) is a new method of analyzing eye movement data, developed by Mr. Jinlu Cao under the supervision of Prof. Chen Hsuan-Chih at The Chinese University of Hong Kong, and Prof. Wang Suiping at the South China Normal Univeristy. The package ‘fpa’ is a R implementation which makes FPA analysis much easier. There are four major functions in the package: ft2fp(), get_pattern(), plot_pattern(), and lineplot(). The function ft2fp() is the core function, which can complete all the preprocessing within moments. The other three functions are supportive functions which visualize the eye fixation patterns.
FPCA2D Two Dimensional Functional Principal Component Analysis
Compute the two dimension functional principal component scores for a series of two dimension images.
fpCompare Reliable Comparison of Floating Point Numbers
Comparisons of floating point numbers are problematic due to errors associated with the binary representation of decimal numbers. Despite being aware of these problems, people still use numerical methods that fail to account for these and other rounding errors (this pitfall is the first to be highlighted in Circle 1 of Burns (2012, http://…/R_inferno.pdf ). This package provides four new relational operators useful for performing floating point number comparisons with a set tolerance.
FPDclustering PD-Clustering and Factor PD-Clustering
Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PD-clustering is a flexible method that can be used with non-spherical clusters, outliers, or noisy data. Facto PD-clustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It allows clustering of high dimensional data sets.
fpeek Check Text Files Content at a Glance
Tools to help text files importation. It can return the number of lines; print the first and last lines; convert encoding. Operations are made without reading the entire file before starting, resulting in good performances with large files. This package provides an alternative to a simple use of the ‘head’, ‘tail’, ‘wc’ and ‘iconv’ programs that are not always available on machine where R is installed.
fpest Estimating Finite Population Total
Given the values of sampled units and selection probabilities the desraj function in the package computes the estimated value of the total as well as estimated variance.
fpmoutliers Frequent Pattern Mining Outliers
Algorithms for detection of outliers based on frequent pattern mining. Such algorithms follow the paradigm: if an instance contains more frequent patterns, it means that this data instance is unlikely to be an anomaly (He Zengyou, Xu Xiaofei, Huang Zhexue Joshua, Deng Shengchun (2005) <doi:10.2298/CSIS0501103H>). The package implements a list of existing state of the art algorithms as well as other published approaches: FPI, WFPI, FPOF, FPCOF, LFPOF, MFPOF, WCFPOF and WFPOF.
fractional Vulgar Fractions in R
The main function of this package allows numerical vector objects to be displayed with their values in vulgar fractional form. This is convenient if patterns can then be more easily detected. In some cases replacing the components of a numeric vector by a rational approximation can also be expected to remove some component of round-off error. The main functions form a re-implementation of the functions ‘fractions’ and ‘rational’ of the MASS package, but using a radically improved programming strategy.
fragilityindex Fragility Index
Implements the fragility index calculation for dichotomous results as described in Walsh, Srinathan, McAuley. Mrkobrada, Levine, Ribic, Molnar, Dattani, Burke, Guyatt, Thabane, Walter, Pogue and Devereaux PJ (2014) <DOI:10.1016/j.jclinepi.2013.10.019>.
frailtyEM Fitting Frailty Models with the EM Algorithm
Contains functions for fitting shared frailty models with a semi-parametric baseline hazard with the Expectation-Maximization algorithm. Supported data formats include clustered failures with left truncation and recurrent events in gap-time or Andersen-Gill format. Several frailty distributions, such as the the gamma, positive stable and the Power Variance Family are supported.
frailtyHL Frailty Models via Hierarchical Likelihood
Implements the h-likelihood estimation procedures for general frailty models including competing-risk models and joint models.
frailtySurv General Semiparametric Shared Frailty Model
Simulates and fits semiparametric shared frailty models under a wide range of frailty distributions using a consistent and asymptotically-normal estimator. Currently supports: gamma, power variance function, log-normal, and inverse Gaussian frailty models.
franc Detect the Language of Text
With no external dependencies and support for 335 languages; all languages spoken by more than one million speakers. ‘Franc’ is a port of the ‘JavaScript’ project of the same name, see <https://…/franc>.
frbs Fuzzy Rule-Based Systems for Classification and Regression Tasks
An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.
frbs: Fuzzy Rule-Based Systems for Classification and Regression in R
freegroup The Free Group
Provides functionality for manipulating elements of the free group (juxtaposition is represented by a plus) including inversion, multiplication by a scalar, group-theoretic power operation, and Tietze forms. The package is fully vectorized.
freetypeharfbuzz Deterministic Computation of Text Box Metrics
Unlike other tools that dynamically link to the ‘Cairo’ stack, ‘freetypeharfbuzz’ is statically linked to specific versions of the ‘FreeType’ and ‘harfbuzz’ libraries (2.9 and 1.7.6 respectively). This ensures deterministic computation of text box extents for situations where reproducible results are crucial (for instance unit tests of graphics).
FRegSigCom Functional Regression using Signal Compression Approach
Signal compression methods for function-on-function (FOF) regression with functional response and functional predictors, including linear models with both scalar and functional predictors for a small number of functional predictors, linear models with functional predictors for a large number of functional predictors, and nonlinear models. Ruiyan Luo and Xin Qi (2017) <doi:10.1080/01621459.2016.1164053>.
freqdist Frequency Distribution
Generates a frequency distribution. The frequency distribution includes raw frequencies, percentages in each category, and cumulative frequencies. The frequency distribution can be stored as a data frame.
freqdom Frequency Domain Analysis for Multivariate Time Series
Methods for the analysis of multivariate time series using frequency domain techniques. Implementations of dynamic principle components analysis (DPCA) and estimators of operators in lagged regression. Examples of usage in functional data analysis setup.
freqdom.fda Functional Time Series: Dynamic Functional Principal Components
Implementations of functional dynamic principle components analysis. Related graphic tools and frequency domain methods. These methods directly use multivariate dynamic principal components implementation, following the guidelines from Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
FreqProf Frequency Profiles Computing and Plotting
Tools for generating an informative type of line graph, the frequency profile, which allows single behaviors, multiple behaviors, or the specific behavioral patterns of individual subjects to be graphed from occurrence/nonoccurrence behavioral data.
frequencies Create Frequency Tables with Counts and Rates
Provides functions to create frequency tables which display both counts and rates.
frequencyConnectedness Spectral Decomposition of Connectedness Measures
Accompanies a paper (Barunik, Krehlik (2017) <doi:10.2139/ssrn.2627599>) dedicated to spectral decomposition of connectedness measures and their interpretation. We implement all the developed estimators as well as the historical counterparts. For more information, see the help or GitHub page (<https://…/frequencyConnectedness> ) for relevant information.
frequentdirections Implementation of Frequent-Directions Algorithm for Efficient Matrix Sketching
Implement frequent-directions algorithm for efficient matrix sketching. (Edo Liberty (2013) <doi:10.1145/2487575.2487623>).
FRK Fixed Rank Kriging
Fixed Rank Kriging is a tool for spatial/spatio-temporal modelling and prediction with large datasets. The approach, discussed in Cressie and Johannesson (2008), decomposes the field, and hence the covariance function, using a fixed set of n basis functions, where n is typically much smaller than the number of data points (or polygons) m. The method naturally allows for non-stationary, anisotropic covariance functions and the use of observations with varying support (with known error variance). The projected field is a key building block of the Spatial Random Effects (SRE) model, on which this package is based. The package FRK provides helper functions to model, fit, and predict using an SRE with relative ease. Reference: Cressie, N. and Johannesson, G. (2008) <DOI:10.1111/j.1467-9868.2007.00633.x>.
fRLR Fit Repeated Linear Regressions
When fitting a set of linear regressions which have some same variables, we can separate the matrix and reduce the computation cost. This package aims to fit a set of repeated linear regressions faster. More details can be found in this blog Lijun Wang (2017) <https://…/>.
fromo Fast Robust Moments
Fast computation of moments via ‘Rcpp’. Supports computation on vectors and matrices, and Monoidal append of moments.
fs Cross-Platform File System Operations Based on ‘libuv’
A cross-platform interface to file system operations, built on top of the ‘libuv’ C library.
FSelectorRcpp Rcpp’ Implementation of ‘FSelector’ Entropy-Based Feature Selection Algorithms with a Sparse Matrix Support
Rcpp’ (free of ‘Java’/’Weka’) implementation of ‘FSelector’ entropy-based feature selection algorithms with a sparse matrix support. It is also equipped with a parallel backend.
FSInteract Fast Searches for Interactions
Performs fast detection of interactions in large-scale data using the method of random intersection trees introduced in ‘Shah, R. D. and Meinshausen, N. (2014) Random Intersection Trees’. The algorithm finds potentially high-order interactions in high-dimensional binary two-class classification data, without requiring lower order interactions to be informative. The search is particularly fast when the matrices of predictors are sparse. It can also be used to perform market basket analysis when supplied with a single binary data matrix. Here it will find collections of columns which for many rows contain all 1’s.
fst Lightning Fast Serialization of Data Frames for R
Read and write data frames at high speed. Compress your data with fast and efficient type-optimized algorithms that allow for random access of stored data frames (columns and rows).
fsthet Fst-Heterozygosity Smoothed Quantiles
A program to generate smoothed quantiles for the Fst-heterozygosity distribution. Designed for use with large numbers of loci (e.g., genome-wide SNPs). The best case for analyzing the Fst-heterozygosity distribution is when many populations (>10) have been sampled. See Flanagan & Jones (2017) <doi:10.1093/jhered/esx048>.
FTRLProximal FTRL Proximal Implementation for Elastic Net Regression
Implementation of Follow The Regularized Leader (FTRL) Proximal algorithm used for online training of large scale regression models using a mixture of L1 and L2 regularization.
ftsspec Spectral Density Estimation and Comparison for Functional Time Series
Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.
fugue Sensitivity Analysis Optimized for Matched Sets of Varied Sizes
As in music, a fugue statistic repeats a theme in small variations. Here, the psi-function that defines an m-statistic is slightly altered to maintain the same design sensitivity in matched sets of different sizes. The main functions in the package are sen() and senCI(). For sensitivity analyses for m-statistics, see Rosenbaum (2007) Biometrics 63 456-464 <doi:10.1111/j.1541-0420.2006.00717.x>.
fullfact Full Factorial Breeding Analysis
Package for the analysis of full factorial breeding designs.
fulltext Full Text of ‘Scholarly’ Articles Across Many Data Sources
Provides a single interface to many sources of full text ‘scholarly’ data, including ‘Biomed Central’, Public Library of Science, ‘Pubmed Central’, ‘eLife’, ‘F1000Research’, ‘PeerJ’, ‘Pensoft’, ‘Hindawi’, ‘arXiv’ ‘preprints’, and more. Functionality included for searching for articles, downloading full or partial text, converting to various data formats used in and outside of R.
funchir Convenience Functions by Michael Chirico
A set of functions, some subset of which I use in every .R file I write. Examples are table2(), which adds useful functionalities to base table (sorting, built-in proportion argument, etc.); lyx.xtable(), which converts xtable() output to a format more easily copy-pasted into LyX; pdf2(), which writes a plot to file while also displaying it in the RStudio plot window; and abbr_to_colClass(), which is a much more concise way of feeding many types to a colClass argument in a data reader.
functools Extending Functional Programming in R
Extending functional programming in R by providing support to the usual higher order functional suspects (Map, Reduce, Filter, etc.).
funcy Functional Clustering Algorithms
Unified framework to cluster functional data according to one of seven models. All models are based on the projection of the curves onto a basis. The main function funcit() calls wrapper functions for the existing algorithms, so that input parameters are the same. A list is returned with each entry representing the same or extended output for the corresponding method. Method specific as well as general visualization tools are available.
funData An S4 Class for Functional Data
S4 classes for univariate and multivariate functional data with utility functions.
funFEM Clustering in the Discriminative Functional Subspace
The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.
fungible Fungible Coefficients and Monte Carlo Functions
Functions for computing fungible coefficients and Monte Carlo data.
funHDDC Model-based clustering in group-specific functional subspaces
The package provides the funHDDC algorithm (Bouveyron & Jacques, 2011) which allows to cluster functional data by modeling each group within a specific functional subspace.
funique A Faster Unique Function
Similar to base’s unique function, only optimized for working with data frames, especially those that contain date-time columns.
fUnitRoots Rmetrics – Modelling Trends and Unit Roots
Provides four addons for analyzing trends and unit roots in financial time series: (i) functions for the density and probability of the augmented Dickey-Fuller Test, (ii) functions for the density and probability of MacKinnon’s unit root test statistics, (iii) reimplementations for the ADF and MacKinnon Test, and (iv) an ‘urca’ Unit Root Test Interface for Pfaff’s unit root test suite.
funLBM Model-Based Co-Clustering of Functional Data
The funLBM algorithm allows to simultaneously cluster the rows and the columns of a data matrix where each entry of the matrix is a function or a time series.
funModeling Learning Data Cleaning, Visual Analysis and Model Performance
Learn data cleaning, visual data analysis and model performance assessment (KS, AUC and ROC), package core is in the vignette documentation explaining last topics as a tutorial.
funr Simple Utility Providing Terminal Access to all R Functions
A small utility which wraps Rscript and provides access to all R functions from the shell.
funrar Functional Rarity Indices Computation
Computes functional rarity indices as proposed by Violle et al (in revision). Various indices can be computed using both regional and local information. Functional Rarity combines both the functional aspect of rarity as well as the extent aspect of rarity.
FUNTA Functional Tangential Angle Pseudo-Depth
Computes the functional tangential angle pseudo-depth and its robustified version from the paper by Kuhnt and Rehage (2016). See Kuhnt, S.; Rehage, A. (2016): An angle-based multivariate functional pseudo-depth for shape outlier detection, JMVA 146, 325-340, <doi:10.1016/j.jmva.2015.10.016> for details.
funtimes Functions for Time Series Analysis
Includes non-parametric estimators and tests for time series analysis. The functions allow to test for presence of possibly non-monotonic trends and for synchronism of trends in multiple time series, using modern bootstrap techniques and robust non-parametric difference-based estimators.
furrr Apply Mapping Functions in Parallel using Futures
Implementations of the family of map() functions from ‘purrr’ that can be resolved using any ‘future’-supported backend, e.g. parallel on the local machine or distributed on a compute cluster.
fuser Fused Lasso for High-Dimensional Regression over Groups
Enables high-dimensional penalized regression across heterogeneous subgroups. Fusion penalties are used to share information about the linear parameters across subgroups. The underlying model is described in detail in Dondelinger and Mukherjee (2017) <arXiv:1611.00953>.
fusionclust Clustering and Feature Screening using L1 Fusion Penalty
Provides the Big Merge Tracker and COSCI algorithms for convex clustering and feature screening using L1 fusion penalty. See Radchenko, P. and Mukherjee, G. (2017) <doi:10.1111/rssb.12226> and T.Banerjee et al. (2017) <doi:10.1016/j.jmva.2017.08.001> for more details.
futility Interim Analysis of Operational Futility in Randomized Trials with Time-to-Event Endpoints and Fixed Follow-Up
Randomized clinical trials commonly follow participants for a time-to-event efficacy endpoint for a fixed period of time. Consequently, at the time when the last enrolled participant completes their follow-up, the number of observed endpoints is a random variable. Assuming data collected through an interim timepoint, simulation-based estimation and inferential procedures in the standard right-censored failure time analysis framework are conducted for the distribution of the number of endpoints–in total as well as by treatment arm–at the end of the follow-up period. The future (i.e., yet unobserved) enrollment, endpoint, and dropout times are generated according to mechanisms specified in the simTrial() function in the ‘seqDesign’ package. A Bayesian model for the endpoint rate is used for generating future data (see the vignette for details).
future A Future API for R
A Future API for R is provided. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available. Futures are useful constructs in for instance concurrent evaluation, e.g. multicore parallel processing and distributed processing on compute clusters. The purpose of this package is to provide a lightweight interface for using futures in R. Functions ‘future()’ and ‘value()’ exist for creating futures and requesting their values. An infix assignment operator ‘%<=%’ exists for creating futures whose values are accessible by the assigned variables (as promises). This package implements the synchronous ‘lazy’ and ‘eager’ futures, and the asynchronous ‘multicore’ future (not on Windows). Additional types of futures are provided by other packages enhancing this package.
A Future API for R
future.apply Apply Function to Elements in Parallel using Futures
Implementations of apply(), lapply(), sapply() and friends that can be resolved using any future-supported backend, e.g. parallel on the local machine or distributed on a compute cluster. These future_*apply() functions comes with the same pros and cons as the corresponding base-R *apply() functions but with the additional feature of being able to be processed via the future framework.
future.BatchJobs A Future for BatchJobs
Simple parallel and distributed processing using futures that utilizes the ‘BatchJobs’ framework, e.g. ‘fit %<-% { glm.fit(x, y) }’. This package implements the Future API of the ‘future’ package.
future.batchtools A Future API for Parallel and Distributed Processing using ‘batchtools’
Implements of the Future API on top of the ‘batchtools’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, not only on your local machine or ad-hoc cluster of machines, but also via high-performance compute (‘HPC’) job schedulers such as ‘LSF’, ‘OpenLava’, ‘Slurm’, ‘SGE’, and ‘TORQUE’ / ‘PBS’, e.g. ‘y <- future_lapply(files, FUN = process)’.
future.callr A Future API for Parallel Processing using ‘callr’
Implementation of the Future API on top of the ‘callr’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, on your local (Linux, macOS, Windows, …) machine. Contrary to backends relying on the ‘parallel’ package (e.g. ‘future::multisession’), the ‘callr’ backend provided here can run more than 125 parallel R processes.
fuzzr Fuzz-Test R Functions
Test function arguments with a wide array of inputs, and produce reports summarizing messages, warnings, errors, and returned values.
Fuzzy.p.value Computing Fuzzy p-Value
The main goal of this package is drawing the membership function of the fuzzy p-value which is defined as a fuzzy set on the unit interval for three following problems: (1) testing crisp hypotheses based on fuzzy data, (2) testing fuzzy hypotheses based on crisp data, and (3) testing fuzzy hypotheses based on fuzzy data. In all cases, the fuzziness of data or/and the fuzziness of the boundary of null fuzzy hypothesis transported via the p-value function and causes to produce the fuzzy p-value. If the p-value is fuzzy, it is more appropriate to consider a fuzzy significance level for the problem. Therefore, the comparison of the fuzzy p-value and the fuzzy significance level is evaluated by a fuzzy ranking method in this package.
FuzzyAHP (Fuzzy) AHP Calculation
Calculation of AHP (Analytic Hierarchy Process – <http://…/Analytic_hierarchy_process> ) with classic and fuzzy weights based on Saaty’s pairwise comparison method for determination of weights.
fuzzyforest Fuzzy Forests
Fuzzy forests, a new algorithm based on random forests, is designed to reduce the bias seen in random forest feature selection caused by the presence of correlated features. Fuzzy forests uses recursive feature elimination random forests to select features from separate blocks of correlated features where the correlation within each block of features is high and the correlation between blocks of features is low. One final random forest is fit using the surviving features. This package fits random forests using the ‘randomForest’ package and allows for easy use of ‘WGCNA’ to split features into distinct blocks.
fuzzyjoin Join Tables Together on Inexact Matching
Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.
FuzzyLP Fuzzy Linear Programming
Methods to solve Fuzzy Linear Programming Problems with fuzzy constraints (by Verdegay, Zimmermann, Werner, Tanaka), fuzzy costs (multiobjective, interval arithmetic, stratified piecewise reduction, defuzzification-based), and fuzzy technological matrix.
FuzzyMCDM Multi-Criteria Decision Making Methods for Fuzzy Data
Implementation of several MCDM methods for fuzzy data (triangular fuzzy numbers) for decision making problems. The methods that are implemented in this package are Fuzzy TOPSIS (with two normalization procedures), Fuzzy VIKOR, Fuzzy Multi-MOORA and Fuzzy WASPAS. In addition, function MetaRanking() calculates a new ranking from the sum of the rankings calculated, as well as an aggregated ranking.
FuzzyNumbers.Ext.2 Apply Two Fuzzy Numbers on a Monotone Function
One can easily draw the membership function of f(x,y) by package ‘FuzzyNumbers.Ext.2’ in which f(.,.) is supposed monotone and x and y are two fuzzy numbers. This work is possible using function f2apply() which is an extension of function fapply() from Package ‘FuzzyNumbers’ for two-variable monotone functions.
FuzzyR Fuzzy Logic Toolkit for R
Design and simulate fuzzy logic systems using Type 1 Fuzzy Logic. This toolkit includes with graphical user interface (GUI) and an adaptive neuro-fuzzy inference system (ANFIS). This toolkit is a continuation from the previous package (‘FuzzyToolkitUoN’). Produced by the Intelligent Modelling & Analysis Group, University of Nottingham.
fuzzyreg Fuzzy Linear Regression
Estimators for fuzzy linear regression. The functions estimates parameters of fuzzy linear regression models with crisp or fuzzy independent variables (triangular fuzzy numbers are supported). Implements multiple methods for parameter estimation and algebraic operations with triangular fuzzy numbers. Includes functions for summarising, printing and plotting the model fit. Calculates predictions from the model. Diamond (1988) <doi:10.1016/0020-0255(88)90047-3> Hung & Yang (2006) <doi:10.1016/j.fss.2006.08.004> Lee & Tanaka (1999) <doi:10.15807/jorsj.42.98> Nasrabadi, Nasrabadi & Nasrabady (2005) <doi:10.1016/j.amc.2004.02.008> Tanaka, Hayashi & Watada (1989) <doi:10.1016/0377-2217(89)90431-1>.
FuzzyStatTra Statistical Methods for Trapezoidal Fuzzy Numbers
The aim of the package is to provide some basic functions for doing statistics with trapezoidal fuzzy numbers. In particular, the package contains several functions for simulating trapezoidal fuzzy numbers, as well as for calculating some central tendency measures (mean and two types of median), some scale measures (variance, ADD, MDD, Sn, Qn, Tn and some M-estimators) and one diversity index and one inequality index. Moreover, functions for calculating the 1-norm distance, the mid/spr distance and the (phi,theta)-wabl/ldev/rdev distance between fuzzy numbers are included, and a function to calculate the value phi-wabl given a sample of trapezoidal fuzzy numbers.
fuzzywuzzyR Fuzzy String Matching
Fuzzy string matching implementation of the ‘fuzzywuzzy’ <https://…/fuzzywuzzy> ‘python’ package. It uses the Levenshtein Distance <https://…/Levenshtein_distance> to calculate the differences between sequences.
fxtract Feature Extraction from Grouped Data
An R6 implementation for calculating features from grouped data. The output will be one row for each group. This functionality is often needed in the feature extraction process of machine learning problems. Very large datasets are supported, since data is only read into RAM when needed. Calculation can be done in parallel and the process can be monitored. Global error handling is supported. Results are available in one final dataframe.

G

g3viz Visualize Genomic Mutation Data Using an Interactive Lollipop Diagram
R interface for ‘g3viz’ JavaScript library. Using an interactive lollipop diagram to visualize genomic mutation data in a web browser.
GAabbreviate Abbreviating Questionnaires (or Other Measures) Using Genetic Algorithms
The GAabbreviate uses Genetic Algorithms as an optimization tool to create abbreviated forms of lengthy questionnaires (or other measures) that maximally capture the variance in the original data of the long form of the measure.
GADAG A Genetic Algorithm for Learning Directed Acyclic Graphs
Sparse large Directed Acyclic Graphs learning with a combination of a convex program and a tailored genetic algorithm (see Champion et al. (2017) <https://…/document> ).
gafit Genetic Algorithm for Curve Fitting
A group of sample points are evaluated against a user-defined expression, the sample points are lists of parameters with values that may be substituted into that expression. The genetic algorithm attempts to make the result of the expression as low as possible (usually this would be the sum of residuals squared).
gains Gains Table Package
This package constructs gains tables and lift charts for prediction algorithms. Gains tables and lift charts are commonly used in direct marketing applications.
galgo Genetic Algorithms for Multivariate Statistical Models from Large-Scale Functional Genomics Data
Build multivariate predictive models from large datasets having far larger number of features than samples such as in functional genomics datasets. Trevino and Falciani (2006) <doi:10.1093/bioinformatics/btl074>.
gama Genetic Approach to Maximize Clustering Criterion
An evolutionary approach to performing hard partitional clustering. The algorithm uses genetic operators guided by information about the quality of individual partitions. The method looks for the best barycenters/centroids configuration (encoded as real-value) to maximize or minimize one of the given clustering validation criteria: Silhouette, Dunn Index, C-Index or Calinski-Harabasz Index. As many other clustering algorithms, ‘gama’ asks for k: a fixed a priori established number of partitions. If the user does not know the best value for k, the algorithm estimates it by using one of two user-specified options: minimum or broad. The first method uses an approximation of the second derivative of a set of points to automatically detect the maximum curvature (the ‘elbow’) in the within-cluster sum of squares error (WCSSE) graph. The second method estimates the best k value through majority voting of 24 indices. One of the major advantages of ‘gama’ is to introduce a bias to detect partitions which attend a particular criterion. References: Scrucca, L. (2013) <doi:10.18637/jss.v053.i04>; CHARRAD, Malika et al. (2014) <doi:10.18637/jss.v061.i06>; Tsagris M, Papadakis M. (2018) <doi:10.7287/peerj.preprints.26605v1>; Kaufman, L., & Rousseeuw, P. (1990, ISBN:0-47 1-73578-7).
gamCopula Generalized Additive Models for Bivariate Conditional Dependence Structures and Vine Copulas
Implementation of various inference and simulation tools to apply generalized additive models to bivariate dependence structures and non-simplified vine copulas.
GAMens Applies GAMbag, GAMrsm and GAMens Ensemble Classifiers for Binary Classification
Ensemble classifiers based upon generalized additive models for binary classification (De Bock et al. (2010) <DOI:10.1016/j.csda.2009.12.013>). The ensembles implement Bagging (Breiman (1996) <DOI:10.1023/A:1018054314350>), the Random Subspace Method (Ho (1998) <DOI:10.1109/34.709601>), or both, and use Hastie and Tibshirani’s (1990) generalized additive models (GAMs) as base classifiers. Once an ensemble classifier has been trained, it can be used for predictions on new data. A function for cross validation is also included.
GameTheory Cooperative Game Theory
Implementation of a common set of punctual solutions for Cooperative Game Theory.
GameTheoryAllocation Tools for Calculating Allocations in Game Theory
Many situations can be modeled as game theoretic situations. Some procedures are included in this package to calculate the most important allocations rules in Game Theory: Shapley value, Owen value or nucleolus, among other. First, we must define as an argument the value of the unions of the envolved agents with the characteristic function.
gamlss.inf Fitting Mixed (Inflated and Adjusted) Distributions
This is an add-on package to ‘gamlss’. The purpose of this package is to allow users to fit GAMLSS (Generalised Additive Models for Location Scale and Shape) models when the response variable is defined either in the intervals [0,1), (0,1] and [0,1] (inflated at zero and/or one distributions), or in the positive real line including zero (zero-adjusted distributions). The mass points at zero and/or one are treated as extra parameters with the possibility to include a linear predictor for both. The package also allows transformed or truncated distributions from the GAMLSS family to be used for the continuous part of the distribution. Standard methods and GAMLSS diagnostics can be used with the resulting fitted object.
gamlss.spatial Spatial Terms in Generalized Additive Models for Location Scale and Shape Models
It allows us to fit Gaussian Markov Random Field within the Generalized Additive Models for Location Scale and Shape algorithms.
gamlssbssn Bimodal Skew Symmetric Normal Distribution
Density, distribution function, quantile function and random generation for the bimodal skew symmetric normal distribution of Hassan and El-Bassiouni (2016) <doi:10.1080/03610926.2014.882950>.
gamm4.test Comparing Nonlinear Curves and Surface Estimations by Semiparametric Methods
To compare nonlinear curves and surface estimations between groups using semiparametric methods for cross-sectional and longitudinal dataset.
gammSlice Generalized additive mixed model analysis via slice sampling
Uses a slice sampling-based Markov chain Monte Carlo to conduct Bayesian fitting and inference for generalized additive mixed models (GAMM). Generalized linear mixed models and generalized additive models are also handled as special cases of GAMM.
gamreg Robust and Sparse Regression via Gamma-Divergence
Robust regression via gamma-divergence with L1, elastic net and ridge.
gamRR Calculate the RR for the GAM
To calculate the relative risk (RR) for the generalized additive model.
gamsel Fit Regularization Path for Generalized Additive Models
Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families.
GAparsimony Searching Parsimony Models with Genetic Algorithms
Methodology that combines feature selection, model tuning, and parsimonious model selection with Genetic Algorithms (GA) proposed in {Martinez-de-Pison} (2015) <DOI:10.1016/j.asoc.2015.06.012>. To this objective, a novel GA selection procedure is introduced based on separate cost and complexity evaluations.
GAR Authorize and Request Google Analytics Data
The functions included are used to obtain initial authentication with Google Analytics as well as simple and organized data retrieval from the API. Allows for retrieval from multiple profiles at once.
GAS Generalized Autoregressive Score Models
Simulate, Estimate and Forecast using univariate and multivariate GAS models.
gaselect Genetic Algorithm (GA) for Variable Selection from High-Dimensional Data
Provides a genetic algorithm for finding variable subsets in high dimensional data with high prediction performance. The genetic algorithm can use ordinary least squares (OLS) regression models or partial least squares (PLS) regression models to evaluate the prediction power of variable subsets. By supporting different cross-validation schemes, the user can fine-tune the tradeoff between speed and quality of the solution.
gatepoints Easily Gate or Select Points on a Scatter Plot
Allows user to choose/gate a region on the plot and returns points within it.
GBJ Generalized Berk-Jones Statistic for Set-Based Inference
Offers the Generalized Berk-Jones (GBJ) test for set-based inference in genetic association studies. The GBJ is designed as an alternative to tests such as Berk-Jones (BJ), Higher Criticism (HC), Generalized Higher Criticism (GHC), Minimum p-value (minP), and Sequence Kernel Association Test (SKAT). All of these other methods (except for SKAT) are also implemented in this package, and we additionally provide an omnibus test (OMNI) which integrates information from each of the tests. The GBJ has been shown to outperform other tests in genetic association studies when signals are correlated and moderately sparse. For more details, please ask for a preprint of our manuscript or see the vignette for a quickstart guide.
gbm Generalized Boosted Regression Models
An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart).
gbp A Bin Packing Problem Solver
Basic infrastructure and several algorithms for 1d-4d bin packing problem. This package provides a set of c-level classes and solvers for 1d-4d bin packing problem, and an r-level solver for 4d bin packing problem, which is a wrapper over the c-level 4d bin packing problem solver. The 4d bin packing problem solver aims to solve bin packing problem, a.k.a container loading problem, with an additional constraint on weight. Given a set of rectangular-shaped items, and a set of rectangular-shaped bins with weight limit, the solver looks for an orthogonal packing solution such that minimizes the number of bins and maximize volume utilization. Each rectangular-shaped item i = 1, .. , n is characterized by length l_i, depth d_i, height h_i, and weight w_i, and each rectangular-shaped bin j = 1, .. , m is specified similarly by length l_j, depth d_j, height h_j, and weight limit w_j. The item can be rotated into any orthogonal direction, and no further restrictions implied.
gbts Hyperparameter Search for Gradient Boosted Trees
An implementation of hyperparameter optimization for Gradient Boosted Trees on binary classification and regression problems. The current version provides two optimization methods: active learning and random search.
GCalignR Simple Peak Alignment for Gas-Chromatography Data
Aligns chromatography peaks with a three step algorithm: (1) Linear transformation of retention times to maximise shared peaks among samples (2) Align peaks within a certain error-interval (3) Merges rows that are likely representing the same substance (i.e. no sample shows peaks in both rows and the rows have similar retention time means). The method was first described in Stoffel et al. (2015) <doi:10.1073/pnas.1506076112>.
gcerisk Generalized Competing Event Model
Generalized competing event model based on Cox PH model and Fine-Gray model. This function is designed to develop optimized risk-stratification methods for competing risks data, such as described in: 1. Carmona R, Gulaya S, Murphy JD, Rose BS, Wu J, Noticewala S,McHale MT, Yashar CM, Vaida F, and Mell LK.(2014) Validated competing event model for thestage I-II endometrial cancer population. Int J Radiat Oncol Biol Phys.89:888-98. <DOI:10.1016/j.ijrobp.2014.03.047>. 2. Carmona R, Zakeri K, Green G, Hwang L, Gulaya S, Xu B, Verma R, Williamson CW, Triplett DP, Rose BS, Shen H, Vaida F, Murphy JD, and Mell LK. (2016) Improved method to stratify elderly cancer patients at risk for competing events. J Clin Oncol.in press. <DOI:10.1200/JCO.2015.65.0739>.
gcForest Deep Forest Model
R application programming interface (API) for Deep Forest which based on Zhou and Feng (2017). Deep Forest: Towards an Alternative to Deep Neural Networks. (<arXiv:1702.08835v2>) or Zhou and Feng (2017). Deep Forest. (<arXiv:1702.08835>). And for the Python module ‘gcForest’ (<https://…/gcForest> ).
gcite Google Citation Parser
Scrapes Google Citation pages and creates data frames of citations over time.
gcKrig Analyze and Interpolate Geostatistical Count Data using Gaussian Copula
Provides a variety of functions to analyze and model geostatistical count data with Gaussian copulas, including 1) data simulation and visualization; 2) correlation structure assessment (here also known as the NORTA); 3) calculate multivariate normal rectangle probabilities; 4) likelihood inference and parallel prediction at unsampled locations.
GDAtools A toolbox for the analysis of categorical data in social sciences, and especially Geometric Data Analysis
This package contains functions for ‘specific’ MCA (Multiple Correspondence Analysis), ‘class specific’ MCA, computing and plotting structuring factors and concentration ellipses, ‘standardized’ MCA, inductive tests and others tools for Geometric Data Analysis. It also provides functions for the translation of logit models coefficients into percentages (forthcoming), weighted contingency tables and an association measure – i.e. Percentages of Maximum Deviation from Independence (PEM).
gDefrag Graph-Based Landscape De-Fragmentation
Provides a set of tools to help the de-fragmentation process. It works by prioritizing the different sections of linear infrastructures (e.g. roads, power-lines) to increase the available amount of a given resource.
gdm Functions for Generalized Dissimilarity Modeling
A toolkit with functions to fit, plot, and summarize Generalized Dissimilarity Models.
gdns Tools to Work with Google DNS Over HTTPS API
To address the problem of insecurity of UDP-based DNS requests, Google Public DNS offers DNS resolution over an encrypted HTTPS connection. DNS-over-HTTPS greatly enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups. Functions that enable querying individual requests that bulk requests that return detailed responses and bulk requests are both provided. Support for reverse lookups is also provided. See <https://…/dns-over-https> for more information.
gdpc Generalized Dynamic Principal Components
Functions to compute the Generalized Dynamic Principal Components introduced in Peña and Yohai (2016) <DOI:10.1080/01621459.2015.1072542>.
gds Descriptive Statistics of Grouped Data
Contains a function called gds() which accepts three input parameters like lower limits, upper limits and the frequencies of the corresponding classes. The gds() function calculate and return the values of mean (‘gmean’), median (‘gmedian’), mode (‘gmode’), variance (‘gvar’), standard deviation (‘gstdev’), coefficient of variance (‘gcv’), quartiles (‘gq1’, ‘gq2’, ‘gq3’), inter-quartile range (‘gIQR’), skewness (‘g1’), and kurtosis (‘g2’) which facilitate effective data analysis. For skewness and kurtosis calculations we use moments.
gdtools Utilities for Graphical Rendering
Useful tools for writing vector graphics devices.
gear Geostatistical Analysis in R
Implements common geostatistical methods in a clean, straightforward, efficient manner. A quasi reboot of the SpatialTools R package.
GeDS Geometrically Designed Spline Regression
Geometrically Designed Spline (‘GeDS’) Regression is a non-parametric geometrically motivated method for fitting variable knots spline predictor models in one or two independent variables, in the context of generalized (non-)linear models. ‘GeDS’ estimates the number and position of the knots and the order of the spline, assuming the response variable has a distribution from the exponential family. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s00180-015-0621-7> and Dimitrova et al. (2017) <https://…/18460>.
gee4 Generalised Estimating Equations (GEE/WGEE) using ‘Armadillo’ and S4
Fit joint mean-covariance models for longitudinal data within the framework of (weighted) generalised estimating equations (GEE/WGEE). The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Armadillo’ C++ library for numerical linear algebra and ‘RcppArmadillo’ glue.
geecure Marginal Proportional Hazards Mixture Cure Models with Generalized Estimating Equations
Features the marginal parametric and semi-parametric proportional hazards mixture cure models for analyzing clustered survival data with a possible cure fraction. A reference is Yi Niu and Yingwei Peng (2014) <doi:10.1016/j.jmva.2013.09.003>.
GEEmediate Mediation Analysis for Generalized Linear Models Using the Difference Method
Causal mediation analysis for a single exposure/treatment and a single mediator, both allowed to be either continuous or binary. The package implements the difference method and provide point and interval estimates as well as testing for the natural direct and indirect effects and the mediation proportion.
geex An API for M-Estimation
Provides a general, flexible framework for estimating parameters and empirical sandwich variance estimator from a set of unbiased estimating equations (i.e., M-estimation in the vein of Stefanski & Boos (2002) <doi:10.1198/000313002753631330>). Also provides an API to compute finite-sample variance corrections.
gelnet Generalized Elastic Nets
The package implements several extensions of the elastic net regularization scheme. These extensions include individual feature penalties for the L1 term and feature-feature penalties for the L2 term.
gemmR General Monotone Model
An R-language implementation of the General Monotone Model proposed by Michael Dougherty and Rick Thomas. It is a procedure for estimating weights for a set of independent predictors that minimize the rank-order inversions between the model predictions and some outcome.
gems Generalized Multistate Simulation Model
Simulate and analyze multistate models with general hazard functions. gems provides functionality for the preparation of hazard functions and parameters, simulation from a general multistate model and predicting future events. The multistate model is not required to be a Markov model and may take the history of previous events into account. In the basic version, it allows to simulate from transition-specific hazard function, whose parameters are multivariable normally distributed.
genBart Generate ‘BART’ File
A set of functions to generate and format results from statistical analyses of a wide range of high throughput experiments that can then be uploaded into the ‘BART’ (Bio-Statistical Analysis Reporting Tool) ‘shiny’ app <https://…/BART>. The app provides users with tools to visualize and efficiently sift through large amounts of data and results.
gencve General Cross Validation Engine
Engines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for ‘glmnet’, ‘lars’, ‘plus’, ‘MASS’, ‘rpart’, ‘C50’ and ‘randomforest’. It is easy for the user to add other regression or classification algorithms. The ‘parallel’ package is used to improve speed. Several data generation algorithms for problems in regression and classification are provided.
genderizeR Gender Prediction Based on First Names
Utilizes the genderize.io API to predict gender from first names extracted from a text vector. The accuracy of prediction could be controlled by two parameters: counts of a first name in the database and probability of prediction.
genderNames Client for the Genderize API That Determines the Gender of Names
API client for genderize.io which will tell you the gender of the name you input. Use the first name of the person you are interested in to find their gender.
gendist Generated Probability Distribution Models
Computes the probability density function (pdf), cumulative distribution function (cdf), quantile function (qf) and generates random values (rg) for the following general models : mixture models, composite models, folded models, skewed symmetric models and arc tan models.
GENEAclassify Segmentation and Classification of Accelerometer Data
Segmentation and classification procedures for data from the ‘Activinsights GENEActiv’ <https://…/> accelerometer that provides the user with a model to guess behaviour from test data where behaviour is missing. Includes a step counting algorithm, a function to create segmented data with custom features and a function to use recursive partitioning provided in the function rpart() of the ‘rpart’ package to create classification models.
GENEAsphere Visualisation of Raw or Segmented Accelerometer Data
Creates visualisations in two and three dimensions of simulated data based on detected segments or raw accelerometer data.
GeneralizedUmatrix Credible Visualization for Two-Dimensional Projections of Data
Projections from a high-dimensional data space onto a two-dimensional plane are used to detect structures, such as clusters, in multivariate data. The generalized Umatrix is able to visualize errors of these two-dimensional scatter plots by using a 3D topographic map.
GeneralOaxaca Blinder-Oaxaca Decomposition for Generalized Linear Model
Perform the Blinder-Oaxaca decomposition for generalized linear model with bootstrapped standard errors. The twofold and threefold decomposition are given, even the generalized linear model output in each group.
GeneralTree General Tree Data Structure
A general tree data structure implementation in R.
generator Generate Data Containing Fake Personally Identifiable Information
Allows users to quickly and easily generate fake data containing Personally Identifiable Information (PII) through convenience functions.
generics Common S3 Generics not Provided by Base R Methods Related to Model Fitting
In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.
GenEst Generalized Mortality Estimator
Command-line and ‘shiny’ GUI implementation of the GenEst models for estimating bird and bat mortality at wind and solar power facilities, following Dalthorp, et al. (2018) <https://…/tm7a2.pdf>.
GeNetIt Spatial Graph-Theoretic Genetic Gravity Modelling
Implementation of spatial graph-theoretic genetic gravity models. The model framework is applicable for other types of spatial flow questions. Includes functions for constructing spatial graphs, sampling and summarizing associated raster variables and building unconstrained and singly constrained gravity models.
GenForImp The Forward Imputation: A Sequential Distance-Based Approach for Imputing Missing Data
Two methods based on the Forward Imputation approach are implemented for the imputation of quantitative missing data. One method alternates Nearest Neighbour Imputation and Principal Component Analysis (function ‘ForImp.PCA’), the other uses Nearest Neig
genie A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm
A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed.
genio Genetics Input/Output Functions
Implements readers and writers for file formats associated with genetics data. Reading and writing plink BED/BIM/FAM formats is fully supported, including a lightning-fast BED reader and writer implementations. Other functions are ‘readr’ wrappers that are more constrained, user-friendly, and efficient for these particular applications; handles plink and eigenstrat tables (FAM, BIM, IND, and SNP files). There are also ‘make’ functions for FAM and BIM tables with default values to go with simulated genotype data.
genlogis Generalized Logistic Distribution
Provides basic distribution functions for a generalized logistic distribution proposed by Rathie and Swamee (2006). It also has a interactive ‘RStudio’ plot for better guessing dynamically of initial values for ease of included optimization and simulating.
genpathmox Generalized PATHMOX Algorithm for PLS-PM, LS and LAD Regression
genpathmox provides a very interesting solution for handling segmentation variables in complex statistical methodology. It contains en extended version of the PATHMOX algorithm in the context of partial least square path modeling (Sanchez, 2009) including the F-block test (to detect the responsible latent endogenous equations of the difference), the F-coefficient (to detect the path coefficients responsible of the difference) and the invariance test (to realize a comparison between the sub-models’ latent variables). Furthermore, the package contains a generalized version of the PATHMOX algorithm to approach different methodologies: linear regression and least absolute regression models.
gensphere Generalized Spherical Distributions
Define and compute with generalized spherical distributions – multivariate probability laws that are specified by a star shaped contour (directional behavior) and a radial component.
gensvm A Generalized Multiclass Support Vector Machine
The GenSVM classifier is a generalized multiclass support vector machine (SVM). This classifier aims to find decision boundaries that separate the classes with as wide a margin as possible. In GenSVM, the loss function is very flexible in the way that misclassifications are penalized. This allows the user to tune the classifier to the dataset at hand and potentially obtain higher classification accuracy than alternative multiclass SVMs. Moreover, this flexibility means that GenSVM has a number of other multiclass SVMs as special cases. One of the other advantages of GenSVM is that it is trained in the primal space, allowing the use of warm starts during optimization. This means that for common tasks such as cross validation or repeated model fitting, GenSVM can be trained very quickly. Based on: G.J.J. van den Burg and P.J.F. Groenen (2018) <http://…/14-526.html>.
geoaxe Split ‘Geospatial’ Objects into Pieces
Split ‘geospatial’ objects into pieces. Includes support for some spatial object inputs, ‘Well-Known Text’, and ‘GeoJSON’.
geodetector Stratified Heterogeneity Measure, Dominant Driving Force Detection, Interaction Relationship Investigation
Spatial stratified heterogeneity (SSH), referring to the within strata are more similar than the between strata, a model with global parameters would be confounded if input data is SSH. Note that the ‘spatial’ here can be either geospatial or the space in mathematical meaning. Geographical detector is a novel tool to investigate SSH: (1) measure and find SSH of a variable Y; (2) test the power of determinant X of a dependent variable Y according to the consistency between their spatial distributions; and (3) investigate the interaction between two explanatory variables X1 and X2 to a dependent variable Y (Wang et al 2014 <doi:10.1080/13658810802443457>, Wang, Zhang, and Fu 2016 <doi:10.1016/j.ecolind.2016.02.052>).
geodist Fast, Dependency-Free Geodesic Distance Calculations
Dependency-free, ultra fast calculation of geodesic distances. Includes the reference nanometre-accuracy geodesic distances of Karney (2013) <doi:10.1007/s00190-012-0578-z>, as used by the ‘sf’ package, as well as Haversine and Vincenty distances. Default distance measure is the ‘Mapbox cheap ruler’ which is generally more accurate than Haversine or Vincenty for distances out to a few hundred kilometres, and is considerably faster. The main function accepts one or two inputs in almost any generic rectangular form, and returns either matrices of pairwise distances, or vectors of sequential distances.
geofacet ggplot2′ Faceting Utilities for Geographical Data
Provides geofaceting functionality for ‘ggplot2’. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that preserves some of the geographical orientation.
geofd Spatial Prediction for Function Value Data
Kriging based methods are used for predicting functional data (curves) with spatial dependence.
geoGAM Select Sparse Geoadditive Models for Spatial Prediction
A model building procedure to select a sparse geoadditive model from a large number of covariates. Continuous, binary and ordered categorical responses are supported. The model building is based on component wise gradient boosting with linear effects and smoothing splines. The resulting covariate set after gradient boosting is further reduced through cross validated backward selection and aggregation of factor levels. The package provides a model based bootstrap method to simulate prediction intervals for point predictions. A test data set of a soil mapping case study is provided.
geogrid Turn Geospatial Polygons into Regular or Hexagonal Grids
Turn irregular polygons (such as geographical regions) into regular or hexagonal grids. This package enables the generation of regular (square) and hexagonal grids through the package ‘sp’ and then assigns the content of the existing polygons to the new grid using the Hungarian algorithm, Kuhn (1955) (<doi:10.1007/978-3-540-68279-0_2>). This prevents the need for manual generation of hexagonal grids or regular grids that are supposed to reflect existing geography.
geohash Tools for Geohash Creation and Manipulation
Provides tools to encode lat/long pairs into geohashes, decode those geohashes, and identify their neighbours.
geohashTools Tools for Working with Geohashes
Tools for working with Gustavo Niemeyer’s geohash coordinate system, ported to R from Hiroaki Kawai’s ‘Python’ implementation and embellished to sit naturally in the R ecosystem.
geojson Classes for ‘GeoJSON’
Classes for ‘GeoJSON’ to make working with ‘GeoJSON’ easier.
geojsonio Convert Data from and to ‘geoJSON’ or ‘topoJSON’
Convert data to ‘geoJSON’ or ‘topoJSON’ from various R classes, including vectors, lists, data frames, shape files, and spatial classes. ‘geojsonio’ does not aim to replace packages like ‘sp’, ‘rgdal’, ‘rgeos’, but rather aims to be a high level client to simplify conversions of data from and to ‘geoJSON’ and ‘topoJSON’.
geojsonlint Tools for Validating ‘GeoJSON’
Tools for linting ‘GeoJSON’. Includes tools for interacting with the online tool <http://geojsonlint.com>, the ‘Javascript’ library ‘geojsonhint’ (<https://…/geojsonhint> ), and validating against a GeoJSON schema via the ‘Javascript’ library (<https://…/is-my-json-valid> ). Some tools work locally while others require an internet connection.
geojsonR A GeoJson Processing Toolkit
Includes functions for processing GeoJson objects <https://…/GeoJSON> relying on ‘RFC 7946’ <https://…/rfc7946.pdf>. The geojson encoding is based on ‘json11’, a tiny JSON library for ‘C++11’ <https://…/json11>. Furthermore, the source code is exported in R through the ‘Rcpp’ and ‘RcppArmadillo’ packages.
geojsonsf GeoJSON to Simple Feature Converter
Converts GeoJSON to simple feature objects.
GeomComb (Geometric) Forecast Combination Methods
Provides eigenvector-based (geometric) forecast combination methods; also includes simple approaches (simple average, median, trimmed and winsorized mean, inverse rank method) and regression-based combination. Tools for data pre-processing are available in order to deal with common problems in forecast combination (missingness, collinearity).
geomerge Geospatial Data Integration
Geospatial data integration framework that merges raster, spatial polygon, and (dynamic) spatial points data into a spatial (panel) data frame at any geographical resolution.
geometa Tools for Reading and Writing ISO/OGC Geographic Metadata
Provides facilities to handle reading and writing of geographic metadata defined with OGC/ISO 19115 and 19139 (XML) standards.
GeoMongo Geospatial Queries Using ‘PyMongo’
Utilizes methods of the ‘PyMongo’ ‘Python’ library to initialize, insert and query ‘GeoJson’ data (see <https://…/#> for more information on ‘PyMongo’). Furthermore, it allows the user to validate ‘GeoJson’ objects and to use the console for ‘MongoDB’ (bulk) commands. The ‘reticulate’ package provides the ‘R’ interface to ‘Python’ modules, classes and functions.
geomorph Geometric Morphometric Analyses of 2D/3D Landmark Data
Geomorph allows users to read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.
geonames Interface to http://www.geonames.org web service
Code for querying the web service at http://www.geonames.org
geonetwork Geographic Networks
Provides classes and methods for handling networks or graphs whose nodes are geographical (i.e. locations in the globe). The functionality includes the creation of objects of class geonetwork as a graph with node coordinates, the computation of network measures, the support of spatial operations (projection to different Coordinate Reference Systems, handling of bounding boxes, etc.) and the plotting of the geonetwork object combined with supplementary cartography for spatial representation.
geoops GeoJSON’ Topology Calculations and Operations
Tools for doing calculations and manipulations on ‘GeoJSON’, a ‘geospatial’ data interchange format (<https://…/rfc7946> ). ‘GeoJSON’ is also valid ‘JSON’.
geoparser Interface to the Geoparser.io API for Identifying and Disambiguating Places Mentioned in Text
A wrapper for the Geoparser.io API version 0.4.0 (see <https://…/> ), which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. Basic, limited API access is free with paid plans to accommodate larger workloads.
geosample Construction of Geostatistical Sampling Designs
Functions for constructing sampling designs, including spatially random, inhibitory (simple or with close pairs), both discrete and continuous, and adaptive designs. For details on the methods, see the following references: Chipeta et al. (2016) <doi:10.1016/j.spasta.2015.12.004> and Chipeta et al. (2016) <doi:10.1002/env.2425>.
geosapi GeoServer REST API R Interface
Provides an R interface to the GeoServer REST API, allowing to upload and publish data in a GeoServer web-application and expose data to OGC Web-Services. The package currently supports all CRUD (Create,Read,Update,Delete) operations on GeoServer workspaces, namespaces, datastores (stores of vector data), featuretypes, layers, styles, as well as vector data upload operations. For more information about the GeoServer REST API, see <http://…/>.
geosptdb Spatio-Temporal; Inverse Distance Weighting and Radial Basis Functions with Distance-Based Regression
Spatio-temporal: Inverse Distance Weighting (IDW) and radial basis functions; optimization, prediction, summary statistics from leave-one-out cross-validation, adjusting distance-based linear regression model and generation of the principal coordinates of a new individual from Gower’s distance.
geotoolsR Tools to Improve the Use of Geostatistic
The basic idea of this package is provides some tools to help the researcher to work with geostatistics. Initially, we present a collection of functions that allow the researchers to deal with spatial data using bootstrap procedure.There are five methods available and two ways to display them: bootstrap confidence interval – provides a two-sided bootstrap confidence interval; bootstrap plot – a graphic with the original variogram and each of the B bootstrap variograms.
geoviz Elevation and GPS Data Visualisation
Simpler processing of digital elevation model and GPS trace data for use with the ‘rayshader’ package.
geozoning Zoning Methods for Spatial Data
A zoning method and a numerical criterion for zoning quality are available in this package. The zoning method is based on a numerical criterion that evaluates the zoning quality. This criterion quantifies simultaneously how zones are heterogeneous on the whole map and how neighbouring zones are similar. This approach allows comparison between maps either with different zones or different labels, which is of importance for zone delineation algorithms aiming at maximizing inter-zone variability. An optimisation procedure provides the user with the best zonings thanks to contour delineation for a given map.
GERGM Estimation and Fit Diagnostics for Generalized Exponential Random Graph Models
Estimation and diagnosis of the convergence of Generalized Exponential Random Graph Models (GERGM) via Gibbs sampling or Metropolis Hastings with exponential down weighting.
gesca Generalized Structured Component Analysis (GSCA)
Fit a variety of component-based structural equation models.
gestate Generalised Survival Trial Assessment Tool Environment
Provides tools to assist planning and monitoring of time-to-event trials under complicated censoring assumptions and/or non-proportional hazards. There are three main components: The first is analytic calculation of predicted time-to-event trial properties, providing estimates of expected hazard ratio, event numbers and power under different analysis methods. The second is simulation, allowing calculation of these same properties. Finally, it also provides parametric event prediction using blinded trial data, including creation of confidence intervals. Methods are based upon numerical integration and a flexible object-orientated structure for defining event, censoring and recruitment curves.
gethr Access to Ethereum-Based Blockchains Through Geth Nodes
Full access to the Geth command line interface for running full Ethereum nodes. With gethr it is possible to carry out different tasks such as mine ether, transfer funds, create contacts, explore block history, etc. The package also provides access to all the available APIs. The officially exposed by Ethereum blockchains (eth, shh, web3, net) and some provided directly by Geth (admin, debug, miner, personal, txpool). For more details on Ethereum, access the project website <https://…/>. For more details on the Geth client, access the project website <https://…/>.
getmstatistic Quantifying Systematic Heterogeneity in Meta-Analysis
Quantifying systematic heterogeneity in meta-analysis using R. The \code{M} statistic aggregates heterogeneity information across multiple variants to, identify systematic heterogeneity patterns and their direction of effect in meta-analysis. It’s primary use is to identify outlier studies, which either show ‘null’ effects or consistently show stronger or weaker genetic effects than average across, the panel of variants examined in a GWAS meta-analysis. In contrast to conventional heterogeneity metrics (Q-statistic, I-squared and tau-squared) which measure random heterogeneity at individual variants, \code{M} measures systematic (non-random) heterogeneity across multiple independently associated variants. Systematic heterogeneity can arise in a meta-analysis due to differences in the study characteristics of participating studies. Some of the differences may include: ancestry, allele frequencies, phenotype definition, age-of-disease onset, family-history, gender, linkage disequilibrium and quality control thresholds. See <https://…/> for statistical statistical theory, documentation and examples.
getPass Masked User Input
A micro-package for reading ‘passwords’, i.e. reading user input with masking, so that the input is not displayed as it is typed. Currently we have support for ‘RStudio’, the command line (every OS), and any platform where ‘tcltk’ is present.
getProxy Get Free Proxy IP and Port
Allows get address and port of the free proxy server, from one of two services <http://…/> or <https://…/>. And it’s easy to redirect your Internet connection through a proxy server.
gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods
Automated multi-path General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting structural breaks in the mean. The mean can be specified as an autoregressive model with covariates (an ‘AR-X’ model), and the variance can be specified as a log-variance model with covariates (a ‘log-ARCH-X’ model). The four main functions of the package are arx, getsm, getsv and isat. The first function, arx, estimates an AR-X model with log-ARCH-X errors. The second function, getsm, undertakes GETS model selection of the mean specification of an arx object. The third function, getsv, undertakes GETS model selection of the log-variance specification of an arx object. The fourth function, isat, undertakes GETS model selection of an indicator saturated mean specification.
gettz Get the Timezone Information
A function to retrieve the system timezone on Unix systems which has been found to find an answer when ‘Sys.timezone()’ has failed. It is based on an answer by Duane McCully posted on ‘StackOverflow’, and adapted to be callable from R.
gexp Generator of Experiments
Generates experiments – simulating structured or experimental data as: completely randomized design, randomized block design, latin square design, factorial and split-plot experiments (Ferreira, 2008, ISBN:8587692526; Naes et al., 2007 <doi:10.1002/qre.841>; Rencher et al., 2007, ISBN:9780471754985; Montgomery, 2001, ISBN:0471316490).
GFA Group Factor Analysis
Factor analysis implementation for multiple data sources, i.e., for groups of variables. The whole data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The model group factor analysis (GFA) is inferred with Gibbs sampling.
GFD Tests for General Factorial Designs
Implemented are the Wald-type statistic, a permuted version thereof as well as the ANOVA-type statistic for general factorial designs, even with non-normal error terms and/or heteroscedastic variances, for crossed designs with an arbitrary number of factors and nested designs with up to three factors.
GFGM.copula Generalized Farlie-Gumbel-Morgenstern Copula
Compute bivariate dependence measures and perform bivariate competing risks analysis under the generalized Farlie-Gumbel-Morgenstern (FGM) copula. See Shih and Emura (2016) <doi:10.1007/s00362-016-0865-5> and Shih and Emura (2017, in re-submission) for details.
gfmR Implements Group Fused Multinomial Regression
Software to implement methodology to preform automatic response category combinations in multinomial logistic regression. There are functions for both cross validation and AIC for model selection. The method provides regression coefficient estimates that may be useful for better understanding the true probability distribution of multinomial logistic regression when category probabilities are similar. These methods are not recommended for a large number of predictor variables.
ggallin Grab Bag of ‘ggplot2’ Functions
Extra geoms and scales for ‘ggplot2’, including geom_cloud(), a Normal density cloud replacement for errorbars; transforms ssqrt_trans and pseudolog10_trans, which are loglike but appropriate for negative data; interp_trans() and warp_trans() which provide scale transforms based on interpolation; and an infix compose operator for scale transforms.
ggalluvial Alluvial Diagrams in ‘ggplot2’
Alluvial diagrams encompass a variety of charts that use x-splines (alluvia and flows), sometimes augmented with stacked bars (lodes or strata), to visualize incidence structures derived from several data types, including repeated categorical measures, evolving classifications, and multi-dimensional categorical data. This package contains stat and geom layers that interpret multiple data formats compatible with this framework while hewing to the principles of tidy data and the grammar of graphics.
ggalt Extra Coordinate Systems, Geoms and Statistical Transformations for ‘ggplot2’
A compendium of ‘geoms’, ‘coords’ and ‘stats’ for ‘ggplot2’, including splines, 1d and 2d densities, univariate average shifted histograms and a new map coordinate system based on the ‘PROJ.4’-library.
gganimate A Grammar of Animated Graphics
The grammar of graphics as implemented in the ‘ggplot2’ package has been successful in providing a powerful API for creating static visualisation. In order to extend the API for animated graphics this package provides a completely new set of grammar, fully compatible with ‘ggplot2’ for specifying transitions and animations in a flexible and extensible way.
ggasym Asymmetric Matrix Plotting in ‘ggplot2’
Plots a symmetric matrix with three different fill aesthetics for the top-left and bottom-right triangles and along the diagonal. It operates within the Grammar of Graphics paradigm implemented in ‘ggplot2’.
ggbeeswarm Categorical Scatter (Violin Point) Plots
Provides two methods of plotting categorical scatter plots such that the arrangement of points within a category reflects the density of data at that region, and avoids over-plotting.
ggconf Simpler Appearance Modification of ‘ggplot2’
A flexible interface for ggplot2::theme(), potentially saving 50% of your typing.
ggcorrplot Visualization of a Correlation Matrix using ‘ggplot2’
The ‘ggcorrplot’ package can be used to visualize easily a correlation matrix using ‘ggplot2’. It provides a solution for reordering the correlation matrix and displays the significance level on the plot. It also includes a function for computing a matrix of correlation p-values.
ggdag Analyze and Create Elegant Directed Acyclic Graphs
Tidy, analyze, and plot directed acyclic graphs (DAGs). ‘ggdag’ is built on top of ‘dagitty’, an R package that uses the ‘DAGitty’ web tool (<http://dagitty.net> ) for creating and analyzing DAGs. ‘ggdag’ makes it easy to tidy and plot ‘dagitty’ objects using ‘ggplot2’ and ‘ggraph’, as well as common analytic and graphical functions, such as determining adjustment sets and node relationships.
ggdemetra ggplot2′ Extension for Seasonal and Trading Day Adjustment with ‘RJDemetra’
Provides ‘ggplot2’ functions to return the results of seasonal and trading day adjustment made by ‘RJDemetra’. ‘RJDemetra’ is an ‘R’ interface around ‘JDemetra+’ (<https://…/jdemetra-app> ), the seasonal adjustment software officially recommended to the members of the European Statistical System and the European System of Central Banks.
ggdistribute A ‘ggplot2’ Extension for Plotting Unimodal Distributions
The ‘ggdistribute’ package is an extension for plotting posterior or other types of unimodal distributions that require overlaying information about a distribution’s intervals. It makes use of the ‘ggproto’ system to extend ‘ggplot2’, providing additional ‘geoms’, ‘stats’, and ‘positions.’ The extensions integrate with existing ‘ggplot2’ layer elements.
ggdmc Dynamic Model of Choice with Parallel Computation, and C++ Capabilities
A fast engine for computing hierarchical Bayesian model implemented in the Dynamic Model of Choice.
GGEBiplots GGE Biplots with ‘ggplot2’
Genotype plus genotype-by-environment (GGE) biplots rendered using ‘ggplot2’. Provides a command line interface to all of the functionality contained within ‘GGEBiplotGUI’.
ggedit Interactive ‘ggplot2’ Layer and Theme Aesthetic Editor
Interactively edit ‘ggplot2’ layer and theme aesthetics definitions.
ggenealogy Visualization Tools for Genealogical Data
Methods for searching through genealogical data and displaying the results. Plotting algorithms assist with data exploration and publication-quality image generation. Uses the Grammar of Graphics.
ggetho Visualisation of High-Throughput Behavioural (i.e. Ethomics) Data
Extension of ‘ggplot2’ providing layers, scales and preprocessing functions useful to represent behavioural variables that are recorded over multiple animals and days.
ggExtra Collection of Functions and Layers to Enhance ggplot2
Collection of functions and layers to enhance ggplot2.
ggfan Summarise a Distribution Through Coloured Intervals
Implements the functionality of the ‘fanplot’ package as ‘geoms’ for ‘ggplot2’. Designed for summarising MCMC samples from a posterior distribution, where a visualisation is desired for several values of a continuous covariate. Increasing posterior intervals of the sampled quantity are mapped to a continuous colour scale.
ggfittext Fit Text Inside a Box in ‘ggplot2’
Provides ‘ggplot2’ geoms to fit text into a box by growing, shrinking or wrapping the text.
ggfocus Focus on Specific Factor Levels in your ggplot()
A ‘ggplot2’ extension that provides tools for automatically focusing specific factor levels.
ggforce Accelerating ‘ggplot2’
The aim of ‘ggplot2’ is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. ‘ggforce’ aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using ‘ggforce’ should be a stable experience.
ggformula Formula Interface to the Grammar of Graphics
Provides a formula interface to ‘ggplot2’ graphics.
ggfortify Data Visualization Tools for Statistical Analysis Results
Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using ‘ggplot2’.
ggghost Capture the Spirit of Your ‘ggplot2’ Calls
Creates a reproducible ‘ggplot2’ object by storing the data and calls.
gghighlight Highlight Lines and Points in ‘ggplot2’
Make it easier to explore data with highlights.
ggimage Use Image in ‘ggplot2’
Supports aesthetic mapping of image files to be visualized in ‘ggplot2’ graphic system. files as a scatterplot.
ggiraph Make ‘ggplot2’ Graphics Interactive Using ‘htmlwidgets’
Create interactive ‘ggplot2’ graphics that are usable in the ‘RStudio’ viewer pane, in ‘R Markdown’ documents and in ‘Shiny’ apps.
ggiraphExtra Make Interactive ‘ggplot2’. Extension to ‘ggplot2’ and ‘ggiraph’
Collection of functions to enhance ‘ggplot2’ and ‘ggiraph’. Provides functions for exploratory plots. All plot can be a ‘static’ plot or an ‘interactive’ plot using ‘ggiraph’.
ggjoy Joyplots in ‘ggplot2’
Joyplots provide a convenient way of visualizing changes in distributions over time or space. This package enables the creation of such plots in ‘ggplot2’.
gglasso Group Lasso Penalized Learning Using a Unified BMD Algorithm
A unified algorithm, blockwise-majorization-descent (BMD), for efficiently computing the solution paths of the group-lasso penalized least squares, logistic regression, Huberized SVM and squared SVM. The package is an implementation of Yang, Y. and Zou, H. (2015) DOI: <doi:10.1007/s11222-014-9498-5>.
Geom for Logo Sequence Plots
Visualize sequences in (modified) logo plots. The design choices used by these logo plots allow sequencing data to be more easily analyzed. Because it is integrated into the ‘ggplot2’ geom framework, these logo plots support native features such as faceting.
ggloop Create ‘ggplot2’ Plots in a Loop
Pass a data frame and mapping aesthetics to ggloop() in order to create a list of ‘ggplot2’ plots. The way x-y and dots are paired together is controlled by the remapping arguments. Geoms, themes, facets, and other features can be added with the special %L+% (L-plus) operator.
gglorenz Plotting Lorenz Curve with the Blessing of ‘ggplot2’
Provides statistical transformations for plotting empirical ordinary Lorenz curve (Lorenz 1905) <doi:10.2307/2276207> and generalized Lorenz curve (Shorrocks 1983) <doi:10.2307/2554117>.
ggm Functions for graphical Markov models
Functions and datasets for maximum likelihood fitting of some classes of graphical Markov models.
ggmap Spatial Visualization with Google Maps and OpenStreetMap
Easily visualize of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps with ggplot2.
GGMM Mixture Gaussian Graphical Models
The Gaussian graphical model is a widely used tool for learning gene regulatory networks with high-dimensional gene expression data. For many real problems, the data are heterogeneous, which may contain some subgroups or come from different resources. This package provide a Gaussian Graphical Mixture Model (GGMM) for the heterogeneous data. You can refer to Jia, B. and Liang, F. (2018) at <arXiv:1805.02547> for detail.
ggmosaic Mosaic Plots in the ‘ggplot2’ Framework
Mosaic plots in the ‘ggplot2’ framework. Mosaic plot functionality is provided in a single ‘ggplot2’ layer by calling the geom ‘mosaic’.
GGMridge Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation
Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM).
ggmuller Create Muller Plots of Evolutionary Dynamics
Create plots that combine a phylogeny and frequency dynamics. Phylogenetic input can be a generic adjacency matrix or a tree of class ‘phylo’. Inspired by similar plots in publications of the labs of RE Lenski and JE Barrick. Named for HJ Muller (who popularised such plots) and H Wickham (whose code this package exploits).
ggnetwork Geometries to Plot Networks with ‘ggplot2’
Geometries to plot network objects with ‘ggplot2’.
ggnewscale Multiple Fill and Color Scales in ‘ggplot2’
Use multiple fill and color scales in ‘ggplot2’.
ggnormalviolin A ‘ggplot2’ Extension to Make Normal Violin Plots
Uses ‘ggplot2’ to create normally distributed violin plots with specified means and standard deviations. This function can be useful in showing hypothetically normal distributions and confidence intervals.
ggpage Creates Page Layout Visualizations
Facilitates the creation of page layout visualizations in which words are represented as rectangles with sizes relating to the length of the words. Which then is divided in lines and pages for easy overview of up to quite large texts.
ggparliament Parliament Plots
Simple parliament plots using ‘ggplot2’. Visualize election results as points in the architectural layout of the legislative chamber.
ggperiodic Easy Plotting of Periodic Data with ‘ggplot2’
Implements methods to plot periodic data in any arbitrary range on the fly.
ggplot2 An Implementation of the Grammar of Graphics
An implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes. See http://ggplot2.org for more information, documentation and examples.
ggplotAssist RStudio’ Addin for Teaching and Learning ‘ggplot2’
An ‘RStudio’ addin for teaching and learning making plot using the ‘ggplot2’ package. You can learn each steps of making plot by clicking your mouse without coding. You can get resultant code for the plot.
ggplotify Convert Plot to ‘grob’ or ‘ggplot’ Object
Convert plot function call (using expression or formula) to ‘grob’ or ‘ggplot’ object that compatible to the ‘grid’ and ‘ggplot2’ ecosystem. With this package, we are able to e.g. using ‘cowplot’ to align plots produced by ‘base’ graphics, ‘grid’, ‘lattice’, ‘vcd’ etc. by converting them to ‘ggplot’ objects.
ggpmisc Miscellaneous Extensions to ‘ggplot2’
Implements extensions to ‘ggplot2’ respecting the grammar of graphics paradigm. Provides new stats to locate and tag peaks and valleys in 2D plots, a stat to add a label by group with the equation of a polynomial fitted with lm(), or R^2 or adjusted R^2 values for any model fitted with function lm(). Provides a function for flexibly converting time series to data frames suitable for plotting with ggplot(). In addition provides two stats useful for diagnosing what data are passed to compute_group() and compute_panel() functions.
ggPMX ggplot2′ Based Tool to Facilitate Diagnostic Plots for NLME Models
At Novartis, we aimed at standardizing the set of diagnostic plots used for modeling activities in order to reduce the overall effort required for generating such plots. For this, we developed a guidance that proposes an adequate set of diagnostics and a toolbox, called ‘ggPMX’ to execute them. ‘ggPMX’ is a toolbox that can generate all diagnostic plots at a quality sufficient for publication and submissions using few lines of code.
ggpol Visualizing Social Science Data with ‘ggplot2’
A ‘ggplot2’ extension for implementing parliament charts and several other useful visualizations.
ggpolypath Polygons with Holes for the Grammar of Graphics
Tools for working with polygons with holes in ‘ggplot2’, with a new ‘geom’ for drawing a ‘polypath’ applying the ‘evenodd’ or ‘winding’ rules.
ggpubr ggplot2′ Based Publication Ready Plots
ggplot2′ is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ‘ggplot’, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. ‘ggpubr’ provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.
ggpval Annotate Statistical Tests for ‘ggplot2’
Automatically perform desired statistical tests (e.g. wilcox.test(), t.test()) to compare between groups, and add test p-values to the plot with annotation bar. Visualizing group differences are frequently performed by boxplots, violin plots etc.. Statistical test results are often needed to be annotated on the plots. This package provide a convenient function that work on ‘ggplot2’ objects, perform desired statistical test between groups of interest and annotate the test results on the plot.
ggQC Quality Control Charts for the Grammar of Graphics Plotting System
Plot single and faceted type quality control charts within the grammar of graphics plotting framework.
ggQQunif Compare Big Datasets to the Uniform Distribution
A quantile-quantile plot can be used to compare a sample of p-values to the uniform distribution. But when the dataset is big (i.e. > 1e4 p-values), plotting the quantile-quantile plot can be slow. geom_QQ uses all the data to calculate the quantiles, but thins it out in a way that focuses on points near zero before plotting to speed up plotting and decrease file size, when vector graphics are stored.
ggquickeda Quickly Explore Your Data Using ‘ggplot2’ and Summary Tables
Quickly and easily perform exploratory data analysis by uploading your data as a ‘csv’ file. Start generating insights using ‘ggplot2’ plots and tables with descriptive stats, all using an easy-to-use point and click ‘Shiny’ interface.
ggquiver Quiver Plots for ‘ggplot2’
An extension of ‘ggplot2’ to provide quiver plots to visualise vector fields. This functionality is implemented using a geom to produce a new graphical layer, which allows aesthetic options. This layer can be overlaid on a map to improve visualisation of mapped data.
ggraptR Allows Interactive Visualization of Data Through a Web Browser GUI
Intended for both technical and non-technical users to create interactive data visualizations through a web browser GUI without writing any code.
ggrepel Repulsive Text and Label Geoms for ‘ggplot2’
Provides text and label geoms for ‘ggplot2’ that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.
ggridges Ridgeline Plots in ‘ggplot2’
Ridgeline plots provide a convenient way of visualizing changes in distributions over time or space. This package enables the creation of such plots in ‘ggplot2’.
ggsci Scientific Journal and Sci-Fi Themed Color Palettes for ‘ggplot2’
A collection of ‘ggplot2’ color palettes inspired by scientific journals and science fiction TV shows.
ggseas seasonal adjustment on the fly extension for ggplot2
Seasonal adjustment on the fly extension for ggplot2. Convenience functions that let you easily do seasonal adjustment on the fly with ggplot. Depends on the seasonal package to give you access to X13-SEATS-ARIMA.
A ‘ggplot2’ Extension for Drawing Publication-Ready Sequence Logos
The extensive range of functions provided by this package makes it possible to draw highly versatile sequence logos. Features include, but not limited to, modifying colour schemes and fonts used to draw the logo, generating multiple logo plots, and aiding the visualisation with annotations. Sequence logos can easily be combined with other plots ‘ggplot2’ plots.
ggsignif Significance Bars for ‘ggplot2’
Enrich your ggplots with group-wise comparisons. This package provides an easy way to indicate if two groups are significantly different. Commonly this is shown by a bar on top connecting the groups of interest which itself is annotated with the level of significance (NS, *, **, ***). The package provides a single layer (geom_signif) that takes the groups for comparison and the test (t.test(), wilcox.text() etc.) as arguments and adds the annotation to the plot.
ggsolvencyii A ‘ggplot2’-Plot of Composition of Solvency II SCR: SF and IM
An implementation of ‘ggplot2’-methods to present the composition of Solvency II Solvency Capital Requirement (SCR) as a series of concentric circle-parts. Solvency II (Solvency 2) is European insurance legislation, coming in force by the delegated acts of October 10, 2014. <https://…/?uri=OJ%3AL%3A2015%3A012%3ATOC>. Additional files, defining the structure of the Standard Formula (SF) method of the SCR-calculation are provided. The structure files can be adopted for localization or for insurance companies who use Internal Models (IM). Options are available for combining smaller components, horizontal and vertical scaling, rotation, and plotting only some circle-parts. With outlines and connectors several SCR-compositions can be compared, for example in ORSA-scenarios (Own Risk and Solvency Assessment).
ggsom New Data Visualisations for SOMs Cluster
Contains parallel coordinate and attribute mapping visualisations for cluster data.
ggspatial Spatial Data Framework for ggplot2
Spatial data plus the power of the ggplot2 framework means easier mapping when input data are already in the form of Spatial* objects.
ggspectra Extensions to ‘ggplot2’ for Radiation Spectra
Additional annotations, stats and scales for plotting ‘light’ spectra with ‘ggplot2’, together with specializations of ggplot() and plot() methods for spectral data stored in objects of the classes defined in package ‘photobiology’ and a plot() method for objects of class ‘waveband’, also defined in package ‘photobiology’.
ggstance Horizontal ‘ggplot2’ Components
A ‘ggplot2’ extension that provides flipped components: horizontal versions of ‘Stats’ and ‘Geoms’, and vertical versions of ‘Positions’.
ggstatsplot ggplot2′ Based Plots with Statistical Details
Extension of ‘ggplot2’, ‘ggstatsplot’ creates graphics with details from statistical tests (parametric, non-parametric, or robust) included in the plots themselves. It is targeted primarily at behavioral sciences community to provide a one-line code to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms) or categorical (pie charts) data.
ggtern An Extension to ‘ggplot2’, for the Creation of Ternary Diagrams
Extends the functionality of ggplot2, providing the capability to plot ternary diagrams for (subset of) the ggplot2 geometries. Additionally, ggtern has implemented several NEW geometries which are unavailable to the standard ggplot2 release. For further examples and documentation, please proceed to the ggtern website.
ggThemeAssist Add-in to Customize ‘ggplot2’ Themes
Rstudio add-in that delivers a graphical interface for editing ‘ggplot2’ theme elements.
ggTimeSeries Time Series Visualisations Using the Grammar of Graphics
Provides additional display mediums for time series visualisations, such as calendar heat map, steamgraph, marimekko, etc.
ggtree A phylogenetic tree viewer for different types of tree annotations
ggtree extends the ggplot2 plotting system which implemented the grammar of graphics. ggtree is designed for visualizing phylogenetic tree and different types of associated annotation data.
ggvis Interactive Grammar of Graphics
An implementation of an interactive grammar of graphics, taking the best parts of ggplot2, combining them with shiny’s reactive framework and drawing web graphics using vega.
ggvoronoi Voronoi Diagrams and Heatmaps with ‘ggplot2’
Easy creation and manipulation of Voronoi diagrams using ‘deldir’ with visualization in ‘ggplot2’.
ggwordcloud A Wordcloud Geom for ‘ggplot2’
Provides a wordcloud text geom for ‘ggplot2’. Texts are placed so that they do not overlap as in ‘ggrepel’. The algorithm used is a variation around the one of ‘wordcloud2.js’.
ghit Lightweight GitHub Package Installer
A lightweight, vectorized drop-in replacement for ‘devtools::install_github()’ that uses native git and R methods to clone and install a package from GitHub.
GHS Graphical Horseshoe MCMC Sampler Using Data Augmented Block Gibbs Sampler
Draw posterior samples to estimate the precision matrix for multivariate Gaussian data. Posterior means of the samples is the graphical horseshoe estimate by Li, Bhadra and Craig(2017) <arXiv:1707.06661>. The function uses matrix decomposition and variable change from the Bayesian graphical lasso by Wang(2012) <doi:10.1214/12-BA729>, and the variable augmentation for sampling under the horseshoe prior by Makalic and Schmidt(2016) <arXiv:1508.03884>. Structure of the graphical horseshoe function was inspired by the Bayesian graphical lasso function using blocked sampling, authored by Wang(2012) <doi:10.1214/12-BA729>.
gibble Component Geometry Decomposition
Translation and restructuring operations for planar shapes and other hierarchical types require a data model with a record of the underlying relationships between elements. The gibble() function creates a geometry map, a simple record of the underlying structure in path-based hierarchical types. There are methods for the planar shape types in the ‘sf’ package.
Gifi Multivariate Analysis with Optimal Scaling
Implements categorical principal component analysis (‘PRINCALS’), multiple correspondence analysis (‘HOMALS’). It replaces the ‘homals’ package.
gifski Highest Quality GIF Encoder
Multi-threaded GIF encoder written in Rust: <https://gif.ski/>. Converts images to GIF animations using pngquant’s efficient cross-frame palettes and temporal dithering with thousands of colors per frame.
gim Generalized Integration Model
Implements the generalized integration model, which integrates individual-level data and summary statistics under a generalized linear model framework. It supports continuous and binary outcomes to be modeled by the linear and logistic regression models.
gimme Group Iterative Multiple Model Estimation
Automated identification and estimation of group- and individual-level relations in time series data from within a structural equation modeling framework.
gimmeTools Supplemental Tools for the ‘gimme’ R Package
Supplemental tools for the ‘gimme’ R package. It contains an interactive graphical user interface, allowing for the flexible specification of a variety of both basic and advanced options. It will expand to include a variety of tools for navigating output.
GiniWegNeg Computing the Gini Coefficient for Weighted and Negative Attributes
Computation of the Gini coefficient in the presence of weighted and/or negative attributes. Two different approaches are considered in order to fulfill, in the case of negative attributes, the normalization principle, that is a value of the Gini coefficient bounded into the close range [0,1]. The first approach is based on the proposal by Chen, Tsaur and Rhai (1982) and Berebbi and Silber (1985), while the second approach is based on a recent proposal by Raffinetti, Siletti and Vernizzi (2015). The plot of the curve of maximum inequality, defined in the contribution of Raffinetti, Siletti and Vernizzi (2015), is provided.
GiRaF Gibbs Random Fields Analysis
Allows calculation on, and sampling from Gibbs Random Fields, and more precisely general homogeneous Potts model. The primary tool is the exact computation of the intractable normalising constant for small rectangular lattices. Beside the latter function, it contains method that give exact sample from the likelihood for small enough rectangular lattices or approximate sample from the likelihood using MCMC samplers for large lattices.
gistr Work with GitHub Gists
Work with GitHub gists from R (e.g., http://…/GitHub#Gist , https://…/about-gists ). A gist is simply one or more files with code/text/images/etc. gistr allows the user to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated. Some requests require authentication and some do not. Gists website: https://gist.github.com .
git2r Provides Access to Git Repositories
Interface to the libgit2 library, which is a pure C implementation of the Git core methods. Provides access to Git repositories to extract data and running some basic git commands.
gitgadget Rstudio Addin for Version Control and Assignment Management using Git
An Rstudio addin for version control that allows users to clone repos, create and delete branches, and sync forks on GitHub, GitLab, etc. Furthermore, the addin uses the GitLab API to allow instructors to create forks and merge requests for all students/teams with one click of a button.
givitiR The GiViTI Calibration Test and Belt
Functions to assess the calibration of logistic regression models with the GiViTI (Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units – see <http://…/> ) approach. The approach consists in a graphical tool, namely the GiViTI calibration belt, and in the associated statistical test. These tools can be used both to evaluate the internal calibration (i.e. the goodness of fit) and to assess the validity of an externally developed model.
gjam Generalized Joint Attribute Modeling
Analyzes joint attribute data (e.g., species abundance) that are combinations of continuous and discrete data with Gibbs sampling.
gk g-and-k and g-and-h Distribution Functions
Functions for the g-and-k and generalised g-and-h distributions.
GK2011 Gaines and Kuklinski (2011) Estimators for Hybrid Experiments
Implementations of the treatment effect estimators for hybrid (self-selection) experiments, as developed by Brian J. Gaines and James H. Kuklinski, (2011), ‘Experimental Estimation of Heterogeneous Treatment Effects Related to Self-Selection,’ American Journal of Political Science 55(3): 724-736.
gkmSVM Gapped-Kmer Support Vector Machine
Imports the ‘gkmSVM’ v2.0 functionalities into R <http://…/> It also uses the ‘kernlab’ library (separate R package by different authors) for various SVM algorithms.
glamlasso Lasso Penalization in Large Scale Generalized Linear Array Models
Efficient design matrix free procedure for Lasso regularized estimation in large scale 3-dimensional generalized linear array models. The Gaussian model with identity link, the Binomial model with logit link, the Poisson model with log link and the Gamma model with log link is currently implemented.
glancedata Generate tables and plots to get summaries of data
Generate data frames for all the variables with some summaries and also some plots for numerical variables. Several functions from the ‘tidyverse’ and ‘GGally’ packages are used.
glassoFast Fast Graphical LASSO
A fast and improved implementation of the graphical LASSO.
glcm Calculate Textures from Grey-Level Co-Occurrence Matrices (GLCMs)
Enables calculation of image textures (Haralick 1973) <doi:10.1109/TSMC.1973.4309314> from grey-level co-occurrence matrices (GLCMs). Supports processing images that cannot fit in memory.
GLDEX Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods
The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
GLDreg Fit GLD Regression Model and GLD Quantile Regression Model to Empirical Data
Owing to the rich shapes of GLDs, GLD standard/quantile regression is a competitive flexible model compared to standard/quantile regression. The proposed method has some major advantages: 1) it provides a reference line which is very robust to outliers with the attractive property of zero mean residuals and 2) it gives a unified, elegant quantile regression model from the reference line with smooth regression coefficients across different quantiles. The goodness of fit of the proposed model can be assessed via QQ plots and the Kolmogorov-Smirnov test, to ensure the appropriateness of the statistical inference under consideration. Statistical distributions of coefficients of the GLD regression line are obtained using simulation, and interval estimates are obtained directly from simulated data.
gldrm Generalized Linear Density Ratio Models
Fits a generalized linear density ratio model (GLDRM). A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known.
GLIDE Global and Individual Tests for Direct Effects
Functions evaluate global and individual tests for direct effects in Mendelian randomization studies.
gllvm Generalized Linear Latent Variable Models
Generalized linear latent variable model (gllvm) for analyzing multivariate data. Estimation is performed using either Laplace approximation or variational approximation method implemented via TMB (Kristensen et al., (2016), <doi:10.18637/jss.v070.i05>). Details for gllvm, see Hui et al. (2015) <doi:10.1111/2041-210X.12236> and (2017) <doi:10.1080/10618600.2016.1164708> and Niku et al. (2017) <doi:10.1007/s13253-017-0304-7>.
glm.ddR Distributed ‘glm’ for Big Data using ‘ddR’ API
Distributed training and prediction of generalized linear models using ‘ddR’ (Distributed Data Structures) API in the ‘ddR’ package.
glm.deploy C’ and ‘Java’ Source Code Generator for Fitted Glm Objects
Provides two functions that generate source code implementing the predict function of fitted glm objects. In this version, code can be generated for either ‘C’ or ‘Java’. The idea is to provide a tool for the easy and fast deployment of glm predictive models into production. The source code generated by this package implements two function/methods. One of such functions implements the equivalent to predict(type=’response’), while the second implements predict(type=’link’). Source code is written to disk as a .c or .java file in the specified path. In the case of c, an .h file is also generated.
glm.predict Predicted Values and Discrete Changes for GLM
Functions to calculate predicted values and the difference between the two cases with confidence interval for glm, glm.nb, polr and multinom.
glmaag Adaptive LASSO and Network Regularized Generalized Linear Models
Efficient procedures for adaptive LASSO and network regularized for Gaussian, logistic, and Cox model. Provides network estimation procedure (combination of methods proposed by Ucar, et al. (2007) <doi:10.1093/bioinformatics/btm423> and Meinshausen and Buhlmann (2006) <doi:10.1214/009053606000000281>), cross validation and stability selection proposed by Meinshausen and Buhlmann (2010) <doi:10.1111/j.1467-9868.2010.00740.x> and Liu, Roeder and Wasserman (2010) <arXiv:1006.3316> methods. Interactive R app is available.
GLMaSPU An Adaptive Test on High Dimensional Parameters in Generalized Linear Models
Several tests for high dimensional generalized linear models have been proposed recently. In this package, we implemented a new test called adaptive sum of powered score (aSPU) for high dimensional generalized linear models, which is often more powerful than the existing methods in a wide scenarios. We also implemented permutation based version of several existing methods for research purpose. We recommend users use the aSPU test for their real testing problem. You can learn more about the tests implemented in the package via the following papers: 1. Pan, W., Kim, J., Zhang, Y., Shen, X. and Wei, P. (2014) <DOI:10.1534/genetics.114.165035> A powerful and adaptive association test for rare variants, Genetics, 197(4). 2. Guo, B., and Chen, S. X. (2016) <DOI:10.1111/rssb.12152>. Tests for high dimensional generalized linear models. Journal of the Royal Statistical Society: Series B. 3. Goeman, J. J., Van Houwelingen, H. C., and Finos, L. (2011) <DOI:10.1093/biomet/asr016>. Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control. Biometrika, 98(2).
glmbb All Hierarchical Models for Generalized Linear Model
Find all hierarchical models of specified generalized linear model with information criterion (AIC, BIC, or AICc) within specified cutoff of minimum value. Use branch and bound algorithm so we do not have to fit all models.
glmBfp Bayesian Fractional Polynomials for GLMs
Implements the Bayesian paradigm for fractional polynomials in generalized linear models. See package ‘bfp’ for the treatment of normal models.
glmdisc Discretization and Grouping for Logistic Regression
A Stochastic-Expectation-Maximization (SEM) algorithm (Celeux et al. (1995) <https://…/inria-00074164> ) associated with a Gibbs sampler which purpose is to learn a constrained representation for logistic regression that is called quantization (Ehrhardt et al. (2019) <arXiv:1903.08920>). Continuous features are discretized and categorical features’ values are grouped to produce a better logistic regression model. Pairwise interactions between quantized features are dynamically added to the model through a Metropolis-Hastings algorithm (Hastings, W. K. (1970) <doi:10.1093/biomet/57.1.97>).
glmertree Generalized Linear Mixed Model Trees
Recursive partitioning based on (generalized) linear mixed models (GLMMs) combining lmer()/glmer() from lme4 and lmtree()/glmtree() from partykit.
glmgraph Graph-Constrained Regularization for Sparse Generalized Linear Models
We propose to use sparse regression model to achieve variable selection while accounting for graph-constraints among coefficients. Different linear combination of a sparsity penalty(L1) and a smoothness(MCP) penalty has been used, which induces both sparsity of the solution and certain smoothness on the linear coefficients.
glmm Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation
Approximates the likelihood of a generalized linear mixed model using Monte Carlo likelihood approximation. Then maximizes the likelihood approximation to return maximum likelihood estimates, observed Fisher information, and other model information.
glmmEP Generalized Linear Mixed Model Analysis via Expectation Propagation
Approximate frequentist inference for generalized linear mixed model analysis with expectation propagation used to circumvent the need for multivariate integration. In this version, the random effects can be any reasonable dimension. However, only probit mixed models with one level of nesting are supported. The methodology is described in Hall, Johnstone, Ormerod, Wand and Yu (2018) <arXiv:1805.08423v1>.
glmmfields Generalized Linear Mixed Models with Robust Random Fields for Spatiotemporal Modeling
Implements Bayesian spatial and spatiotemporal models that optionally allow for extreme spatial deviations through time. ‘glmmfields’ uses a predictive process approach with random fields implemented through a multivariate-t distribution instead of the usual multivariate normal. Sampling is conducted with ‘Stan’. References: Anderson and Ward (2018) <doi:10.1002/ecy.2403>.
GLMMRR GLM for Binary Randomized Response Data
GLM for Binary Randomized Response Data. Includes Cauchit, Log-log, Logistic, and Probit link functions for Bernoulli distributed RR data.
glmmsr Fit a Generalized Linear Mixed Model
Conduct inference about generalized linear mixed models, with a choice about which method to use to approximate the likelihood. In addition to the Laplace and adaptive Gaussian quadrature approximations, which are borrowed from ‘lme4’, the likelihood may be approximated by the sequential reduction approximation, or an importance sampling approximation. These methods provide an accurate approximation to the likelihood in some situations where it is not possible to use adaptive Gaussian quadrature.
glmmTMB Generalized Linear Mixed Models using Template Model Builder
Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via ‘TMB’ (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.
glmnet Lasso and Elastic-Net Regularized Generalized Linear Models
Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multiple-response Gaussian, and the grouped multinomial. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the paper linked to via the URL below.
glmnetUtils Utilities for ‘Glmnet’
Provides a formula interface for the ‘glmnet’ package for elasticnet regression, a method for cross-validating the alpha parameter, and other quality-of-life tools.
GLMsData Generalized Linear Model Data Sets
Data sets from the book Generalized Linear Models with Examples in R by Dunn and Smyth.
glmvsd Variable Selection Deviation Measures and Instability Tests for High-Dimensional Generalized Linear Models
Variable selection deviation (VSD) measures and instability tests for high-dimensional model selection methods such as LASSO, SCAD and MCP, etc., to decide whether the sparse patterns identified by those methods are reliable.
globals Identify Global Objects in R Expressions
Identifies global (‘unknown’) objects in R expressions by code inspection using various strategies, e.g. conservative or liberal. The objective of this package is to make it as simple as possible to identify global objects for the purpose of exporting them in distributed compute environments.
globe Plot 2D and 3D Views of the Earth, Including Major Coastline
Basic functions for plotting 2D and 3D views of a sphere, by default the Earth with its major coastline, and additional lines and points.
glogis Fitting and Testing Generalized Logistic Distributions
Tools for the generalized logistic distribution (Type I, also known as skew-logistic distribution), encompassing basic distribution functions (p, q, d, r, score), maximum likelihood estimation, and structural change methods.
glrt Generalized Logrank Tests for Interval-censored Failure Time Data
Functions to conduct four generalized logrank tests and a score test under a proportional hazards model
glue Interpreted String Literals
An implementation of interpreted string literals, inspired by Python’s Literal String Interpolation <https://…/> and Docstrings <https://…/> and Julia’s Triple-Quoted String Literals <https://…/#triple-quoted-string-literals>.
gma Granger Mediation Analysis
Performs Granger mediation analysis (GMA) for time series. This package includes a single level GMA model and a two-level GMA model, for time series with hierarchically nested structure. The single level GMA model for the time series of a single participant performs the causal mediation analysis which integrates the structural equation modeling and the Granger causality frameworks. A vector autoregressive model of order p is employed to account for the spatiotemporal dependencies in the data. Meanwhile, the model introduces the unmeasured confounding effect through a nonzero correlation parameter. Under the two-level model, by leveraging the variabilities across participants, the parameters are identifiable and consistently estimated based on a full conditional likelihood or a two-stage method. See Zhao, Y., & Luo, X. (2017), Granger Mediation Analysis of Multiple Time Series with an Application to fMRI, <arXiv:1709.05328> for details.
gmapsdistance Distance and Travel Time Between Two Points from Google Maps
Get distance and travel time between two points from Google Maps. Four possible modes of transportation (bicycling, walking, driving and public transportation).
gmat Simulation of Graphically Constrained Matrices
Implementation of the simulation method for Gaussian graphical models described in Córdoba et al. (2018) <arXiv:1807.03090>. The package also provides an alternative method based on diagonally dominant matrices.
GMDH Predicting and Forecasting Time Series via GMDH-Type Neural Network Algorithms
Group method of data handling (GMDH) – type neural network algorithm is the heuristic self-organization method for modelling the complex systems. In this package, GMDH-type neural network algorithms are applied to predict and forecast a univariate time series.
GMDH2 Binary Classification via GMDH-Type Neural Network Algorithm
Performs binary classification via Group Method of Data Handling (GMDH) – type neural network algorithm. Also, it produces a well-formatted table of descriptives for a binary response. Moreover, it returns confusion matrix and related statistics and scatter plot with classification labels of binary classes to assess the prediction performance. All ‘GMDH2’ functions are designed for a binary response. See Dag O. and Yozgatligil C. (2016, ISSN:2073-4859) and Kondo T. and Ueno J. (2016, ISSN:1349-4198) for the details of GMDH algorithms.
Gmedian Geometric Median, k-Median Clustering and Robust Median PCA
Fast algorithms based on averaged stochastic gradient for robust estimation with large samples (with data whose dimension is larger than 2). Estimation of the geometric median, robust k-Gmedian clustering, and robust PCA based on the Gmedian covariation matrix.
gmediation Mediation Analysis for Multiple and Multi-Stage Mediators
Current version of this R package conducts mediation path analysis for multiple mediators in two stages.
gmeta Meta-Analysis via a Unified Framework of Confidence Distribution
An implementation of an all-in-one function for a wide range of meta-analysis problems. It contains a single function gmeta() that unifies all standard meta-analysis methods and also several newly developed ones under a framework of combining confidence distributions (CDs). Specifically, the package can perform classical p-value combination methods (such as methods of Fisher, Stouffer, Tippett, etc.), fit meta-analysis fixed-effect and random-effects models, and synthesizes 2×2 tables. Furthermore, it can perform robust meta-analysis, which provides protection against model-misspecifications, and limits the impact of any unknown outlying studies. In addition, the package implements two exact meta-analysis methods from synthesizing 2×2 tables with rare events (e.g., zero total event). A plot function to visualize individual and combined CDs through extended forest plots is also available.
gmfd Inference and Clustering of Functional Data
Some methods for the inference and clustering of univariate and multivariate functional data, using a generalization of Mahalanobis distance, along with some functions useful for the analysis of functional data. For further details, see Martino A., Ghiglietti, A., Ieva, F. and Paganoni A. M. (2017) <arXiv:1708.00386>.
gmnl Multinomial Logit Models with Random Parameters
An implementation of maximum simulated likelihood method for the estimation of multinomial logit models with random coefficients. Specifically, it allows estimating models with continuous heterogeneity such as the mixed multinomial logit and the generalized multinomial logit. It also allows estimating models with discrete heterogeneity such as the latent class and the mixed-mixed multinomial logit model.
gMOIP 2D plots of linear or integer programming models’
Make 2D plots of the polyeder of a LP or IP problem, including integer points and iso profit curve. Can also make a plot of a bi-objective criterion space.
GMSE Generalised Management Strategy Evaluation Simulator
Integrates game theory and ecological theory to construct social-ecological models that simulate the management of populations and stakeholder actions. These models build off of a previously developed management strategy evaluation (MSE) framework to simulate all aspects of management: population dynamics, manager observation of populations, manager decision making, and stakeholder responses to management decisions. The newly developed game-theoretic management strategy evaluation (GMSE) framework uses genetic algorithms to mimic the decision-making process of managers and stakeholders under conditions of change, uncertainty, and conflict.
gmum.r GMUM Machine Learning Group Package
Direct R interface to Support Vector Machine libraries (‘LIBSVM’ and ‘SVMLight’) and efficient C++ implementations of Growing Neural Gas and models developed by ‘GMUM’ group (Cross Entropy Clustering and 2eSVM).
gmvarkit Estimate Gaussian Mixture Vector Autoregressive Model
Maximum likelihood estimation of Gaussian Mixture Vector Autoregressive (GMVAR) model, quantile residual tests, graphical diagnostics, forecasting and simulations. Applying general linear constraints to the autoregressive parameters is supported. Leena Kalliovirta, Mika Meitz, Pentti Saikkonen (2016) <doi:10.1016/j.jeconom.2016.02.012>.
gmwm Generalized Method of Wavelet Moments
Generalized Method of Wavelet Moments (GMWM) is an estimation technique for the parameters of time series models. It uses the wavelet variance in a moment matching approach that makes it particularly suitable for the estimation of certain state-space models. Furthermore, there exists a robust implementation of GMWM, which allows the robust estimation of some state-space models and ARIMA models. Lastly, the package provides the ability to quickly generate time series data, perform different wavelet decompositions, and visualizations.
gnFit Goodness of Fit Test for Continuous Distribution Functions
Computes the test statistic and p-value of the Cramer-von Mises and Anderson-Darling test for some continuous distribution functions proposed by Chen and Balakrishnan (1995) <http://…/index.html?item=11407>. In addition to our classic distribution functions here, we calculate the Goodness of Fit (GoF) test to dataset which follows the extreme value distribution function, without remembering the formula of distribution/density functions.
gnlm Generalized Nonlinear Regression Models
A variety of functions to fit linear and nonlinear regression with a large selection of distributions.
gnorm Generalized Normal/Exponential Power Distribution
Functions for obtaining generalized normal/exponential power distribution probabilities, quantiles, densities and random deviates. The generalized normal/exponential power distribution was introduced by Subbotin (1923) and rediscovered by Nadarajah (2005). The parametrization given by Nadarajah (2005) <doi:10.1080/02664760500079464> is used.
gofastr Fast DocumentTermMatrix and TermDocumentMatrix Creation
Harness the power of ‘quanteda’, ‘data.table’ & ‘stringi’ to quickly generate ‘tm’ DocumentTermMatrix and TermDocumentMatrix data structures.
gofCopula Goodness-of-Fit Tests for Copulae
Several GoF tests for Copulae are provided. A new hybrid test is implemented which supports all of the individual tests. Estimation methods for the margins are provided. All the tests support parameter estimation and predefined values. The parameters are estimated by pseudo maximum likelihood but if it fails the estimation switches automatically to inversion of Kendall’s tau.
GofKmt Khmaladze Martingale Transformation Goodness-of-Fit Test
Consider a goodness-of-fit(GOF) problem of testing whether a random sample comes from one sample location-scale model where location and scale parameters are unknown. It is well known that Khmaladze martingale transformation method provides asymptotic distribution free test for the GOF problem. This package contains one function: KhmaladzeTrans(). In this version, KhmaladzeTrans() provides test statistic and critical value of GOF test for normal, Cauchy, and logistic distributions.
goftte Goodness-of-Fit for Time-to-Event Data
Extension of ‘gof ‘ package to survival models.
gomms GLM-Based Ordination Method
A zero-inflated quasi-Poisson factor model to display similarity between samples visually in a low (2 or 3) dimensional space.
GoodmanKruskal Association Analysis for Categorical Variables
Association analysis between categorical variables using the Goodman and Kruskal tau measure. This asymmetric association measure allows the detection of asymmetric relations between categorical variables (e.g., one variable obtained by re-grouping another).
googleAnalyticsR Google Analytics API into R
R library for interacting with the Google Analytics Reporting API v3 and v4.
googleAuthR Easy Authentication with Google OAuth2 APIs
Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.
googleCloudStorageR R Interface with Google Cloud Storage
Interact with Google Cloud Storage API in R. Part of the ‘cloudyr’ project.
googleComputeEngineR R Interface with Google Compute Engine
Interact with the Google Compute Engine API in R. Lets you create, start and stop instances in the Google Cloud. Support for preconfigured instances, with templates for common R needs.
googledrive An Interface to Google Drive
Manage Google Drive files from R.
googleformr Collect Data Programmatically by POST Methods to Google Forms
GET and POST data to Google Forms; more secure than having to expose Google Sheets in order to POST data.
GoogleKnowledgeGraphR Retrieve Information from ‘Google Knowledge Graph’ API
Allows you to retrieve information from the ‘Google Knowledge Graph’ API <https://…/knowledge.html> and process it in R in various forms. The ‘Knowledge Graph Search’ API lets you find entities in the ‘Google Knowledge Graph’. The API uses standard ‘schema.org’ types and is compliant with the ‘JSON-LD’ specification.
googleLanguageR Call Google’s ‘Natural Language’ API, ‘Cloud Translation’ API and ‘Cloud Speech’ API
Call ‘Google Cloud’ machine learning APIs for text and speech tasks. Call the ‘Cloud Translation’ API <https://…/> for detection and translation of text, the ‘Natural Language’ API <https://…/> to analyse text for sentiment, entities or syntax or the ‘Cloud Speech’ API <https://…/> to transcribe sound files to text.
googlenlp An Interface to Google’s Cloud Natural Language API
Interact with Google’s Cloud Natural Language API <https://…/> (v1) via R. The API has four main features, all of which are available through this R package: syntax analysis and part-of-speech tagging, entity analysis, sentiment analysis, and language identification.
googlePolylines Encoding Coordinates into ‘Google’ Polylines
Encodes simple feature (‘sf’) objects and coordinates using the ‘Google’ polyline encoding algorithm (<https://…/polylinealgorithm> ).
googlePrintr Connect to ‘Google Cloud Print’ API
Allows printing documents from R through ‘Google Cloud Print’ API. See <https://…/overview> for more information about ‘Google Cloud Print’.
googlePublicData Working with Google Public Data Explorer DSPL Metadata Files
Provides a collection of functions designed for working with ‘Google Public Data Explorer’. Automatically builds up the corresponding DSPL (XML) metadata files and CSV files; compressing all the files and leaving them ready to be published on the ‘Public Data Explorer’.
googlesheets Google Spreadsheets R API
Access and manage Google spreadsheets from R with googlesheets. Features:
• Access a spreadsheet by its title, key or URL.
• Extract data or edit data.
• Create | delete | rename | copy | upload | download spreadsheets and worksheets.
googleVis R Interface to Google Charts
R interface to Google Charts API, allowing users to create interactive charts based on data frames. Charts are displayed locally via the R HTTP help server. A modern browser with Internet connection is required and for some charts a Flash player. The data remains local and is not uploaded to Google.
googleway Retrieve Routes from Google Directions API and Decode Encoded Polylines
Retrieves routes and decodes polylines generated from Google’s directions API (https://…/directions ).
GORCure Fit Generalized Odds Rate Mixture Cure Model with Interval Censored Data
Generalized Odds Rate Mixture Cure (GORMC) model is a flexible model of fitting survival data with a cure fraction, including the Proportional Hazards Mixture Cure (PHMC) model and the Proportional Odds Mixture Cure Model as special cases. This package fit the GORMC model with interval censored data.
Goslate Goslate Interface
An interface to the Python package Goslate (Version 1.5.0). Goslate provides an API to Google’s free online language translation service by querying the Google translation website. See <https://…/> for more information about the Python package.
gower Gower’s Distance
Compute Gower’s distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting openMP.
GPareto Gaussian Processes for Pareto Front Estimation and Optimization
Gaussian process regression models, a.k.a. kriging models, are applied to global multiobjective optimization of black-box functions. Multiobjective Expected Improvement and Stepwise Uncertainty Reduction sequential infill criteria are available. A quantification of uncertainty on Pareto fronts is provided using conditional simulations
GPB Generalized Poisson Binomial Distribution
Functions that compute the distribution functions for the Generalized Poisson Binomial distribution, which provides the cdf, pmf, quantile function, and random number generation for the distribution.
GPCMlasso Differential Item Functioning in Generalized Partial Credit Models
Provides a function to detect Differential Item Functioning (DIF) in Generalized Partial Credit Models (GPCM) and special cases of the GPCM. A joint model is set up where DIF is explicitly parametrized and penalized likelihood estimation is used for parameter selection. The big advantage of the method called GPCMlasso is that several variables can be treated simultaneously and that both continuous and categorical variables can be used to detect DIF.
GPfit Gaussian Processes Modeling
A computationally stable approach of fitting a Gaussian Process (GP) model to a deterministic simulator. Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version ofthe model proposed by Ranjan et al. (2011 ) in the R package GPfit. A novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm yield robust optimization that is typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit . Several test functions are used for performance comparison with the popular R package mlegp . We also use GPfit for a real application, i.e., for emulating the tidal kinetic energy model for the Bay of Fundy, Nova Scotia, Canada. GPfit is free software and distributed under the General Public License and available from the Comprehensive R Archive Network.
http://…/paper
gpg GNU Privacy Guard for R
Bindings to GnuPG for working with OpenGPG (RFC4880) cryptographic methods. Includes utilities for public key encryption, creating and verifying digital signatures, and managing your local keyring. Note that some functionality depends on the version of GnuPG that is installed on the system. In particular GnuPG 2.1 mandates the use of ‘gpg-agent’ for entering passphrases, which only works if R runs in a terminal session.
GPGame Solving Complex Game Problems using Gaussian Processes
Sequential strategies for finding game equilibria are proposed in a black-box setting (expensive pay-off evaluations, no derivatives). The algorithm handles noiseless or noisy evaluations. Two acquisition functions are available. Graphical outputs can be generated automatically.
GpGp Fast Gaussian Process Computation Using Vecchia’s Approximation
Functions for reordering input locations, finding ordered nearest neighbors (with help from ‘FNN’ package), grouping operations, approximate likelihood evaluations, profile likelihoods, Gaussian process predictions, and conditional simulations. Covariance functions for spatial and spatial-temporal data on Euclidean domains and spheres are provided. The original approximation is due to Vecchia (1988) <http://…/2345768>, and the reordering and grouping methods are from Guinness (2018) <doi:10.1080/00401706.2018.1437476>.
gpHist Gaussian Process with Histogram Intersection Kernel
Provides an implementation of a Gaussian process regression with a histogram intersection kernel (HIK) and utilizes approximations to speed up learning and prediction. In contrast to a squared exponential kernel, an HIK provides advantages such as linear memory and learning time requirements. However, the HIK only provides a piecewise-linear approximation of the function. Furthermore, the number of estimated eigenvalues is reduced. The eigenvalues and vectors are required for the approximation of the log-likelihood function as well as the approximation of the predicted variance of new samples. This package provides approximations for a single eigenvalue as well as multiple. Further information of the variance and log-likelihood approximation, as well as the Gaussian process with HIK, can be found in the paper by Rodner et al. (2016) <doi:10.1007/s11263-016-0929-y>.
gphmm Generalized Pair Hidden Markov Chain Model for Sequence Alignment
Implementation of a generalized pair hidden Markov chain model (GPHMM) that can be used to compute the probability of alignment between two sequences of nucleotides (e.g., a reference sequence and a noisy sequenced read). The model can be trained on a dataset where the noisy sequenced reads are known to have been sequenced from known reference sequences. If no training sets are available default parameters can be used.
GPM Gaussian Process Modeling of Multi-Response Datasets
Provides a general and efficient tool for fitting a response surface to datasets via Gaussian processes. The dataset can have multiple responses. The package is based on the work of Bostanabad, R., Kearney, T., Tao, S., Apley, D. W. & Chen, W. Leveraging the nugget parameter for efficient Gaussian process modeling (2017) <doi:10.1002/nme.5751>.
GPPFourier Calculate Gross Primary Production (GPP) from O2 Time Series
Implementation of the Fourier method to estimate aquatic gross primary production from high frequency oxygen data, described in Cox, et al (2015) <doi:10.1002/lom3.10046> and Cox, et al (2017) <doi:10.5194/bg-2017-81>.
gppm Gaussian Process Panel Modeling
Provides an implementation of Gaussian process panel modeling (GPPM). GPPM is described in Karch (2016; <DOI:10.18452/17641>) and Karch, Brandmaier & Voelkle (2018; <DOI:10.17605/OSF.IO/KVW5Y>). Essentially, GPPM is Gaussian process based modeling of longitudinal panel data. ‘gppm’ also supports regular Gaussian process regression (with a focus on flexible model specification), and multi-task learning.
GPrank Gaussian Process Ranking of Multiple Time Series
Implements a Gaussian process (GP)-based ranking method which can be used to rank multiple time series according to their temporal activity levels. An example is the case when expression levels of all genes are measured over a time course and the main concern is to identify the most active genes, i.e. genes which show significant non-random variation in their expression levels. This is achieved by computing Bayes factors for each time series by comparing the marginal likelihoods under time-dependent and time-independent GP models. Additional variance information from pre-processing of the observations is incorporated into the GP models, which makes the ranking more robust against model overfitting. The package supports exporting the results to ‘tigreBrowser’ for visualisation, filtering or ranking.
GPRMortality Gaussian Process Regression for Mortality Rates
A Bayesian statistical model for estimating child (under-five age group) and adult (15-60 age group) mortality. The main challenge is how to combine and integrate these different time series and how to produce unified estimates of mortality rates during a specified time span. GPR is a Bayesian statistical model for estimating child and adult mortality rates which its data likelihood is mortality rates from different data sources such as: Death Registration System, Censuses or surveys. There are also various hyper-parameters for completeness of DRS, mean, covariance functions and variances as priors. This function produces estimations and uncertainty (95% or any desirable percentiles) based on sampling and non-sampling errors due to variation in data sources. The GP model utilizes Bayesian inference to update predicted mortality rates as a posterior in Bayes rule by combining data and a prior probability distribution over parameters in mean, covariance function, and the regression model. This package uses Markov Chain Monte Carlo (MCMC) to sample from posterior probability distribution by ‘rstan’ package in R. Details are given in Wang H, Dwyer-Lindgren L, Lofgren KT, et al. (2012) <doi:10.1016/S0140-6736(12)61719-X>, Wang H, Liddell CA, Coates MM, et al. (2014) <doi:10.1016/S0140-6736(14)60497-9> and Mohammadi, Parsaeian, Mehdipour et al. (2017) <doi:10.1016/S2214-109X(17)30105-5>.
GPSCDF Generalized Propensity Score Cumulative Distribution Function
Implements the generalized propensity score cumulative distribution function proposed by Greene (2017) <https://…/>. A single scalar balancing score is calculated for any generalized propensity score vector with three or more treatments. This balancing score is used for propensity score matching and stratification in outcome analyses when analyzing either ordinal or multinomial treatments.
gpuR GPU Functions for R Objects
Provides GPU enabled functions for R objects in a simple and approachable manner. New gpu* and vcl* classes have been provided to wrap typical R objects (e.g. vector, matrix), in both host and device spaces, to mirror typical R syntax without the need to know OpenCL.
Grace Graph-Constrained Estimation and Hypothesis Testing
Use the graph-constrained estimation (Grace) procedure to estimate graph-guided linear regression coefficients and use the Grace and GraceR tests to perform graph-guided hypothesis test on the association between the response and the predictor.
gradDescent Gradient Descent for Regression Tasks
An implementation of various learning algorithms based on Gradient Descent for dealing with regression tasks. The variants of gradient descent algorithm are : Mini-Batch Gradient Descent (MBGD), an optimization to use training data partially to reduce the computation load. Stochastic Gradient Descent (SGD), an optimization to use a random data in learning to reduce the computation load drastically. Stochastic Average Gradient (SAG), a SGD-based algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), an optimization to speed-up gradient descent learning. Accelerated Gradient Descent (AGD), an optimization to accelerate gradient descent learning. Adagrad, a gradient-descent-based algorithm that accumulate previous cost to do adaptive learning. Adadelta, a gradient-descent-based algorithm that use hessian approximation to do adaptive learning. RMSprop, a gradient-descent-based algorithm that combine Adagrad and Adadelta adaptive learning ability. Adam, a gradient-descent-based algorithm that mean and variance moment to do adaptive learning.
gradientPickerD3 Interactive Color Gradient Picker Using ‘htmlwidgets’ and the Modified JS Script ‘jquery-gradient-picker’
Widget for an interactive selection and modification of a color gradient. ‘gradientPickerD3’ allows addition, removement and replacement of color ticks. List of numeric values will automatically translate in their corresponding tick position within the numeric range. App returns a data.frame containing tick values, colors and the positions in percent (0.0 to 1.0) for each color tick in the gradient. The original JS ‘jquery-gradient-picker’ was implemented by Matt Crinklaw-Vogt (nick: tantaman) <https://…/>. Widget and JS modifications were done by CD. Peikert.
grainchanger Moving-Window and Direct Data Aggregation
Data aggregation via moving window or direct methods. Aggregate a fine-resolution raster to a grid. The moving window method smooths the surface using a specified function within a moving window of a specified size and shape prior to aggregation. The direct method simply aggregates to the grid using the specified function.
GRANBase Creating Continuously Integrated Package Repositories from Manifests
Repository based tools for department and analysis level reproducibility. ‘GRANBase’ allows creation of custom branched, continuous integration-ready R repositories, including incremental testing of only packages which have changed versions since the last repository build.
GRANCore Classes and Methods for ‘GRANBase’
Provides the classes and methods for GRANRepository objects that are used within the ‘GRAN’ build framework for R packages. This is primarily used by the ‘GRANBase’ package and repositories that are created by it.
grapes Make Binary Operators
Turn arbitrary functions into binary operators.
gRapHD Efficient Selection of Undirected Graphical Models for High-Dimensional Datasets
Performs efficient selection of high-dimensional undirected graphical models as described in Abreu, Edwards and Labouriau (2010) <doi:10.18637/jss.v037.i01>. Provides tools for selecting trees, forests and decomposable models minimizing information criteria such as AIC or BIC, and for displaying the independence graphs of the models. It has also some useful tools for analysing graphical structures. It supports the use of discrete, continuous, or both types of variables.
GraphFactor Network Topology of Intravariable Clusters with Intervariable Links
A Network Implementation of Fuzzy Sets: Build Network Objects from Multivariate Flat Files. For more information on fuzzy sets, refer to: Zadeh, L.A. (1964) <DOI:10.1016/S0019-9958(65)90241-X>.
graphframes Interface for ‘GraphFrames’
A ‘sparklyr’ <https://…/> extension that provides an R interface for ‘GraphFrames’ <https://…/>. ‘GraphFrames’ is a package for ‘Apache Spark’ that provides a DataFrame-based API for working with graphs. Functionality includes motif finding and common graph algorithms, such as PageRank and Breadth-first search.
graphicalVAR Graphical VAR for Experience Sampling Data
Estimates within and between time point interactions in experience sampling data, using the Graphical VAR model in combination with LASSO and EBIC.
graphkernels Graph Kernels
A fast C++ implementation of various graph kernels.
GraphKit Estimating Structural Invariants of Graphical Models
Efficient methods for constructing confidence intervals of monotone graph invariants, as well as testing for monotone graph properties. Many packages are available to estimate precision matrices, this package serves as a tool to extract structural properties from their induced graphs. By iteratively bootstrapping on only the relevant edge set, we are able to obtain the optimal interval size.
graphlayouts Additional Layout Algorithms for Network Visualizations
Several new layout algorithms to visualize networks are provided which are not part of ‘igraph’. Most are based on the concept of stress majorization by Gansner et al. (2004) <doi:10.1007/978-3-540-31843-9_25>. Some more specific algorithms allow to emphasize hidden group structures in networks or focus on specific nodes.
graphon A Collection of Graphon Estimation Methods
Provides a not-so-comprehensive list of methods for estimating graphon, a symmetric measurable function, from a single or multiple of observed networks. For a detailed introduction on graphon and popular estimation techniques, see the paper by Orbanz, P. and Roy, D.M.(2014) <doi:10.1109/TPAMI.2014.2334607>. It also contains several auxiliary functions for generating sample networks using various network models and graphons.
GraphPCA Graphical Tools of Histogram PCA
Histogram principal components analysis is the generalization of the PCA. Histogram data are adapted to design complex and big data which histograms used as variables (big data adapter). Functions implemented provides numerical and graphical tools of an extension of PCA. Sun Makosso Kallyth (2016) <doi:10.1002/sam.11270>. Sun Makosso Kallyth and Edwin Diday (2012) <doi:10.1007/s11634-012-0108-0>.
graphql A GraphQL Query Parser
Bindings to the ‘libgraphqlparser’ C++ library. Currently parses GraphQL and exports the AST in JSON format.
graphscan Cluster Detection with Hypothesis Free Scan Statistic
Multiple scan statistic with variable window for one dimension data and scan statistic based on connected components in 2D or 3D.
graphTweets Visualise Twitter Interactions
Allows building an edge table from data frame of tweets, also provides function to build vertices (meta-data).
gratia Graceful ‘ggplot’-Based Graphics and Other Functions for GAMs Fitted Using ‘mgcv’
Graceful ‘ggplot’-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the ‘mgcv’ package. Provides a reimplementation of the plot() method for GAMs that ‘mgcv’ provides, as well as ‘tidyverse’ compatible representations of estimated smooths.
gravity A Compilation of Different Estimation Methods for Gravity Models
One can use gravity models to explain bilateral flows related to the sizes of bilateral partners, a measure of distance between them and other influences on interaction costs. The underlying idea is rather simple. The greater the masses of two bodies and the smaller the distance between them, the stronger they attract each other. This concept is applied to several research topics such as trade, migration or foreign direct investment. Even though the basic idea of gravity models is rather simple, they can become very complex when it comes to the choice of models or estimation methods. The package gravity targets to provide R users with the functions necessary to execute the most common estimation methods for gravity models, especially for cross-sectional data. It contains the functions Ordinary Least Squares (OLS), Fixed Effects, Double Demeaning (DDM), Bonus vetus OLS with simple averages (BVU) and with GDP-weights (BVW), Structural Iterated Least Squares (SILS), Tetrads as well as Poisson Pseudo Maximum Likelihood (PPML). By considering the descriptions of the estimation methods, users can see which method and data may be suited for a certain research question. In order to illustrate the estimation methods, this package includes a dataset called Gravity (see the description of the dataset for more information). On the Gravity Cookbook website (<https://…/> ) Keith Head and Thierry Mayer provide Stata code for the most common estimation methods for gravity models when using cross-sectional data. In order to get comparable results in R, the methods presented in the package gravity are designed to be consistent with this Stata code when choosing the option of robust variance estimation. However, compared to the Stata code available, the functions presented in this package provide users with more flexibility regarding the type of estimation (robust or not robust), the number and type of independent variables as well as the possible data. The functions all estimate gravity models, but they differ in whether they estimate them in their multiplicative or additive form, their requirements with respect to the data, their handling of Multilateral Resistance terms as well as their possibilities concerning the inclusion of unilateral independent variables. Therefore, they normally lead to different estimation results. We refer the user to the Gravity Cookbook website (<https://…/> ) for more information on gravity models in general. Head, K. and Mayer, T. (2014) <DOI:10.1016/B978-0-444-54314-1.00003-3> provide a comprehensive and accessible overview of the theoretical and empirical development of the gravity literature as well as the use of gravity models and the various estimation methods, especially their merits and potential problems regarding applicability as well as different gravity datasets. All functions were tested to work on cross-sectional data and are consistent with the Stata code mentioned above. For the use with panel data no tests were performed. Therefore, it is up to the user to ensure that the functions can be applied to panel data. For a comprehensive overview of gravity models for panel data see Egger, P., & Pfaffermayr, M. (2003) <DOI:10.1007/s001810200146>, Gomez-Herrera, E. (2013) <DOI:10.1007/s00181-012-0576-2> and Head, K., Mayer, T., & Ries, J. (2010) <DOI:10.1016/j.jinteco.2010.01.002> as well as the references therein (see also the references included in the descriptions of the different functions). Depending on the panel dataset and the variables – specifically the type of fixed effects – included in the model, it may easily occur that the model is not computable. Also, note that by including bilateral fixed effects such as country-pair effects, the coefficients of time-invariant observables such as distance can no longer be estimated. Depending on the specific model, the code of the respective function may has to be changed in order to exclude the distance variable from the estimation. At the very least, the user should take special care with respect to the meaning of the estimated coefficients and variances as well as the decision about which effects to include in the estimation. As, to our knowledge at the moment, there is no explicit literature covering the estimation of a gravity equation by Double Demeaning, Structural Iterated Least Squares or Bonus Vetus OLS using panel data, we do not recommend to apply these methods in this case. Contributions, extensions and error corrections are very welcome. Please do not hesitate to contact us.
gRc Inference in Graphical Gaussian Models with Edge and Vertex Symmetries
Estimation, model selection and other aspects of statistical inference in Graphical Gaussian models with edge and vertex symmetries (Graphical Gaussian models with colours). Documentation about ‘gRc’ is provided in the paper by Hojsgaard and Lauritzen (2007, <doi:10.18637/jss.v023.i06>) and the paper by Hojsgaard and Lauritzen (2008, <doi:10.1111/j.1467-9868.2008.00666.x>).
GRCdata Parameter Inference and Optimal Designs for Grouped and/or Right-Censored Count Data
We implement two main functions. The first function uses a given grouped and/or right-censored grouping scheme and empirical data to infer parameters, and implements chi-square goodness-of-fit tests. The second function searches for the global optimal grouping scheme of grouped and/or right-censored count responses in surveys.
grec Classification of Spatial Patterns from Environmental Data Through GRadient RECognition
Provides algorithms for detection of spatial patterns from oceanographic data using image processing methods based on Gradient Recognition.
GreedyEPL Greedy Expected Posterior Loss
Summarises a collection of partitions into a single optimal partition. The objective function is the expected posterior loss, and the minimisation is performed through a greedy algorithm described in Rastelli, R. and Friel, N. (2016) ‘Optimal Bayesian estimators for latent variable cluster models’ <arXiv:1607.02325>.
GreedyExperimentalDesign Greedy Experimental Design Construction
Computes experimental designs for a two-arm experiment with covariates by greedily optimizing a balance objective function. This optimization provides lower variance for the treatment effect estimator (and higher power) while preserving a design that is close to complete randomization. We return all iterations of the designs for use in a permutation test. Additional functionality includes using branch and bound optimization (via Gurobi) and exhaustive enumeration.
GreedyExperimentalDesignJARs GreedyExperimentalDesign JARs
These are GreedyExperimentalDesign Java dependency libraries. Note: this package has no functionality of its own and should not be installed as a standalone package without GreedyExperimentalDesign.
GreedySBTM Greedy Stochastic Block Transition Models
Performs clustering on the nodes of an undirected binary dynamic network, by maximising the exact integrated complete likelihood. The greedy algorithm used is described in Rastelli, R. (2017) ‘Exact integrated completed likelihood maximisation in a stochastic block transition model for dynamic networks’ <arXiv:1710.03551>.
Greg Regression Helper Functions
Methods for manipulating regression models and for describing these in a style adapted for medical journals. Contains functions for generating an HTML table with crude and adjusted estimates, plotting hazard ratio, plotting model estimates and confidence intervals using forest plots, extending this to comparing multiple models in a single forest plots. In addition to the descriptives methods, there are addons for the robust covariance matrix provided by the sandwich package, a function for adding non-linearities to a model, and a wrapper around the Epi package’s Lexis functions for time-spliting a dataset when modeling non-proportional hazards in Cox regressions.
gremlin Mixed-Effects REML Incorporating Generalized Inverses
Fit linear mixed-effects models using restricted (or residual) maximum likelihood (REML) and with generalized inverse matrices to specify covariance structures for random effects. In particular, the package is suited to fit quantitative genetic mixed models, often referred to as ‘animal models’ (Kruuk. 2004 <DOI: 10.1098/rstb.2003.1437>). Implements the average information algorithm as the main tool to maximize the restricted likelihood, but with other algorithms available (Meyer. 1997. Genet Sel Evol 29:97; Meyer and Smith. 1998. Genet Sel Evol 28:23.).
gren Adaptive Group-Regularized Logistic Elastic Net Regression
Allows the user to incorporate multiple sources of co-data (e.g., previously obtained p-values, published gene lists, and annotation) in the estimation of a logistic regression model to enhance predictive performance and feature selection, as described in Münch, Peeters, van der Vaart, and van de Wiel (2018) <arXiv:1805.00389>.
greta Simple and Scalable Statistical Modelling in R
Write statistical models in R and fit them by MCMC on CPUs and GPUs, using Google TensorFlow
greybox Toolbox for Model Selection and Combinations for the Forecasting Purposes
Implements model selection and combinations via information criteria based on the values of partial correlations. This allows, for example, solving ‘fat regression’ problems, where the number of variables is much larger than the number of observations. This is driven by the research on information criteria, which is well discussed in Burnham & Anderson (2002) <doi:10.1007/b97636>, and currently developed further by Ivan Svetunkov and Yves Sagaert (working paper in progress). Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
greyzoneSurv Fit a Grey-Zone Model with Survival Data
Allows one to classify patients into low, intermediate, and high risk groups for disease progression based on a continuous marker that is associated with progression-free survival. It uses a latent class model to link the marker and survival outcome and produces two cutoffs for the marker to divide patients into three groups. See the References section for more details.
grf Generalized Random Forests (Beta)
A pluggable package for forest-based statistical estimation and inference. GRF currently provides methods for non-parametric least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables). This package is currently in beta, and we expect to make continual improvements to its performance and usability.
gridBezier Bezier Curves in ‘grid’
Functions for rendering Bezier curves (Pomax, 2018) <https://…/> in ‘grid’. There is support for both quadratic and cubic Bezier curves. There are also functions for calculating points on curves, tangents to curves, and normals to curves.
gridGeometry Polygon Geometry in ‘grid’
Functions for performing polygon geometry with ‘grid’ grobs. This allows complex shapes to be defined by combining simpler shapes.
gridGraphics Redraw Base Graphics Using grid Graphics
Functions to convert a page of plots drawn with the graphics package into identical output drawn with the grid package. The result looks like the original graphics-based plot, but consists of grid grobs and viewports that can then be manipulated with grid functions (e.g., edit grobs and revisit viewports).
http://…/murrell.pdf
gridsample Tools for Grid-Based Survey Sampling Design
Multi-stage cluster household surveys are commonly performed by governments and programs to monitor population demographic, social, economic, and health outcomes. In these surveys, communities are sampled in a first stage of sampling from within subpopulations of interest (or strata), households are sampled in a second stage of sampling, and sometimes individuals are listed and further sampled within households. The first stage of sampling, where communities of sample populations are defined, are called Primary Sampling Units (PSUs) while the households are secondary sampling units (SSUs). Census data are typically used to select PSUs within strata. If census data are outdated, inaccurate, or not available at fine enough scale, however, gridded population data can be used instead. This tool selects PSUs within user-defined strata using gridded population data, given desired numbers of sampled households within each PSU. The population densities used to create PSUs are drawn from rasters such as the population data from the WorldPop Project (http://www.worldpop.org.uk ). PSUs are defined within a stratum using a serpentine sampling method, and can be set to have a certain ratio of urban and rural PSUs, or to be evenly distributed across a coarse, user-defined grid.
gridsampler A Simulation Tool to Determine the Required Sample Size for Repertory Grid Studies
Simulation tool to facilitate determination of required sample size to achieve category saturation for studies using multiple repertory grids in conjunction with content analysis.
grImport2 Importing ‘SVG’ Graphics
Functions for importing external vector images and drawing them as part of ‘R’ plots. This package is different from the ‘grImport’ package because, where that package imports ‘PostScript’ format images, this package imports ‘SVG’ format images. Furthermore, this package imports a specific subset of ‘SVG’, so external images must be preprocessed using a package like ‘rsvg’ to produce ‘SVG’ that this package can import. ‘SVG’ features that are not supported by ‘R’ graphics, e.g., gradient fills, can be imported and then exported via the ‘gridSVG’ package.
gromovlab Gromov-Hausdorff Type Distances for Labeled Metric Spaces
Computing Gromov-Hausdorff type l^p distances for labeled metric spaces. These distances were introduced in V.Liebscher, Gromov meets Phylogenetics – new Animals for the Zoo of Metrics on Tree Space. preprint arXiv:1504.05795, for phylogenetic trees but may apply to much more situations.
GroupComparisons Paired/Unpaired Parametric/Non-Parametric Group Comparisons
Receives two vectors, computes appropriate function for group comparison (i.e., t-test, Mann-Whitney; equality of variances), and reports the findings (mean/median, standard deviation, test statistic, p-value, effect size) in APA format (Fay, M.P., & Proschan, M.A. (2010)<DOI: 10.1214/09-SS051>).
groupdata2 Creating Groups from Data
Subsetting methods for balanced cross-validation, time series windowing, and general grouping and splitting of data.
groupedstats Grouped Statistical Analyses in a Tidy Way
Collection of functions to run statistical tests across all levels of multiple grouping variables.
groupedSurv Efficient Estimation of Grouped Survival Models Using the Exact Likelihood Function
The core of this ‘Rcpp’-based package is a set of functions to compute the efficient score statistics for grouped survival models. The functions are designed to analyze grouped time-to-event data with the optional inclusion of either baseline covariates or family structure of related individuals (e.g., trios). Functions for estimating the baseline hazards, frailty variance, nuisance parameters, and fixed effects are also provided. The functions encompass two processes for discrete-time shared frailty model data with random effects: (1) evaluation of the multiple variable integration to compute the exact proportional-hazards-model-based likelihood and (2) estimation of the desired parameters using maximum likelihood. For data without family structure, only the latter step is performed. The integration is evaluated by the ‘Cuhre’ algorithm from the ‘Cuba’ library (Hahn, T. (2005). Cuba-a library for multidimensional numerical integration, Comput. Phys. Commun. 168, 78-95 <doi:10.1016/j.cpc.2005.01.010>), and the source files of the ‘Cuhre’ function are included in this package. The maximization process is carried out using Brent’s algorithm, with the ‘C++’ code file from John Burkardt and John Denker (Brent, R., Algorithms for Minimization without Derivatives, Dover, 2002, ISBN 0-486-41998-3).
groupICA Independent Component Analysis for Grouped Data
Contains an implementation of an independent component analysis (ICA) for grouped data. The main function groupICA() performs a blind source separation, by maximizing an independence across sources and allows to adjust for varying confounding for user-specified groups. Additionally, the package contains the function uwedge() which can be used to approximately jointly diagonalize a list of matrices. For more details see the project website <https://…/>.
groupRemMap Regularized Multivariate Regression for Identifying Master Predictors Using the GroupRemMap Penalty
An implementation of the GroupRemMap penalty for fitting regularized multivariate response regression models under the high-dimension-low-sample-size setting. When the predictors naturally fall into groups, the GroupRemMap penalty encourages procedure to select groups of predictors, while control for the overall sparsity of the final model.
groupsubsetselection Group Subset Selection
Group subset selection for linear regression models is provided in this package. Given response variable, and explanatory variables, which are organised in groups, group subset selection selects a small number of groups to explain response variable linearly using least squares.
GroupTest Multiple Testing Procedure for Grouped Hypotheses
Contains functions for a two-stage multiple testing procedure for grouped hypothesis, aiming at controlling both the total posterior false discovery rate and within-group false discovery rate.
grove Wavelet Functional ANOVA Through Markov Groves
Functional denoising and functional ANOVA through wavelet-domain Markov groves. Fore more details see: Ma L. and Soriano J. (2016) Efficient functional ANOVA through wavelet-domain Markov groves. <arXiv:1602.03990v2 [stat.ME]>.
GrowingSOM Growing Self-Organizing Maps
A growing self-organizing map (GrowingSOM, GSOM) is a growing variant of the popular self-organizing map (SOM). A growing self-organizing map is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional representation of the input space of the training samples, called a map.
growth Multivariate Normal and Elliptically-Contoured Repeated Measurements Models
Functions for fitting various normal theory (growth curve) and elliptically-contoured repeated measurements models with ARMA and random effects dependence.
growthPheno Plotting, Smoothing and Growth Trait Extraction for Longitudinal Data
Assists in producing longitudinal or profile plots of measured traits. These allow checks to be made for anomalous data and growth patterns in the data to be explored. Smoothing of growth trends for individual plants using smoothing splines is available for removing transient effects. There are tools for diagnosing the adequacy of trait smoothing, either using this package or other packages, such as those that fit nonlinear growth models. A range of per-unit (pot, plant, plot) growth traits can be extracted from longitudinal data, including single time-point smoothed trait values and their growth rates, interval growth rates and other growth statistics, such as maximum growth. The package is particularly suited to preparing data from high-throughput phenotyping facilities, such as imaging data from a Lemna-Tec Scananalyzer (see <http://…/scanalyzer-3d> for more information). The package ‘growthPheno’ can also be installed from <http://…/>.
growthrates Estimate Growth Rates from Experimental Data
A collection of methods to determine growth rates from experimental data, in particular from batch experiments and plate reader trials.
grplassocat Standardization for Group Lasso Models with Categorical Predictors
Implements the simple and computationally efficient standardization scheme for group lasso models with categorical predictors described in Detmer, Cebral, Slawski (2019) <arXiv:1805.06915>.
grpregOverlap Penalized Regression Models with Overlapping Grouped Covariates
Fit the regularization path of linear, logistic or poisson models with overlapping grouped covariates based on the latent group lasso approach. Latent group MCP/SCAD as well as bi-level selection methods, namely the group exponential lasso and the composite MCP are also available. This package serves as an extension of R package ‘grpreg’ (by Dr. Patrick Breheny <patrick-breheny@uiowa.edu>) for grouped variable selection involving overlaps between groups.
grpSLOPE Group Sorted L1 Penalized Estimation
Group SLOPE is a penalized linear regression method that is used for adaptive selection of groups of significant predictors in a high-dimensional linear model. The Group SLOPE method can control the (group) false discovery rate at a user-specified level (i.e., control the expected proportion of irrelevant among all selected groups of predictors).
grpss Group Screening and Selection
Contains the tools to screen grouped variables, and select screened grouped variables afterwards. The main function grpss() can perform the grouped variables screening as well as selection for ultra-high dimensional data with group structure. The screening step is primarily used to reduce the dimensions of data so that the selection procedure can easily handle the moderate or low dimensions instead of ultra-high dimensions.
GrpString Patterns and Statistical Differences Between Two Groups of Strings
Methods include converting series of event names to strings, discovering common patterns in a group of strings, discovering ‘unique’ patterns when comparing two groups of strings as well as the number and starting position of each ‘unique’ pattern in each string, finding the transition information, and statistically comparing the difference between two groups of strings.
grr Alternate Implementations of Base R Functions
Alternate implementations of some base R functions, including sort, order, and match. Functions are faster and/or have been otherwise augmented.
GRS.test GRS Test for Portfolio Efficiency and Its Statistical Power Analysis
Computational resources for test proposed by Gibbons, Ross, Shanken (1989)<DOI:10.2307/1913625>.
GSCAD Implementing GSCAD Method for Image Denoising and Inpainting
Method proposed in ‘Simultaneous Sparse Dictionary Learning and Pruning’ ( Qu and Wang (2016) <arXiv:1605.07870>) is implemented. The idea is to conduct a linear decomposition of a signal using a few atoms of a learned and usually over-completed dictionary instead of a pre-defined basis. A proper size of the to-be-learned dictionary is determining at the same time during the procedure. Application includes image denoising and image inpainting.
gscounts Group Sequential Designs with Negative Binomial Outcomes
Design and analysis of group sequential designs with negative binomial outcomes, as described by T Muetze, E Glimm, H Schmidli, T Friede (2017) <arXiv:1707.04612>.
GSED Group Sequential Enrichment Design
Provides function to apply ‘Group sequential enrichment design incorporating subgroup selection’ (GSED) method proposed by Magnusson and Turnbull (2013) <doi:10.1002/sim.5738>.
gSEM Semi-Supervised Generalized Structural Equation Modelling
Conducts a semi-gSEM statistical analysis (semi-supervised generalized structural equation modeling) on a data frame of coincident observations of multiple continuous variables, via two functions sgSEMp1() and sgSEMp2(), representing fittings based on two statistical principles. Principle 1 determines the univariate relationships in the spirit of the Markovian process. The relationship between each pair of system elements, including predictors and the system level response, is determined with the Markovian property that assumes the value of the current predictor is sufficient in relating to the next level variable, i.e., the relationship is independent of the specific value of the preceding-level variable to the current predictor, given the current value. Principle 2 resembles the multiple regression principle in the way multiple predictors are considered simultaneously. Specifically, the first-level predictors to the system level variable, such as, Time and unit level variables, acted on the system level variable collectively by an additive model. This collective additive model can be found with a generalized stepwise variable selection (using the step() function in R, which performs variable selection on the basis of AIC) and this proceeds iteratively.
gsheet Download Google Sheets Using Just the URL
Simple package to download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually. Google Sheets is the new name for Google Docs Spreadsheets.
GSparO Group Sparse Optimization
Approaches a group sparse solution of an underdetermined linear system. It implements the proximal gradient algorithm to solve a lower regularization model of group sparse learning. For details, please refer to the paper ‘Y. Hu, C. Li, K. Meng, J. Qin and X. Yang. Group sparse optimization via l_{p,q} regularization. Journal of Machine Learning Research, to appear, 2017’.
gsrsb Group Sequential Refined Secondary Boundary
A gate-keeping procedure to test a primary and a secondary endpoint in a group sequential design with multiple interim looks. Computations related to group sequential primary and secondary boundaries. Refined secondary boundaries are calculated for a gate-keeping test on a primary and a secondary endpoint in a group sequential design with multiple interim looks. The choices include both the standard boundaries and the boundaries using error spending functions. Version 1.0.0 was released on April 12, 2017. See Tamhane et al. (2017+) ‘A gatekeeping procedure to test a primary and a secondary endpoint in a group sequential design with multiple interim looks’, Biometrics, to appear.
gStream Graph-Based Sequential Change-Point Detection for Streaming Data
Uses an approach based on k-nearest neighbor information to sequentially detect change-points. Offers analytic approximations for false discovery control given user-specified average run length. Can be applied to any type of data (high-dimensional, non-Euclidean, etc.) as long as a reasonable similarity measure is available.
gt4ireval Generalizability Theory for Information Retrieval Evaluation
Provides tools to measure the reliability of an Information Retrieval test collection. It allows users to estimate reliability using Generalizability Theory and map those estimates onto well-known indicators such as Kendall tau correlation or sensitivity.
gtable Arrange grobs in tables
Tools to make it easier to work with ‘tables’ of grobs.
gTests Graph-Based Two-Sample Tests
Three graph-based tests are provided for testing whether two samples are from the same distribution.
gtheory Apply Generalizability Theory with R
Estimates variance components, generalizability coefficients, universe scores, and standard errors when observed scores contain variation from one or more measurement facets (e.g., items and raters).
gtop Game-Theoretically OPtimal (GTOP) Reconciliation Method
In hierarchical time series (HTS) forecasting, the hierarchical relation between multiple time series is exploited to make better forecasts. This hierarchical relation implies one or more aggregate consistency constraints that the series are known to satisfy. Many existing approaches, like for example bottom-up or top-down forecasting, therefore attempt to achieve this goal in a way that guarantees that the forecasts will also be aggregate consistent. This package provides with an implementation of the Game-Theoretically OPtimal (GTOP) reconciliation method proposed in van Erven and Cugliari (2015), which is guaranteed to only improve any given set of forecasts. This opens up new possibilities for constructing the forecasts. For example, it is not necessary to assume that bottom-level forecasts are unbiased, and aggregate forecasts may be constructed by regressing both on bottom-level forecasts and on other covariates that may only be available at the aggregate level.
gtrendsR R Functions to Perform and Display Google Trends Queries
An interface for retrieving and displaying the information returned online by Google Trends is provided. Trends (number of hits) over the time as well as geographic representation of the results can be displayed.
gtsf General Transit Simple Features
Get simple features from general transit feed data. For example, make a simple features data frame of the route lines, or route stops, that vehicles traverse as part of their schedule.
gtsummary Presentation-Ready Data Summary Tables
Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with many types of statistics presented in any pattern. Regression models are summarized including the reference row for categorical independent variables. Function defaults follow reporting guidelines outlined in Assel, Sjoberg, et al. (2019) <doi:10.1016/j.eururo.2018.12.014>.
guess Adjust Estimates of Learning for Guessing
Adjust Estimates of Learning for Guessing. The package provides standard guessing correction, and a latent class model that leverages informative pre-post transitions. For details of the latent class model, see <http://…/guess.pdf>.
GUIgems Graphical User Interface for Generalized Multistate Simulation Model
A graphical user interface for the R package Gems. Apart from the functionality of Gems package in the Graphical User interface, GUIgems allows adding states to a defined model, merging states for the analysis and plotting progression paths between states based on the simulated cohort. There is also a module in the GUIgems which allows to compare costs and QALYs between different cohorts.
GUIProfiler Graphical User Interface for Rprof()
Graphical User Interface for Rprof()
gustave A User-Oriented Statistical Toolkit for Analytical Variance Estimation
Provides a toolkit for analytical variance estimation in survey sampling. Apart from the implementation of standard variance estimators, its main feature is to help the sampling expert produce easy-to-use variance estimation ‘wrappers’, where systematic operations (linearization, domain estimation) are handled in a consistent and transparent way for the end user.
GVARX Perform Stationary Global Vector Autoregression Estimation and Inference
Perform the estimation and inference of stationary Global Vector Autoregression model (GVAR) of Pesaran, Schuermann and Weiner (2004) <DOI:10.1198/073500104000000019> and Dees, di Mauro, Pesaran and Smith (2007) <DOI:10.1002/jae.932>.
gvcm.cat Regularized Categorical Effects/Categorical Effect Modifiers/Continuous/Smooth Effects in GLMs
Generalized structured regression models with regularized categorical effects, categorical effect modifiers, continuous effects and smooth effects.
gwdegree A Shiny App to Aid Interpretation of Geometrically-Weighted Degree Estimates in Exponential Random Graph Models
This is a Shiny application intended to provide better understanding of how geometrically-weighted degree terms function in exponential random graph models of networks.
gwer Geographically Weighted Elliptical Regression
Computes a elliptical regression model or a geographically weighted regression model with elliptical errors using Fisher’s score algorithm. Provides diagnostic measures, residuals and analysis of variance. Cysneiros, F. J. A., Paula, G. A., and Galea, M. (2007) <doi:10.1016/j.spl.2007.01.012>.
gWidgets2RGtk2 Implementation of gWidgets2 for the RGtk2 Package
Implements the ‘gWidgets2’ API for ‘RGtk2.’
gWidgets2tcltk Toolkit Implementation of gWidgets2 for tcltk
Port of the ‘gWidgets2’ API for the ‘tcltk’ package.
GWLelast Geographically Weighted Logistic Elastic Net Regression
Fit a geographically weighted logistic elastic net regression.
gWQS Generalized Weighted Quantile Sum Regression
Fits Weighted Quantile Sum (WQS) regressions for continuous or binomial outcomes.
gym Provides Access to the OpenAI Gym API
OpenAI Gym is a open-source Python toolkit for developing and comparing reinforcement learning algorithms. This is a wrapper for the OpenAI Gym API, and enables access to an ever-growing variety of environments. For more details on OpenAI Gym, please see here: <https://…/gym>. For more details on the OpenAI Gym API specification, please see here: <https://…/gym-http-api>.

H

h2o4gpu Interface to ‘H2O4GPU’
Interface to ‘H2O4GPU’ <https://…/h2o4gpu>, a collection of ‘GPU’ solvers for machine learning algorithms.
hablar Convert Data Types and Get Non-Astonishing Results
Simple tools for converting columns to new data types. Intuitive summary functions.
halfcircle Plot Halfcircle Diagram
There are growing concerns on flow data in diverse fields including trade, migration, knowledge diffusion, disease spread, and transportation. The package is an effective visual support to learn the pattern of flow which is called halfcircle diagram. The flow between two nodes placed on the center line of a circle is represented using a half circle drawn from the origin to the destination in a clockwise direction. Through changing the order of nodes, the halfcircle diagram enables users to examine the complex relationship between bidirectional flow and each potential determinants. Furthermore, the halfmeancenter function, which calculates (un) weighted mean center of half circles, makes the comparison easier.
handlr Convert Among Citation Formats
Converts among many citation formats, including ‘BibTeX’, ‘Citeproc’, ‘Codemeta’, ‘RDF XML’, ‘RIS’, and ‘Schema.org’. A low level ‘R6’ class is provided, as well as stand-alone functions for each citation format for both read and write.
handyplots Handy Plots
Several handy plots for quickly looking at the relationship between two numeric vectors of equal length. Quickly visualize scatter plots, residual plots, qq-plots, box plots, confidence intervals, and prediction intervals.
haploReconstruct Reconstruction of Haplotype-Blocks from Time Series Data
Reconstruction of founder haplotype blocks from time series data.
harmonicmeanp Harmonic Mean p-Values and Model Averaging by Mean Maximum Likelihood
The harmonic mean p-value (HMP) test simply and instantly combines p-values and corrects for multiple testing while controlling the family-wise error rate in a way that is more powerful than common alternatives including Bonferroni and Simes procedures, more stringent than controlling the false discovery rate, and is robust to positive correlations between tests and unequal weights. It is a multi-level test in the sense that a superset of one or more significant tests is almost certain to be significant and conversely when the superset is non-significant, the constituent tests are almost certain to be non-significant. It is based on MAMML (model averaging by mean maximum likelihood), a frequentist analogue to Bayesian model averaging, and is theoretically grounded in generalized central limit theorem.
HarmonicRegression Harmonic Regression to One or more Time Series
Fits the first harmonics in a Fourier expansion to one or more time series. Trend elimination can be performed. Computed values include estimates of amplitudes and phases, as well as confidence intervals and p-values for the null hypothesis of Gaussian noise.
HARtools Read HTTP Archive (‘HAR’) Data
The goal of ‘HARtools’ is to provide a simple set of functions to read/parse, write and visualise HTTP Archive (‘HAR’) files in R.
Harvest.Tree Harvest the Classification Tree
Aimed at applying the Harvest classification tree algorithm, which is a modified algorithm of classic classification tree. It was firstly used in drug discovery field, but it also performs well in other kind of data, especially when active region is unrelated.To learn more about the harvest classification algorithm, you can go to http://…/220.pdf for more information.
hashids Generate Short Unique YouTube-Like IDs (Hashes) from Integers
An R port of the hashids library. hashids generates YouTube-like hashes from integers or vector of integers. Hashes generated from integers are relatively short, unique and non-seqential. hashids can be used to generate unique ids for URLs and hide database row numbers from the user. By default hashids will avoid generating common English cursewords by preventing certain letters being next to each other. hashids are not one-way: it is easy to encode an integer to a hashid and decode a hashid back into an integer.
hashmap The Faster Hash Map
Provides a hash table class for fast key-value storage of atomic vector types. Internally, hashmap makes extensive use of Rcpp, boost::variant, and boost::unordered_map to achieve high performance, type-safety, and versatility, while maintaining compliance with the C++98 standard.
haven Import SPSS, Stata and SAS Files
Import foreign statistical formats into R via the embedded ReadStat C library (https://…/ReadStat ). Package includes preliminary support for writing Stata and SPSS formats.
hBayesDM Hierarchical Bayesian Modeling of Decision-Making Tasks
Fit an array of decision-making tasks with computational models in a hierarchical Bayesian framework. Can perform hierarchical Bayesian analysis of various computational models with a single line of coding.
hbdct Hierarchical Bayesian Design for Clinical Trials
Implements our hierarchical Bayesian design for randomized clinical trials.
hctrial Using Historical Controls for Designing Phase II Clinical Trials
Provides functions for designing phase II clinical trials adjusting for the heterogeneity of the population using known subgroups or historical controls.
hdbinseg Change-Point Analysis of High-Dimensional Time Series via Binary Segmentation
Binary segmentation methods for detecting and estimating multiple change-points in the mean or second-order structure of high-dimensional time series as described in Cho and Fryzlewicz (2014) <doi:10.1111/rssb.12079> and Cho (2016) <doi:10.1214/16-EJS1155>.
HDCI High Dimensional Confidence Interval Based on Lasso and Bootstrap
Fits regression models on high dimensional data to estimate coefficients and use bootstrap method to obtain confidence intervals. Choices for regression models are Lasso, Lasso+OLS, Lasso partial ridge, Lasso+OLS partial ridge.
HDclust Clustering High Dimensional Data with Hidden Markov Model on Variable Blocks
Clustering of high dimensional data with Hidden Markov Model on Variable Blocks (HMM-VB) fitted via Baum-Welch algorithm. Clustering is performed by the Modal Baum-Welch algorithm (MBW), which finds modes of the density function. Lin Lin and Jia Li (2017) <http://…/16-342.html>.
HDcpDetect Detect Change Points in Means of High Dimensional Data
Objective: Implement new methods for detecting change points in high-dimensional time series data. These new methods can be applied to non-Gaussian data, account for spatial and temporal dependence, and detect a wide variety of change-point configurations, including changes near the boundary and changes in close proximity. Additionally, this package helps address the ‘small n, large p’ problem, which occurs in many research contexts. This problem arises when a dataset contains changes that are visually evident but do not rise to the level of statistical significance due to the small number of observations and large number of parameters. The problem is overcome by treating the dimensions as a whole and scaling the test statistics only by its standard deviation, rather than scaling each dimension individually. Due to the computational complexity of the functions, the package runs best on datasets with a relatively large number of attributes but no more than a few hundred observations.
hdf5r Interface to the ‘HDF5’ Binary Data Format
HDF5′ is a data model, library and file format for storing and managing large amounts of data. This package provides a nearly feature complete, object oriented wrapper for the ‘HDF5’ API <https://…/RM_H5Front.html> using R6 classes. Additionally, functionality is added so that ‘HDF5’ objects behave very similar to their corresponding R counterparts.
HDGLM Tests for High Dimensional Generalized Linear Models
Test the significance of coefficients in high dimensional generalized linear models.
HDInterval Highest (Posterior) Density Intervals
A generic function and a set of methods to calculate highest density intervals for a variety of classes of objects which can specify a probability density distribution, including MCMC output, fitted density objects, and functions.
hdm High-Dimensional Metrics
Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty.
hdme High-Dimensional Regression with Measurement Error
Penalized regression for generalized linear models for measurement error problems (aka. errors-in-variables). The package contains a version of the lasso (L1-penalization) which corrects for measurement error (Sorensen et al. (2015) <doi:10.5705/ss.2013.180>). It also contains an implementation of the Generalized Matrix Uncertainty Selector, which is a version the (Generalized) Dantzig Selector for the case of measurement error (Sorensen et al. (2018) <doi:10.1080/10618600.2018.1425626>).
HDMT A Multiple Testing Procedure for High-Dimensional Mediation Hypotheses
A multiple-testing procedure for high-dimensional mediation hypotheses. Mediation analysis is of rising interest in epidemiology and clinical trials. Among existing methods for mediation analyses, the popular joint significance (JS) test yields an overly conservative type I error rate and therefore low power. In the R package ‘HDMT’ we implement a multiple-testing procedure that accurately controls the family-wise error rate (FWER) and the false discovery rate (FDR) when using JS for testing high-dimensional mediation hypotheses. The core of our procedure is based on estimating the proportions of three component null hypotheses and deriving the corresponding mixture distribution of null p-values. Results of the data examples include better-behaved quantile-quantile plots and improved detection of novel mediation relationships on the role of DNA methylation in genetic regulation of gene expression. With increasing interest in mediation by molecular intermediaries such as gene expression and epigenetic markers, the proposed method addresses an unmet methodological challenge.
hdnom Nomograms for High-Dimensional Cox Models
Build nomograms for high-dimensional Cox models, with support for model validation and calibration.
HDoutliers Leland Wilkinson’s Algorithm for Detecting Multidimensional Outliers
An implementation of an algorithm for outlier detection that can handle a) data with a mixed categorical and continuous variables, b) many columns of data, c) many rows of data, d) outliers that mask other outliers, and e) both unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, HDoutliers is based on a distributional model that uses probabilities to determine outliers.
hdpca Principal Component Analysis in High-Dimensional Data
In high-dimensional settings: Estimate the number of distant spikes based on the Generalized Spiked Population (GSP) model. Estimate the population eigenvalues, angles between the sample and population eigenvectors, correlations between the sample and population PC scores, and the asymptotic shrinkage factors. Adjust the shrinkage bias in the predicted PC scores.
hds Hazard Discrimination Summary
Functions for calculating the hazard discrimination summary and its standard errors, as described in Liang and Heagerty (2016) <doi:10.1111/biom.12628>.
healthcareai Tools for Healthcare Machine Learning
A machine learning toolbox tailored to healthcare data. Aids in data cleaning, model development, hyperparameter tuning, and model deployment in a production SQL environment. Algorithms currently supported are Lasso, Random Forest, and Linear Mixed Model.
heatmaply Interactive Heat Maps Using ‘plotly’
Create interactive heatmaps that are usable from the R console, in the ‘RStudio’ viewer pane, in ‘R Markdown’ documents, and in ‘Shiny’ apps. Hover the mouse pointer over a cell to show details or drag a rectangle to zoom. A heatmap is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by dendrograms. Heatmaps are used in many fields for visualizing observations, correlations, missing values patterns, and more. Interactive heatmaps allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the heatmap by dragging a rectangle around the relevant area. This work is based on the ‘ggplot2’ and ‘plotly.js’ engine. It produces similar heatmaps as ‘d3heatmap’, with the advantage of speed (‘plotly.js’ is able to handle larger size matrix), and the ability to zoom from the dendrogram panes.
hedgehog Property-Based Testing
Hedgehog will eat all your bugs. ‘Hedgehog’ is a property-based testing package in the spirit of ‘QuickCheck’. With ‘Hedgehog’, one can test properties of their programs against randomly generated input, providing far superior test coverage compared to unit testing. One of the key benefits of ‘Hedgehog’ is integrated shrinking of counterexamples, which allows one to quickly find the cause of bugs, given salient examples when incorrect behaviour occurs.
heemod Markov Models for Health Economic Evaluations
An implementation of the modelling and reporting features described in reference textbook and guidelines (Briggs, Andrew, et al. Decision Modelling for Health Economic Evaluation. Oxford Univ. Press, 2011; Siebert, U. et al. State-Transition Modeling. Medical Decision Making 32, 690-700 (2012).): deterministic and probabilistic sensitivity analysis, heterogeneity analysis, time dependency on state-time and model-time (semi-Markov and non-homogeneous Markov models), etc.
hellno Providing ‘stringsAsFactors=FALSE’ Variants of ‘data.frame()’ and ‘as.data.frame()’
Base R’s default setting for ‘stringsAsFactors’ within ‘data.frame()’ and ‘as.data.frame()’ is supposedly the most often complained about piece of code in the R infrastructure. The ‘hellno’ package provides an explicit solution without changing R itself or having to mess around with options. It tries to solve this problem by providing alternative ‘data.frame()’ and ‘as.data.frame()’ functions that are in fact simple wrappers around base R’s ‘data.frame()’ and ‘as.data.frame()’ with ‘stringsAsFactors’ option set to ‘HELLNO’ ( which in turn equals FALSE ) by default.
HEMDAG Hierarchical Ensemble Methods for Directed Acyclic Graphs
An implementation of Hierarchical Ensemble Methods for DAGs: ‘HTD-DAG’ (Hierarchical Top Down) and ‘TPR-DAG’ (True Path Rule). ‘HEMDAG’ can be used to enhance the predictions of virtually any flat learning method, by taking into account the hierarchical nature of the classes of a bio-ontology. ‘HEMDAG’ is specifically designed for exploiting the hierarchical relationships of DAG-structured taxonomies, such as the Human Phenotype Ontology (HPO) or the Gene Ontology (GO), but it can be also safely applied to tree-structured taxonomies (as FunCat), since trees are DAGs. ‘HEMDAG’ scale nicely both in terms of the complexity of the taxonomy and in the cardinality of the examples. (Marco Notaro, Max Schubach, Peter N. Robinson and Giorgio Valentini, Prediction of Human Phenotype Ontology terms by means of Hierarchical Ensemble methods, BMC Bioinformatics 2017).
here A Simpler Way to Find Your Files
Constructs paths to your project’s files. The ‘here()’ function uses a reasonable heuristics to find your project’s files, based on the current working directory at the time when the package is loaded. Use it as a drop-in replacement for ‘file.path()’, it will always locate the files relative to your project root.
hesim Health-Economic Simulation Modeling and Decision Analysis
Functionality for developing and analyzing the output of health-economic simulation models. Contains random sampling functions for conducting probabilistic sensitivity analyses (Claxton et al. 2005) <doi:10.1002/hec.985> and individual patient simulations (Brennan et al. 2006) <doi:10.1002/hec.1148>. Individualized cost-effectiveness analysis (Basu and Meltzer 2007, Ioannidis and Garber 2011) <doi:10.1177/0272989X06297393>, <doi:10.1371/journal.pmed.1001058> can be performed on simulation output and used to summarize a probabilistic sensitivity analysis at the subgroup or individual level. Core functions are written in C++ to facilitate computationally intensive modeling.
hetGP Heteroskedastic Gaussian Process Modeling and Design under Replication
Performs Gaussian process regression with heteroskedastic noise following Binois, M., Gramacy, R., Ludkovski, M. (2016) <arXiv:1611.05902>. The input dependent noise is modeled as another Gaussian process. Replicated observations are encouraged as they yield computational savings. Sequential design procedures based on the integrated mean square prediction error and lookahead heuristics are provided, and notably fast update functions when adding new observations.
hetmeta Heterogeneity Measures in Meta-Analysis
Assess the presence of statistical heterogeneity and quantify its impact in the context of meta-analysis. It includes test for heterogeneity as well as other statistical measures (R_b, I^2, R_I).
heuristica Heuristics Including Take the Best and Unit-Weight Linear
Implements various heuristics like Take The Best and unit-weight linear, which do two-alternative choice: which of two objects will have a higher criterion? Also offers functions to assess performance, e.g. percent correct across all row pairs in a data set and finding row pairs where models disagree. New models can be added by implementing a fit and predict function– see vignette.
heuristicsmineR Discovery of Process Models with the Heuristics Miner
Provides the heuristics miner algorithm for process discovery as proposed by Weijters et al. (2011) <doi:10.1109/CIDM.2011.5949453>. The algorithm builds a causal net from an event log created with the ‘bupaR’ package. Event logs are a set of ordered sequences of events for which ‘bupaR’ provides the S3 class eventlog(). The discovered causal nets can be visualised as ‘htmlwidgets’ and it is possible to annotate them with the occurrence frequency or processing and waiting time of process activities.
hextri Hexbin Plots with Triangles
Display hexagonally binned scatterplots for multi-class data, using coloured triangles to show class proportions.
hglm Hierarchical Generalized Linear Models
Implemented here are procedures for fitting hierarchical generalized linear models (HGLM). It can be used for linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects. Fixed effects can also be fitted in the dispersion part of the mean model. As statistical models, HGLMs were initially developed by Lee and Nelder (1996) <https://…/2346105?seq=1>. We provide an implementation (Ronnegard, Alam and Shen 2010) <https://…RJournal_2010-2_Roennegaard~et~al.pdf> following Lee, Nelder and Pawitan (2006) <ISBN: 9781420011340> with algorithms extended for spatial modeling (Alam, Ronnegard and Shen 2015) <https://…/RJ-2015-017.pdf>.
hgm Holonomic Gradient Method and Gradient Descent
The holonomic gradient method (HGM, hgm) gives a way to evaluate normalization constants of unnormalized probability distributions by utilizing holonomic systems of differential or difference equations. The holonomic gradient descent (HGD, hgd) gives a method to find maximal likelihood estimates by utilizing the HGM.
HGSL Heterogeneous Group Square-Root Lasso
Estimation of high-dimensional multi-response regression with heterogeneous noises under Heterogeneous group square-root Lasso penalty. For details see: Ren, Z., Kang, Y., Fan, Y. and Lv, J. (2018)<arXiv:1606.03803>.
hgutils Collection of Utility Functions
A handy collection of utility functions designed to aid in package development, plotting and scientific research. Package development functionalities includes among others tools such as cross-referencing package imports with the description file, analysis of redundant package imports, editing of the description file and the creation of package badges for GitHub. Some of the other functionalities include automatic package installation and loading, plotting points without overlap, creating nice breaks for plots, overview tables and many more handy utility functions.
HHG Heller-Heller-Gorfine Tests of Independence and Equality of Distributions
Heller-Heller-Gorfine (‘HHG’) tests are a set of powerful statistical tests of multivariate k-sample homogeneity and independence. For the univariate case, the package also offers implementations of the ‘MinP DDP’ and ‘MinP ADP’ tests, which are consistent against all continuous alternatives but are distribution-free, and are thus much faster to apply.
hhi Calculate and Visualize the Herfindahl-Hirschman Index
Based on the aggregated shares retained by individual firms or actors within a market or space, the Herfindahl-Hirschman Index (HHI) measures the level of concentration in the market or space. It is often used as a measure of competition, where 0 equals perfect competition amongst firms or actors and 10,000 equals perfect monopoly. This package allows for intuitive and straightforward computation of the HHI, requiring placement of objects directly into the function, including the data frame first, followed by the name of the vector (or variable) corresponding with the market shares in quotation marks. The package also includes a plot function for quick visual display of HHI time series across any measure of time (year, quarter, month, etc.) Suggested citation of the HHI: Rhoades, Stephen A. (1993, ‘The herfindahl-hirschman index.’ Federal Reserve Bulletin 79: 188).
HiDimDA High Dimensional Discriminant Analysis
Performs linear discriminant analysis in high dimensional problems based on reliable covariance estimators for problems with (many) more variables than observations. Includes routines for classifier training, prediction, cross-validation and variable selection.
hierarchicalSets Set Data Visualization Using Hierarchies
Pure set data visualization approaches are often limited in scalability due to the combinatorial explosion of distinct set families as the number of sets under investigation increases. hierarchicalSets applies a set centric hierarchical clustering of the sets under investigation and uses this hierarchy as a basis for a range of scalable visual representations. hierarchicalSets is especially well suited for collections of sets that describe comparable comparable entities as it relies on the sets to have a meaningful relational structure.
hierband Convex Banding of the Covariance Matrix
Implementation of the convex banding procedure (using a hierarchical group lasso penalty) for covariance estimation that is introduced in Bien, Bunea, Xiao (2015) Convex Banding of the Covariance Matrix. Accepted for publication in JASA.
hiertest Convex Hierarchical Testing of Interactions
Implementation of the convex hierarchical testing (CHT) procedure introduced in Bien, Simon, and Tibshirani (2015) Convex Hierarchical Testing of Interactions. Annals of Applied Statistics. Vol. 9, No. 1, 27-42.
highcharter A Wrapper for the ‘Highcharts’ Library
A wrapper for the ‘Highcharts’ library including shortcut functions to plot R objects. ‘Highcharts’ <http://…/> is a charting library offering numerous chart types with a simple configuration syntax.
HighDimOut Outlier Detection Algorithms for High-Dimensional Data
Three high-dimensional outlier detection algorithms and a outlier unification scheme are implemented in this package. The angle-based outlier detection (ABOD) algorithm is based on the work of Kriegel, Schubert, and Zimek [2008]. The subspace outlier detection (SOD) algorithm is based on the work of Kriegel, Kroger, Schubert, and Zimek [2009]. The feature bagging-based outlier detection (FBOD) algorithm is based on the work of Lazarevic and Kumar [2005]. The outlier unification scheme is based on the work of Kriegel, Kroger, Schubert, and Zimek [2011].
highlightHTML Highlight HTML Text and Tables
A tool to highlight specific cells in an HTML table or more generally text from an HTML document. This may be helpful for those using markdown to create reproducible documents. In addition, the ability to compile directly from R markdown files is also possible using the ‘knitr’ package.
highmean Two-Sample Tests for High-Dimensional Mean Vectors
Provides various tests for comparing high-dimensional mean vectors in two groups.
higrad Statistical Inference for Online Learning and Stochastic Approximation via HiGrad
Implements the Hierarchical Incremental GRAdient Descent (HiGrad) algorithm, a first-order algorithm for finding the minimizer of a function in online learning just like stochastic gradient descent (SGD). In addition, this method attaches a confidence interval to assess the uncertainty of its predictions. See Su and Zhu (2018) <arXiv:1802.04876> for details.
hillR Diversity Through Hill Numbers
Calculate taxonomic, functional and phylogenetic diversity measures through Hill Numbers proposed by Chao, Chiu and Jost (2014) <doi:10.1146/annurev-ecolsys-120213-091540>.
HIMA High-Dimensional Mediation Analysis
Allows to estimate and test high-dimensional mediation effects based on sure independent screening and minimax concave penalty techniques. A joint significance test is used for mediation effect. Haixiang Zhang, Yinan Zheng, Zhou Zhang, Tao Gao, Brian Joyce, Grace Yoon, Wei Zhang, Joel Schwartz, Allan Just, Elena Colicino, Pantel Vokonas, Lihui Zhao, Jinchi Lv, Andrea Baccarelli, Lifang Hou, Lei Liu (2016) <doi:10.1093/bioinformatics/btw351>.
hipread Read Hierarchical Fixed Width Files
Read hierarchical fixed width files like those commonly used by many census data providers. Also allows for reading of data in chunks, and reading ‘gzipped’ files without storing the full file in memory.
hIRT Hierarchical Item Response Theory Models
Implementation of a class of hierarchical item response theory (IRT) models where both the mean and the variance of latent preferences (ability parameters) can depend on observed covariates. The current implementation includes both the two-parameter latent trait model and the graded response model for ordinal data. Both are fitted via the Expectation-Maximization (EM) algorithm. Asymptotic standard errors are derived from the observed information matrix.
hisse Hidden State Speciation and Extinction
Sets up and executes a HiSSE model (Hidden State Speciation and Extinction) on a phylogeny and character sets to test for hidden shifts in trait dependent rates of diversification. Beaulieu and O’Meara (2016) <doi:10.1093/sysbio/syw022>.
HistDAWass Histogram-Valued Data Analysis
In the framework of Symbolic Data Analysis, a relatively new approach to the statistical analysis of multi-valued data, we consider histogram-valued data, i.e., data described by univariate histograms. The methods and the basic statistics for histogram-valued data are mainly based on the L2 Wasserstein metric between distributions, i.e., a Euclidean metric between quantile functions. The package contains unsupervised classification techniques, least square regression and tools for histogram-valued data and for histogram time series.
histmdl A Most Informative Histogram-Like Model
Using the MDL principle, it is possible to estimate parameters for a histogram-like model. The package contains the implementation of such an estimation method.
HistogramTools Utility Functions for R Histograms
Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
histry Enhanced Command History Tracking for R Sessions and Dynamic Documents
Automatically tracks and makes programmatically available code evaluation history in R sessions and dynamic documents.
hit Hierarchical Inference Testing
Hierarchical inference testing (HIT) for linear models with correlated covariates applicable to high-dimensional settings.
hkclustering Ensemble Clustering using K Means and Hierarchical Clustering
Implements an ensemble algorithm for clustering combining a k-means and a hierarchical clustering approach.
hkevp Spatial Extreme Value Analysis with the Hierarchical Model of Reich and Shaby (2012)
Simulation and fitting procedures for a particular hierarchical max-stable model: the HKEVP of Reich and Shaby (2012) <DOI:10.1214/12-AOAS591>. Spatial prediction and marginal distribution extrapolation are also available, which allows a risk estimation at an ungauged site.
HKprocess Hurst-Kolmogorov Process
Methods to make inference about the Hurst-Kolmogorov and the AR(1) process.
HLSM Hierarchical latent space network model (HLSM)
Hierarchical latent space network model for ensemble of networks
HMB Hierarchical Model-Based Estimation Approach
For estimation of a variable of interest using two sources of auxiliary information available in a nested structure. For reference see Saarela et al. (2016)<doi:10.1007/s13595-016-0590-1> and Saarela et al. (2018) <doi:10.3390/rs10111832>.
hmi Hierarchical Multiple Imputation
Runs single level and multilevel imputation models. The user just has to pass the data to the main function and, optionally, his analysis model. Basically the package then translates this analysis model into commands to impute the data according to it with functions from ‘mice’, ‘MCMCglmm’ or routines build for this package.
HMM Hidden Markov Models
Easy to use library to setup, apply and make inference with discrete time and discrete space Hidden Markov Models
hmmm hierarchical multinomial marginal models
Functions for specifying and fitting marginal models for contingency tables proposed by Bergsma and Rudas (2002) here called hierarchical multinomial marginal models (hmmm) and their extensions presented by Bartolucci et al. (2007); multinomial Poisson homogeneous (mph) models and homogeneous linear predictor (hlp) models for contingency tables proposed by Lang (2004) and (2005); hidden Markov models where the distribution of the observed variables is described by a marginal model. Inequality constraints on the parameters are allowed and can be tested.
HMMpa Analysing Accelerometer Data Using Hidden Markov Models
Analysing time-series accelerometer data to quantify length and intensity of physical activity using hidden Markov models. It also contains the traditional cut-off point method. Witowski V, Foraita R, Pitsiladis Y, Pigeot I, Wirsik N (2014)<doi:10.1371/journal.pone.0114089>.
hmstimer hms Based Timer
Tracks elapsed clock time using a hms() scalar (inherits from difftime() with seconds as the unit).
HMVD Group Association Test using a Hidden Markov Model
Perform association test between a group of variable and the outcome.
hNMF Hierarchical Non-Negative Matrix Factorization
Hierarchical non-negative matrix factorization for tumor segmentation based on multi-parametric MRI data.
hoa Higher Order Likelihood Inference
Performs likelihood-based inference for a wide range of regression models. Provides higher-order approximations for inference based on extensions of saddlepoint type arguments as discussed in the book Applied Asymptotics: Case Studies in Small-Sample Statistics by Brazzale, Davison, and Reid (2007).
hoardr Manage Cached Files
Suite of tools for managing cached files, targeting use in other R packages. Uses ‘rappdirs’ for cross-platform paths. Provides utilities to manage cache directories, including targeting files by path or by key; cached directories can be compressed and uncompressed easily to save disk space.
hogsvdR Higher-Order Generalized Singular Value Decomposition
Implementation of higher order generalized singular value decomposition (HO GSVD). Based on Ponnapalli, Saunders, etal (2011) <doi:10.1371/journal.pone.0028072>.
holodeck A Tidy Interface for Simulating Multivariate Data
Provides pipe-friendly (%>%) functions to create simulated multivariate data sets with groups of variables with different degrees of variance, covariance, and effect size.
Homeric Doughnut Plots
A simple implementation of doughnut plots – pie charts with a blank center. The package is named after Homer Simpson – arguably the best-known lover of doughnuts.
hommel Methods for Closed Testing with Simes Inequality, in Particular Hommel’s Method
Provides methods for closed testing using Simes local tests. In particular, calculates adjusted p-values for Hommel’s multiple testing method, and provides lower confidence bounds for true discovery proportions. A robust but more conservative variant of the closed testing procedure that does not require the assumption of Simes inequality is also implemented.
hopit Hierarchical Ordered Probit Models with Application to Reporting Heterogeneity
Self-reported health, happiness, attitudes, and other statuses or perceptions are often the subject of biases that may come from different sources. For example, the evaluation of own health may depend on previous medical diagnoses, functional status, and symptoms and signs of illness, as well as life-style behaviors including contextual social, gender, age-specific, linguistic and other cultural factors (Jylha 2009 <doi:10.1016/j.socscimed.2009.05.013>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). This package offers versatile functions for analyzing different self-reported ordinal variables and helping to estimate their biases. Specifically, the package provides the function to fit a generalized ordered probit model that regresses original self-reported status measures on two sets of independent variables (King et al. 2004 <doi:10.1017/S0003055403000881>; Jurges 2007 <doi:10.1002/hec.1134>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). In contrast to standard ordered probit models, generalized ordered probit models relax the assumption that individuals use a common scale when rating their own statuses, and thus allow for distinguishing between the status (e.g., health) and reporting differences based on other individual characteristics. In other words, the model accounts for heterogeneity in reporting behaviors. The first set of variables (e.g., health variables) included in the regression are individual statuses and characteristics that are directly related to the self-reported variable. In case of self-reported health, these could be chronic conditions, mobility level, difficulties with daily activities, performance on grip strength tests, anthropometric measures, and lifestyle behaviors. The second set of independent variables (threshold variables) is used to model cut-points between adjacent self-reported response categories as functions of individual characteristics, such as gender, age group, education, and country (Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The model helps adjust for these specific socio-demographic and cultural differences in how the continuous latent health is projected onto the ordinal self-rated measure. The fitted model can be used to calculate an individual latent status variable that serves as a proxy of the true status. In case of self-reported health, the predicted latent health variable can be standardized to a health index, which varies from 0 representing the (model-based) worst health state to 1 representing the (model-based) best health in the sample. The standardized latent coefficients (disability weights for the case of self-rated health) provide information about the individual impact of the specific latent (e.g., health) variables on the latent (e.g., health) construct. For example, they indicate the extent to which the latent health index is reduced by the presence of Parkinson’s disease, poor mobility, and other specific health measures (Jurges 2007 <doi:10.1002/hec.1134>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The latent index can in turn be used to reclassify the categorical status measure that has been adjusted for inter-individual differences in reporting behavior. Two methods for doing so are available, one which uses model estimated cut-points, and a second which reclassifies responses according to the percentiles of the original categorical response distribution (Jurges 2007 <doi:10.1002/hec.1134>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>).
horizon Horizon Search Algorithm
Calculates horizon elevation angle and sky view factor from a digital terrain model.
hornpa Horn’s (1965) Test to Determine the Number of Components/Factors
A stand-alone function that generates a user specified number of random datasets and computes eigenvalues using the random datasets (i.e., implements Horn’s parallel analysis). Users then compare the resulting eigenvalues (the mean or the specified percentile) from the random datasets (i.e., eigenvalues resulting from noise) to the eigenvalues generated with the user’s data. Can be used for both principal components analysis (PCA) and common/exploratory factor analysis (EFA). The output table shows how large eigenvalues can be as a result of merely using randomly generated datasets. If the user’s own dataset has actual eigenvalues greater than the corresponding eigenvalues, that lends support to retain that factor/component. In other words, if the i(th) eigenvalue from the actual data was larger than the percentile of the (i)th eigenvalue generated using randomly generated data, empirical support is provided to retain that factor/component. Horn, J. (1965). A rationale and test for the number of factors in factor analysis.
horserule Flexible Non-Linear Regression with the HorseRule Algorithm
Implementation of the HorseRule model a flexible tree based Bayesian regression method for linear and nonlinear regression and classification described in Nalenz & Villani (2017) <arXiv:1702.05008>.
horseshoe Implementation of the Horseshoe Prior
Contains functions for applying the horseshoe prior to high- dimensional linear regression, yielding the posterior mean and credible intervals, amongst other things. The key parameter tau can be equipped with a prior or estimated via maximum marginal likelihood estimation (MMLE). The main function, horseshoe, is for linear regression. In addition, there are functions specifically for the sparse normal means problem, allowing for faster computation of for example the posterior mean and posterior variance. Finally, there is a function available to perform variable selection, using either a form of thresholding, or credible intervals.
hot.deck Multiple Hot-deck Imputation
Performs multiple hot-deck imputation of categorical and continuous variables in a data frame.
HotDeckImputation Hot Deck Imputation Methods for Missing Data
This package provides hot deck imputation methods to resolve missing data.
hotspot Software Hotspot Analysis
Contains data for software hotspot analysis, along with a function performing the analysis itself.
hqreg Regularization Paths for Huber Loss Regression and Quantile Regression Penalized by Lasso or Elastic-Net
Efficient algorithms for fitting entire regularization paths for Huber loss regression and quantile regression penalized by lasso or elastic-net.
hR An HR Analytics Toolkit
Manipulate and visualize people data in meaningful and common ways. This package is meant for Human Resources analytics (often referred to as talent analytics or people analytics).
hrbrthemes Additional Themes, Theme Components and Utilities for ‘ggplot2’
A compilation of extra ‘ggplot2’ themes, scales and utilities, including a spell check function plot label fields and an overall emphasis on typography. A copy of the ‘Google’ font ‘Roboto Condensed’ <https://…/> is also included to support one of the typography-oriented themes.
hrIPW Hazard Ratio Estimation using Cox Model Weighted by the Estimated Propensity Score
Estimates the log hazard ratio associated with a binary exposure using a Cox PH model weighted by the propensity score. Propensity model is estimated using a simple logistic regression. Variance estimation takes into account the propensity score estimation step with the method proposed by Hajage et al. (2018) <doi:10.1002/bimj.201700330>. Both the average treatment effect on the overall (ATE) or the treated (ATT) population can be estimated. For the ATE estimation, both unstabilized and stabilized weights can be used.
HRW Datasets, Functions and Scripts for Semiparametric Regression Supporting Harezlak, Ruppert & Wand (2018)
The book ‘Semiparametric Regression with R’ by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book’s scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included.
HS Homogenous Segmentation for Spatial Lines Data
Methods of homogenous segmentation for spatial lines data, such as pavement performance indicators and traffic volumes. A moving coefficient of variation method is available for homogenous segmentation.
HSAR Hierarchical Spatial Autoregressive Model (HSAR)
A library of the Hierarchical Spatial Autoregressive Model (HSAR), based on a Bayesian Markov Chain Monte Carlo (MCMC) algorithm.
hsdar Manage, Analyse and Simulate Hyperspectral Data
Transformation of reflectance spectra, calculation of vegetation indices and red edge parameters, spectral resampling for hyperspectral remote sensing, simulation of reflectance and transmittance using the leaf reflectance model PROSPECT and the canopy reflectance model PROSAIL.
HSDiC Homogeneity and Sparsity Detection Incorporating Prior Constraint Information
We explore sparsity and homogeneity of regression coefficients incorporating prior constraint information. A general pairwise fusion approach is proposed to deal with the sparsity and homogeneity detection when combining prior convex constraints. We develop an modified alternating direction method of multipliers algorithm (ADMM) to obtain the estimators.
htdp Horizontal Time Dependent Positioning
Provides bindings to the National Geodetic Survey (NGS) Horizontal Time Dependent Positioning (HTDP) utility, v3.2.5, written by Richard Snay, Chris Pearson, and Jarir Saleh of NGS. HTDP is a utility that allows users to transform positional coordinates across time and between spatial reference frames. See <https://…/Htdp.shtml> for more information.
htm2txt Convert Html into Text
Wipe out tags in a html document and extract a text from it.
htmltab Assemble Data Frames from HTML Tables
htmltab is a package for extracting structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides two major advantages. First, the package automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table. Additionally, the function preprocesses table code, removes unneeded parts and so helps to alleviate the need for tedious post-processing.
htmlTable Advanced Tables for Markdown/HTML
A package for creating tables with state of the art layout containing row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout the underlying CSS structure is simple in order to maximize compatibility with MS Word/LibreOffice. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.
htmltidy Clean Up or Pretty Print Gnarly HTML and XHTML
HTML documents can be beautiful and pristine. They can also be wretched, evil, malformed demon-spawn. Now, you can tidy up that HTML and XHTML before processing it with your favorite angle-bracket crunching tools, going beyond the limited tidying that ‘libxml2’ affords in the ‘XML’ and ‘xml2’ packages and taming even the ugliest HTML code generated by the likes of Google Docs and Microsoft Word. It’s also possible to use the functions provided to format or ‘pretty print’ HTML content as it is being tidied.
htmlwidgets HTML Widgets for R
A framework for creating HTML widgets that render in various contexts including the R console, R Markdown documents, and Shiny web applications.
htree Historical Tree Ensembles for Longitudinal Data
Historical regression trees are an extension of standard trees, producing a non-parametric estimate of how the response depends on all of its prior realizations as well as that of any time-varying predictor variables. The method applies equally to regularly as well as irregularly sampled data. The package implements random forest and boosting ensembles based on historical regression trees, suitable for longitudinal data.
hts Hierarchical and grouped time series
Methods for analysing and forecasting hierarchical and grouped time series.
httpcache Query Cache for HTTP Clients
In order to improve performance for HTTP API clients, ‘httpcache’ provides simple tools for caching and invalidating cache. It includes the HTTP verb functions GET, PUT, PATCH, POST, and DELETE, which are drop-in replacements for those in the ‘httr’ package. These functions are cache-aware and provide default settings for cache invalidation suitable for RESTful APIs; the package also enables custom cache-management strategies. Finally, ‘httpcache’ includes a basic logging framework to facilitate the measurement of HTTP request time and cache performance.
httpcode HTTP’ Status Code Helper
Find and explain the meaning of ‘HTTP’ status codes. Functions included for searching for codes by full or partial number, by message, and get appropriate dog and cat images for many status codes.
httping Ping’ ‘URLs’ to Time ‘Requests’
A suite of functions to ping ‘URLs’ and to time ‘HTTP’ ‘requests’. Designed to work with ‘httr’.
httptest A Test Environment for HTTP Requests
Testing code and packages that communicate with remote servers can be painful. Dealing with authentication, bootstrapping server state, cleaning up objects that may get created during the test run, network flakiness, and other complications can make testing seem too costly to bother with. But it doesn’t need to be that hard. This package enables one to test all of the logic on the R sides of the API in your package without requiring access to the remote service. Importantly, it provides three test contexts that mock the network connection in different ways, and it offers additional expectations to assert that HTTP requests were–or were not–made. Using these tools, one can test that code is making the intended requests and that it handles the expected responses correctly, all without depending on a connection to a remote API.
humaniformat A Parser for Human Names
Human names are complicated and nonstandard things. Humaniformat, which is based on Anthony Ettinger’s ‘humanparser’ project (https://…/humanparser ) provides functions for parsing human names, making a best-guess attempt to distinguish sub-components such as prefixes, suffixes, middle names and salutations.
humanize Create Values for Human Consumption
An almost direct port of the ‘python’ ‘humanize’ package <https://…/humanize>. This package contains utilities to convert values into human readable forms.
humanleague Synthetic Population Generator
Generates high-entropy integer synthetic populations from marginal and (optionally) seed data using quasirandom sampling, in arbitrary dimensionality (Smith, Lovelace and Birkin (2017) <doi:10.18564/jasss.3550>). The package also provides an implementation of the Iterative Proportional Fitting (IPF) algorithm (Zaloznik (2011) <doi:10.13140/2.1.2480.9923>).
hunspell Hunspell Spell Checker
A spell checker and morphological analyzer library designed for languages with rich morphology and complex word compounding or character encoding.
hurdlr Zero-Inflated and Hurdle Modelling Using Bayesian Inference
When considering count data, it is often the case that many more zero counts than would be expected of some given distribution are observed. It is well established that data such as this can be reliably modelled using zero-inflated or hurdle distributions, both of which may be applied using the functions in this package. Bayesian analysis methods are used to best model problematic count data that cannot be fit to any typical distribution. The package functions are flexible and versatile, and can be applied to varying count distributions, parameter estimation with or without explanatory variable information, and are able to allow for multiple hurdles as it is also not uncommon that count data have an abundance of large-number observations which would be considered outliers of the typical distribution. In lieu of throwing out data or misspecifying the typical distribution, these extreme observations can be applied to a second, extreme distribution. With the given functions of this package, such a two-hurdle model may be easily specified in order to best manage data that is both zero-inflated and over-dispersed.
hutilscpp Miscellaneous Functions in C++
Provides utility functions that are simply, frequently used, but may require higher performance that what can be obtained from base R. Incidentally provides support for ‘reverse geocoding’, such as matching a point with its nearest neighbour in another array. Used as a complement to package ‘hutils’ by sacrificing compilation or installation time for higher running speeds. The name is a portmanteau of the author and ‘Rcpp’.
huxtable Simply Create LaTeX and HTML Tables
Creates HTML and LaTeX tables. Provides similar functionality to ‘xtable’, but does more, with a simpler interface. Allows export to Microsoft Word or PowerPoint using the ‘officer’ package. Includes a ‘huxreg’ function for creation of regression tables, and ‘quick_*’ one-liner commands to print data as Word, PDF or HTML.
hwwntest Tests of White Noise using Wavelets
Provides methods to test whether time series is consistent with white noise.
HybridFS A Hybrid Filter-Wrapper Feature Selection Method
A hybrid method of feature selection which combines both filter and wrapper methods. The first level involves feature reduction based on some of the important Filter methods while the second level involves feature subset selection as in a wrapper method. Experimental results show that this hybrid feature selection algorithm simplifies the feature selection process effectively and obtains higher classification accuracy, reduced processing time and improved data handling capacity than other feature selection algorithms.
hybridModels Stochastic Hybrid Models in Dynamic Networks
Simulates stochastic hybrid models for transmission of infectious diseases in dynamic networks.
hydra Hyperbolic Embedding
Calculate an optimal embedding of a set of data points into low-dimensional hyperbolic space. This uses the strain-minimizing hyperbolic embedding of Keller-Ressel and Nargang (2019), see <arXiv:1903.08977>.
hyper.fit Generic N-Dimensional Hyperplane Fitting with Heteroscedastic Covariant Errors and Intrinsic Scatter
Includes two main high level codes for hyperplane fitting (hyper.fit) and visualising (hyper.plot2d / hyper.plot3d). In simple terms this allows the user to produce robust 1D linear fits for 2D x vs y type data, and robust 2D plane fits to 3D x vs y vs z type data. This hyperplane fitting works generically for any N-1 hyperplane model being fit to a N dimension dataset. All fits include intrinsic scatter in the generative model orthogonal to the hyperplane.
hyper2 The Hyperdirichlet Distribution, Mark 2
A suite of routines for the hyperdirichlet distribution; supersedes the hyperdirichlet package for most purposes.
hypercube Organizing Data in a Hypercube
Provides methods for organizing data in a hypercube (i.e. a multi-dimensional cube). Cubes are generated from molten data frames. Each cube can be manipulated with five operations: rotation (changeDimensionOrder()), dicing and slicing (add.selection(), remove.selection()), drilling down (add.aggregation()), and rolling up (remove.aggregation()).
hypergate Machine Learning of Hyperrectangular Gating Strategies for High-Dimensional Cytometry
Given a high-dimensional dataset that typically represents a cytometry dataset, and a subset of the datapoints, this algorithm outputs an hyperrectangle so that datapoints within the hyperrectangle best correspond to the specified subset. In essence, this allows the conversion of clustering algorithms’ outputs to gating strategies outputs. For more details see Etienne Becht, Yannick Simoni, Elaine Coustan-Smith, Maximilien Evrard, Yang Cheng, Lai Guan Ng, Dario Campana and Evan Newell (2018) <doi:10.1101/278796>.
hypersampleplan Attribute Sampling Plan with Exact Hypergeometric Probabilities using Chebyshev Polynomials
Implements an algorithm for efficient and exact calculation of hypergeometric and binomial probabilities using Chebyshev polynomials, while other algorithm use an approximation when N is large. A useful applications is also considered in this package for the construction of attribute sampling plans which is an important field of statistical quality control. The quantile, and the confidence limit for the attribute sampling plan are also implemented in this package. The hypergeometric distribution can be represented in terms of Chebyshev polynomials. This representation is particularly useful in the calculation of exact values of hypergeometric variables.
hyperSMURF Hyper-Ensemble Smote Undersampled Random Forests
Machine learning supervised method to learn rare genomic features in imbalanced genetic data sets. This method can be also applied to classify or rank examples characterized by a high imbalance between the minority and majority class. hyperSMURF adopts a hyper-ensemble (ensemble of ensembles) approach, undersampling of the majority class and oversampling of the minority class to learn highly imbalanced data.
hyphenatr Tools to Hyphenate Strings Using the ‘Hunspell’ Hyphenation Library
Identifying hyphenation points in strings can be useful for both text processing and display functions. The ‘Hunspell’ hyphenation library <https://…/hyphen> provides tools to perform hyphenation using custom language rule dictionaries. Many hyphenation rules dictionaries are included. Words can be hyphenated directly or split into hyphenated component strings for further processing.
hypoparsr Multi-Hypothesis CSV Parser
A Multi-Hypothesis CSV Parser. Stresses your computer not you.
HyRiM Multicriteria Risk Management using Zero-Sum Games with Vector-Valued Payoffs that are Probability Distributions
Construction and analysis of multivalued zero-sum matrix games over the abstract space of probability distributions, which describe the losses in each scenario of defense vs. attack action. The distributions can be compiled directly from expert opinions or other empirical data (insofar available). The package implements the methods put forth in the EU project HyRiM (Hybrid Risk Management for Utility Networks), FP7 EU Project Number 608090.
HYRISK Hybrid Methods for Addressing Uncertainty in RISK Assessments
Methods for addressing uncertainty in risk assessments using hybrid representations of uncertainty (probability distributions, fuzzy numbers, intervals, probability distributions with imprecise parameters). The uncertainty propagation procedure combines random sampling using Monte Carlo method with fuzzy interval analysis of Baudrit et al. (2007) <doi:10.1109/TFUZZ.2006.876720>. The sensitivity analysis is based on the pinching method of Ferson and Tucker (2006) <doi:10.1016/j.ress.2005.11.052>.

I

IAbin Plotting N-T Plane for Decision on Performing an Interim Analysis
In randomized-controlled trials, interim analyses are often planned for possible early trial termination to claim superiority or futility of a new therapy. Blinded data also have information about the potential treatment difference between the groups. We developed a blinded data monitoring tool that enables investigators to predict whether they observe such an unblinded interim analysis results that supports early termination of the trial. Investigators may skip some of the planned interim analyses if an early termination is unlikely. This tool will provide reference information about N: Sample size at interim analysis, and T: Total number of responders at interim analysis for decision on performing an interim analysis.
iadf Analysis of Intra Annual Density Fluctuations
Calculate false ring proportions from data frames of intra annual density fluctuations.
iaQCA The Irvine Robustness Assessment for Qualitative Comparative Analysis
Test the robustness of the QCA method, or a user’s QCA solutions, to randomness. iaQCA is also packaged with the irQCA function, which provides recommendations for improving QCA solutions to reach typical significance levels.
IATScore Scoring Algorithm for the Implicit Association Test (IAT)
This minimalist package is designed to quickly score raw data outputted from an Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) <doi:10.1037/0022-3514.74.6.1464>. IAT scores are calculated as specified by Greenwald, Nosek, and Banaji (2003) <doi:10.1037/0022-3514.85.2.197>. Outputted values can be interpreted as effect sizes. The input function consists of three arguments. First, indicate the name of the dataset to be analyzed. This is the only required input. Second, indicate the number of trials in your entire IAT (the default is set to 220, which is typical for most IATs). Last, indicate whether congruent trials (e.g., flowers and pleasant) or incongruent trials (e.g., guns and pleasant) were presented first for this participant (the default is set to congruent). The script will tell you how long it took to run the code, the effect size for the participant, and whether that participant should be excluded based on the criteria outlined by Greenwald et al. (2003). Data files should consist of six columns organized in order as follows: Block (0-6), trial (0-19 for training blocks, 0-39 for test blocks), category (dependent on your IAT), the type of item within that category (dependent on your IAT), a dummy variable indicating whether the participant was correct or incorrect on that trial (0=correct, 1=incorrect), and the participant’s reaction time (in milliseconds). Three sample datasets are included in this package (labeled ‘IAT’, ‘TooFastIAT’, and ‘BriefIAT’) to practice with.
IATscores Implicit Association Test Scores Using Robust Statistics
Compute several variations of the Implicit Association Test (IAT) scores, including the D scores (Greenwald, Nosek, Banaji, 2003) and the new scores that were developed using robust statistics (Richetin, Costantini, Perugini, and Schonbrodt, 2015).
IBCF.MTME Item Based Collaborative Filtering for Multi-Trait and Multi-Environment Data
Implements the Item based collaborative filtering (IBCF) method for continues phenotypes in the context of plant breeding where data are collected for various traits that were studied in various environments proposed by Montesinos-López et al. (2017) <doi:10.1534/g3.117.300309>.
ibmcraftr Toolkits to Develop Individual-Based Models in Infectious Disease
It provides a generic set of tools for initializing a synthetic population with each individual in specific disease states, and making transitions between those disease states according to the rates calculated on each timestep. Additional functions will follow for changing attributes on demographic, health belief and movement.
iBreakDown Model Agnostic Instance Level Variable Attributions
Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the ‘breakDown’ package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with ‘D3.js’ library). The methodology behind is described in the ‘iBreakDown’ article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the ‘DrWhy.AI’ universe (Biecek 2018) <arXiv:1806.08915>.
ibs Integral of B-Spline Functions
Calculate B-spline basis functions with a given set of knots and order, or a B-spline function with a given set of knots and order and set of de Boor points (coefficients), or the integral of a B-spline function.
iBST Improper Bagging Survival Tree
Fit a bagging survival tree on a mixture of population (susceptible and nonsusceptible) using either a pseudo R2 criterion or an adjusted Logrank criterion. The predictor is evaluated using the Out Of Bag Integrated Brier Score (IBS) and several scores of importance are computed for variable selection. The thresholds values for variable selection are computed using a nonparametric permutation test.
ica Independent Component Analysis
Independent Component Analysis (ICA) using various algorithms: FastICA, Information-Maximization (Infomax), and Joint Approximate Diagonalization of Eigenmatrices (JADE).
ICAFF Imperialist Competitive Algorithm
Imperialist Competitive Algorithm (ICA) http://…/Imperialist_competitive_algorithm is a computational method that is used to solve optimization problems of different types and it is the mathematical model and the computer simulation of human social evolution. The package provides a minimum value for the cost function and the best value for the optimization variables by Imperialist Competitive Algorithm. Users can easily define their own objective function depending on the problem at hand. This version has been successfully applied to solve optimization problems, for continuous functions.
ical iCalendar’ Parsing
A simple wrapper around the ‘ical.js’ library executing ‘Javascript’ code via ‘V8’ (the ‘Javascript’ engine driving the ‘Chrome’ browser and ‘Node.js’ and accessible via the ‘V8’ R package). This package enables users to parse ‘iCalendar’ files (‘.ics’, ‘.ifb’, ‘.iCal’, ‘.iFBf’) into lists and ‘data.frames’ to ultimately do statistics on events, meetings, schedules, birthdays, and the like.
ICAOD Imperialist Competitive Algorithm for Optimal Designs
Finding locally D-optimal, minimax D-optimal, standardized maximin D-optimal, optim-on-the-average and multiple objective optimal designs for nonlinear models. Different Fisher information matrices can also be set by user. There are also useful functions for verifying the optimality of the designs with respect to different criteria by equivalence theorem. ICA is a meta-heuristic evolutionary algorithm inspired from the socio-political process of humans. See Masoudi et al. (2016) <doi:10.1016/j.csda.2016.06.014>.
icarus Calibrates and Reweights Units in Samples
Provides user-friendly tools for calibration in survey sampling. The package is production-oriented, and its interface is inspired by the famous popular macro ‘Calmar’ for SAS, so that ‘Calmar’ users can quickly get used to ‘icarus’. In addition to calibration (with linear, raking and logit methods), ‘icarus’ features functions for calibration on tight bounds and penalized calibration.
ICBayes Bayesian Semiparametric Models for Interval-Censored Data
Contains functions to fit Bayesian semiparametric regression survival models (proportional hazards model, proportional odds model, and probit model) to interval-censored time-to-event data.
ICC.Sample.Size Calculation of Sample Size and Power for ICC
Provides functions to calculate the requisite sample size for studies where ICC is the primary outcome. Can also be used for calculation of power. In both cases it allows the user to test the impact of changing input variables by calculating the outcome for several different values of input variables. Based off the work of Zou. Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in medicine, 31(29), 3972-3981.
ICCbin Facilitates Clustered Binary Data Generation, and Estimation of Intracluster Correlation Coefficient (ICC) for Binary Data
Assists in generating binary clustered data, estimates of Intracluster Correlation coefficient (ICC) for binary response in 14 different methods, and 4 different types of confidence intervals.
ICcforest An Ensemble Method for Interval-Censored Survival Data
Implements the conditional inference forest approach to modeling interval-censored survival data. It also provides functions to tune the parameters and evaluate the model fit. See Yao et al. (2019) <arXiv:1901.04599>.
icdGLM EM by the Method of Weights for Incomplete Categorical Data in Generlized Linear Models
Provides an estimator for generalized linear models with incomplete data for discrete covariates. The estimation is based on the EM algorithm by the method of weights by Ibrahim (1990) <DOI:10.2307/2290013>.
ICGOR Fit Generalized Odds Rate Hazards Model with Interval Censored Data
Generalized Odds Rate Hazards (GORH) model is a flexible model of fitting survival data, including the Proportional Hazards (PH) model and the Proportional Odds (PO) Model as special cases. This package fit the GORH model with interval censored data.
iClick A Button-Based GUI for Financial and Economic Data Analysis
A GUI designed to support the analysis of financial-economic time series data.
icmm Empirical Bayes Variable Selection via ICM/M Algorithm
Carries out empirical Bayes variable selection via ICM/M algorithm. The basic problem is to fit high-dimensional regression which most coefficients are assumed to be zero. This package allows incorporating the Ising prior to capture structure of predictors in the modeling process. The current version of this package can handle the normal, binary logistic, and Cox’s regression (Pungpapong et. al. (2015) <doi:10.1214/15-EJS1034>, Pungpapong et. al. (2017) <arXiv:1707.08298>).
icosa Global Triangular and Penta-Hexagonal Grids Based on Tessellated Icosahedra
Employs triangular tessellation to refine icosahedra defined in 3d space. The procedures can be set to provide a grid with a custom resolution. Both the primary triangular and their inverted penta- hexagonal grids are available for implementation. Additional functions are provided to position points (latitude-longitude data) on the grids, to allow 2D and 3D plotting, use raster data and shapefiles.
icr Compute Krippendorff’s Alpha
Provides functions to compute and plot Krippendorff’s inter-coder reliability coefficient alpha and bootstrapped uncertainty estimates (Krippendorff 2004, ISBN:0761915443).
ICRanks Simultaneous Confidence Intervals for Ranks
Algorithms to construct confidence intervals for the ranks of centers mu_1,…,mu_n based on an independent Gaussian sample using multiple testing techniques.
icRSF A Modified Random Survival Forest Algorithm
Implements a modification to the Random Survival Forests algorithm for obtaining variable importance in high dimensional datasets. The proposed algorithm is appropriate for settings in which a silent event is observed through sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The modified algorithm incorporates a formal likelihood framework that accommodates sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The original Random Survival Forests algorithm is modified by the introduction of a new splitting criterion based on a likelihood ratio test statistic.
ICS Tools for Exploring Multivariate Data via ICS/ICA
Implementation of Tyler, Critchley, Duembgen, Oja, Sirkia and Eriksson’s method of two different scatter matrices to obtain an invariant coordinate system or independent components, depending on the underlying assumptions.
ICSOutlier Outlier Detection Using Invariant Coordinate Selection
Multivariate outlier detection is performed using invariant coordinates where the package offers different methods to choose the appropriate components.
ICSShiny ICS via a Shiny Application
Performs Invariant Coordinate Selection (ICS) (Tyler, Critchley, Duembgen and Oja (2009) <doi:10.1111/j.1467-9868.2009.00706.x>) and especially ICS outlier identification (Archimbaud, Nordhausen, Ruiz-Gazen (2016) <arXiv:1612.06118>) using a shiny app.
ICtest Estimating and Testing the Number of Interesting Components in Linear Dimension Reduction
For different linear dimension reduction methods like principal components analysis (PCA), independent components analysis (ICA) and supervised linear dimension reduction tests and estimates for the number of interesting components (ICs) are provided.
ICV Indirect Cross-Validation (ICV) for Kernel Density Estimation
Functions for computing the global and local Gaussian density estimates based on the ICV bandwidth. See the article of Savchuk, O.Y., Hart, J.D., Sheather, S.J. (2010). Indirect cross-validation for density estimation. Journal of the American Statistical Association, 105(489), 415-423 <doi:10.1198/jasa.2010.tm08532>.
idbr R Interface to the US Census Bureau International Data Base API
Use R to make requests to the US Census Bureau’s International Data Base API. Results are returned as R data frames. For more information about the IDB API, visit http://…/international-database.html.
IDE Integro-Difference Equation Spatio-Temporal Models
The Integro-Difference Equation model is a linear, dynamical model used to model phenomena that evolve in space and in time; see, for example, Cressie and Wikle (2011, ISBN:978-0-471-69274-4) or Dewar et al. (2009) <doi:10.1109/TSP.2008.2005091>. At the heart of the model is the kernel, which dictates how the process evolves from one time point to the next. Both process and parameter reduction are used to facilitate computation, and spatially-varying kernels are allowed. Data used to estimate the parameters are assumed to be readings of the process corrupted by Gaussian measurement error. Parameters are fitted by maximum likelihood, and estimation is carried out using an evolution algorithm.
idealstan Bayesian IRT Ideal Point Models with ‘Stan’
Offers item-response theory (IRT) ideal-point scaling/dimension reduction methods that incorporate additional response categories and missing/censored values, including absences and abstentions, for roll call voting data (or any other kind of binary or ordinal item-response theory data). Full and approximate Bayesian inference is done via the ‘Stan’ engine (www.mc-stan.org).
ideamdb Easy Manipulation of IDEAM’s Climatological Data
Time series plain text conversion and data visualization. It allows to transform IDEAM (Instituto de Hidrologia, Meteorologia y Estudios Ambientales) daily series from plain text to CSV files or data frames in R. Additionally, it is possible to obtain exploratory graphs from times series. IDEAM’s data is freely delivered under formal request through the official web page <http://…/solicitud-de-informacion>.
idefix Efficient Designs for Discrete Choice Experiments
Generates efficient designs for discrete choice experiments based on the multinomial logit model, and individually adapted designs for the mixed multinomial logit model. Crabbe M, Akinc D and Vandebroek M (2014) <doi:10.1016/j.trb.2013.11.008>.
idendr0 Interactive Dendrograms
Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a built-in heat map, but also in GGobi interactive plots and user-supplied plots. This is a backport of Qt-based ‘idendro’ (https://…/idendro ) to base R graphics and Tcl/Tk GUI.
ider Various Methods for Estimating Intrinsic Dimension
An implementation of various methods for estimating intrinsic dimension of vector-valued dataset or distance matrix. Most methods implemented are based on different notion of fractal dimension such as the capacity dimension, the box-counting dimension, and the information dimension.
IDetect Isolate-Detect Methodology for Multiple Change-Point Detection
Provides efficient implementation of the Isolate-Detect methodology for the consistent estimation of the number and location of multiple change-points in one-dimensional data sequences from the ‘deterministic + noise’ model. For details on the Isolate-Detect methodology, please see Anastasiou and Fryzlewicz (2018) <https://…6a0866c574654163b8255e272bc0001b.pdf>. Currently implemented scenarios are: piecewise-constant signal with Gaussian noise, piecewise-constant signal with heavy-tailed noise, continuous piecewise-linear signal with Gaussian noise, continuous piecewise-linear signal with heavy-tailed noise.
IDF Estimation and Plotting of IDF Curves
Intensity-duration-frequency (IDF) curves are a widely used analysis-tool in hydrology to assess extreme values of precipitation [e.g. Mailhot et al., 2007, <doi:10.1016/j.jhydrol.2007.09.019>]. The package ‘IDF’ provides a function to read precipitation data from German weather service (DWD) ‘webwerdis’ <http://…/webwerdis.html> files and Berlin station data from ‘Stadtmessnetz’ <http://…/index.html> files, and additionally IDF parameters can be estimated also from a given data.frame containing a precipitation time series. The data is aggregated to given levels yearly intensity maxima are calculated either for the whole year or given months. From these intensity maxima IDF parameters are estimated on the basis of a duration-dependent generalised extreme value distribution [Koutsoyannis et al., 1998, <doi:10.1016/S0022-1694(98)00097-3>]. IDF curves based on these estimated parameters can be plotted.
idm Incremental Decomposition Methods
Incremental Principal Component Analysis and Multiple Correspondence Analysis using incremental eigenvalue decomposition methods.
IDmining Intrinsic Dimension for Data Mining
Contains techniques for mining large high-dimensional data sets by using the concept of Intrinsic Dimension (ID). Here the ID is not necessarily integer. It is extended to fractal dimensions. And the Morisita estimator is used for the ID estimation, but other tools are included as well.
idmTPreg Regression Model for Progressive Illness Death Data
Modeling of regression effects for transition probabilities in a progressive illness-death model.
ids Generate Random Identifiers
Generate random or human readable and pronounceable identifiers.
iECAT Integrating External Controls into Association Test
Functions for single-variant and region-based tests with external control samples. These methods use external study samples as control samples with adjusting for possible batch effects.
ifaTools Toolkit for Item Factor Analysis with OpenMx
Tools, tutorials, and demos of Item Factor Analysis using OpenMx.
IFSPlot Draw Fractals using Iterated Function Systems and C++
With this package you can draw fractals using iterated function systems.
IGG Inverse Gamma-Gamma
Implements Bayesian linear regression, normal means estimation, and variable selection using the inverse gamma-gamma prior, as introduced by Bai and Ghosh (2018) <arXiv:1710.04369>.
IGM.MEA IGM MEA Analysis
Software tools for the characterization of neuronal networks as recorded on multi-electrode arrays.
IGP Interchangeable Gaussian Process Models
Creates a Gaussian process model using the specified package. Makes it easy to try different packages in same code, only the package argument needs to be changed. It is essentially a wrapper for the other Gaussian process software packages.
igraph Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
igraphinshiny Use ‘shiny’ to Demo ‘igraph’
Using ‘shiny’ to demo ‘igraph’ package makes learning graph theory easy and fun.
iheatmapr Interactive, Complex Heatmaps
Make complex, interactive heatmaps. ‘iheatmapr’ includes a modular system for iteratively building up complex heatmaps, as well as the iheatmap() function for making relatively standard heatmaps.
IHSEP Inhomogeneous Self-Exciting Process
Simulate an inhomogeneous self-exciting process (IHSEP), or Hawkes process, with a given (time-varying) baseline intensity and an excitation function. Calculate the likelihood of an IHSEP with given baseline intensity and excitation functions for an (increasing) sequence of event times. Calculate the point process residuals (integral transforms of the original event times). Calculate the mean intensity process.
iilasso Independently Interpretable Lasso
Efficient algorithms for fitting linear / logistic regression model with Independently Interpretable Lasso. Takada, M., Suzuki, T., & Fujisawa, H. (2018). Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables. AISTATS. <http://…/takada18a.pdf>.
iJRF Integrative Joint Random Forest
Integrative framework for the simultaneous estimation of interactions from different class of data.
ijtiff TIFF I/O for ‘ImageJ’ Users
Correctly import TIFF files that were saved from ‘ImageJ’ and write TIFF files than can be correctly read by ‘ImageJ’ <https://…/>. Full support for TIFF files with floating point (real-numbered) pixels. Also supports text image I/O.
ILS Interlaboratory Study
It performs interlaboratory studies (ILS) to detect those laboratories that provide non-consistent results when comparing to others. It permits to work simultaneously with various testing materials, from standard univariate, and functional data analysis (FDA) perspectives. The univariate approach based on ASTM E691-08 consist of estimating the Mandel’s h and k statistics to identify those laboratories that provide more significant different results, testing also the presence of outliers by Cochran and Grubbs tests, Analysis of variance (ANOVA) techniques are provided (F and Tuckey tests) to test differences in means corresponding to different laboratories per each material. Taking into account the functional nature of data retrieved in analytical chemistry, applied physics and engineering (spectra, thermograms, etc.). ILS package provides a FDA approach for finding the Mandel’s k and h statistics distribution by smoothing bootstrap resampling.
imager R package for image processing
Imager is an image/video processing package for R, based on CImg, a C++ library by David Tschumperlé. CImg provides an easy-to-use and consistent API for image processing, which imager largely replicates. CImg supports images in up to four dimensions, which makes it suitable for applications like video processing/hyperspectral imaging/MRI.
GitHub
imagerExtra Extra Image Processing Library Based on ‘imager’
Providing several advanced functions for image processing based on the package ‘imager’.
IMaGES Independent Multiple-Sample Greedy Equivalence Search Implementation
Functions for the implementation of Independent Multiple-sample Greedy Equivalence Search (IMaGES), a causal inference algorithm for creating aggregate graphs and structural equation modeling data for one or more datasets. This package is useful for time series data with specific regions of interest. This implementation is inspired by the paper ‘Six problems for causal inference from fMRI’ by Ramsey, Hanson, Hanson, Halchenko, Poldrack, and Glymour (2010) <DOI:10.1016/j.neuroimage.2009.08.065>. The IMaGES algorithm uses a modified BIC score to compute goodness of fit of edge additions, subtractions, and turns across all datasets and returns a representative graph, along with structural equation modeling data for the global graph and individual datasets, means, and standard errors. Functions for plotting the resulting graph(s) are provided. This package is built upon the ‘pcalg’ package.
imageviewer Simple ‘htmlwidgets’ Image Viewer with WebGL Brightness/Contrast
Display a 2D-matrix data as a interactive zoomable gray-scale image viewer, providing tools for manual data inspection. The viewer window shows cursor guiding lines and a corresponding data slices for both axes at the current cursor position. A tool-bar allows adjusting image display brightness/contrast through WebGL filters and performing basic high-pass/low-pass filtering.
imagine Imaging Engine, Tools for Application of Image Filters to Data Matrices
Provides fast application of image filters to data matrices, using R and C++ algorithms.
imbalance Preprocessing Algorithms for Imbalanced Datasets
Algorithms to treat imbalanced datasets. Imbalanced datasets usually damage the performance of the classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent preprocessing algorithms in the literature.
IMIFA Fitting, Diagnostics, and Plotting Functions for Infinite Mixtures of Infinite Factor Analysers and Related Models
Provides flexible Gibbs sampler functions for fitting Infinite Mixtures of Infinite Factor Analysers and related models, introduced by Murphy et al. (2017) <https://…/1701.07010>, which conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors. Model-specific diagnostic tools are also provided, as well as many options for plotting results and conducting posterior inference on parameters of interest.
iml Interpretable Machine Learning
Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <arXiv:1801.01489>, partial dependence plots described by Friedman (2001) <http://…/2699986>, individual conditional expectation (‘ice’) plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of ‘lime’) described by Ribeiro et. al (2016) <arXiv:1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x> and tree surrogate models.
IMmailgun Send Emails using ‘Mailgun’
Send emails using the ‘mailgun’ api. To use this package you will need an account from <https://www.mailgun.com> .
IMP Interactive Model Performance Evaluation
Contains functions for evaluating & comparing the performance of Binary classification models. Functions can be called either statically or interactively (as Shiny Apps).
IMPACT The Impact of Items
Implement a multivariate analysis of the impact of items to identify a bias in the questionnaire validation of Likert-type scale variables. The items requires considering a null value (category doesn’t have tendency). Offering frequency, importance and impact of the items.
impimp Imprecise Imputation for Statistical Matching
Imputing blockwise missing data by imprecise imputation, featuring a domain-based, variable-wise, and case-wise strategy. Furthermore, the estimation of lower and upper bounds for unconditional and conditional probabilities based on the obtained imprecise data is implemented. Additionally, two utility functions are supplied: one to check whether variables in a data set contain set-valued observations; and another to merge two already imprecisely imputed data.
implyr R Interface for Apache Impala
SQL’ back-end to ‘dplyr’ for Apache Impala (incubating), the massively parallel processing query engine for Apache ‘Hadoop’. Impala enables low-latency ‘SQL’ queries on data stored in the ‘Hadoop’ Distributed File System ‘(HDFS)’, Apache ‘HBase’, Apache ‘Kudu’, and Amazon Simple Storage Service ‘(S3)’. See <https://impala.apache.org> for more information about Impala.
imPois Imprecise Inferential Framework for Poisson Sampling Models
A collection of tools needed for conducting imprecise inferential framework is provided. Poisson sampling model and zero-truncated Poisson sampling models are mainly studied as the part of larger project, Imprecise Probability Estimates for Generalized Linear Model. Imprecise probability theory introduced by Peter Wally in 1991 is the basis of this inference.
import An Import Mechanism for R
This is an alternative mechanism for importing objects from packages. The syntax allows for importing multiple objects from a package with a single command in an expressive way. The import package bridges some of the gap between using library (or require) and direct (single-object) imports. Furthermore the imported objects are not placed in the current environment.
importar Enables Importing/Loadig of Packages or Functions While Creating an Alias for Them
Enables ‘Python’-like importing/loading of packages or functions with aliasing to prevent namespace conflicts.
ImportExport Import and Export Data
Import and export data from the most common statistical formats by using R functions that guarantee the least loss of the data information, giving special attention to the date variables and the labelled ones.
imputeMissings Impute Missing Values in a Predictive Context
Compute missing values on a training data set and impute them on a new data set. Current available options are median/mode and random forest.
imputeMulti Imputation Methods for Multivariate Multinomial Data
Implements imputation methods using EM and Data Augmentation for multinomial data following the work of Schafer 1997 <ISBN: 978-0-412-04061-0>.
ImputeRobust Robust Multiple Imputation with Generalized Additive Models for Location Scale and Shape
Provides new imputation methods for the ‘mice’ package based on generalized additive models for location, scale, and shape (GAMLSS) as described in de Jong, van Buuren and Spiess <doi:10.1080/03610918.2014.911894>.
imputeTestbench Test Bench for Missing Data Imputing Models/Methods Comparison
Provides a Test bench for comparison of missing data imputation models/methods. It compares imputing methods with reference to RMSE, MAE or MAPE parameters. It allows to add new proposed methods to test bench and to compare with other methods. The function ‘append_method()’ allows to add multiple numbers of methods to the existing methods available in test bench.
imputeTS Time Series Missing Value Imputation
Imputation (replacement) of missing values in univariate time series.
imputeYn Imputing the Last Largest Censored Observation(s) Under Weighted Least Squares
Method brings less bias and more efficient estimates for AFT models.
IMTest Information Matrix Test for Generalized Partial Credit Models
Implementation of the information matrix test for generalized partial credit models.
IMWatson Chat with Watson’s Assistant API
Chat with a chatbot created with the ‘IBM Watson Assistant’ <https://…/>.
inaparc Initialization Algorithms for Partitioning Cluster Analysis
Partitioning clustering algorithms divide data sets into k subsets or partitions which are so-called clusters. They require some initialization procedures for starting to partition the data sets. Initialization of cluster prototypes is one of such kind of procedures for most of the partitioning algorithms. Cluster prototypes are the data elements, i.e. centroids or medoids, representing the clusters in a data set. In order to initialize cluster prototypes, the package ‘inaparc’ contains a set of the functions that are the implementations of several linear time-complexity and loglinear time-complexity methods in addition to some novel techniques. Initialization of fuzzy membership degrees matrices is another important task for starting the probabilistic and possibilistic partitioning algorithms. In order to initialize membership degrees matrices required by these algorithms, a number of functions based on some traditional and novel initialization techniques are also available in the package ‘inaparc’.
inca Integer Calibration
Specific functions are provided for rounding real weights to integers and performing an integer programming algorithm for calibration problems. They are useful for census-weights adjustments, or for performing linear regression with integer parameters.
IncDTW Incremental Calculation of Dynamic Time Warping
Implements incremental calculation of the DTW ( Dynamic Time Warping) distance of two vectors, which is specifically useful for life data streams. Further the calculation of the global cost matrix is implemented in C++ to be faster. The Sakoe Chiba band is also implemented. The calculation of DTW is less functional then the one of dtw(), however much faster. For details about DTW see the original paper ‘Dynamic programming algorithm optimization for spoken word recognition’ by Sakoe and Chiba (1978) <DOI:10.1109/TASSP.1978.1163055>.
incgraph Incremental Graphlet Counting for Network Optimisation
An efficient and incremental approach for calculating the differences in orbit counts when performing single edge modifications in a network. Calculating the differences in orbit counts is much more efficient than recalculating all orbit counts from scratch for each time point.
IndepTest Nonparametric Independence Tests Based on Entropy Estimation
Implementations of the weighted Kozachenko-Leonenko entropy estimator and independence tests based on this estimator, (Kozachenko and Leonenko (1987) <http://…/ppi797> ). Also includes a goodness-of-fit test for a linear model which is an independence test between covariates and errors.
IndexConstruction Index Construction for Time Series Data
Derivation of indexes for benchmarking purposes. The methodology of the CRyptocurrency IndeX (CRIX) family with flexible number of constituents is implemented. Also functions for market capitalization and volume weighted indexes with fixed number of constituents are available. The methodology behind the functions provided gets introduced in Trimborn and Haerdle (2018) <doi:10.1016/j.jempfin.2018.08.004>.
IndexNumR Index Number Calculation
Computes bilateral and multilateral index numbers. It has support for several standard bilateral indices as well as the GEKS multilateral index number methods (see Ivancic, Diewert and Fox (2011) <doi:10.1016/j.jeconom.2010.09.003>) . It also supports updating of GEKS indexes using several splicing methods.
indirect Elicitation of Independent Conditional Means Priors for Generalised Linear Models
Functions are provided to facilitate prior elicitation for Bayesian generalised linear models using independent conditional means priors. The package supports the elicitation of multivariate normal priors for generalised linear models. The approach can be applied to indirect elicitation for a generalised linear model that is linear in the parameters. The package is designed such that the facilitator executes functions within the R console during the elicitation session to provide graphical and numerical feedback at each design point. Various methodologies for eliciting fractiles (equivalently, percentiles or quantiles) are supported, including versions of the approach of Hosack et al. (2017) <doi:10.1016/j.ress.2017.06.011>. For example, experts may be asked to provide central credible intervals that correspond to a certain probability. Or experts may be allowed to vary the probability allocated to the central credible interval for each design point. Additionally, a median may or may not be elicited.
IndTestPP Tests of Independence Between Point Processes in Time
Several parametric and non-parametric tests and measures to check independence between two or more (homogeneous or nonhomogeneous) point processes in time are provided. Tools for simulating point processes in one dimension with different types of dependence are also implemented.
infer Tidy Statistical Inference
The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
inferr Inferential Statistics
Select set of parametric and non-parametric statistical tests. ‘inferr’ builds upon the solid set of statistical tests provided in ‘stats’ package by including additional data types as inputs, expanding and restructuring the test results. The tests included are t tests, variance tests, proportion tests, chi square tests, Levene’s test, McNemar Test, Cochran’s Q test and Runs test.
infix Basic Infix Binary Operators
Contains a number of infix binary operators that may be useful in day to day practices.
Inflation Core Inflation
Provides access to core inflation functions. Four different core inflation functions are provided. The well known trimmed means, exclusion and double weighing methods, alongside the new Triple Filter method introduced in Ferreira et al. (2016) <https://goo.gl/UYLhcj>.
influence.ME Tools for Detecting Influential Data in Mixed Effects Models
influence.ME provides a collection of tools for detecting influential cases in generalized mixed effects models. It analyses models that were estimated using lme4. The basic rationale behind identifying influential data is that when iteratively single units are omitted from the data, models based on these data should not produce substantially different estimates. To standardize the assessment of how influential a (single group of) observation(s) is, several measures of influence are common practice, such as DFBETAS and Cook’s Distance. In addition, we provide a measure of percentage change of the fixed point estimates and a simple procedure to detect changing levels of significance.
influenceR Software Tools to Quantify Structural Importance of Nodes in a Network
Provides functionality to compute various node centrality measures on networks. Included are functions to compute betweenness centrality (by utilizing Madduri and Bader’s SNAP library), implementations of Burt’s constraint and effective network size (ENS) metrics, Borgatti’s algorithm to identify key players, and Valente’s bridging metric. On Unix systems, the betweenness, Key Players, and bridging implementations are parallelized with OpenMP, which may run faster on systems which have OpenMP configured.
InformationValue Performance Analysis and Companion Functions for Binary Classification Problems
Provides companion function for analysing the performance of classification models. Plot ‘ROC’ Curve in ‘ggplot2’, ‘AUROC’, ‘IV’, ‘WOE’ Calculation for Binary Classification.
InformativeCensoring Multiple Imputation for Informative Censoring
Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation from Jackson et al. (2014) <DOI:10.1002/sim.6274> and Risk Score Imputation from Hsu et al. (2009) <DOI:10.1002/sim.3480>.
InfoTrad Calculates the Probability of Informed Trading (PIN)
Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) <doi:10.1017/S0022109010000074> (EHO factorization) and Lin and Ke (2011) <doi:10.1016/j.finmar.2011.03.001> (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the grid-search algorithm proposed by Yan and Zhang (2012) <doi: 10.1016/j.jbankfin.2011.08.003> and hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) <doi:10.1080/14697688.2015.1023336> .
infuser A Very Basic Templating Engine
Replace parameters in strings and/or text files with specified values.
Infusion Inference Using Simulation
Implements functions for simulation-based inference. In particular, implements functions to perform likelihood inference from data summaries whose distributions are simulated.
ingredients Effects and Importances of Model Ingredients
Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependency() for partial dependency plots, conditional_dependency() for conditional dependency plots, accumulated_dependency() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, theme_drwhy() with a ‘ggplot2’ skin for all plots, generic print() and plot() for better usability of selected explainers. The package ‘ingredients’ is a part of the ‘DrWhy.AI’ universe (Biecek 2018) <arXiv:1806.08915>.
ini Read and Write ‘.ini’ Files
Parse simple ‘.ini’ configuration files to an structured list. Users can manipulate this resulting list with lapply() functions. This same structured list can be used to write back to file after modifications.
injectoR R Dependency Injection
R dependency injection framework. Dependency injection allows a program design to follow the dependency inversion principle. The user delegates to external code (the injector) the responsibility of providing its dependencies. This separates the responsibilities of use and construction.
INLA Integrated Nested Laplace Approximation
The R inla packages solves models using Integrated nested Laplace approximation (INLA) which is a new approach to statistical inference for latent Gaussian Markov random field (GMRF) models described in. In short, a latent GMRF model is a hierarchical model where, at the first stage we find a distributional assumption for the observables y usually assumed to be conditionally independent given some latent parameters and, possibly, some additional parameters.
inlabru Spatial Inference using Integrated Nested Laplace Approximation
Facilitates spatial modeling using integrated nested Laplace approximation via the INLA package (<http://www.r-inla.org> ). Additionally, implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. See Yuan Yuan, Fabian E. Bachl, Finn Lindgren, David L. Borchers, Janine B. Illian, Stephen T. Buckland, Havard Rue, Tim Gerrodette (2017), <arXiv:1604.06013>.
INLAutils Utility Functions for ‘INLA’
A number of utility functions for ‘INLA’ <http://www.r-inla.org>. Additional diagnostic plots and support for ‘ggplot2’. Step wise regression with ‘INLA’. Species distribution models and other helper functions.
inline Functions to Inline C, C++, Fortran Function Calls from R
Functionality to dynamically define R functions and S4 methods with inlined C, C++ or Fortran code supporting .C and .Call calling conventions.
inpdfr Analyse Text Documents Using Ecological Tools
A set of functions and a graphical user interface to analyse and compare texts, using classical text mining functions, as well as those from theoretical ecology.
inplace In-place Operators for R
It provides in-place operators for R that are equivalent to ‘+=’, ‘-=’, ‘*=’, ‘/=’ in C++. Those can be applied on integer|double vectors|matrices. You have also access to sweep operations (in-place).
insect Informatic Sequence Classification Trees
Provides a bioinformatics pipeline for DNA meta-barcoding analysis, including functions for sequence parsing, demultiplexing, quality filtering and probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.
insight Easy Access to Model Information for Various Model Objects
A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. ‘insight’ mainly revolves around two types of functions: Functions that find (the names of) information, starting with ‘find_’, and functions that get the underlying data, starting with ‘get_’. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.
InspectChangepoint High-Dimensional Changepoint Estimation via Sparse Projection
Provides a data-driven projection-based method for estimating changepoints in high-dimensional time series. Multiple changepoints are estimated using a (wild) binary segmentation scheme.
inspectdf Inspection, Comparison and Visualisation of Data Frames
A collection of utilities for columnwise summary, comparison and visualisation of data frames. Functions report missingness, categorical levels, numeric distribution, correlation, column types and memory usage.
inspectr Perform Basic Checks of Dataframes
Check one column or multiple columns of a dataframe using the preset basic checks or your own functions. Enables checks without knowledge of lapply() or sapply().
install.load Check, Install and Load CRAN & USGS GRAN Packages
Checks the local R library(ies) to see if the required package(s) is/are installed or not. If the package(s) is/are not installed, then the package(s) will be installed along with the required dependency(ies). This function pulls source or binary packages from the Revolution Analytics CRAN mirror and/or the USGS GRAN Repository. Lastly, the chosen package(s) is/are loaded.
installr Using R to install stuff (such as: R, Rtools, RStudio, git, and more!)
R is great for installing software. Through the ‘installr’ package you can automate the updating of R (on Windows, using updateR()) and install new software. Software installation is initiated through a gui (just run installr()), or through functions such as: install.Rtools(), install.pandoc(), install.git(), and many more. The updateR() command performs the following: finding the latest R version, downloading it, running the installer, deleting the installation file, copy and updating old packages to the new R installation.
instaR Access to Instagram API via R
Provides an interface to the Instagram API <https://…/>, which allows R users to download public pictures filtered by hashtag, popularity, user or location, and to access public users’ profile data.
insurancerating Analytic Insurance Rating Techniques
Methods for insurance rating. It provides a data driven strategy for the construction of insurance tariff classes. This strategy is based on the work by Antonio and Valdez (2012) <doi:10.1007/s10182-011-0152-7>. The package also adds functionality showing additional lines for the reference categories in the levels of the coefficients in the output of a generalized linear regression analysis. In addition it implements a procedure determining the level of a factor with the largest exposure, and thereafter changing the base level of the factor to this level.
intccr Semiparametric Regression on Cumulative Incidence Function with Interval-Censored Competing Risks Data
The function ciregic() fits semiparametric competing risks regression models with interval-censored data as described in Bakoyannis, Yu, and Yiannoutsos (2017) <doi:10.1002/sim.7350>.
intcensROC Fast Spline Function Based Constrained Maximum Likelihood Estimator for AUC Estimation of Interval Censored Survival Data
The kernel of this ‘Rcpp’ based package is an efficient implementation of the generalized gradient projection method for spline function based constrained maximum likelihood estimator for interval censored survival data (Wu, Yuan; Zhang, Ying. Partially monotone tensor spline estimation of the joint distribution function with bivariate current status data. Ann. Statist. 40, 2012, 1609-1636 <doi:10.1214/12-AOS1016>). The key function computes the density function of the joint distribution of event time and the marker and returns the receiver operating characteristic (ROC) curve for the interval censored survival data as well as area under the curve (AUC).
IntClust Integrated Data Analysis via Clustering
Several integrative data methods in which information of objects from different data sources can be combined are included in the IntClust package. As a single data source is limited in its point of view, this provides more insight and the opportunity to investigate how the variables are interconnected. Clustering techniques are to be applied to the combined information. For now, only agglomerative hierarchical clustering is implemented. Further, differential gene expression and pathway analysis can be conducted on the clusters. Plotting functions are available to visualize and compare results of the different methods.
integr An Implementation of Interaction Graphs of Aleks Jakulin
Generates a ‘Graphviz’ graph of the most significant 3-way interaction gains (i.e. conditional information gains) based on a provided discrete data frame. Various output formats are supported (‘Graphviz’, SVG, PNG, PDF, PS). For references, see the webpage of Aleks Jakulin <http://…/>.
IntegrateBs Integration for B-Spline
Integrated B-spline function.
IntegratedMRF Integrated Prediction using Univariate and Multivariate Random Forests
An implementation of a framework for drug sensitivity prediction from various genetic characterizations using ensemble approaches. Random Forests or Multivariate Random Forest predictive models can be generated from each genetic characterization that are then combined using a Least Square Regression approach. IntegratedMRF also provides options for the use of different error estimation approaches of Leave-one-out, Bootstrap, Re-substitution and 0.632Bootstrap along with generation of prediction confidence interval using Jackknife-after-Bootstrap approach.
intensity.analysis Intensity of Change for Comparing Categorical Maps from Sequential Intervals
Calculate metrics of change intensity for category, transition and interval levels in categorical maps from sequential intervals. For more information please consult: Aldwaik,Safaa Zakaria and Robert Gilmore Pontius Jr. (2012). ‘Intensity analysis to unify measurements of size and stationarity of land changes by interval, category, and transition’. Landscape and Urban Planning. 106, 103-114. <doi:10.1016/j.landurbplan.2012.02.010>.
interactions Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the ‘jtools’ package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of ‘simple slopes’ and Johnson-Neyman intervals. These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
intercure Cure Rate Estimators for Interval Censored Data
Implementations of semiparametric cure rate estimators for interval censored data in R. The algorithms are based on the promotion time and frailty models, all for interval censoring. For the frailty model, there is also a implementation contemplating clustered data.
interep Interaction Analysis of Repeated Measure Data
Extensive penalized variable selection methods have been developed in the past two decades for analyzing high dimensional omics data, such as gene expressions, single nucleotide polymorphisms (SNPs), copy number variations (CNVs) and others. However, lipidomics data have been rarely investigated by using high dimensional variable selection methods. This package incorporates our recently developed penalization procedures to conduct interaction analysis for high dimensional lipidomics data with repeated measurements. A new upgraded version will be released in the near future. The development of this software package and the associated statistical methods have been partially supported by an Innovative Research Award from Johnson Cancer Research Center, Kansas State University.
interflex Multiplicative Interaction Models Diagnostics and Visualization
Performs diagnostic tests of multiplicative interaction models and plots non-linear marginal effects of a treatment on an outcome across different values of a moderator.
interim Scheduling Interim Analyses in Clinical Trials
Allows the simulation of both the recruitment and treatment phase of a clinical trial. Based on these simulations, the timing of interim analyses can be assessed.
interlineaR Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. ‘InterlineaR’ provide a set of functions that targets several popular formats of IGT (‘SIL Toolbox’, ‘EMELD XML’) and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software (‘SIL FLEX’, ‘SIL Toolbox’) typically produce dictionaries of the morphemes used in the glosses. ‘InterlineaR’ provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.
internetarchive An API Client for the Internet Archive
Search the Internet Archive, retrieve metadata, and download files.
interp Interpolation Methods
Bivariate data interpolation on regular and irregular grids, either linear or using splines.
interplot Plot the Effects of Variables in Interaction Terms
Plots the conditional coefficients (‘marginal effects’) of variables included in multiplicative interaction terms.
intrinsicDimension Intrinsic Dimension Estimation
A variety of methods for estimating intrinsic dimension of data sets (i.e the manifold or Hausdorff dimension of the support of the distribution that generated the data) as reviewed in Johnsson, K. (2016, ISBN:978-91-7623-921-6) and Johnsson, K., Soneson, C. and Fontes, M. (2015) <doi:10.1109/TPAMI.2014.2343220>. Furthermore, to evaluate the performance of these estimators, functions for generating data sets with given intrinsic dimensions are provided.
intrval Relational Operators for Intervals
Evaluating if values of vectors are within different open/closed intervals (`x %[]% c(a, b)`), or if two closed intervals overlap (`c(a1, b1) %[o]% c(a2, b2)`). Operators for negation and directional relations also implemented.
intRvals Analysis of Time-Ordered Event Data with Missed Observations
Calculates event rates and compares means and variances of groups of interval data corrected for missed arrival observations.
intubate Interface to Popular R Functions for Data Science Pipelines
Interface to popular R functions with formulas, such as ‘lm’, so they can be included painlessly in data science pipelines implemented by ‘magrittr’ with the operator %>%.
inum Interval and Enum-Type Representation of Vectors
Enum-type representation of vectors and representation of intervals, including a method of coercing variables in data frames.
InvariantCausalPrediction Invariant Causal Prediction
Confidence intervals for causal prediction in a regression setting. An experimental version is also available for classification.
InvasionCorrection Invasion Correction
The correction is achieved under the assumption that non-migrating cells of the essay approximately form a quadratic flow profile due to frictional effects, compare law of Hagen-Poiseuille for flow in a tube. The script fits a conical plane to give xyz-coordinates of the cells. It outputs the number of migrated cells and the new corrected coordinates.
inventorize Inventory Analytics and Cost Calculations
Facilitate inventory analysis calculations. The package heavily relies on my studies, the package includes calculations of inventory metrics, profit calculations and ABC analysis calculations. The first version has only normal and Poisson distributions but I am hoping that other distributions will follow in later versions. The functions are referenced from : 1-Harris, Ford W. (1913). ‘How many parts to make at once’. Factory, The Magazine of Management. <isbn10: 135-136, 152>. 2- Nahmias, S. Production and Operations Analysis. McGraw-Hill International Edition. <isbn: 0-07- 2231265-3. Chapter 4>. 3-Silver, E.A., Pyke, D.F., Peterson, R. Inventory Management and Production Planning and Scheduling. <isbn: 978-0471119470>. 4-Ballou, R.H. Business Logistics Management. <isbn: 978-0130661845>. Chapter 9. 5-MIT Micromasters Program.
Inventorymodel Inventory Models
Determination of the optimal policy in inventory problems from a game-theoretic perspective.
invgamma The Inverse Gamma Distribution
Standard distribution functions for the inverse gamma distribution, wrapping those for the gamma distribution that are already provided in the stats package.
invLT Inversion of Laplace-Transformed Functions
Provides two functions for the numerical inversion of Laplace-Transformed functions, returning the value of the standard (time) domain function at a specified value. The first algorithm is the first optimum contour algorithm described by Evans and Chung (2000). The second algorithm uses the Bromwich contour as per the definition of the inverse Laplace Transform. The latter is unstable for numerical inversion and mainly included for comparison or interest. There are also some additional functions provided for utility, including plotting and some simple Laplace Transform examples, for which there are known analytical solutions. Polar-cartesian conversion functions are included in this package and are used by the inversion functions. Evans & Chung, 2000: Laplace transform inversions using optimal contours in the complex plane; International Journal of Computer Mathematics v73 pp531-543.
IOHanalyzer Data Analysis Part of ‘IOHprofiler’
The data analysis module for the Iterative Optimization Heuristics Profiler (‘IOHprofiler’). This module provides statistical analysis methods for the benchmark data generated by optimization heuristics, which can be visualized through a web-based interface. The benchmark data is usually generated by the experimentation module, called ‘IOHexperimenter’. ‘IOHanalyzer’ also supports the widely used ‘COCO’ (Comparing Continuous Optimisers) data format for benchmarking.
ionicons Ionicons’ Icon Pack
Provides icons from the ‘Ionicons’ icon pack (<http://…/> ). Functions are provided to get icons as png files or as raw matrices. This is useful when you want to embed raster icons in a report or a graphic.
iotools I/O Tools for Streaming
Basic I/O tools for streaming.
ipapi Geolocate IPv4/6 addresses and/or domain names using ip-api.com’s API
ipapi is a package to geolocate IPv4/6 addresses and/or domain names using ip-api.com’s API The following functions are implemented: geolocate – lookup a vector of IPv4/6 addresses and/or domain names and return a data.table of results.
ipc Tools for Message Passing Between Processes
Provides tools for passing messages between R processes. Shiny Examples are provided showing how to perform useful tasks such as: updating reactive values from within a future, progress bars for long running async tasks, and interrupting async tasks based on user input.
ipcwswitch Inverse Probability of Censoring Weights to Deal with Treatment Switch in Randomized Clinical Trials
Contains functions for formatting clinical trials data and implementing inverse probability of censoring weights to handle treatment switches when estimating causal treatment effect in randomized clinical trials.
IPEC Root Mean Square Curvature Calculation
Calculates the RMS intrinsic and parameter-effects curvatures of a nonlinear regression model.
ipflasso Integrative Lasso with Penalty Factors
The core of the package is cvr2.ipflasso(), an extension of glmnet to be used when the (large) set of available predictors is partitioned into several modalities which potentially differ with respect to their information content in terms of prediction. For example, in biomedical applications patient outcome such as survival time or response to therapy may have to be predicted based on, say, mRNA data, miRNA data, methylation data, CNV data, clinical data, etc. The clinical predictors are on average often much more important for outcome prediction than the mRNA data. The ipflasso method takes this problem into account by using different penalty parameters for predictors from different modalities. The ratio between the different penalty parameters can be chosen by cross-validation.
IPMRF Intervention in Prediction Measure (IPM) for Random Forests
Computes IPM for assessing variable importance for random forests. See details at I. Epifanio (2017) <DOI:10.1186/s12859-017-1650-8>.
ipptoolbox IPP Toolbox
Uncertainty quantification and propagation in the framework of Dempster-Shafer Theory and imprecise probabilities. This toolbox offers easy-to-use methods for using imprecise probabities for applied uncertainty modelling and simulation. The package comprises the basic functionality needed, with usability similar to standard probabilistic analysis: – Fit imprecise probability distributions from data, – Define imprecise probabilities based on distribution functions, – Combine with various aggregation rules (e. g. Dempster’s rule), – Plotting tools, – Propagate through arbitrary functions / simulations via Monte Carlo, – Perform sensitivity analyses with imprecise distributions, – Example models for a quick start.
ipred Improved Predictors
Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
iprior Linear Regression using I-Priors
In a linear regression setting, priors can be assigned to the regression function using a vector space framework, and the posterior estimate of the regression function obtained. I-priors are a class of such priors based on the principle of maximum entropy. While the main interest of I-prior modelling is prediction, inference is also possible, e.g. via log-likelihood ratio tests. This package supports both formula and non-formula based input. Estimation is mainly done via an EM algorithm, but there is flexibility in using any R optimiser to maximise the log-likelihood computed by the kernel loader function.
IPSUR Introduction to Probability and Statistics Using R
An introductory probability and statistics textbook, alongside other supplementary materials. The book is released under the GNU Free Documentation License.
IPtoCountry Convert IP Addresses to Country Names or Full Location with Geoplotting
Tools for identifying the origins of IP addresses. Includes functions for converting IP addresses to country names, location details (region, city, zip, latitude, longitude), IP codes, binary values, as well as a function for plotting IP locations on a world map. This product includes IP2Location LITE data available from <http://www.ip2location.com> and is is available by Creative Commons Attribution-ShareAlike 4.0 Interational license (CC-BY-SA 4.0).
iptools Manipulate, Validate and Resolve IP Addresses
A toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses. While it primarily has support for the IPv4 address space, more extensive IPv6 support is intended.
GitHub
ipw Estimate Inverse Probability Weights
Functions to estimate the probability to receive the observed treatment, based on individual characteristics. The inverse of these probabilities can be used as weights when estimating causal effects from observational data via marginal structural models.
IPWboxplot Adapted Boxplot to Missing Observations
Boxplots adapted to the happenstance of missing observations where drop-out probabilities can be given by the practitioner or modelled using auxiliary covariates. The paper of ‘Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J.(2012) <doi:10.1111/j.1541-0420.2011.01712.x>’, proposes estimators of marginal quantiles based on the Inverse Probability Weighting method.
ipwErrorY Inverse Probability Weighting Estimation of Average Treatment Effect with Outcome Misclassification
An implementation of the correction methods proposed by Shu and Yi (2017) <doi:10.1177/0962280217743777> for the inverse probability weighting (IPW) estimation of average treatment effect (ATE) with misclassified outcomes. Logistic regression model is assumed for treatment model for all implemented correction methods, and is assumed for the outcome model for the implemented doubly robust correction method.
IRATER A R Interface for the Instantaneous RATEs (IRATE) Model
A R interface to setup, run and read IRATE model runs to assess band recovery (conventional tagging) data (i.e. age-dependent or independent fishing and natural mortality rates).
ircor Correlation Coefficients for Information Retrieval
Provides implementation of various correlation coefficients of common use in Information Retrieval. In particular, it includes Kendall (1970, isbn:0852641990) tau coefficient as well as tau_a and tau_b for the treatment of ties. It also includes Yilmaz et al. (2008) <doi:10.1145/1390334.1390435> tauAP correlation coefficient, and versions tauAP_a and tauAP_b developed by Urbano and Marrero (2017) <doi:10.1145/3121050.3121106> to cope with ties.
IRdisplay Jupyter’ Display Machinery
An interface to the rich display capabilities of ‘Jupyter’ front-ends (e.g. ‘Jupyter Notebook’) <https://jupyter.org>. Designed to be used from a running ‘IRkernel’ session <https://irkernel.github.io>.
Irescale Calculate and Scale Moran’s I
Provides a scaling method to obtain a standardized Moran’s I measure. Moran’s I is a measure for the spatial autocorrelation of a data set, it gives a measure of similarity between data and its surrounding. Researchers calculate Moran’s I to express the spatial autocorrelation of their data. The range of this value must be [-1,1], but this does not happen in practice. This package scale the Moran’s I value and map it into the theoretical range of [-1,1]. Once the Moran’s I value is rescaled, it facilitates the comparison between projects, for instance, a researcher can calculate Moran’s I in a city in China, with a sample size of n1 and area of interest a1. Another researcher runs a similar experiment in a city in Mexico with different sample size, n2, and an area of interest a2. Due to the differences between the conditions, it is not possible to compare Moran’s I in a straightforward way. In this version of the package, the spatial autocorrelation Moran’s I is calculated as proposed in Chen(2009) <arXiv:1606.03658>.
iRF iterative Random Forests
Iteratively grows feature weighted random forests and finds high-order feature interactions in a stable fashion.
IRkernel Native R Kernel for the ‘Jupyter Notebook’
The R kernel for the ‘Jupyter’ environment executes R code which the front-end (‘Jupyter Notebook’ or other front-ends) submits to the kernel via the network.
IROmiss Imputation Regularized Optimization Algorithm
Missing data are frequently encountered in high-dimensional data analysis, but they are usually difficult to deal with using standard algorithms, such as the EM algorithm and its variants. This package provides a general algorithm, the so-called Imputation Regularized Optimization (IRO) algorithm, for high-dimensional missing data problems. You can refer to Liang, F., Jia, B., Xue, J., Li, Q. and Luo, Y. (2018) at <https://…/ica10.pdf> for detail. The publication ‘An Imputation Regularized Optimization Algorithm for High-Dimensional Missing Data Problems and Beyond’ will be appear on Journal of the Royal Statistical Society Series B soon.
IrregLong Analysis of Longitudinal Data with Irregular Observation Times
Analysis of longitudinal data for which the times of observation are random variables that are potentially associated with the outcome process. The package includes inverse-intensity weighting methods (Lin H, Scharfstein DO, Rosenheck RA (2004) <doi:10.1111/j.1467-9868.2004.b5543.x>) and multiple outputation (Pullenayegum EM (2016) <doi:10.1002/sim.6829>).
irregulAR1 Functions for Irregularly Sampled AR(1) Processes
Simulation and density evaluation of irregularly sampled stationary AR(1) processes with Gaussian errors using the algorithms described in Allévius (2018) <arXiv:1801.03791>.
irrNA Coefficients of Interrater Reliability – Generalized for Randomly Incomplete Datasets
Provides coefficients of interrater reliability, that are generalized to cope with randomly incomplete (i.e. unbalanced) datasets without any imputation of missing values or any (row-wise or column-wise) omissions of actually available data. Applied to complete (balanced) datasets, these generalizations yield the same results as the common procedures, namely the Intraclass Correlation according to McGraw & Wong (1996) <doi:10.1037/1082-989X.1.1.30> and the Coefficient of Concordance according to Kendall & Babington Smith (1939) <doi:10.1214/aoms/1177732186>.
irtDemo Item Response Theory Demo Collection
Includes a collection of shiny applications to demonstrate or to explore fundamental item response theory (IRT) concepts such as estimation, scoring, and multidimensional IRT models.
IRTpp Estimating IRT Parameters using the IRT Methodology
An implementation of the IRT paradigm for the scoring of different instruments measuring latent traits (a.k.a Abilities) and estimating item parameters for a variety of models, this package is highly optimized using Rcpp and carefully written R for the rest of the package, it aims to expand IRT applications to those applications that require faster and more robust estimation procedures. See the IRTpp documentation and github site for more information and examples.
irtreliability Item Response Theory Reliability
Estimation of reliability coefficients for ability estimates and sum scores from item response theory models as defined in Cheng, Y., Yuan, K.-H. and Liu, C. (2012) <doi:10.1177/0013164411407315> and Kim, S. and Feldt, L. S. (2010) <doi:10.1007/s12564-009-9062-8>. The package supports the 3-PL and generalized partial credit models and includes estimates of the standard errors of the reliability coefficient estimators, derived in Andersson, B. and Xin, T. (2018) <doi:10.1177/0013164417713570>.
IRTShiny Item Response Theory via Shiny
Interactive shiny application for running Item Response Theory analysis.
ISCO08ConveRsions Converts ISCO-08 to Job Prestige Scores, ISCO-88 and Job Name
Implementation of functions to assign corresponding common job prestige scores (SIOPS, ISEI), the official job or group title and the ISCO-88 code to given ISCO-08 codes. ISCO-08 is the latest version of the International Standard Classification of Occupations which is used to organise information on labour and jobs.
islasso The Induced Smoothed Lasso
An implementation of the induced smoothing idea that focuses on hypothesis testing in lasso regularization models (IS-lasso). Linear, logistic, Poisson and gamma regressions with several link functions are already implemented. The algorithm uses the steps as described in the original paper. See: The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Cilluffo, G., Sottile, G., La Grutta, S. and Muggeo, V. (2019) <doi:10.1177/0962280219842890>.
ISM Interpretive Structural Modelling (ISM)
The development of ISM was made by Warfield in 1974. ISM is the process of collaborating distinct or related essentials into a simplified and an organized format. Hence, ISM is a methodology that seeks the interrelationships among the various elements considered and endows with a hierarchical and multilevel structure. To run this package user needs to provide a matrix (VAXO) converted into 0’s and 1’s. Warfield,J.N. (1974) <doi:10.1109/TSMC.1974.5408524> Warfield,J.N. (1974, E-ISSN:2168-2909).
isni Index of Sensitivity to Nonignorability
The current version provides functions to compute, print and summarize the Index of Sensitivity to Nonignorability (ISNI) in the generalized linear model for independent data, and in the marginal multivariate Gaussian model and the linear mixed model for longitudinal/clustered data. It allows for arbitrary patterns of missingness in the regression outcomes caused by dropout and/or intermittent missingness. One can compute the sensitivity index without estimating any nonignorable models or positing specific magnitude of nonignorability. Thus ISNI provides a simple quantitative assessment of how robust the standard estimates assuming missing at random is with respect to the assumption of ignorability. For more details, see Troxel Ma and Heitjan (2004) and Xie and Heijan (2004) <doi:10.1191/1740774504cn005oa> and Ma Troxel and Heitjan (2005) <doi:10.1002/sim.2107> and Xie (2008) <doi:10.1002/sim.3117> and Xie (2012) <doi:10.1016/j.csda.2010.11.021> and Xie and Qian (2012) <doi:10.1002/jae.1157>.
isnullptr Check if an ‘externalptr’ is a Null Pointer
Check if an ‘externalptr’ is a null pointer. R does currently not have a native function for that purpose. This package contains a C function that returns TRUE in case of a null pointer.
isoband Generate Isolines and Isobands from Regularly Spaced Elevation Grids
A fast C++ implementation to generate contour lines (isolines) and contour polygons (isobands) from regularly spaced grids containing elevation data.
isoph Isotonic Proportional Hazards Model
Nonparametric estimation of an isotonic covariate effect for Cox’s partial likelihood.
ISR3 Iterative Sequential Regression
Performs multivariate normal imputation through iterative sequential regression. Conditional dependency structure between imputed variables can be specified a priori to accelerate imputation.
issueReporter Create Reports from GitHub Issues
Generates a report from a GitHub issue thread, using R Markdown.
itan Item Analysis for Multiple Choice Tests
Functions for analyzing multiple choice items. These analyses include the convertion of student response into binaty data (correct/incorrect), the computation of the number of corrected responses and grade for each subject, the calculation of item difficulty and discrimination, the computation of the frecuency and point-biserial correlation for each distractor and the graphical analysis of each item.
itcSegment Individual Tree Crowns Segmentation
Two methods for Individual Tree Crowns (ITCs) delineation on remote sensing data: one is based on LiDAR data in x,y,z format and one on imagery data in raster format.
ITEMAN Classical Item Analysis
Runs classical item analysis for multiple-choice test items and polytomous items (e.g., rating scales).
itemanalysis Classical Test Theory Item Analysis
Runs classical item analysis for multiple-choice test items and polytomous items (e.g., rating scales). The statistics reported in this package can be found in any measurement textbook such as Crocker and Algina (2006, ISBN:9780495395911).
ITGM Individual Tree Growth Modeling
Individual tree model is an instrument to support the decision with regard to forest management. This package provides functions that let you work with data for this model. Also other support functions and extension related to this model are available.
iTOP Inferring the Topology of Omics Data
Infers a topology of relationships between different datasets, such as multi-omics and phenotypic data recorded on the same samples. We based this methodology on the RV coefficient (Robert & Escoufier, 1976, <doi:10.2307/2347233>), a measure of matrix correlation, which we have extended for partial matrix correlations and binary data (Aben et al., 2018, in preparation).
ITRSelect Variable Selection for Optimal Individualized Dynamic Treatment Regime
Sequential advantage selection (SAS, Fan, Lu and Song, 2016) <arXiv:1405.5239> and penalized A-learning (PAL, Shi, et al., 2018) methods are implement for selecting important variables involved in optimal individualized (dynamic) treatment regime in both single-stage or multi-stage studies.
itsadug Interpreting Time Series and Autocorrelated Data Using GAMMs
GAMM (Generalized Additive Mixed Modeling; Lin & Zhang, 1999) as implemented in the R package mgcv (Wood, S.N., 2006; 2011) is a nonlinear regression analysis which is particularly useful for time course data such as EEG, pupil dilation, gaze data (eye tracking), and articulography recordings, but also for behavioral data such as reaction times and response data. As time course measures are sensitive to autocorrelation problems, GAMMs implements methods to reduce the autocorrelation problems. This package includes functions for the evaluation of GAMM models (e.g., model comparisons, determining regions of significance, inspection of autocorrelational structure in residuals) and interpreting of GAMMs (e.g., visualization of complex interactions, and contrasts).
iva Instrumental Variable Analysis in Case-Control Association Studies
Mendelian randomization (MR) analysis is a special case of instrumental variable analysis with genetic instruments. It is used to estimate the unconfounded causal effect of an exposure. This package implements estimating and testing methods in Zhang et al. (2019) for MR analysis in case-control studies. It (1) estimates the causal effect of a quantitative exposure by the quasi empirical likelihood approach; (2) uses Lagrange multiplier test for testing the presence of causal; (3) provides a test for the presence of confounder.
ivmodel Statistical Inference and Diagnostics for Instrumental Variables Model
Contains functions for carrying out instrumental variable estimation of causal effects, including power analysis, sensitivity analysis, and diagnostics.
ivmte Instrumental Variables: Extrapolation by Marginal Treatment Effects
The marginal treatment effect was introduced by Heckman and Vytlacil (2005) <doi:10.1111/j.1468-0262.2005.00594.x> to provide a choice-theoretic interpretation to instrumental variables models that maintain the monotonicity condition of Imbens and Angrist (1994) <doi:10.2307/2951620>. This interpretation can be used to extrapolate from the compliers to estimate treatment effects for other subpopulations. This package provides a flexible set of methods for conducting this extrapolation. It allows for parametric or nonparametric sieve estimation, and allows the user to maintain shape restrictions such as monotonicity. The package operates in the general framework developed by Mogstad, Santos and Torgovitsky (2018) <doi:10.3982/ECTA15463>, and accommodates either point identification or partial identification (bounds). In the partially identified case, bounds are computed using linear programming. Support for three linear programming solvers is provided. Gurobi and the Gurobi R API can be obtained from <http://…/index>. CPLEX can be obtained from <https://…/cplex-optimizer>. CPLEX R APIs ‘Rcplex’ and ‘cplexAPI’ are available from CRAN. The lp_solve library is freely available from <http://…/>, and is included when installing either of its R APIs, ‘lpSolve’ or ‘lpSolveAPI’, which are available from CRAN.
ivpanel Instrumental Panel Data Models
Fit the instrumental panel data models: the fixed effects, random effects and between models.
ivregEX Create Independent Evidence in IV Analyses and Do Sensitivity Analysis in Regression and IV Analysis
Allows you to create an evidence factor (EX analysis) in an instrumental variables regression model. Additionally, performs Sensitivity analysis for OLS analysis, 2SLS analysis and EX analysis with interpretable plotting and printing features.
ivtools Instrumental Variables
Contains tools for instrumental variables estimation. Currently, only non-parametric bounds and two-stage least squares are implemented, but more methods will be added in the future. Balke, A. and Pearl, J. (1997) <doi:10.2307/2965583>, Vansteelandt S., Bowden J., Babanezhad M., Goetghebeur E. (2011) <doi:10.1214/11-STS360>.
ivx Robust Econometric Inference
Drawing statistical inference on the coefficients of a short- or long-horizon predictive regression with persistent regressors by using the IVX method of Magdalinos and Phillips (2009) and <doi:10.1017/S0266466608090154> Kostakis, Magdalinos and Stamatogiannis (2015) <doi:10.1093/rfs/hhu139>.

J

jaccard Test Similarity Between Binary Data using Jaccard/Tanimoto Coefficients
Calculate statistical significance of Jaccard/Tanimoto similarity coefficients for binary data.
jackknifeKME Jackknife Estimates of Kaplan-Meier Estimators or Integrals
Computing the original and modified jackknife estimates of Kaplan-Meier estimators.
JacobiEigen Classical Jacobi Eigensolution Algorithm
Implements the classical Jacobi (1846) algorithm for the eigenvalues and eigenvectors of a real symmetric matrix, both in pure R and in C++ using Rcpp. Mainly as a programming example for teaching purposes.
jacpop Jaccard Index for Population Structure Identification
Uses the Jaccard similarity index to account for population structure in sequencing studies. This method was specifically designed to detect population stratification based on rare variants, hence it will be especially useful in rare variant analysis.
jagsUI A Wrapper Around rjags to Streamline JAGS Analyses
This package provides a set of wrappers around rjags functions to run Bayesian analyses in JAGS (specifically, via libjags). A single function call can control adaptive, burn-in, and sampling MCMC phases, with MCMC chains run in sequence or in parallel. Posterior distributions are automatically summarized (with the ability to exclude some monitored nodes if desired) and functions are available to generate figures based on the posteriors (e.g., predictive check plots, traceplots). Function inputs, argument syntax, and output format are nearly identical to the R2WinBUGS/R2OpenBUGS packages to allow easy switching between MCMC samplers.
janitor Simple Tools for Examining and Cleaning Dirty Data
The main janitor functions can: perfectly format data.frame column names; isolate duplicate records; and provide quick one- and two-variable tabulations (i.e., frequency tables and crosstabs). Other janitor functions nicely format the results of these tabulations. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the ‘tidyverse’ and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster and save their thinking for the fun stuff.
jaod Directory of Open Access Journals Client
Client for the Directory of Open Access Journals (‘DOAJ’) (<https://doaj.org/> ). ‘API’ documentation at <https://…/docs>. Methods included for working with all ‘DOAJ’ ‘API’ routes: fetch article information by identifier, search for articles, fetch journal information by identifier, and search for journals.
jarbes Just a Rather Bayesian Evidence Synthesis
Provides a new class of Bayesian meta-analysis models that we called ‘The Hierarchical Meta-Regression’ (HMR). The aim of HMR is to incorporate into the meta-analysis, the data collection process, which results in a model for the internal and external validity bias. In this way, it is possible to combine studies of different types. For example, we can combine the results of randomized control trials (RCTs) with the results of observational studies (OS). The statistical methods and their applications are described in Verde, Ohmann, Morbach and Icks (2016) <doi:10.1002/sim.6809> and in Verde (2017) <doi:10.5772/intechopen.70231>.
jcp Joint Change Point Detection
Procedures for joint detection of changes in both expectation and variance in univariate sequences. Performs a statistical test of the null hypothesis of the absence of change points. In case of rejection performs an algorithm for change point detection. References – Bivariate change point detection (2019+), Michael Messer.
Jdmbs Monte Carlo Option Pricing Algorithm for Jump Diffusion Model with Correlation Companies
Black-Scholes Model [Black (1973) <doi:10.1086/260062>] is important to calculate option premium in the stock market. And variety of improved models are studied. In this package, I proposed functions in order to calculate normal and new Jump Diffusion Models [Kou (2002) <doi:10.1287/mnsc.48.8.1086.166>] by Monte Carlo Method. This package can be used for Computational Finance.
jdx Java’ Data Exchange for ‘R’ and ‘rJava’
Simplifies and extends data exchange between ‘R’ and ‘Java’.
jeek A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
Provides a fast and scalable joint estimator for integrating additional knowledge in learning multiple related sparse Gaussian Graphical Models (JEEK). The JEEK algorithm can be used to fast estimate multiple related precision matrices in a large-scale. For instance, it can identify multiple gene networks from multi-context gene expression datasets. By performing data-driven network inference from high-dimensional and heterogeneous data sets, this tool can help users effectively translate aggregated data into knowledge that take the form of graphs among entities. Please run demo(jeekDemo) to learn the basic functions provided by this package. For further details, please read the original paper: Beilun Wang, Arshdeep Sekhon, Yanjun Qi ‘A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models’ (ICML 2018) <arXiv:1806.00548>.
jetpack A Friendly Package Manager
Manage project dependencies from your DESCRIPTION file. Create a reproducible virtual environment with minimal additional files in your project. Provides tools to add, remove, and update dependencies as well as install existing dependencies with a single function.
JFE A Menu-Driven GUI for Analyzing and Modelling Data of Just Finance and Econometrics
The Just Finance and Econometrics (‘JFE’) provides a ‘tcltk’ based interface to global assets selection and portfolio optimization. ‘JFE’ aims to provide a simple GUI that allows a user to quickly load data from a .RData (.rda) file, explore the data and evaluate financial models. Invoked as JFE(), ‘JFE’ exports a number of utility functions for visualizing assets price (e.g. technical charting) and returns, selecting assets by performance index (based on the package ‘PerformanceAnalytics’) and backtesting specific portfolio profiles (based on the package ‘fPortfolio’).
JGEE Joint Generalized Estimating Equation Solver
Fits two different joint generalized estimating equation models to multivariate longitudinal data.
jipApprox Approximate Inclusion Probabilities for Survey Sampling
Approximate joint-inclusion probabilities in Unequal Probability Sampling, or compute Monte Carlo approximations of the first and second-order inclusion probabilities of a general sampling design as in Fattorini (2006) <doi:10.1093/biomet/93.2.269>.
jlctree Joint Latent Class Trees for Joint Modeling of Time-to-Event and Longitudinal Data
Implements the tree-based approach to joint modeling of time-to-event and longitudinal data. This approach looks for a tree-based partitioning such that within each estimated latent class defined by a terminal node, the time-to-event and longitudinal responses display a lack of association. See Zhang and Simonoff (2018) <arXiv:1812.01774>.
jmcm Joint Mean-Covariance Models using ‘Armadillo’ and S4
Fit joint mean-covariance models for longitudinal data. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Armadillo’ C++ library for numerical linear algebra and ‘RcppArmadillo’ glue.
JMcmprsk Joint Models for Longitudinal Measurements and Competing Risks Failure Time Data
Parameter models for the joint modeling of longitudinal (continuous or ordinal) data and time-to-event data with competing risks. For a detailed information, see Robert Elashoff, Gang li and Ning Li (2016, ISBN:9781439807828) ; Robert M. Elashoff,Gang Li and Ning Li (2008) <doi:10.1111/j.1541-0420.2007.00952.x> ; Ning Li, Robert Elashoff, Gang Li and Jeffrey Saver (2010) <doi:10.1002/sim.3798> .
jmdem Fitting Joint Mean and Dispersion Effects Models
Joint mean and dispersion effects models fit the mean and dispersion parameters of a response variable by two separate linear models, the mean and dispersion submodels, simultaneously. It also allows the users to choose either the deviance or the Pearson residuals as the response variable of the dispersion submodel. Furthermore, the package provides the possibility to nest the submodels in one another, if one of the parameters has significant explanatory power on the other. Wu & Li (2016) <doi:10.1016/j.csda.2016.04.015>.
jmdl Joint Mean-Correlation Regression Approach for Discrete Longitudinal Data
Fit joint mean-correlation models for discrete longitudinal data (Tang CY,Zhang W, Leng C, 2017 <doi:10.5705/ss.202016.0435>).
jmotif Tools for Time Series Analysis Based on Symbolic Aggregate Dicretization
A set of tools based on time series symbolic discretization and vector space model that aids in time series characteristic pattern discovery and facilitates interpretable time series classification.
jmuOutlier Permutation Tests for Nonparametric Statistics
Performs a permutation test on the difference between two location parameters, a permutation correlation test, a permutation F-test, the Siegel-Tukey test, a root mean deviance test. Also performs some graphing techniques, such as for confidence intervals, vector addition, and Fourier analysis; and includes functions related to the Laplace (double exponential) and triangular distributions. Performs power calculations for the binomial test.
jmv The ‘jamovi’ Analyses
jamovi’ is a rich graphical statistics program providing many common statistical tests such as t-tests, ANOVAs, correlation matrices, proportion tests, contingency tables, etc (see <https://www.jamovi.org> for more information). This package makes all of the basic ‘jamovi’ analyses available to the R user.
jmvconnect Connect to the ‘jamovi’ Statistical Spreadsheet
Methods to access data sets from the ‘jamovi’ statistical spreadsheet (see <https://www.jamovi.org> for more information) from R.
jmvcore Dependencies for the ‘jamovi’ Framework
jamovi’ is a framework for creating rich interactive statistical analyses (see <https://www.jamovi.org> for more information). This package represents the core libraries which jamovi analyses written in R depend upon.
jocre Joint Confidence Regions
Computing and plotting joint confidence regions and intervals. Regions include classical ellipsoids, minimum-volume or minimum-length regions, and an empirical Bayes region. Intervals include the TOST procedure with ordinary or expanded intervals and a fixed-sequence procedure. Such regions and intervals are useful e.g., for the assessment of multi-parameter (bio-)equivalence. Joint confidence regions for the mean and variance of a normal distribution are available as well.
joineRmeta Joint Modelling for Meta-Analytic (Multi-Study) Data
Fits joint models of the type proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extends to the multi-study, meta-analytic case. Functions for meta-analysis of a single longitudinal and a single time-to-event outcome from multiple studies using joint models. Options to produce plots for multi study joint data, to pool joint model fits from ‘JM’ and ‘joineR’ packages in a two stage meta-analysis, and to model multi-study joint data in a one stage meta-analysis.
joineRML Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes
Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project is funded by the Medical Research Council (Grant number MR/M013227/1).
joint.Cox Penalized Likelihood Estimation under the Joint Cox Models Between TTP and OS for Meta-Analysis
Perform regression analyses under the joint Cox proportional hazards model between TTP and OS for meta-analysis. The method is applicable for meta-analysis combining several studies or for cluster survival data.
JointAI Joint Analysis and Imputation of Incomplete Data
Provides joint analysis and imputation of linear regression models, generalized linear regression models or linear mixed models with incomplete (covariate) data in the Bayesian framework. The package performs some preprocessing of the data and creates a ‘JAGS’ model, which will then automatically be passed to ‘JAGS’ <http://mcmc-jags.sourceforge.net> with the help of the package ‘rjags’. It also provides summary and plotting functions for the output.
jointMeanCov Joint Mean and Covariance Estimation for Matrix-Variate Data
Jointly estimates two-group means and covariances for matrix-variate data and calculates test statistics. This package implements the algorithms defined in Hornstein, Fan, Shedden, and Zhou (2018) <doi:10.1080/01621459.2018.1429275>.
jointseg Joint Segmentation of Multivariate (Copy Number) Signals
Methods for fast segmentation of multivariate signals into piecewise constant profiles and for generating realistic copy-number profiles. A typical application is the joint segmentation of total DNA copy numbers and allelic ratios obtained from Single Nucleotide Polymorphism (SNP) microarrays in cancer studies. The methods are described in Pierre-Jean, Rigaill and Neuvial (2015) <doi:10.1093/bib/bbu026>.
joinXL Perform Joins or Minus Queries on ‘Excel’ Files
Performs Joins and Minus Queries on ‘Excel’ Files fulljoinXL() Merges all rows of 2 ‘Excel’ files based upon a common column in the files. innerjoinXL() Merges all rows from base file and join file when the join condition is met. leftjoinXL() Merges all rows from the base file, and all rows from the join file if the join condition is met. rightjoinXL() Merges all rows from the join file, and all rows from the base file if the join condition is met. minusXL() Performs 2 operations source-minus-target and target-minus-source If the files are identical all output files will be empty. Choose two ‘Excel’ files via a dialog box, and then follow prompts at the console to choose a base or source file and columns to merge or minus on.
jose Javascript Object Signing and Encryption
A collection of specifications to securely transfer claims such as authorization information between parties. A JSON Web Token (JWT) contains claims used by systems to apply access control rules to its resources. One potential use case of the JWT is authentication and authorization for a system that exposes resources through OAuth 2.0.
JOUSBoost Implements Under/Oversampling for Probability Estimation
Implements under/oversampling for probability estimation. To be used with machine learning methods such as adaBoost, random forests, etc.
JPEN Covariance and Inverse Covariance Matrix Estimation Using Joint Penalty
A Joint PENalty Estimation of Covariance and Inverse Covariance Matrices.
JQL Jump Q-Learning for Individualized Interval-Valued Dose Rule
We provide tools to estimate the individualized interval-valued dose rule (I2DR) that maximizes the expected beneficial clinical outcome for each individual and returns an optimal interval-valued dose, by using the jump Q-learning method. The jump Q-learning method directly models the conditional mean of the response given the dose level and the baseline covariates via jump penalized least squares regression under the framework of Q learning. We develop a searching algorithm by dynamic programming in order to find the optimal I2DR with the time complexity O(n2) and spatial complexity O(n). The outcome of interest includes the best partition of the entire dosage of interest, the regression coefficients of each partition, and the value function under the estimated I2DR as well as the Wald-type confidence interval of value function constructed through the Bootstrap.
jqr Client for ‘jq’, a JSON Processor
Client for ‘jq’, a JSON processor (<http://…/> ), written in C. ‘jq’ allows the following with JSON data: index into, parse, do calculations, cut up and filter, change key names and values, perform conditionals and comparisons, and more.
jrc Exchange Commands Between R and ‘JavaScript’
An ‘httpuv’ based bridge between R and ‘JavaScript’. Provides an easy way to exchange commands and data between a web page and a currently running R session.
JRF Joint Random Forest (JRF) for the Simultaneous Estimation of Multiple Related Networks
Simultaneous estimation of multiple related networks.
jrich Jack-Knife Support for Evolutionary Distinctiveness Indices I and W
These functions calculate the taxonomic measures presented in Miranda-Esquivel (2016). The package introduces Jack-knife resampling in evolutionary distinctiveness prioritization analysis, as a way to evaluate the support of the ranking in area prioritization, and the persistence of a given area in a conservation analysis. The algorithm is described in: Miranda-Esquivel, D (2016) <DOI:10.1007/978-3-319-22461-9_11>.
JRM Joint Regression Modelling
Routines for fitting various joint regression models, with several types of covariate effects, in the presence of associated error equations, endogeneity, non-random sample selection or partial observability.
jrvFinance Basic Finance; NPV/IRR/Annuities/Bond-Pricing; Black Scholes
Implements the basic financial analysis functions similar to (but not identical to) what is available in most spreadsheet software. This includes finding the IRR and NPV of regularly spaced cash flows and annuities. Bond pricing and YTM calculations are included. In addition, Black Scholes option pricing and Greeks are also provided.
js Tools for Working with JavaScript in R
A set of utility functions for working with JavaScript in R. It currently includes functions to validate, reformat, optimize and analyze JavaScript code.
jskm Kaplan-Meier Plot with ‘ggplot2’
The function ‘jskm()’ creates publication quality Kaplan-Meier plot with at risk tables below. ‘svyjskm()’ provides plot for weighted Kaplan-Meier estimator.
jsonld JSON for Linking Data
JSON-LD is a light-weight syntax for expressing linked data. It is primarily intended for web-based programming environments, interoperable web services and for storing linked data in JSON-based databases. This package provides bindings to the JavaScript library for converting, expanding and compacting JSON-LD documents.
jsonlite A Robust, High Performance JSON Parser and Generator for R
A fast JSON parser and generator optimized for statistical data and the web. Started out as a fork of RJSONIO, but has been completely rewritten in recent versions. The package offers flexible, robust, high performance tools for working with JSON in R and is particularly powerful for building pipelines and interacting with web APIs. The implementation is based on the mapping described in the vignette of the package (Ooms, 2014). In addition to drop-in replacements for toJSON and fromJSON, jsonlite contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
jsonstat Interface to ‘JSON-stat’
Interface to ‘JSON-stat’ <https://…/>, a simple lightweight ‘JSON’ format for data dissemination.
jsr223 A ‘Java’ Platform Integration for ‘R’ with Programming Languages ‘Groovy’, ‘JavaScript’, ‘JRuby’ (‘Ruby’), ‘Jython’ (‘Python’), and ‘Kotlin’
Provides a high-level integration for the ‘Java’ platform that makes ‘Java’ objects easy to use from within ‘R’; provides a unified interface to integrate ‘R’ with several programming languages; and features extensive data exchange between ‘R’ and ‘Java’. The ‘jsr223’-supported programming languages include ‘Groovy’, ‘JavaScript’, ‘JRuby’ (‘Ruby’), ‘Jython’ (‘Python’), and ‘Kotlin’. Any of these languages can use and extend ‘Java’ classes in natural syntax. Furthermore, solutions developed in any of the ‘jsr223’-supported languages are also accessible to ‘R’ developers. The ‘jsr223’ package also features callbacks, script compiling, and string interpolation. In all, ‘jsr223’ significantly extends the computing capabilities of the ‘R’ software environment.
jstable Create Tables from Different Types of Regression
Create regression tables from generalized linear model(GLM), Generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.
jstor Read Data from JSTOR/DfR
Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.
jsTree Create Interactive Trees with the ‘jQuery’ ‘jsTree’ Plugin
Create and customize interactive trees using the ‘jQuery’ ‘jsTree’ <https://…/> plugin library and the ‘htmlwidgets’ package. These trees can be used directly from the R console, from ‘RStudio’, in Shiny apps and R Markdown documents.
jtools Analyzing and Presenting Social Scientific Data
This is a collection of tools that the author (Jacob) has written for the purpose of more efficiently understanding and sharing the results of (primarily) regression analyses. There are a number of functions focused specifically on the interpretation and presentation of interactions in linear models. Just about everything supports models from the survey package.
jubilee Forecast Long-Term Growth of the U.S. Stock Market
A long-term forecast model called ‘Jubilee-Tectonic model’ is implemented to forecast future returns of the U.S. stock market, Treasury yield, and gold price. The five-factor model can forecast the 10-year and 20-year future equity returns with high R-squared above 80 percent. It is based on linear growth and mean reversion characteristics in the U.S. stock market. In addition, this model enhances the CAPE model by introducing the hypothesis that there are fault lines in the historical CAPE, which can be calibrated and corrected through statistical learning.
jug Create a Simple Web API for your R Functions
A set of convenience functions to build simple APIs.
JuliaCall Seamless Integration Between R and ‘Julia’
Provides an R interface to ‘Julia’, which is a high-level, high-performance dynamic programming language for numerical computing, see <https://…/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any ‘Julia’ function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like ‘Julia’.
JumpTest Financial Jump Detection
A fast simulation on stochastic volatility model, with jump tests, p-values pooling, and FDR adjustments.
junr Access Open Data Through the Junar API
The Junar API is a commercial platform to organize and publish data <http://junar.com>. It has been used in a number of national and local government Open Data initiatives in Latin America and the USA. This package is a wrapper to make it easier to access data made public through the Junar API.
jvcoords Principal Component Analysis (PCA) and Whitening
Provides functions to standardize and whiten data, and to perform Principal Component Analysis (PCA). The main advantage of this package over alternatives like prcomp() is, that jvcoords makes it easy to convert (additional) data between the original and the transformed coordinates. The package also provides a class coords, which can represent affine coordinate transformations. This class forms the basis of the transformations provided by the package, but can also be used independently. The implementation has been optimized to be of comparable speed (and sometimes even faster) than existing alternatives.
jvmr Integration of R, Java, and Scala
Cross-platform, self-contained, and bi-directional interface between R and Scala, Java, and other JVM-based languages.
jvnVaR Value at Risk
Many method to compute, predict and back-test VaR. For more detail, see the report: Value at Risk <researchgate.net>.
JWileymisc Miscellaneous Utilities and Functions
A collection of miscellaneous tools and functions, such as tools to generate descriptive statistics tables, format output, visualize relations among variables or check distributions.
jwutil Utilities for Data Manipulation, Disk Caching, Testing
This is a set of simple utilities for various data manipulation and caching tasks. The goal is to use base tools well, without bringing in any dependencies. Main areas of interest are data frame manipulation, such as converting factors in multiple binary indicator columns, and disk caching of data frames (which is optionally done by date range). There are testing functions which provide testthat expectations to permute arguments to function calls. There are functions and data to test extreme numbers, dates, and bad input of various kinds which should allow testing failure and corner cases. The test suite has many examples of usage.

K

kableExtra Construct Complex Table with ‘Kable’ and Pipe Syntax
A collection of functions to help build complex HTML or ‘LaTeX’ tables using ‘kable()’ from ‘knitr’ and the piping syntax from ‘magrittr’. Function ‘kable()’ is a light weight table generator coming from ‘knitr’. This package simplifies the way to manipulate the HTML or ‘LaTeX’ codes generated by ‘kable()’ and allows users to construct complex tables and customize styles using a readable syntax.
kader Kernel Adaptive Density Estimation and Regression
Implementation of various kernel adaptive methods in nonparametric curve estimation like density estimation as introduced in Stute and Srihera (2011) <doi:10.1016/j.spl.2011.01.013> and Eichner and Stute (2013) <doi:10.1016/j.jspi.2012.03.011> for pointwise estimation, and like regression as described in Eichner and Stute (2012) <doi:10.1080/10485252.2012.760737>.
kamila Methods for Clustering Mixed-Type Data
Implements methods for clustering mixed-type data, specifically combinations of continuous and nominal data. Special attention is paid to the often-overlooked problem of equitably balancing the contribution of the continuous and categorical variables. This package implements KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. Also implemented is Modha-Spangler clustering, which uses a brute-force strategy to maximize the cluster separation simultaneously in the continuous and categorical variables.
kantorovich Kantorovich Distance Between Probability Measures
Computes the Kantorovich distance between two probability measures on a finite set.
kaos Encoding of Sequences Based on Frequency Matrix Chaos Game Representation
Sequences encoding by using the chaos game representation. Löchel et al. (2019) <doi:10.1101/575324>.
kaphom Test the Homogeneity of Kappa Statistics
Tests the homogeneity of intraclass kappa statistics obtained from independent studies or a stratified study with binary results. It is desired to compare the kappa statistics obtained in multi-center studies or in a single stratified study to give a common or summary kappa using all available information. If the homogeneity test of these kappa statistics is not rejected, then it is possible to make inferences over a single kappa statistic that summarizes all the studies. Jun-mo Nam (2003) <doi:10.1111/j.0006-341X.2003.00118.x> Jun-mo Nam (2005) <doi:10.1002/sim.2321>Mousumi Banerjee, Michelle Capozzoli, Laura McSweeney,Debajyoti Sinha (1999) <doi:10.2307/3315487> Allan Donner, Michael Eliasziw, Neil Klar (1996) <doi:10.2307/2533154>.
kappalab Non-additive measure and integral manipulation functions
Kappalab, which stands for ‘laboratory for capacities’, is an S4 tool box for capacity (or non-additive measure, fuzzy measure) and integral manipulation on a finite setting. It contains routines for handling various types of set functions such as games or capacities. It can be used to compute several non-additive integrals: the Choquet integral, the Sugeno integral, and the symmetric and asymmetric Choquet integrals. An analysis of capacities in terms of decision behavior can be performed through the computation of various indices such as the Shapley value, the interaction index, the orness degree, etc. The well-known Möbius transform, as well as other equivalent representations of set functions can also be computed. Kappalab further contains seven capacity identification routines: three least squares based approaches, a method based on linear programming, a maximum entropy like method based on variance minimization, a minimum distance approach and an unsupervised approach grounded on parametric entropies. The functions contained in Kappalab can for instance be used in the framework of multicriteria decision making or cooperative game theory.
kazaam Tools for Tall Distributed Matrices
Many data science problems reduce to operations on very tall, skinny matrices. However, sometimes these matrices can be so tall that they are difficult to work with, or do not even fit into main memory. One strategy to deal with such objects is to distribute their rows across several processors. To this end, we offer an ‘S4’ class for tall, skinny, distributed matrices, called the ‘shaq’. We also provide many useful numerical methods and statistics operations for operating on these distributed objects. The naming is a bit ‘tongue-in-cheek’, with the class a play on the fact that ‘Shaquille’ ‘ONeal’ (‘Shaq’) is very tall, and he starred in the film ‘Kazaam’.
kcpRS Kernel Change Point Detection on the Running Statistics
The running statistics of interest is first extracted using a time window which is slid across the time series, and in each window, the running statistics value is computed. KCP (Kernel Change Point) detection proposed by Arlot et al. (2012) <arXiv:1202.3878> is then implemented to flag the change points on the running statistics (Cabrieto et al., 2018, <doi:10.1016/j.ins.2018.03.010>). Change points are located by minimizing a variance criterion based on the pairwise similarities between running statistics which are computed via the Gaussian kernel. KCP can locate change points for a given k number of change points. To determine the optimal k, the KCP permutation test is first carried out by comparing the variance of the running statistics extracted from the original data to that of permuted data. If this test is significant, then there is sufficient evidence for at least one change point in the data. Model selection is then used to determine the optimal k>0.
kde1d Univariate Kernel Density Estimation
Provides an efficient implementation of univariate local polynomial kernel density estimators that can handle bounded and discrete data. See Geenens (2014) <arXiv:1303.4121>, Geenens and Wang (2018) <arXiv:1602.04862>, Nagler (2018a) <arXiv:1704.07457>, Nagler (2018b) <arXiv:1705.05431>.
kdecopula Kernel Smoothing for Bivariate Copula Densities
Provides fast implementations of kernel smoothing techniques for bivariate copula densities, in particular density estimation and resampling.
kdensity Kernel Density Estimation with Parametric Starts and Asymmetric Kernels
Handles univariate non-parametric density estimation with parametric starts and asymmetric kernels in a simple and flexible way. Kernel density estimation with parametric starts involves fitting a parametric density to the data before making a correction with kernel density estimation, see Hjort & Glad (1995) <doi:10.1214/aos/1176324627>. Asymmetric kernels make kernel density estimation more efficient on bounded intervals such as (0, 1) and the positive half-line. Supported asymmetric kernels are the gamma kernel of Chen (2000) <doi:10.1023/A:1004165218295>, the beta kernel of Chen (1999) <doi:10.1016/S0167-9473(99)00010-9>, and the copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>. User-supplied kernels, parametric starts, and bandwidths are supported.
kdevine Multivariate Kernel Density Estimation with Vine Copulas
Implements a vine copula based kernel density estimator. The estimator does not suffer from the curse of dimensionality and is therefore well suited for high-dimensional applications.
kdist K-Distribution and Weibull Paper
Density, distribution function, quantile function and random generation for the K-distribution. A plotting function that plots data on Weibull paper and another function to draw additional lines. See results from package in T Lamont-Smith (2018), submitted J. R. Stat. Soc.
kdtools Tools for Working with Multidimensional Data
Provides various tools for working with multidimensional data in R and C++, including extremely fast nearest-neighbor- and range- queries without the overhead of linked tree nodes.
KDViz Knowledge Domain Visualization
Knowledge domain visualization using ‘mpa’ co-words method as the word clustering method and network graphs with ‘D3.js’ library as visualization tool.
keep Arrays with Better Control over Dimension Dropping
Provides arrays with flexible control over dimension dropping when subscripting.
Kendall Kendall rank correlation and Mann-Kendall trend test
Computes the Kendall rank correlation and Mann-Kendall trend test. See documentation for use of block bootstrap when there is autocorrelation.
kendallRandomWalks Simulate and Visualize Kendall Random Walks and Related Distributions
Kendall random walks are a continuous-space Markov chains generated by the Kendall generalized convolution. This package provides tools for simulating these random walks and studying distributions related to them. For more information about Kendall random walks see Jasiulis-Gołdyn (2014) <arXiv:1412.0220>.
KENDL Kernel-Smoothed Nonparametric Methods for Environmental Exposure Data Subject to Detection Limits
Calculate the kernel-smoothed nonparametric estimator for the exposure distribution in presence of detection limits.
keras R Interface to ‘Keras’
Interface to ‘Keras’, a high-level neural networks API which runs on top of ‘TensorFlow’. ‘Keras’ was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both ‘CPU’ and ‘GPU’ devices.
kerasformula A High-Level R Interface for Neural Nets
Adds a high-level interface for ‘keras’ neural nets. kms() fits neural net and accepts R formulas to aid data munging and hyperparameter selection. kms() can optionally accept a compiled keras_sequential_model() from ‘keras’. kms() accepts a number of parameters (like loss and optimizer) and splits the data into sparse test and training matrices. kms() returns a single object with predictions, a confusion matrix, and function call details.
kerasR R Interface to the Keras Deep Learning Library
Provides a consistent interface to the ‘Keras’ Deep Learning Library directly from within R. ‘Keras’ (see <https://keras.io/> for more information) provides specifications for describing dense neural networks, convolution neural networks (CNN) and recurrent neural networks (RNN) running on top of either ‘TensorFlow’ (<https://…/> ) or ‘Theano’ (<http://…/> ). Type conversions between Python and R are automatically handled correctly, even when the default choices would otherwise lead to errors. Includes complete R documentation and many working examples.
KERE Expectile Regression in Reproducing Kernel Hilbert Space
An efficient algorithm inspired by majorization-minimization principle for solving the entire solution path of a flexible nonparametric expectile regression estimator constructed in a reproducing kernel Hilbert space.
kernDeepStackNet Kernel Deep Stacking Networks
Contains functions for estimation and model selection of kernel deep stacking networks. The model selection includes direct optimization or model based alternatives with arbitrary loss functions.
kerndwd Distance Weighted Discrimination (DWD) and Kernel Methods
Distance Weighted Discrimination (DWD) and Kernel Methods
kernelboot Smoothed Bootstrap and Random Generation from Kernel Densities
Smoothed bootstrap and functions for random generation from univariate and multivariate kernel densities. It does not estimate kernel densities.
KernelKnn Kernel k Nearest Neighbors
Extends the simple k-nearest neighbors algorithm by incorporating numerous kernel functions and a variety of distance metrics. The package takes advantage of ‘RcppArmadillo’ to speed up computationally intensive functions.
kernhaz Kernel Estimation of Hazard Function in Survival Analysis
Producing kernel estimates of the unconditional and conditional hazard function for right-censored data including methods of bandwidth selection.
kernlab Kernel-based Machine Learning Lab
Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods kernlab includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
kernscr Kernel Machine Score Test for Semi-Competing Risks
Kernel Machine Score Test for Pathway Analysis in the Presence of Semi-Competing Risks.
kexpmv Matrix Exponential using Krylov Subspace Routines
Implements functions from ‘EXPOKIT’ (<https://…/> ) to calculate matrix exponentials, Sidje RB, (1998) <doi:10.1145/285861.285868>. Includes functions for small dense matrices along with functions for large sparse matrices. The functions for large sparse matrices implement Krylov subspace methods which help minimise the computational complexity for matrix exponentials. ‘Kexpmv’ can be utilised to calculate both the matrix exponential in isolation along with the product of the matrix exponential and a vector.
KeyboardSimulator Simulate Keyboard Press and Mouse Click
Control your keyboard and mouse with R code, simulate key press and mouse click.
keyholder Store Data About Rows
Tools for keeping track of information, named ‘keys’, about rows of data frame like objects. This is done by creating special attribute ‘keys’ which is updated after every change in rows (subsetting, ordering, etc.). This package is designed to work tightly with ‘dplyr’ package.
keyplayer Locating Key Players in Social Networks
Provides group centrality measures and identifies the most central group of players in a network.
keyring Access the System Credential Store from R
Platform independent ‘API’ to access the operating system’s credential store. Currently supports: ‘Keychain’ on ‘macOS’, Credential Store on ‘Windows’, the Secret Service ‘API’ on ‘Linux’, and a simple, platform independent store implemented with environment variables. Additional storage back-ends can be added easily.
keyringr Decrypt Passwords from Gnome Keyring and Windows Data Protection API
Currently this package decrypts passwords stored in the Gnome Keyring using secret tool, and strings encrypted with the Windows Data Protection API. OSX Keychain coming soon.
kfda Kernel Fisher Discriminant Analysis
Kernel Fisher Discriminant Analysis (KFDA) is performed using Kernel Principal Component Analysis (KPCA) and Fisher Discriminant Analysis (FDA). There are some similar packages. First, ‘lfda’ is a package that performs Local Fisher Discriminant Analysis (LFDA) and performs other functions. In particular, ‘lfda’ seems to be impossible to test because it needs the label information of the data in the function argument. Also, the ‘ks’ package has a limited dimension, which makes it difficult to analyze properly. This package is a simple and practical package for KFDA based on the paper of Yang, J., Jin, Z., Yang, J. Y., Zhang, D., and Frangi, A. F. (2004) <DOI:10.1016/j.patcog.2003.10.015>.
kfigr Integrated Code Chunk Anchoring and Referencing for R Markdown Documents
A streamlined cross-referencing system for R Markdown documents generated with ‘knitr’. R Markdown is an authoring format for generating dynamic content from R. ‘kfigr’ provides a hook for anchoring code chunks and a function to cross-reference document elements generated from said chunks, e.g. figures and tables.
kinn An Implementation of ‘kinn’ Algorithm, a Graph Based Regression Model
A graph based regression model from flat unstructured dataset. Each line in the input data set is treated as a node from which an edge to another line (node) can be formed. In the training process, a model is created which contains sparse graph adjacency matrix. This model is then used for prediction by taking a predictor and the model as inputs and outputs a prediction which is an average of the most similar node and its neighbours in the model graph.
klaR Classification and visualization
Miscellaneous functions for classification and visualization developed at the Fakultaet Statistik, Technische Universitaet Dortmund
kmc Kaplan-Meier Estimator with Constraints for Right Censored Data — a Recursive Computational Algorithm
Given constraints for right censored data, we use a recursive computational algorithm to calculate the the ‘constrained’ Kaplan-Meier estimator. The constraint is assumed given in linear estimating equations or mean functions. We also illustrate how this leads to the empirical likelihood ratio test with right censored data and accelerated failure time model with given coefficients. EM algorithm from emplik package is used to get the initial value. The properties and performance of the EM algorithm is discussed in Mai Zhou and Yifan Yang (2015)<doi: 10.1007/s00180-015-0567-9> and Mai Zhou and Yifan Yang (2017) <10.1002/wics.1400>. More applications could be found in Mai Zhou (2015) <doi: 10.1201/b18598>.
kmcudaR Yinyang’ K-Means and K-NN using NVIDIA CUDA
K-means implementation is based on ‘Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup’. While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree.
kmeans.ddR Distributed k-Means for Big Data using ‘ddR’ API
Distributed k-means clustering algorithm written using ‘ddR’ (Distributed Data Structures) API in the ‘ddR’ package.
kmed Distance-Based k-Medoids
A simple and fast distance-based k-medoids clustering algorithm from Park and Jun (2009) <doi:10.1016/j.eswa.2008.01.039>. Calculate distances for mixed variable data such as Gower (1971) <doi:10.2307/2528823>, Wishart (2003) <doi:10.1007/978-3-642-55721-7_23>, Podani (1999) <doi:10.2307/1224438>, Huang (1997) <http://….1.94.9984&rep=rep1&type=pdf>, and Harikumar and PV (2015) <doi:10.1016/j.procs.2015.10.077>. Cluster validation applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm such as complete, ward, or average linkages.
kmer Fast K-Mer Counting and Clustering for Biological Sequence Analysis
Contains tools for rapidly computing distance matrices and clustering large sequence datasets using fast alignment-free k-mer counting and recursive k-means partitioning. See Vinga and Almeida (2003) <doi:10.1093/bioinformatics/btg005> for a review of k-mer counting methods and applications for biological sequence analysis.
kmeRs K-Mers Similarity Score Matrix
Contains tools to calculate similarity score matrix for DNA k-mers. The pairwise similarity score is calculated using PAM or BLOSUM substitution matrix. The results are evaluated by similarity score calculated by Needleman-Wunsch (1970) <doi:10.1016/0022-2836(70)90057-4> global or Smith-Waterman (1981) <doi:10.1016/0022-2836(81)90087-5> local alignment. Higher similarity score indicates more similar sequences for BLOSUM and less similar sequences for PAM matrix; 30, 40, 70, 120, 250 and 62, 45, 50, 62, 80, 100 matrix versions are available for PAM and BLOSUM, respectively.
kml3d K-Means for Joint Longitudinal Data
An implementation of k-means specifically design to cluster joint trajectories (longitudinal data on several variable-trajectories). Like ‘kml’, it provides facilities to deal with missing value, compute several quality criterion (Calinski and Harabatz, Ray and Turie, Davies and Bouldin, BIC,…) and propose a graphical interface for choosing the ‘best’ number of clusters. In addition, the 3D graph representing the mean joint-trajectories of each cluster can be exported through LaTeX in a 3D dynamic rotating PDF graph.
kmlShape K-Means for Longitudinal Data using Shape-Respecting Distance
K-means for longitudinal data using shape-respecting distance and shape-respecting means.
kmodR K-Means with Simultaneous Outlier Detection
An implementation of the ‘k-means-‘ algorithm proposed by Chawla and Gionis, 2013 in their paper, ‘k-means- : A unified approach to clustering and outlier detection. SIAM International Conference on Data Mining (SDM13)’, and using ‘ordering’ described by Howe, 2013 in the thesis, ‘Clustering and anomaly detection in tropical cyclones’. Useful for creating (potentially) tighter clusters than standard k-means and simultaneously finding outliers inexpensively in multidimensional space.
KnapsackSampling Generate Feasible Samples of a Knapsack Problem
The sampl.mcmc() function creates samples of the feasible region of a knapsack problem with both equalities and inequalities constraints.
knitLatex ‘Knitr’ Helpers – Mostly Tables
Provides several helper functions for working with ‘knitr’ and ‘LaTeX’. It includes ‘xTab’ for creating traditional ‘LaTeX’ tables, ‘lTab’ for generating ‘longtable’ environments, and ‘sTab’ for generating a ‘supertabular’ environment. Additionally, this package contains a knitr_setup() function which fixes a well-known bug in ‘knitr’, which distorts the ‘results=’asis” command when used in conjunction with user-defined commands; and a com command (<<com=TRUE>>=) which renders the output from ‘knitr’ as a ‘LaTeX’ command.
knitr A General-Purpose Package for Dynamic Report Generation in R
Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
knitrBootstrap A framework to create bootstrap styled HTML reports from knitr Rmarkdown
A framework to create bootstrap styled HTML reports from knitr Rmarkdown.
knitrProgressBar Provides Progress Bars in ‘knitr’
Provides a progress bar similar to ‘dplyr’ that can write progress out to a variety of locations, including stdout(), stderr(), or from file(). Useful when using ‘knitr’ or ‘rmarkdown’, and you still want to see progress of calculations in the terminal.
knncat Nearest-neighbor Classification with Categorical Variables
Scale categorical variables in such a way as to make NN classification as accurate as possible. The code also handles continuous variables and prior probabilities, and does intelligent variable selection and estimation of both error rates and the right number of NN’s.
knnp Time Series Prediction using K-Nearest Neighbors Algorithm (Parallel)
Two main functionalities are provided. One of them is predicting values with k-nearest neighbors algorithm and the other is optimizing the parameters k and d of the algorithm. These are carried out in parallel using multiple threads.
Knoema Interface to the Knoema API
Knoema’ has the largest collection of public data and statistics on the Internet featuring about 2.5 billion time series from thousands of sources. Using ‘Knoema’ API users can access the data in the ‘Knoema’ repository and use rich R calculations in order to analyze the data. Because data in ‘Knoema’ is time series data, ‘Knoema’ function offers data in a number of formats usable in R such as ‘ts’, ‘xts’ or ‘zoo’. For more information about ‘Knoema’ API go to <https://…/docs>.
knor Non-Uniform Memory Access (‘NUMA’) Optimized, Parallel K-Means
The k-means ‘NUMA’ Optimized Routine library or ‘knor’ is a highly optimized and fast library for computing k-means in parallel with accelerations for Non-Uniform Memory Access (‘NUMA’) architectures.
knotR Knot Diagrams using Bezier Curves
Makes nice pictures of knots using Bezier curves and numerical optimization.
KnowGRRF Knowledge-Based Guided Regularized Random Forest
Random Forest (RF) and Regularized Random Forest can be used for feature selection. Moreover, by Guided Regularized Random Forest, statistical-based weights are used to guide the regularization of random forest and further used for feature selection. This package can integrate prior information from multiple domains (statistical based and knowledge domain) to guide the regularization of random forest and feature selection. For more details, see reference: Guan X., Liu L. (2018) <doi:10.1007/978-3-319-78759-6_1>.
KODAMA Knowledge discovery by accuracy maximization
KODAMA (KnOwledge Discovery by Accuracy MAximization) is an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data.
kokudosuuchi R Interface to e-Stat API
Provides an interface to Kokudo Suuchi API, the GIS data service of the Japanese government. See <http://…/index.html> for more information.
komadown R Markdown Templates for the ‘KOMA-Script’ Classes
R Markdown templates based on the ‘KOMA-Script’ classes for LaTeX, additionally offering cross-referencing via the ‘bookdown’ package.
komaletter Simple yet Flexible Letters via the ‘KOMA-Script LaTeX Bundle’
An R Markdown template for writing beautiful yet versatile letters, using the ‘KOMA-Script’ letter class ‘scrlttr2’ and an adaptation of the ‘pandoc-letter’ template. ‘scrlttr2’ provides layouts for many different window envelope types and the possibility to define your own.
konfound Quantify the Robustness of Causal Inferences
Statistical methods that quantify the conditions necessary to alter inferences, also known as sensitivity analysis, are becoming increasingly important to a variety of quantitative sciences. A series of recent works, including Frank (2000) <doi:10.1177/0049124100029002001> and Frank et al. (2013) <doi:10.3102/0162373713493129> extend previous sensitivity analyses by considering the characteristics of omitted variables or unobserved cases that would change an inference if such variables or cases were observed. These analyses generate statements such as ‘an omitted variable would have to be correlated at xx with the predictor of interest (e.g., treatment) and outcome to invalidate an inference of a treatment effect’. Or ‘one would have to replace pp percent of the observed data with null hypothesis cases to invalidate the inference’. We implement these recent developments of sensitivity analysis and provide modules to calculate these two robustness indices and generate such statements in R. In particular, the functions konfound(), pkonfound() and mkonfound() allow users to calculate the robustness of inferences for a user’s own model, a single published study and multiple studies respectively.
koRpus.lang.en Language Support for ‘koRpus’ Package: English
Adds support for the English language to the ‘koRpus’ package. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please consider subscribing to the koRpus-dev mailing list (<http://korpusml.reaktanz.de> ).
KoulMde Koul’s Minimum Distance Estimation in Linear Regression and Autoregression Model
Consider linear regression model and autoregressive model of order p where errors in the linear regression model and innovations in the autoregression model are independent and symmetrically distributed. Hira L. Koul proposed a nonparametric minimum distance estimation method by minimizing L2-type distance between certain weighted residual empirical distribution functions. He also proposed a simpler version of the loss function by using symmetry of the integrating measure in the distance. This package contains two functions: KoulLrMde() and KoulArMde(). KoulLrMde() and KoulArMde() provide minimum distance estimators for linear regression model and autoregression model, respectively, where both are based on Koul’s method. These two functions take much less time for the computation than those based on parametric minimum distance estimation methods.
kpcalg Kernel PC Algorithm for Causal Structure Detection
Kernel PC (kPC) algorithm for causal structure learning and causal inference using graphical models. kPC is a version of PC algorithm that uses kernel based independence criteria in order to be able to deal with non-linear relationships and non-Gaussian noise.
kpeaks Determination of K Using Peak Counts of Features for Clustering
The input argument k which is the number of clusters is needed to start all of the partitioning clustering algorithms. In unsupervised learning applications, an optimal value of this argument is widely determined by using the internal validity indexes. Since these indexes suggest a k value which is computed on the clustering results after several runs of a clustering algorithm they are computationally expensive. On the contrary, ‘kpeaks’ enables to estimate k before running any clustering algorithm. It is based on a simple novel technique using the descriptive statistics of peak counts of the features in a data set.
kpmt Known Population Median Test
Functions that implement the known population median test.
kpodclustr Method for Clustering Partially Observed Data
The kpodclustr package implements the k-POD method for clustering partially observed data.
KraljicMatrix A Quantified Implementation of the Kraljic Matrix
Implements a quantified approach to the Kraljic Matrix (Kraljic, 1983, <https://…chasing-must-become-supply-management> ) for strategically analyzing a firm’s purchasing portfolio. It combines multi-objective decision analysis to measure purchasing characteristics and uses this information to place products and services within the Kraljic Matrix.
KRIG Spatial Statistic with Kriging
Implements different methods for spatial statistics, in particular focused in Kriging based models. We count with different implemented models, simple, ordinary and universal forms of Kriging, co-Kriging and regression Kriging models. Includes, multivariate sensitivity analysis under an approximation designed over reproducing kernel Hilbert spaces and computation of Sobol indexes under this framework.
krige Geospatial Kriging with Metropolis Sampling
Estimates kriging models for geographical point-referenced data. Method is described in Monogan and Gill (2016) <doi:10.1017/psrm.2015.5>.
KRIS Keen and Reliable Interface Subroutines for Bioinformatic Analysis
Provides useful functions which are needed for bioinformatic analysis such as calculating linear principal components from numeric data and Single-nucleotide polymorphism (SNP) dataset, calculating fixation index (Fst) using Hudson method, creating scatter plots in 3 views, handling with PLINK binary file format, detecting rough structures and outliers using unsupervised clustering, and calculating matrix multiplication in the faster way for big data.
KScorrect Lilliefors-Corrected Kolmogorov-Smirnoff Goodness-of-Fit Tests
Implements the Lilliefors-corrected Kolmogorov-Smirnoff test for use in goodness-of-fit tests, suitable when population parameters are unknown and must be estimated by sample statistics. P-values are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.
KSgeneral Computing P-Values of the K-S Test for (Dis)Continuous Null Distribution
Computes a p-value of the one-sample two-sided (or one-sided, as a special case) Kolmogorov-Smirnov (KS) statistic, for any fixed critical level, and an arbitrary, possibly large sample size for a pre-specified purely discrete, mixed or continuous cumulative distribution function (cdf) under the null hypothesis. If a data sample is supplied, ‘KSgeneral’ computes the p-value corresponding to the value of the KS test statistic computed based on the user provided data sample. The package ‘KSgeneral’ implements a novel, accurate and efficient method named Exact-KS-FFT, expressing the p-value as a double-boundary non-crossing probability for a homogeneous Poisson process, which is then efficiently computed using Fast Fourier Transform (FFT). The package can also be used to compute and plot the complementary cdf of the KS statistic which is known to depend on the hypothesized distribution when the latter is discontinuous (i.e. purely discrete or mixed).
ksNN K* Nearest Neighbors Algorithm
Prediction with k* nearest neighbor algorithm based on a publication by Anava and Levy (2016) <arXiv:1701.07266>.
kstIO Knowledge Space Theory Input/Output
Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The ‘kstIO’ package provides basic functionalities to read and write KST data from/to files to be used together with the ‘kst’, ‘pks’ or ‘DAKS’ packages.
kstMatrix Basic Functions in Knowledge Space Theory Using Matrix Representation
Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The ‘kstMatrix’ package provides basic functionalities to generate, handle, and manipulate knowledge structures and knowledge spaces. Opposed to the ‘kst’ package, ‘kstMatrix’ uses matrix representations for knowledge structures. Furthermore, ‘kstMatrix’ contains several knowledge spaces developed by the research group around Cornelia Dowling through querying experts.
KTensorGraphs Co-Tucker3 Analysis of Two Sequences of Matrices
Provides a function called COTUCKER3() (Co-Inertia Analysis + Tucker3 method) which performs a Co-Tucker3 analysis of two sequences of matrices, as well as other functions called PCA() (Principal Component Analysis) and BGA() (Between-Groups Analysis), which perform analysis of one matrix, COIA() (Co-Inertia Analysis), which performs analysis of two matrices, PTA() (Partial Triadic Analysis) and TUCKER3(), which perform analysis of a sequence of matrices, and BGCOIA() (Between-Groups Co-Inertia Analysis), STATICO() (STATIS method + Co-Inertia Analysis), COSTATIS() (Co-Inertia Analysis + STATIS method), which also perform analysis of two sequences of matrices.
kuiper.2samp Two-Sample Kuiper Test
This function performs the two-sample Kuiper test to assess the anomaly of continuous, one-dimensional probability distributions. References used for this method are (1). Kuiper, N. H. (1960). <DOI:10.1016/S1385-7258(60)50006-0> and (2). Paltani, S. (2004). <DOI:10.1051/0004-6361:20034220>.
kvh Read/Write Files in Key-Value-Hierarchy Format
The format KVH is a lightweight format that can be read/written both by humans and machines. It can be useful in situations where XML or alike formats seem to be an overkill. We provide an ability to parse KVH files in R pretty fast due to ‘Rcpp’ use.
kzfs Multi-Scale Motions Separation with Kolmogorov-Zurbenko Periodogram Signals
Separation of wave motions in different scales and directions based on Kolmogorov-Zurbenko 2D periodograms and directional periodograms.

L

l0ara Sparse Generalized Linear Model with L0 Approximation for Feature Selection
An efficient procedure for feature selection for generalized linear models with L0 penalty, including linear, logistic, Poisson, gamma, inverse Gaussian regression. Adaptive ridge algorithms are used to fit the models.
L0Learn Fast Algorithms for Best Subset Selection
Highly optimized toolkit for (approximately) solving L0-regularized learning problems. The algorithms are based on coordinate descent and local combinatorial search. For more details, check the paper by Hazimeh and Mazumder (2018) <arXiv:1803.01454>; the link is provided in the URL field below.
L1pack Routines for L1 Estimation
L1 estimation for linear regression and random number generation for the multivariate Laplace distribution.
labeling Axis Labeling
Provides a range of axis labeling algorithms
labelled Manipulating Labelled Data
Work with labelled data imported from ‘SPSS’ or ‘Stata’ with ‘haven’ or ‘foreign’.
labelrank Predicting Rankings of Labels
An implementation of distance-based ranking algorithms to predict rankings of labels. Two common algorithms are included: the naive Bayes and the nearest neighbor algorithms.
labelVector Label Attributes for Atomic Vectors
Labels are a common construct in statistical software providing a human readable description of a variable. While variable names are succinct, quick to type, and follow a language’s naming conventions, labels may be more illustrative and may use plain text and spaces. R does not provide native support for labels. Some packages, however, have made this feature available. Most notably, the ‘Hmisc’ package provides labelling methods for a number of different object. Due to design decisions, these methods are not all exported, and so are unavailable for use in package development. The ‘labelVector’ package supports labels for atomic vectors in a light-weight design that is suitable for use in other packages.
LabourMarketAreas Identification, Tuning, Visualisation and Analysis of Labour Market Areas
Produces Labour Market Areas from commuting flows and provides tools for automatic tuning based on spatial contiguity. It also allows for analyses and visualisation of the new functional geography.
lacm Latent Autoregressive Count Models
Perform pairwise likelihood inference in latent autoregressive count models. See Pedeli and Varin (2018) <arXiv:1805.10865> for details.
LadR Routines for Fit, Inference and Diagnostics in LAD Models
LAD (Least Absolute Deviations) estimation for linear regression, confidence intervals, tests of hypotheses, methods for outliers detection, measures of leverage, methods of diagnostics for LAD regression, special diagnostics graphs and measures of leverage. The algorithms are based in Dielman (2005) <doi:10.1080/0094965042000223680>, Elian et al. (2000) <doi:10.1080/03610920008832518> and Dodge (1997) <doi:10.1006/jmva.1997.1666>. This package also has two datasets ‘houses’ and ‘pollution’, respectively, from Narula and Wellington (1977) <doi:10.2307/1268628> and Santos et al. (2016) <doi:10.1371/journal.pone.0163225>.
lagged Classes and Methods for Lagged Objects
Provides classes and methods for lagged objects.
lagsarlmtree Spatial Lag Model Trees
Model-based linear model trees adjusting for spatial correlation using a simultaneous autoregressive spatial lag.
LagSequential Lag-Sequential Categorical Data Analysis
Lag-sequential analysis is a method of assessing of patterns (what tends to follow what?) in sequences of codes. The codes are typically for discrete behaviors or states. The functions in this package read a stream of codes, or a frequency transition matrix, and produce a variety of lag sequential statistics, including transitional frequencies, expected transitional frequencies, transitional probabilities, z values, adjusted residuals, Yule’s Q values, likelihood ratio tests of stationarity across time and homogeneity across groups or segments, transformed kappas for unidirectional dependence, bidirectional dependence, parallel and nonparallel dominance, and significance levels based on both parametric and randomization tests. The methods are described in Bakeman & Quera (2011) <doi:10.1017/CBO9781139017343>, O’Connor (1999) <doi:10.3758/BF03200753>, Wampold & Margolin (1982) <doi:10.1037/0033-2909.92.3.755>, and Wampold (1995, ISBN:0-89391-919-5).
LAM Some Latent Variable Models
Contains some procedures for latent variable modelling with a particular focus on multilevel data. The LAM package contains mean and covariance structure modelling for multivariate normally distributed data (‘mlnormal’), a general Metropolis-Hastings algorithm (‘amh’) and penalized maximum likelihood estimation (‘pmle’).
lambda.r Functional programming in R
Provides a syntax for writing functional programs in R. Lambda.R has a clean syntax for defining multi-part functions with optional guard statements. Simple pattern matching is also supported. Types can be easily defined and instantiated using the same functional notation. Type checking is integrated and optional, giving the programmer complete flexibility over their application or package.
lamme Log-Analytic Methods for Multiplicative Effects
Log-analytic methods intended for testing multiplicative effects.
lamW Lambert-W Function
Implements both real-valued branches of the Lambert-W function, also known as the product logarithm, without the need for installing the entire GSL.
LANDD Liquid Association for Network Dynamics Detection
Using Liquid Association for Network Dynamics Detection.
landest Landmark Estimation of Survival and Treatment Effect
Provides functions to estimate survival and a treatment effect using a landmark estimation approach.
landscapemetrics Landscape Metrics for Categorical Map Patterns
Calculates landscape metrics for categorical landscape patterns in a tidy workflow. ‘landscapemetrics’ reimplements the most common metrics from ‘FRAGSTATS’ (<https://…/fragstats.html> ) and new ones from the current literature on landscape metrics. This package supports ‘raster’ spatial objects and takes RasterLayer, RasterStacks, RasterBricks or lists of RasterLayer from the ‘raster’ package as input arguments. It further provides utility functions to visualize patches, select metrics and building blocks to develop new metrics.
landscapeR Categorical Landscape Simulation Facility
This set of functions allows the simulation of categorical maps from scratch in a geographically referenced landscape or the manipulation of existing ones. The basic algorithm currently implemented uses a simple agent style/cellular automata growth model, with no rules (apart from areas of exclusion), therefore expands more or less circularly picking cells at the edges randomly.
landscapetools Landscape Utility Toolbox
Provides utility functions to complete tasks involved in most landscape analysis. It includes functions to coerce raster data to the common tibble format and vice versa, it helps with flexible reclassification tasks of raster data and it provides a function to merge multiple raster. Furthermore, ‘landscapetools’ helps landscape scientists to visualize their data by providing optional themes and utility functions to plot single landscapes, rasterstacks, -bricks and lists of raster.
Langevin Langevin Analysis in One and Two Dimensions
Estimate drift and diffusion functions from time series and generate synthetic time series from given drift and diffusion coefficients.
languagelayeR Access the ‘languagelayer’ API
Improve your text analysis with languagelayer <https://languagelayer.com>, a powerful language detection API.
languageserver Language Server Protocol
An implementation of the Language Server Protocol for R. The Language Server protocol is used by an editor client to integrate features like auto completion. See <https://…/language-server-protocol> for details.
LaplacesDemon Complete Environment for Bayesian Inference
Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.
LARF Instrumental Variable Estimation of Causal Effects through Local Average Response Functions
LARF is an R package that provides instrumental variable estimation of treatment effects when both the endogenous treatment and its instrument (i.e., the treatment inducement) are binary. The method (Abadie 2003) involves two steps. First, pseudo-weights are constructed from the probability of receiving the treatment inducement. By default LARF estimates the probability by a probit regression. It also provides semiparametric power series estimation of the probability and allows users to employ other external methods to estimate the probability. Second, the pseudo-weights are used to estimate the local average response function conditional on treatment and covariates. LARF provides both least squares and maximum likelihood estimates of the conditional treatment effects.
largeList Serialization Interface for Large List Objects
Functions to write or append a R list to a file, read or remove elements from it without restoring the whole list.
largeVis High-Quality Visualizations of Large, High-Dimensional Datasets
Implements the largeVis algorithm (see Tang, et al. (2016) <https://…/1602.00370> ) for visualizing very large high-dimensional datasets; very fast search for approximate nearest neighbors; outlier detection; optimized implementation of the HDBSCAN clustering algorithm; plotting functions for visualizing the above.
lars Least Angle Regression, Lasso and Forward Stagewise
Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit. Least angle regression and infinitesimal forward stagewise regression are related to the lasso, as described in the paper below.
LassoBacktracking Modelling Interactions in High-Dimensional Data with Backtracking
Implementation of the algorithm introduced in ‘Shah, R. D. (2016) Modelling interactions in high-dimensional data with Backtracking, JMLR, to appear’. Data with thousands of predictors can be handled. The algorithm performs sequential Lasso (Tibshirani, 1996) fits on design matrices containing increasing sets of candidate interactions. Previous fits are used to greatly speed up subsequent fits so the algorithm is very efficient.
lassopv Nonparametric P-Value Estimation for Predictors in Lasso
The purpose of this package is to estimate p-values for predictors x against target variable y in lasso regression, using the regularization strength when each predictor enters the active set of regularization path for the first time as the statistic. This is based on the assumption that predictors that (first) become active earlier tend to be more significant. Null distribution for each predictor is computed analytically under approximation, which aims at efficiency and accuracy for small p-values.
LassoSIR Sparsed Sliced Inverse Regression via Lasso
Estimate the sufficient dimension reduction space using sparsed sliced inverse regression via Lasso (Lasso-SIR) introduced in Lin, Zhao, and Liu (2017) <arxiv:1611.06655>. The Lasso-SIR is consistent and achieve the optimal convergence rate under certain sparsity conditions for the multiple index models.
lasvmR A Simple Wrapper for the LASVM Solver
This is a simple wrapper for the LASVM Solver (see http://…/lasvm ). LASVM is basically an online variant of the SMO solver.
LatentREGpp Item Response Theory Implemented in R and Cpp
Provides a C++ implementation of the Multidimensional Item Response Theory (MIRT) capable of performing parameter and traits estimations. It also provides a list of options to perform an optimal analysis and obtain useful information about the resulting model. Acknowledgment: This work was Supported by Colciencias Research Grant 0039-2013 and SICS Research Group, Universidad Nacional de Colombia.
later Utilities for Delaying Function Execution
Executes arbitrary R or C functions some time after the current time, after the R execution stack has emptied.
latex2exp Use LaTeX Expressions in Plots
Parses and converts LaTeX math formulas to R’s plotmath expressions, used to enter mathematical formulas and symbols to be rendered as text, axis labels, etc. throughout R’s plotting system.
latexpdf Convert Tables to PDF
Converts table-like objects to stand-alone PDF. Can be used to embed tables and arbitrary content in PDF documents. Provides a low-level R interface for creating ‘LaTeX’ code, e.g. command() and a high-level interface for creating PDF documents, e.g. as.pdf.data.frame(). Extensive customization is available via mid-level functions, e.g. as.tabular(). See also ‘package?latexpdf’. Adapted from ‘metrumrg’ <http://…/?group_id=1215>. Requires a compatible installation of ‘pdflatex’, e.g. <https://…/>.
latte Interface to ‘LattE’ and ‘4ti2’
Back-end connections to ‘LattE’ (<https://…/~latte> ) for counting lattice points and integration inside convex polytopes and ‘4ti2’ (<http://…/> ) for algebraic, geometric, and combinatorial problems on linear spaces and front-end tools facilitating their use in the ‘R’ ecosystem.
latticeExtra Extra Graphical Utilities Based on Lattice
Extra graphical utilities based on lattice.
lavaan Latent Variable Analysis
Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
http://…/charles-sems.html
lavaan.shiny Latent Variable Analysis with Shiny
Interactive shiny application for working with different kinds of latent variable analysis, with the ‘lavaan’ package. Graphical output for models are provided and different estimators are supported.
lavaanPlot Path Diagrams for Lavaan Models via DiagrammeR
Plots path diagrams from models in lavaan using the plotting functionality from the DiagrammeR package. DiagrammeR provides nice path diagrams via Graphviz, and these functions make it easy to generate these diagrams from a lavaan path model without having to write the DOT language graph specification.
lavaSearch2 Tools for Model Specification in the Latent Variable Framework
Tools for model specification in the latent variable framework (add-on to the ‘lava’ package). The package contains three main functionalities: Wald tests/F-tests with improved control of the type 1 error in small samples, adjustment for multiple comparisons when searching for local dependencies, and adjustment for multiple comparisons when doing inference for multiple latent variable models.
Lavash Lava Estimation for the Sum of Sparse and Dense Signals
The lava estimation is a new technique to recover signals that is the sum of a sparse and dense signals. The post-lava method corrects the shrinkage bias of lava. For more information on the lava estimation, see Chernozhukov, Hansen, and Liao (2017) <doi:10.1214/16-AOS1434>.
lawn R Client for ‘Turfjs’ for ‘Geospatial’ Analysis
R client for ‘Turfjs’ (http://turfjs.org ) for ‘geospatial’ analysis. The package revolves around using ‘GeoJSON’ data. Functions are included for creating ‘GeoJSON’ data objects, measuring aspects of ‘GeoJSON’, and combining, transforming, and creating random ‘GeoJSON’ data objects.
lazyrmd Render R Markdown Outputs Lazily
An R Markdown html document format that provides the ability to lazily load plot outputs as the user scrolls over them. This is useful for large R Markdown documents with many plots, as it allows for a fast initial page load and defers loading of individual graphics to the time that the user navigates near them.
lazysql Lazy SQL Programming
Helper functions to build SQL statements for dbGetQuery or dbSendQuery under program control. They are intended to increase speed of coding and to reduce coding errors. Arguments are carefully checked, in particular SQL identifiers such as names of tables or columns. More patterns will be added as required.
lazyWeave LaTeX Wrappers for R Users
Provides the functionality to write LaTeX code from within R without having to learn LaTeX. Functionality also exists to create HTML and Markdown code. While the functionality still exists to write complete documents with lazyWeave, it is generally easier to do so with with markdown and knitr. lazyWeave’s main strength now is the ability to design custom and complex tables for reporting results.
lba Latent Budget Analysis for Compositional Data
Latent budget analysis is a method for the analysis of a two-way contingency table with an exploratory variable and a response variable. It is specially designed for compositional data.
lbfgsb3c Limited Memory BFGS Minimizer with Bounds on Parameters with optim() ‘C’ Interface
Interfacing to Nocedal et al. L-BFGS-B.3.0 (2011 <doi:10.1145/2049662.2049669>) limited memory BFGS minimizer with bounds on parameters. This is a fork of ‘lbfgsb3’. This registers a ‘R’ compatible ‘C’ interface to L-BFGS-B.3.0 that uses the same function types and optimization as the optim() function (see writing ‘R’ extensions and source for details). Ths package also adds more stopping criterion as well as allows adjusting more tolerances.
lbreg Log-Binomial Regression with Constrained Optimization
Maximum likelihood estimation of log-binomial regression with special functionality when the MLE is on (or close to) the boundary of the parameter space.
LBSPR Length-Based Spawning Potential Ratio
Simulate expected equilibrium length composition, YPR, and SPR using the LBSPR model. Fit the LBSPR model to length data to estimate selectivity, relative fishing mortality, and spawning potential ratio for data-limited fisheries.
lcc Longitudinal Concordance Correlation
Estimates the longitudinal concordance correlation to access the longitudinal agreement profile. The estimation approach implemented is variance components approach based on polynomial mixed effects regression model, as proposed by Oliveira, Hinde and Zocchi (2018) <doi:10.1007/s13253-018-0321-1>. In addition, non-parametric confidence intervals were implemented using percentile method or normal-approximation based on Fisher Z-transformation.
lclGWAS Efficient Estimation of Discrete-Time Multivariate Frailty Model Using Exact Likelihood Function for Grouped Survival Data
The core of this ‘Rcpp’ based package is several functions to compute the baseline hazard, effect parameter, and frailty variance for the discrete-time shared frailty model with random effects. The core functions include two processes: (1) evaluate the multiple variable integration to compute the exact proportional hazards model based likelihood and (2) estimate desired parameters using maximum likelihood estimation. The integration is evaluated by ‘Cuhre’ function from ‘Cuba’ library (Hahn, T., Cuba-a library for multidimensional numerical integration, Comput. Phys. Commun. 168, 2005, 78-95 <doi: 10.1016/j.cpc.2005.01.010>), and the source files of ‘Cuhre’ function are included in this package. Maximization process is carried out using the Brent’s algorithm, and the ‘C+ +’ code file is from John Burkardt and John Denker (Brent, R.,Algorithms for Minimization without Derivatives, Dover, 2002, ISBN 0-486-41998-3).
LCMCR Bayesian Nonparametric Latent-Class Capture-Recapture
Bayesian population size estimation using non parametric latent-class models.
lconnect Simple Tools to Compute Landscape Connectivity Metrics
Provides functions to upload vectorial data and derive landscape connectivity metrics in habitat or matrix systems. Additionally, includes an approach to assess individual patch contribution to the overall landscape connectivity, enabling the prioritization of habitat patches. The computation of landscape connectivity and patch importance are very useful in Landscape Ecology research. The metrics available are: number of components, number of links, size of the largest component, mean size of components, class coincidence probability, landscape coincidence probability, characteristic path length, expected cluster size, area-weighted flux and integral index of connectivity. Pascual-Hortal, L., and Saura, S. (2006) <doi:10.1007/s10980-006-0013-z> Urban, D., and Keitt, T. (2001) <doi:10.2307/2679983> Laita, A., Kotiaho, J., Monkkonen, M. (2011) <doi:10.1007/s10980-011-9620-4>.
lcopula Liouville Copulas
Collections of functions allowing random number generations and estimation of Liouville copulas.
lcpm Ordinal Outcomes: Generalized Linear Models with the Log Link
An implementation of the Log Cumulative Probability Model (LCPM) and Proportional Probability Model (PPM) for which the Maximum Likelihood Estimates are determined using constrained optimization. This implementation accounts for the implicit constraints on the parameter space. Other features such as standard errors, z tests and p-values use standard methods adapted from the results based on constrained optimization.
lcyanalysis Stock Data Analysis Functions
Analysis of stock data ups and downs trend, the stock technical analysis indicators function have trend line, reversal pattern and market trend.
ldamatch Multivariate Condition Matching by Backwards Elimination Using Linear Discriminant Analysis
Performs group matching by backward elimination using linear discriminant analysis.
ldat Large Data Sets
Tools for working with vectors and data sets that are too large to keep in memory. Extends the basic functionality provided in the ‘lvec’ package. Provides basis statistical functionality of ‘lvec’ objects, such as arithmetic operations and calculating means and sums. Also implements ‘data.frame’-like objects storing its data in ‘lvec’ objects.
ldatuning Tuning of the LDA Models Parameters
For this first version only metrics to estimate the best fitting number of topics are implemented.
LDAvis Interactive Visualization of Topic Models
Tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. The goal is to help users interpret the topics in their LDA topic model.
GitXiv
ldbod Local Density-Based Outlier Detection
Flexible procedures to compute local density-based outlier scores for ranking outliers. Both exact and approximate nearest neighbor search can be implemented, while also accommodating multiple k values and four different local density-based methods. It allows for referencing a random subsample of input data or a user specified reference data set to compute outlier scores against, so both unsupervised and semi-supervised outlier detection can be implemented.
ldhmm Hidden Markov Model for Return Time-Series Based on Lambda Distribution
Hidden Markov Model (HMM) based on symmetric lambda distribution framework is implemented for the study of return time-series in the financial market. Major features in the S&P500 index, such as regime identification, volatility clustering, and anti-correlation between return and volatility, can be extracted from HMM cleanly. Univariate symmetric lambda distribution is essentially a location-scale family of power-exponential distribution. Such distribution is suitable for describing highly leptokurtic time series obtained from the financial market. It provides a theoretically solid foundation to explore such data where the normal distribution is not adequate. The HMM implementation follows closely the book: ‘Hidden Markov Models for Time Series’, by Zucchini, MacDonald, Langrock (2016).
LDPD Probability of Default Calibration
Implementation of most popular approaches to PD (probability of default) calibration: Quasi Moment Matching algorithm (D. Tasche), algorithm proposed by M. van der Burgt, K. Pluto and D. Tasche’s most prudent estimation methodology.
LDRTools Tools for Linear Dimension Reduction
Linear dimension reduction and the corresponding subspaces can be uniquely defined using orthogonal projection matrices. This package provides tools to compute distances between such subspaces and to compute the average subspace.
ldstatsHD Linear Dependence Statistics for High-Dimensional Data
Statistical methods related to the estimation and testing of multiple correlation, partial correlation and regression coefficient matrices when data is high-dimensional.
leabRa The Artificial Neural Networks Algorithm Leabra
The algorithm Leabra (local error driven and associative biologically realistic algorithm) allows for the construction of artificial neural networks that are biologically realistic and balance supervised and unsupervised learning within a single framework. This package is based on the ‘MATLAB’ version by Sergio Verduzco-Flores, which in turn was based on the description of the algorithm by Randall O’Reilly (1996) <ftp://grey.colorado.edu/pub/oreilly/thesis/oreilly_thesis.all.pdf>. For more general (not ‘R’ specific) information on the algorithm Leabra see <https://…/Leabra>.
leaderCluster Leader Clustering Algorithm
The leader clustering algorithm provides a means for clustering a set of data points. Unlike many other clustering algorithms it does not require the user to specify the number of clusters, but instead requires the approximate radius of a cluster as its primary tuning parameter. The package provides a fast implementation of this algorithm in n-dimensions using Lp-distances (with special cases for p=1,2, and infinity) as well as for spatial data using the Haversine formula, which takes latitude/longitude pairs as inputs and clusters based on great circle distances.
leafem leaflet’ Extensions for ‘mapview’
Provides extensions for package ‘leaflet’, many of which are used by package ‘mapview’. Focus is on functionality readily available in Geographic Information Systems such as ‘Quantum GIS’. Includes functions to display coordinates of mouse pointer position, query image values via mouse pointer and zoom-to-layer buttons. Additionally, provides a feature type agnostic function to add points, lines, polygons to a map.
leaflet An R Interface to Leaflet Maps
Leaflet is an open-source JavaScript library for interactive maps. This R package makes it easy to create Leaflet maps from R.
leaflet.esri ESRI’ Bindings for the ‘leaflet’ Package
An add-on package to the ‘leaflet’ package, which provides bindings for ‘ESRI’ services. This package allows a user to add ‘ESRI’ provided services such as ‘MapService’, ‘ImageMapService’, ‘TiledMapService’ etc. to a ‘leaflet’ map.
leaflet.extras Extra Functionality for ‘leaflet’ Package
The ‘leaflet’ JavaScript library provides many plugins some of which are available in the core ‘leaflet’ package, but there are many more. It is not possible to support them all in the core ‘leaflet’ package. This package serves as an add-on to the ‘leaflet’ package by providing extra functionality via ‘leaflet’ plugins.
leaflet.minicharts Minicharts for Interactive Maps
Add and modify small charts on an interactive map created with package ‘leaflet’. These charts can be used to represent at same time multiple variables on a single map.
leaflet.opacity Opacity Controls for Leaflet Maps
Extends the ‘leaflet’ R package with the ‘Leaflet.OpacityControls’ JavaScript plugin. Adds controls to the leaflet map for adjusting the opacity of a layer.
leafletCN An R Gallery for China and Other Geojson Choropleth Map in Leaflet
An R gallery for China and other geojson choropleth map in leaflet. Contains the geojson data for provinces, cities in China.
leafpm Leaflet Map Plugin for Drawing and Editing
A collection of tools for interactive manipulation of (spatial) data layers on leaflet web maps. Tools include editing of existing layers, creation of new layers through drawing of shapes (points, lines, polygons), deletion of shapes as well as cutting holes into existing shapes. Provides control over options to e.g. prevent self-intersection of polygons and lines or to enable/disable snapping to align shapes.
leafpop Include Tables, Images and Graphs in Leaflet Pop-Ups
Creates ‘HTML’ strings to embed tables, images or graphs in pop-ups of interactive maps created with packages like ‘leaflet’ or ‘mapview’. Handles local images located on the file system or via remote URL. Handles graphs created with ‘lattice’ or ‘ggplot2’ as well as interactive plots created with ‘htmlwidgets’.
leafSTAR Silhouette to Area Ratio of Tilted Surfaces
Implementation of trigonometric functions to calculate the exposure of flat, tilted surfaces, such as leaves and slopes, to direct solar radiation. It implements the equations in A.G. Escribano-Rocafort, A. Ventre-Lespiaucq, C. Granado-Yela, et al. (2014) <doi:10.1111/2041-210X.12141> in a few user-friendly R functions. All functions handle data obtained with ‘Ahmes’ 1.0 for Android, as well as more traditional data sources (compass, protractor, inclinometer). The main function (star()) calculates the potential exposure of flat, tilted surfaces to direct solar radiation (silhouette to area ratio, STAR). It is equivalent to the ratio of the leaf projected area to total leaf area, but instead of using area data it uses spatial position angles, such as pitch, roll and course, and information on the geographical coordinates, hour, and date. The package includes additional functions to recalculate STAR with custom settings of location and time, to calculate the tilt angle of a surface, and the minimum angle between two non-orthogonal planes.
leafsync Small Multiples for Leaflet Web Maps
Create small multiples of several leaflet web maps with (optional) synchronised panning and zooming control. When syncing is enabled all maps respond to mouse actions on one map. This allows side-by-side comparisons of different attributes of the same geometries. Syncing can be adjusted so that any combination of maps can be synchronised.
leanpubr Leanpub’ API Interface
Provides access to the ‘Leanpub’ API <https://…/api> for gathering information about publications and submissions to the ‘Leanpub’ platform.
LeArEst Border and Area Estimation of Data Measured with Additive Error
Provides methods for estimating borders of uniform distribution on the interval (one-dimensional) and on the elliptical domain (two-dimensional) under measurement errors. For one-dimensional case, it also estimates the length of underlying uniform domain and tests the hypothesized length against two-sided or one-sided alternatives. For two-dimensional case, it estimates the area of underlying uniform domain. It works with numerical inputs as well as with pictures in JPG format.
learningCurve An Implementation of Crawford’s and Wright’s Learning Curve Production Functions
An implementation of Crawford’s and Wright’s learning curve production functions. It provides unit and cumulative block estimates for time (or cost) of units along with an aggregate learning curve. It also provides delta and error functions and some basic learning curve plotting functions.
LearningRlab Statistical Learning Functions
Aids in learning statistical functions incorporating the result of calculus done with each function and how they are obtained, that is, which equations and variables are used. Also for all these equations and their related variables detailed explanations and interactive exercises are also included. All these characteristics allow to the package user to improve the learning of statistics basics by means of their use.
learNN Examples of Neural Networks
Implementations of several basic neural network concepts in R, as based on posts on http://qua.st .
learnr Interactive Tutorials for R
Create interactive tutorials using R Markdown. Use a combination of narrative, figures, videos, exercises, and quizzes to create self-paced tutorials for learning about R and R packages.
learnstats An Interactive Environment for Learning Statistics
Allows students with limited programming experience to use R as an interactive educational environment for statistical concepts, ranging from p-values to confidence intervals to stability in time series.
ledger Utilities for Importing Data from Plaintext Accounting Files
Utilities for querying plain text accounting files from ‘Ledger’, ‘HLedger’, and ‘Beancount’.
legocolors Official Lego Color Palettes
Provides a dataset containing several color naming conventions established by multiple sources, along with associated color metadata. The package also provides related helper functions for mapping among the different Lego color naming conventions and between Lego colors, hex colors, and ‘R’ color names. The functions include nearest color matching based on Euclidean distance in RGB space. Naming conventions for color mapping include those from ‘BrickLink’ (<https://www.bricklink.com> ), ‘The Lego Group’ (<https://www.lego.com> ), ‘LDraw’ (<https://…/> ), and ‘Peeron’ (<http://…/> ).
lemon Freshing Up your ‘ggplot2’ Plots
Functions for working with legends and axis lines of ‘ggplot2’, facets that repeat axis lines on all panels, and some ‘knitr’ extensions.
lenses Elegant Data Manipulation with Lenses
Provides tools for creating and using lenses to simplify data manipulation. Lenses are composable getter/setter pairs for working with data in a purely functional way. Inspired by the ‘Haskell’ library ‘lens’ (Kmett, 2012) <https://…/lens>. For a fairly comprehensive (and highly technical) history of lenses please see the ‘lens’ wiki <https://…/History-of-Lenses>.
lest Vectorised Nested if-else Statements Similar to CASE WHEN in ‘SQL’
Functions for vectorised conditional recoding of variables. case_when() enables you to vectorise multiple if and else statements (like ‘CASE WHEN’ in ‘SQL’). if_else() is a stricter and more predictable version of ifelse() in ‘base’ that preserves attributes. These functions are forked from ‘dplyr’ with all package dependencies removed and behave identically to the originals.
lettercase Utilities for Formatting Strings with Consistent Capitalization, Word Breaks and White Space
Utilities for formatting strings and character vectors to for capitalization, word break and white space. Supported formats are: snake_case, spine-case, camelCase, PascalCase, Title Case, UPPERCASE, lowercase, Sentence case or combinations thereof. ‘lettercase’ strives to provide a simple, consistent, intuitive and high performing interface.
lexicon Lexicons
A collection of lexical hash tables, dictionaries, and word lists.
lexRankr Extractive Summarization of Text with the LexRank Algorithm
An R implementation of the LexRank algorithm described by G. Erkan and D. R. Radev (2004) <DOI:10.1613/jair.1523>.
lfda Local Fisher Discriminant Analysis
Functions for performing and visualizing Local Fisher Discriminant Analysis(LFDA) and Kernel Fisher Discriminant Analysis(KLFDA).
LFDR.MLE Estimation of the Local False Discovery Rates by Type II Maximum Likelihood Estimation
Suite of R functions for the estimation of the local false discovery rate (LFDR) using Type II maximum likelihood estimation (MLE).
LFDREmpiricalBayes Estimating Local False Discovery Rates Using Empirical Bayes Methods
New empirical Bayes methods aiming at analyzing the association of single nucleotide polymorphisms (SNPs) to some particular disease are implemented in this package. The package uses local false discovery rate (LFDR) estimates of SNPs within a sample population defined as a ‘reference class’ and discovers if SNPs are associated with the corresponding disease. Although SNPs are used throughout this document, other biological data such as protein data and other gene data can be used. Karimnezhad, Ali and Bickel, D. R. (2016) <http://…/34889>.
lfl Linguistic Fuzzy Logic
Various algorithms related to linguistic fuzzy logic: mining for linguistic fuzzy association rules, performing perception-based logical deduction (PbLD), and forecasting time-series using fuzzy rule-based ensemble (FRBE).
lg Locally Gaussian Distributions: Estimation and Methods
An implementation of locally Gaussian distributions. It provides methods for implementing the locally Gaussian density estimator (LGDE) by Otneim and Tjøstheim (2017a) <doi:10.1007/s11222-016-9706-6>, as well as the corresponding estimator for conditional density functions by Otneim and Tjøstheim (2017b) <doi:10.1007/s11222-017-9732-z>.
lgr A Fully Featured Logging Framework
A flexible, feature-rich yet light-weight logging framework based on ‘R6’ classes. It supports hierarchical loggers, custom log levels, arbitrary data fields in log events, logging to plaintext, ‘JSON’, memory buffers, and databases, as well as email and push notifications. For a full list of features with examples please refer to the package vignette.
libcoin Linear Test Statistics for Permutation Inference
Basic infrastructure for linear test statistics and permutation inference in the framework of Strasser and Weber (1999) <http://…/>. This package must not be used by end-users. CRAN package ‘coin’ implements all user interfaces and is ready to be used by anyone.
LiblineaR.ACF Linear Classification with Online Adaptation of Coordinate Frequencies
Solving the linear SVM problem with coordinate descent is very efficient and is implemented in one of the most often used packages, ‘LIBLINEAR’ (available at http://…/liblinear ). It has been shown that the uniform selection of coordinates can be accelerated by using an online adaptation of coordinate frequencies (ACF). This package implements ACF and is based on ‘LIBLINEAR’ as well as the ‘LiblineaR’ package (<https://…/package=LiblineaR> ). It currently supports L2-regularized L1-loss as well as L2-loss linear SVM. Similar to ‘LIBLINEAR’ multi-class classification (one-vs-the rest, and Crammer & Singer method) and cross validation for model selection is supported. The training of the models based on ACF is much faster than standard ‘LIBLINEAR’ on many problems.
Libra Linearized Bregman Algorithms for Generalized Linear Models
Efficient procedures for fitting the lasso regularization path for linear regression, logistic and multinomial regression. The package uses Linearized Bregman Algorithm to solve the regularization path through iterations.
librarian Install, Update, Load Packages from CRAN and ‘GitHub’ in One Step
Automatically install, update, and load ‘CRAN’ and ‘GitHub’ packages in a single function call. By accepting bare unquoted names for packages, it’s easy to add or remove packages from the list.
librarysnapshot Library Snapshot for Packages and Dependencies in Use by Current Session
Generate a local library copy with relevant packages. All packages currently found within the search path – except base packages – will be copied to the directory provided and can be used later on with the .libPaths() function.
libstableR Fast and Accurate Evaluation, Random Number Generation and Parameter Estimation of Skew Stable Distributions
Tools for fast and accurate evaluation of skew stable distributions (CDF, PDF and quantile functions), random number generation and parameter estimation.
lift Compute the Top Decile Lift and Plot the Lift Curve
Compute the top decile lift and plot the lift curve. Cumulative lift curves are also supported.
liftLRD Wavelet Lifting Estimators of the Hurst Exponent for Regularly and Irregularly Sampled Time Series
Implementations of Hurst exponent estimators based on the relationship between wavelet lifting scales and wavelet energy.
liftr Dockerize R Markdown Documents
Dockerize R Markdown documents with support for Rabix (Portable Bioinformatics Pipelines).
likelihoodAsy Functions for Likelihood Asymptotics
Functions for computing the r and r* statistics for inference on an arbitrary scalar function of model parameters, plus some code for the (modified) profile likelihood.
likelihoodExplore Likelihood Exploration
Provides likelihood functions as defined by Fisher (1922) <doi:10.1098/rsta.1922.0009> and a function that creates likelihood functions from density functions. The functions are meant to aid in education of likelihood based methods.
likert Functions to analyze and visualize likert type items
Functions to analyze and visualize likert type items
lilikoi Metabolomics Personalized Pathway Analysis Tool
Computes the pathway deregulation score for a given set of metabolites, selects the pathways with the highest mutual information and then uses them to build a classifier. F. Alakwaa, S. Huang, and L. Garmire (2018) <doi:10.1101/283408>.
LilRhino For Implementation of Feed Reduction, Learning Examples and Code Management
This is for code management functions, a Monty Hall simulator, and for implementing my own variable reduction technique called Feed Reduction <http://…/Redditbot_Paper.pdf>. The Feed Reduction technique is not yet published, but is merely a tool for implementing a series of binary neural networks meant for reducing data into N dimensions, where N is the number of possible values of the response variable.
lime Local Interpretable Model-Agnostic Explanations
When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. ‘lime’ (a port of the ‘lime’ ‘Python’ package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <arXiv:1602.04938>.
lin.eval Perform Polynomial Evaluation of Linearity
Evaluates whether the relationship between two vectors is linear or nonlinear. Performs a test to determine how well a linear model fits the data compared to higher order polynomial models. Jhang et al. (2004) <doi:10.1043/1543-2165(2004)128%3C44:EOLITC%3E2.0.CO;2>.
lindia Automated Linear Regression Diagnostic
Provides a set of streamlined functions that allow easy generation of linear regression diagnostic plots necessarily for checking linear model assumptions. This package is meant for easy scheming of linear regression diagnostics, while preserving merits of ‘The Grammar of Graphics’ as implemented in ‘ggplot2’. See the ‘ggplot2’ website for more information regarding the specific capability of graphics.
LindleyPowerSeries Lindley Power Series Distribution
Computes the probability density function, the cumulative distribution function, the hazard rate function, the quantile function and random generation for Lindley Power Series distributions, see Nadarajah and Si (2018) <doi:10.1007/s13171-018-0150-x>.
LindleyR The Lindley Distribution and Its Modifications
Implements the probability density function, quantile function, cumulative distribution function, random number generation and the hazard rate function for the continuous one-parameter Lindley distribution as well as for 15 of its modifications. Also is it possible to draw censored random samples, with a desired censoring rate, when the event times are any continuous lifetime distribution supported by R.
linear.tools Manipulate Formulas and Evaluate Marginal Effects
Provides tools to manipulate formulas, such as getting x, y or contrasts from the model/formula, and functions to evaluate and check the marginal effects of a linear model.
linearQ Linear Algorithm for Simulating Quantiles in Multiscale Change-Point Segmentation Problem
It is a linear algorithm to simulate quantiles of multiscale statistics under hull hypothesis for multiscale change-point segmentation. The reference is in preparation.
LinearRegressionMDE Minimum Distance Estimation in Linear Regression Model
Consider linear regression model Y = Xb + error where the distribution function of errors is unknown, but errors are independent and symmetrically distributed. The package contains a function named LRMDE which takes Y and X as input and returns minimum distance estimator of parameter b in the model.
linemap Line Maps
Create maps made of lines. The package contains two functions: linemap() and getgrid(). linemap() displays a map made of lines using a data frame of gridded data. getgrid() transforms a set of polygons (sf objects) into a suitable data frame for linemap().
linERR Linear Excess Relative Risk Model
Fits a linear excess relative risk model by maximum likelihood, possibly including several variables and allowing for lagged exposures.
lingtypology Linguistic Typology and Mapping
Provides R with the Glottolog database <http://glottolog.org> and some more abilities for purposes of linguistic cartography. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <http://…/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub wiki <https://…/>.
link2GI Linking GIS, Remote Sensing and Other Command Line Tools
Functions to simplify the linking of open source GIS and remote sensing related command line interfaces.
LinkageMapView Plot Linkage Group Maps with Quantitative Trait Loci
Produces high resolution, publication ready linkage maps and quantitative trait loci maps. Input can be output from ‘R/qtl’, simple text or comma delimited files. Output is currently a portable document file.
LinkedGASP Linked Emulator of a Coupled System of Simulators
Prototypes for construction of a Gaussian Stochastic Process emulator (GASP) of a computer model. This is done within the objective Bayesian implementation of the GASP. The package allows for construction of a linked GASP of the composite computer model. Computational implementation follows the mathematical exposition given in publication: Ksenia N. Kyzyurova, James O. Berger, Robert L. Wolpert. Coupling computer models through linking their statistical emulators. SIAM/ASA Journal on Uncertainty Quantification, 6(3): 1151-1171, (2018).<DOI:10.1137/17M1157702>.
linkprediction Link Prediction Methods
Implementations of most of the existing proximity-based methods of link prediction in graphs. Among the 20 implemented methods are e.g.: Adamic L. and Adar E. (2003) <doi:10.1016/S0378-8733(03)00009-1>, Leicht E., Holme P., Newman M. (2006) <doi:10.1103/PhysRevE.73.026120>, Zhou T. and Zhang Y (2009) <doi:10.1140/epjb/e2009-00335-8>, and Fouss F., Pirotte A., Renders J., and Saerens M. (2007) <doi:10.1109/TKDE.2007.46>.
linkspotter Bivariate Correlations Calculation and Visualization
Compute and visualize using the ‘visNetwork’ package all the bivariate correlations of a dataframe. Several and different types of correlation coefficients (Pearson’s r, Spearman’s rho, Kendall’s tau, distance correlation, maximal information coefficient and equal-freq discretization-based maximal normalized mutual information) are used according to the variable couple type (quantitative vs categorical, quantitative vs quantitative, categorical vs categorical).
linl linl’ is not ‘Letter’
A ‘LaTeX’ Letter class for ‘rmarkdown’, using the ‘pandoc-letter’ template adapted for use with ‘markdown’.
lintools Manipulation of Linear Systems of (in)Equalities
Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, removing spurious columns and rows and collapse implied equalities.
liqueueR Implements Queue, PriorityQueue and Stack Classes
Provides three classes: Queue, PriorityQueue and Stack. Queue is just a ‘plain vanilla’ FIFO queue; PriorityQueue orders items according to priority. Stack implements LIFO.
liquidSVM A Fast and Versatile SVM Package
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non- parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, inclusion of a variety of different classification and regression scenarios, and full flexibility for experts.
listdtr List-Based Rules for Dynamic Treatment Regimes
Construction of list-based rules, i.e. a list of if-then clauses, to estimate the optimal dynamic treatment regime.
listenv Environments Behaving (Almost) as Lists
List Environments are environments that can be indexed similarly to lists, e.g. ‘x <- listenv(); x[[2]] <- ‘b’; names(x)[2] <- ‘B’; print(x$B)’.
listless Convert Lists to Tidy Data Frames
A lightweight utility for converting lists to tidy data frames.
listviewer R htmlwidget to view lists
A package of R htmlwidgets to interactively view and maybe modify lists . As of now, listviewer provides just one interface to jsoneditor . listviewer is designed though to support multiple interfaces.
listWithDefaults List with Defaults
Provides a function that, as an alternative to base::list, allows default values to be inherited from another list.
liteq Lightweight Portable Message Queue Using ‘SQLite’
Temporary and permanent message queues for R. Built on top of ‘SQLite’ databases. ‘SQLite’ provides locking, and makes it possible to detect crashed consumers. Crashed jobs can be automatically marked as ‘failed’, or put in the queue again, potentially a limited number of times.
littler R at the Command-Line via ‘r’
A scripting and command-line front-end is provided by ‘r’ (aka ‘littler’) as a lightweight binary wrapper around the GNU R language and environment for statistical computing and graphics. While R can be used in batch mode, the r binary adds full support for both ‘shebang’-style scripting (i.e. using a hash-mark-exclamation-path expression as the first line in scripts) as well as command-line use in standard Unix pipelines. In other words, r provides the R language without the environment.
liureg Liu Regression with Liu Biasing Parameters and Statistics
Linear Liu regression coefficient’s estimation and testing with different Liu related measures such as MSE, R-squared etc.
live Local Interpretable (Model-Agnostic) Visual Explanations
Interpretability of complex machine learning models is a growing concern. This package helps to understand key factors that drive the decision made by complicated predictive model (so called black box model). This is achieved through local approximations that are either based on additive regression like model or CART like model that allows for higher interactions. The methodology is based on Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>.
livechatR R Wrapper for LiveChat REST API
Provides a wrapper around LiveChat’s API. The R functions allow for one to extract chat sessions, raw text of chats between agents and customers and events.
ljr Logistic Joinpoint Regression
Fits and tests logistic joinpoint models.
llbayesireg The L-Logistic Bayesian Regression
R functions and data sets for the work Paz, R.F., Balakrishnan, N and Bazán, J.L. (2018). L-logistic regression models: Prior sensitivity analysis, robustness to outliers and applications. Brazilian Journal of Probability and Statistics, <https://…/BJPS397.pdf>.
LLM Logit Leaf Model Classifier for Binary Classification
Fits the Logit Leaf Model, makes predictions and visualizes the output. (De Caigny et al., (2018) <DOI:10.1016/j.ejor.2018.02.009>).
lmem.qtler Linear Mixed Effects Models for QTL Mapping for Multienvironment and Multitrait Analysis
Performs QTL mapping analysis for balanced and for multi-environment and multi-trait analysis using mixed models. Balanced population, single trait, single environment QTL mapping is performed through marker-regression (Haley and Knott (1992) <DOI:10.1038/hdy.1992.131>, Martinez and Curnow (1992) <DOI:10.1007/BF00222330>, while multi-environment and multi-trait QTL mapping is performed through linear mixed models. These functions could use any of the following populations: double haploid, F2, recombinant inbred lines, back-cross, and 4-way crosses. Performs a Single Marker Analysis, a Single Interval Mapping, or a Composite Interval Mapping analysis, and then constructs a final model with all of the relevant QTL.
lmenssp Linear Mixed Effects Models with Non-stationary Stochastic Processes
Fit, filter and smooth mixed models with non-stationary processes
lmeresampler Bootstrap Methods for Nested Linear Mixed-Effects Models
Bootstrap routines for nested linear mixed effects models fit using either ‘lme4’ or ‘nlme’. The provided ‘bootstrap()’ function implements the parametric, semi-parametric (i.e., CGR), residual, cases, and random effect block (REB) bootstrap procedures.
lmerTest Tests in Linear Mixed Effects Models
Different kinds of tests for linear mixed effects models as implemented in ‘lme4’ package are provided. The tests comprise types I – III F tests for fixed effects, LR tests for random effects. The package also provides the calculation of population means for fixed factors with confidence intervals and corresponding plots. Finally the backward elimination of non-significant effects is implemented.
lmeVarComp Testing for a Subset of Variance Components in Linear Mixed Models
Test zero variance components in linear mixed models and test additivity in nonparametric regression using the restricted likelihood ratio test and the generalized F-test. Details can be found at Zhang et al (2016) <doi:10.1002/cjs.11295>.
LMfilteR Filter Methods for Parameter Estimation in Linear Regression Models
We present a method based on filtering algorithms to estimate the parameters of linear regressions, i.e. the coefficients and the variance of the error term. The proposed algorithm makes use of Particle Filters following Ristic, B., Arulampalam, S., Gordon, N. (2004, ISBN: 158053631X) resampling methods.
lmmen Linear Mixed Model Elastic Net
Fits (Gaussian) linear mixed-effects models for high-dimensional data (n<<p) using the linear mixed model elastic-net penalty.
lmmpar Parallel Linear Mixed Model
Embarrassingly Parallel Linear Mixed Model calculations spread across local cores which repeat until convergence.
lmomPi (Precipitation) Frequency Analysis and Variability with L-Moments from ‘lmom’
It is an extension of ‘lmom’ R package: ‘pel’,’cdf’,qua’ function families are lumped and called from one function per each family respectively in order to create robust automatic tools to fit data with different probability distributions and then to estimate probability values and return periods. The implemented functions are able to manage time series with constant and/or missing values without stopping the execution with error messages. The package also contains tools to calculate several indices based on variability (e.g. ‘SPI’ , Standardized Precipitation Index, see <https://…/standardized-precipitation-index-spi> and <http://…/> ) for multiple time series or spatio-temporal gridded values.
lmPerm Permutation Tests for Linear Models
Linear model functions using permutation tests.
lmreg Data and Functions Used in Linear Models and Regression with R: An Integrated Approach
Data files and a few functions used in the book ‘Linear Models and Regression with R: An Integrated Approach’ by Debasis Sengupta and Sreenivas Rao Jammalamadaka (2019).
lmridge Linear Ridge Regression with Ridge Penalty and Ridge Statistics
Linear ridge regression coefficient’s estimation and testing with different ridge related measures such as MSE, R-squared etc.
lmSubsets Exact Variable-Subset Selection in Linear Regression
Exact and approximation algorithms for variable-subset selection in ordinary linear regression models.
lmvar Linear Regression with Non-Constant Variances
Runs a linear regression in which both the expected value and the variance can vary per observation. The expected values mu follows the standard linear model mu = X_mu * beta_mu. The standard deviation sigma follows the model log(sigma) = X_sigma * beta_sigma. The package comes with two vignettes: ‘Intro’ gives an introduction, ‘Math’ gives mathematical details.
lmviz A Package to Visualize Linear Models Features and Play with Them
Contains three shiny applications. Two are meant to explore linear model inference feature through simulation. The third is a game to learn interpreting diagnostic plots.
LN0SCIs Simultaneous CIs for Ratios of Means of Log-Normal Populations with Zeros
Construct the simultaneous confidence intervals for ratios of means of Log-normal populations with zeros. It also has a Python module that do the same thing, and can be applied to multiple comparisons of parameters of any k mixture distributions. And we provide four methods, the method based on generalized pivotal quantity with order statistics and the quantity based on Wilson by Li et al. (2009) <doi:10.1016/j.spl.2009.03.004> (GPQW), and the methods based on generalized pivotal quantity with order statistics and the quantity based on Hannig (2009) <doi:10.1093/biomet/asp050> (GPQH). The other two methods are based on two-step MOVER intervals by Amany H, Abdel K (2015) <doi:10.1080/03610918.2013.767911>. We deduce Fiducial generalized pivotal two-step MOVER intervals based on Wilson quantity (FMW) and based on Hannig’s quantity (FMWH). All these approach you can find in the paper of us which it has been submitted.
LncMod Predicting Modulator and Functional/Survival Analysis
Predict modulators regulating the ability of effectors to regulate their targets and produce modulator-effector-target triplets followed by goterm functional enrichment and survival analysis.This is mainly applied to long non-coding RNAs (lncRNAs) as candidate modulators regulating the ability of transcription factors (TFs) to regulate their corresponding targets.
LNIRT LogNormal Response Time Item Response Theory Models
Allows the simultaneous analysis of responses and response times in an Item Response Theory (IRT) modelling framework. Parameter estimation is done with a MCMC algorithm. LNIRT replaces the package CIRT, which was written by Rinke Klein Entink. For reference, see the paper by Fox, Klein Entink and Van der Linden (2007), ‘Modeling of Responses and Response Times with the Package cirt’, Journal of Statistical Software, <doi:10.18637/jss.v020.i07>.
loa Various Options and Add-ins for Lattice
This package, Lattice Options and Add-ins (or loa), contains various plots and functions that make use of the lattice/trellis plotting framework. The plots (which include loaPlot, GoogleMap and trianglePlot) use panelPal, a function that extends lattice and hexbin package methods to automate plot subscripting and panel-to-panel and panel-to key synchronization/management. See ?loa for further details.
loadr Cleaner Workspaces with Shared Variable Environments
Provides intuitive functions for loading objects into environments, encouraging less cluttered workspaces and sharing variables with large or reusable data across users and sessions. The user provides named variables which are loaded into the variable environment for later retrieval.
lobstr Visualize R Data Structures with Trees
A set of tools for inspecting and understanding R data structures inspired by str(). Includes ast() for visualizing abstract syntax trees, ref() for showing shared references, cst() for showing call stack trees, and obj_size() for computing object sizes.
LocalControl Local Control: An R Package for Generating High Quality Comparative Effectiveness Evidence
Implements novel nonparametric approaches to address biases and confounding when comparing treatments or exposures in observational studies of outcomes. While designed and appropriate for use in studies involving medicine and the life sciences, the package can be used in other situations involving outcomes with multiple confounders. The package implements a family of methods for nonparametric bias correction when comparing treatments in cross-sectional, case-control, and survival analysis settings, including competing risks with censoring. The approach extends to bias-corrected personalized predictions of treatment outcome differences, and analysis of heterogeneity of treatment effect-sizes across patient subgroups.
localICE Local Individual Conditional Expectation
Local Individual Conditional Expectation is as an extension to Individual Conditional Expectation (ICE) and provides three-dimensional local explanations for particular data instances. The three dimension are two features at the horizontal and vertical axes as well as the target that is represented by different colors. The approach is applicable for classification and regression problems to explain interactions of two features towards the target. The plot for discrete targets looks similar to plots of cluster algorithms like k-means, where different clusters represent different predictions. Reference to the ICE approach: Alex Goldstein, Adam Kapelner, Justin Bleich, Emil Pitkin (2013) <arXiv:1309.6392>.
localIV Estimation of Marginal Treatment Effects using Local Instrumental Variables
In the generalized Roy model, the marginal treatment effect (MTE) can be used as a building block for constructing conventional causal parameters such as the average treatment effect (ATE) and the average treatment effect on the treated (ATT) (Heckman, Urzua, and Vytlacil 2006 <doi:10.1162/rest.88.3.389>). Given a treatment selection model and an outcome model, the function mte() estimates the MTE via local instrumental variables (or via a normal selection model) and also the projection of MTE onto the 2-dimensional space of the propensity score and a latent variable representing unobserved resistance to treatment (Zhou and Xie 2018 <https://…/zhou-xie_mte2.pdf> ). The object returned by mte() can be used to estimate conventional parameters such as ATE and ATT (via average()) or marginal policy-relevant treatment effects (via mprte()).
localModel LIME-Based Explanations with Interpretable Inputs Based on Ceteris Paribus Profiles
Local explanations of machine learning models describe, how features contributed to a single prediction. This package implements an explanation method based on LIME (Local Interpretable Model-agnostic Explanations, see Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>) in which interpretable inputs are created based on local rather than global behaviour of each original feature.
locfdr Computes Local False Discovery Rates
Computation of local false discovery rates.
LocFDRPois Functions for Performing Local FDR Estimation when Null and Alternative are Poisson
The main idea of the Local FDR algorithm is to estimate both proportion of null observations and the ratio of null and alternative densities. In the case that there are many null observations, this can be done reliably, through maximum likelihood or generalized linear models. This package implements this in the case that the null and alternative densities are Poisson.
loder Dependency-Free Access to PNG Image Files
Read and write access to PNG image files using the LodePNG library. The package has no external dependencies.
loe Local Ordinal Embedding
Local Ordinal embedding (LOE) is one of graph embedding methods for unweighted graphs.
LOGAN Log File Analysis in International Large-Scale Assessments
Enables users to handle the dataset cleaning for conducting specific analyses with the log files from two international educational assessments: the Programme for International Student Assessment (PISA, <http://…/> ) and the Programme for the International Assessment of Adult Competencies (PIAAC, <http://…/> ). An illustration of the analyses can be found on the LOGAN Shiny app (<https://…/> ) on your browser.
logger A Lightweight, Modern and Flexible Logging Utility
Inspired by the the ‘futile.logger’ R package and ‘logging’ Python module, this utility provides a flexible and extensible way of formatting and delivering log messages with low overhead.
loggit Effortless Exception Logging
A very simple and easy-to-use set of suspiciously-familiar functions. ‘loggit’ provides a set of wrappings for base R’s message(), warning(), and stop() functions that maintain identical functionality, but also log the handler message to a ‘JSON’ log file. While mostly automatic, powerful custom logging is available via these handlers’ logging function, loggit(), which is also exported for use. No change in existing code is necessary to use this package.
loggle Local Group Graphical Lasso Estimation
Provides a set of methods that learn time-varying graphical models based on data measured over a temporal grid. The underlying statistical model is motivated by the needs to describe and understand evolving interacting relationships among a set of random variables in many real applications, for instance the study of how stocks interact with each other and how such interactions change over time. The time-varying graphical models are estimated under the assumption that the graph topology changes gradually over time. For more details on estimating time-varying graphical models, please refer to: Yang, J. & Peng, J. (2018) <arXiv:1804.03811>.
logiBin Binning Variables to Use in Logistic Regression
Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006) <doi:10.1198/106186006X133933>.
logihist Combined Graphs for Logistic Regression
Provides histograms, boxplots and dotplots as alternatives to scatterplots of data when plotting fitted logistic regressions.
logistic4p Logistic Regression with Misclassification in Dependent Variables
Error in a binary dependent variable, also known as misclassification, has not drawn much attention in psychology. Ignoring misclassification in logistic regression can result in misleading parameter estimates and statistical inference. This package conducts logistic regression analysis with misspecification in outcome variables.
logisticPCA Binary Dimensionality Reduction
Dimensionality reduction techniques for binary data including logistic PCA.
logisticRR Adjusted Relative Risk from Logistic Regression
Adjusted odds ratio conditional on potential confounders can be directly obtained from logistic regression. However, those adjusted odds ratios have been widely incorrectly interpreted as a relative risk. As relative risk is often of interest in public health, we provide a simple code to return adjusted relative risks from logistic regression model under potential confounders.
logKDE Computing Log-Transformed Kernel Density Estimates for Positive Data
Computes log-transformed kernel density estimates for positive data using a variety of kernels. It follows the methods described in Jones, Nguyen and McLachlan (2018) <arXiv:1804.08365>.
lognorm Functions for the Lognormal Distribution
The lognormal distribution (Limpert al. (2001) <doi:10.1641/0006-3568(2001)051[0341:lndats]2.0.co;2>) can characterize uncertainty that is bounded by zero. This package provides estimation of distribution parameters, computation of moments and other basic statistics, and an approximation of the distribution of the sum of several correlated lognormally distributed variables (Lo 2013 <doi:10.12988/ams.2013.39511>).
logNormReg log Normal Linear Regression
Functions to fits simple linear regression models with log normal errors and identity link (taking the responses on the original scale). See Muggeo (2018) <doi:10.13140/RG.2.2.18118.16965>.
logOfGamma Natural Logarithms of the Gamma Function for Large Values
Uses approximations to compute the natural logarithm of the Gamma function for large values.
lolog Latent Order Logistic Graph Models
Estimation of Latent Order Logistic (LOLOG) Models for Networks. LOLOGs are a flexible and fully general class of statistical graph models. This package provides functions for performing MOM, GMM and variational inference. Visual diagnostics and goodness of fit metrics are provided. See Fellows (2018) <arXiv:1804.04583> for a detailed description of the methods.
longclust Model-Based Clustering and Classification for Longitudinal Data
Clustering or classification of longitudinal data based on a mixture of multivariate t or Gaussian distributions with a Cholesky-decomposed covariance structure.
longpower Sample Size Calculations for Longitudinal Data
Compute power and sample size for linear models of longitudinal data. Supported models include mixed-effects models and models fit by generalized least squares and generalized estimating equations. Relevant formulas are derived by Liu and Liang (1997) <DOI:10.2307/2533554>, Diggle et al (2002) <ISBN:9780199676750>, and Lu, Luo, and Chen (2008) <DOI:10.2202/1557-4679.1098>.
longROC Time-Dependent Prognostic Accuracy with Multiply Evaluated Bio Markers or Scores
Time-dependent Receiver Operating Characteristic curves, Area Under the Curve, and Net Reclassification Indexes for repeated measures. It is based on methods in Barbati and Farcomeni (2017) <doi:10.1007/s10260-017-0410-2>.
longRPart2 Recursive Partitioning of Longitudinal Data
Performs recursive partitioning of linear and nonlinear mixed effects models, specifically for longitudinal data. The package is an extension of the original ‘longRPart’ package by Stewart and Abdolell (2013) <https://…/package=longRPart>.
longurl Expand Short URLs Using the ‘LongURL’ API
Interface to the ‘LongURL’ API to identify known URL shortener services and expand vectors of short URLs with optional error checking and URL validation. See <http://…/> for more information about ‘LongURL’.
loo Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models
We efficiently approximate leave-one-out cross-validation (LOO) using very good importance sampling (VGIS), a new procedure for regularizing importance weights. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors, and for the comparison of predictive errors between two models. We also compute the widely applicable information criterion (WAIC).
lookupTable Look-Up Tables using S4
Fits look-up tables by filling entries with the mean or median values of observations fall in partitions of the feature space. Partitions can be determined by user of the package using input argument feature.boundaries, and dimensions of the feature space can be any combination of continuous and categorical features provided by the data set. A Predict function directly fetches corresponding entry value, and a default value is defined as the mean or median of all available observations. The table and other components are represented using the S4 class lookupTable.
loon Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
loopr Uses an Archive to Amend Previous Stages of a Pipe using Current Output
Remedies a common problem in piping: not having access to intermediate outputs of the pipe. Within a ‘loop’, a piping intermediate is stored in a stack archive, data is processed, and then both the stored intermediate and the current output are reintegrated using an ‘ending’ function. Two special ending functions are provided: amend and insert. However, any ending function can be specified, including merge functions, join functions, setNames(), etc. This framework allows the following workflow: focus on a particular aspect or section of a dataset, conduct specific operations, and then reintegrate changes into the whole.
loose.rock Set of Functions to Use in Survival Analysis and in Data Science
Collection of functions to improve work-flow in survival analysis and data science. The package features include: the generation of balanced datasets, live retrieval of protein coding genes from two public databases, generation of random matrix based on covariance matrix, cache function to store function results. This work was supported by two grants from the Portuguese Foundation for Science and technology, and the EU Commission under SOUND project.
lori Low-Rank Interaction Contingency Tables
Estimation and visualization of low-rank interaction contingency tables. Low-rank interaction is a two-fold extension of the log-bilinear model. First, instead of a fixed rank constraint, the log-likelihood is penalized by the nuclear norm of the interaction matrix. Second, available row and column covariates can be incorporated in the model. The main function can be applied to contingency tables with missing values. G. Robin, J. Josse, E. Moulines, S. Sardy (2017) <arXiv:1703.02296>.
LotkasLaw Runs Lotka’s Law which is One of the Special Applications of Zipf’s Law
Running Lotka’s Law following Pao (1985)(DOI: 10.1016/0306-4573(85)90055-X). The Law is based around the proof that the number of authors making n contributions is about 1/n^{a} of those making one contribution.
lotri A Simple Way to Specify Symmetric, Block Diagonal Matrices
Provides a simple mechanism to specify a symmetric block diagonal matrices (often used for covariance matrices). This is based on the domain specific language implemented in ‘nlmixr’ but expanded to create matrices in R generally instead of specifying parts of matrices to estimate.
lowmemtkmeans Low Memory Use Trimmed K-Means
Performs the trimmed k-means clustering algorithm with lower memory use. It also provides a number of utility functions such as BIC calculations.
LowWAFOMNX Low WAFOM Niederreiter-Xing Sequence
R implementation of Low Walsh Figure of Merit (WAFOM) Sequence based on Niederreiter-Xing Sequence.
lpbrim LP-BRIM Bipartite Modularity
Optimization of bipartite modularity using LP-BRIM (Label propagation followed by Bipartite Recursively Induced Modularity).
lpdensity Local Polynomial Density Estimation and Inference
Without imposing stringent distributional assumptions or shape restrictions, nonparametric density estimation has been popular in economics and other social sciences for counterfactual analysis, program evaluation, and policy recommendations. This package implements a novel density estimator based on local polynomial regression, documented in Cattaneo, Jansson and Ma (2017): lpdensity() to construct local polynomial based density (and derivatives) estimator; lpbwdensity() to perform data-driven bandwidth selection; and lpdensity.plot() for density plot with robust confidence interval.
LPGraph Nonparametric Smoothing of Laplacian Graph Spectra
A nonparametric method to approximate Laplacian graph spectra of a network with ordered vertices. This provides a computationally efficient algorithm for obtaining an accurate and smooth estimate of the graph Laplacian basis. The approximation results can then be used for tasks like change point detection, k-sample testing, and so on. The primary reference is Mukhopadhyay, S. and Wang, K. (2018, Technical Report).
lpirfs Local Projections Impulse Response Functions
Contains functions to estimate linear and nonlinear impulse responses based on local projections by Jordà (2005) <doi:10.1257/0002828053828518>. Nonlinear impulse responses are estimated for two regimes based on a transition function as used in Auerbach and Gorodnichenko (2012) <doi:10.1257/pol.4.2.1>.
lplyr dplyr’ Verbs for Lists and Other Verbs for Data Frames
Provides ‘dplyr’ verbs for lists and other useful verbs for manipulation of data frames. In particular, it includes a mutate_which() function that mutates columns for a specific subset of rows defined by a condition, and fuse() which is a more flexible version of ‘tidyr’ unite() function.
LPower Calculates Power, Sample Size, or Detectable Effect for Longitudinal Analyses
Computes power, or sample size or the detectable difference for a repeated measures model with attrition. It requires the variance covariance matrix of the observations but can compute this matrix for several common random effects models. See Diggle, P, Liang, KY and Zeger, SL (1994, ISBN:9780198522843).
LPR Lasso and Partial Ridge
Contains a function called ‘LPR’ to estimate coefficients using Lasso and Partial Ridge method and to calculate confidence intervals through bootstrap.
LPWC Lag Penalized Weighted Correlation for Time Series Clustering
Computes a time series distance measure for clustering based on weighted correlation and introduction of lags. The lags capture delayed responses in a time series dataset. The timepoints must be specified. T. Chandereng, A. Gitter (2018) <doi:10.1101/292615>.
lqr Robust Linear Quantile Regression
It fits a robust linear quantile regression model using a new family of zero-quantile distributions for the error term. This family of distribution includes skewed versions of the Normal, Student’s t, Laplace, Slash and Contaminated Normal distribution. It provides estimates and full inference. It also provides envelopes plots for assessing the fit and confidences bands when several quantiles are provided simultaneously.
LRcontrast Dose Response Signal Detection under Model Uncertainty
Provides functions for calculating test statistics, simulating quantiles and simulating p-values of likelihood ratio contrast tests in regression models with a lack of identifiability.
lrequire Sources an R “Module” with Caching & Encapsulation, Returning Exported Vars
In the fashion of ‘node.js’ <https://…/>, requires a file, sourcing into the current environment only the variables explicitly specified in the module.exports or exports list variable. If the file was already sourced, the result of the earlier sourcing is returned to the caller.
lrgs Linear Regression by Gibbs Sampling
Implements a Gibbs sampler to do linear regression with multiple covariates, multiple responses, Gaussian measurement errors on covariates and responses, Gaussian intrinsic scatter, and a covariate prior distribution which is given by either a Gaussian mixture of specified size or a Dirichlet process with a Gaussian base distribution.
lsasim Simulate Large Scale Assessment Data
Provides functions to simulate data from large-scale educational assessments, including background questionnaire data and cognitive item responses that adhere to a multiple-matrix sampled design.
lsbclust Least-Squares Bilinear Clustering for Three-Way Data
Functions for performing least-squares bilinear clustering of three-way data. The method uses the bilinear decomposition (or biadditive model) to model two-way matrix slices while clustering over the third way. Up to four different types of clusters are included, one for each term of the bilinear decomposition. In this way, matrices are clustered simultaneously on (a subset of) their overall means, row margins, column margins and row-column interactions. The orthogonality of the bilinear model results in separability of the joint clustering problem into four separate ones. Three of these subproblems are specific k-means problems, while a special algorithm is implemented for the interactions. Plotting methods are provided, including biplots for the low-rank approximations of the interactions.
lsbs Bandwidth Selection for Level Sets and HDR Estimation
Bandwidth selection for kernel density estimators of 2-d level sets and highest density regions. It applies a plug-in strategy to estimate the asymptotic risk function and minimize to get the optimal bandwidth matrix. See Doss and Weng (2018) <arXiv:1806.00731> for more detail.
LSDsensitivity Sensitivity Analysis Tools for LSD
Tools for sensitivity analysis of LSD simulation models. Reads object-oriented data produced by LSD simulation models and performs screening and global sensitivity analysis (Sobol decomposition method, Saltelli et al. (2008) ISBN:9780470725177). A Kriging or polynomial meta-model (Kleijnen (2009) <doi:10.1016/j.ejor.2007.10.013>) is estimated using the simulation data to provide the data required by the Sobol decomposition. LSD (Laboratory for Simulation Development) is free software developed by Marco Valente (documentation and downloads available at <http://labsimdev.org> ).
lsei Solving Least Squares Problems under Equality/Inequality Constraints
It contains functions that solve least squares linear regression problems under linear equality/inequality constraints. It is developed based on the ‘Fortran’ program of Lawson and Hanson (1974, 1995), which is public domain and available at http://…/lawson-hanson.
lsl Latent Structure Learning
Conduct latent structure learning methods, particularly structural equation modeling via penalized likelihood.
lslx Semi-Confirmatory Structural Equation Modeling via Penalized Likelihood
Fits semi-confirmatory structural equation modeling (SEM) via penalized likelihood (PL) with lasso or minimax concave penalty (MCP) developed by Huang, Chen, and Weng (2017) <doi:10.1007/s11336-017-9566-9>.
lsm Estimation of the log Likelihood of the Saturated Model
When the values of the outcome variable Y are either 0 or 1, the function lsm() calculates the estimation of the log likelihood in the saturated model. This model is characterized by Llinas (2006, ISSN:2389-8976) in section 2.3 through the assumptions 1 and 2. The function LogLik() works (almost perfectly) when the number of independent variables K is high, but for small K it calculates wrong values in some cases. For this reason, when Y is dichotomous and the data are grouped in J populations, it is recommended to use the function lsm() because it works very well for all K.
lspartition Nonparametric Estimation and Inference Procedures using Partitioning-Based Least Squares Regression
Tools for statistical analysis using partitioning-based least squares regression as described in Cattaneo, Farrell and Feng (2018) <arXiv:1804.04916>. lsprobust() for nonparametric point estimation of regression functions and derivatives thereof, and for robust bias-corrected (pointwise and uniform) inference procedures. lspkselect() for data-driven procedure for selecting the IMSE-optimal number of knots. lsprobust.plot() for regression plots with robust confidence intervals and confidence bands. lsplincom() for estimation and inference for linear combinations of regression functions from different groups.
lspline Linear Splines with Convenient Parametrisations
Linear splines with convenient parametrisations such that (1) coefficients are slopes of consecutive segments or (2) coefficients are slope changes at consecutive knots. Knots can be set manually or at break points of equal-frequency or equal-width intervals covering the range of ‘x’. The implementation follows Greene (2003), chapter 7.2.5.
lsplsGlm Classification using LS-PLS for Logistic Regression
Fit logistic regression models using LS-PLS approaches to analyse both clinical and genomic data. (C. Bazzoli and S. Lambert-Lacroix. (2017) Classification using LS-PLS with logistic regression based on both clinical and gene expression variables <https://…/hal-01405101> ).
ltable Easy to Make (Lazy) Tables
Constructs tables of counts and proportions out of data sets. It has simplified syntax appealing for novice and even for advanced user under time pressure. It is particularly suitable for exploratory data analysis or presentation to single out most appropriate pieces of tabulated information. The other important feature is possibility to insert tables to Excel and Word documents.
ltm Latent Trait Models under IRT
Analysis of multivariate dichotomous and polytomous data using latent trait models under the Item Response Theory approach. It includes the Rasch, the Two-Parameter Logistic, the Birnbaum’s Three-Parameter, the Graded Response, and the Generalized Partial Credit Models.
ltmix Left-Truncated Mixtures of Gamma, Weibull, and Lognormal Distributions
Mixture modelling of one-dimensional data using combinations of left-truncated Gamma, Weibull, and Lognormal Distributions. Blostein, Martin & Miljkovic, Tatjana. (2019) <10.1016/j.insmatheco.2018.12.001>.
ltmle Longitudinal Targeted Maximum Likelihood Estimation
Targeted Maximum Likelihood Estimation (TMLE) of treatment/censoring specific mean outcome or marginal structural model for point-treatment and longitudinal data.
ltsk Local Time Space Kriging
Implements local spatial and local spatiotemporal Kriging based on local spatial and local spatiotemporal variograms, respectively. The method is documented in Kumar et al (2013) <https://…/jes201352 )>.
ltxsparklines Lightweight Sparklines for a LaTeX Document
Sparklines are small plots (about one line of text high), made popular by Edward Tufte. This package is the interface from R to the LaTeX package sparklines by Andreas Loeffer and Dan Luecking (http://…/sparklines ). It can work with Sweave or knitr or other engines that produce TeX. The package can be used to plot vectors, matrices, data frames, time series (in ts or zoo format).
lubridate Make dealing with dates a little easier
Lubridate makes it easier to work with dates and times by providing functions to identify and parse date-time data, extract and modify components of a date-time (years, months, days, hours, minutes, and seconds), perform accurate math on date-times, handle time zones and Daylight Savings Time. Lubridate has a consistent, memorable syntax, that makes working with dates fun instead of frustrating.
lucid Lucid Printing of Floating Point Numbers
Print vectors (and data frames) of floating point numbers using a non-scientific format optimized for human readers. Vectors of numbers are rounded using significant digits, aligned at the decimal point, and all zeros trailing the decimal point are dropped.
lucr Currency Formatting and Conversion
Reformat currency-based data as numeric values (or numeric values as currency-based data) and convert between currencies.
ludic Linkage Using Diagnosis Codes
Probabilistic record linkage without direct identifiers using only diagnosis codes.
lumberjack Track Changes in Data the Tidy Way
A function composition (‘pipe’) operator and extensible framework that allows for easy logging of changes in data.
luzlogr Lightweight Logging for R Scripts
Provides flexible but lightweight logging facilities for R scripts. Supports priority levels for logs and messages, flagging messages, capturing script output, switching logs, and logging to files or connections.
lvec Out of Memory Vectors
Core functionality for working with vectors (numeric, integer, logical and character) that are too large to keep in memory. The vectors are kept (partially) on disk using memory mapping. This package contains the basic functionality for working with these memory mapped vectors (e.g. creating, indexing, ordering and sorting) and provides C++ headers which can be used by other packages to extend the functionality provided in this package.
lvmcomp Stochastic EM Algorithms for Latent Variable Models with a High-Dimensional Latent Space
Provides stochastic EM algorithms for latent variable models with a high-dimensional latent space. So far, we provide functions for confirmatory item factor analysis based on the multidimensional two parameter logistic (M2PL) model and the generalized multidimensional partial credit model. These functions scale well for problems with many latent traits (e.g., thirty or even more) and are virtually tuning-free. The computation is facilitated by multiprocessing ‘OpenMP’ API. For more information, please refer to: Zhang, S., Chen, Y., & Liu, Y. (2018). An Improved Stochastic EM Algorithm for Large-scale Full-information Item Factor Analysis. British Journal of Mathematical and Statistical Psychology. <doi:10.1111/bmsp.12153>.
lvnet Latent Variable Network Modeling
Estimate, fit and compare Structural Equation Models (SEM) and network models (Gaussian Graphical Models; GGM) using OpenMx. Allows for two possible generalizations to include GGMs in SEM: GGMs can be used between latent variables (latent network modeling; LNM) or between residuals (residual network modeling; RNM).
lvplot Letter Value ‘Boxplots’
Implements the letter value ‘boxplot’ which extends the standard ‘boxplot’ to deal with both larger and smaller number of data points by dynamically selecting the appropriate number of letter values to display.
lwgeom Bindings to Selected ‘liblwgeom’ Functions for Simple Features
Access to some of the functions found in ‘liblwgeom’, the geometry library used by ‘PostGIS’.
LZeroSpikeInference Exact Spike Train Inference via L0 Optimization
An implementation of algorithms described in Jewell and Witten (2017) <arXiv:1703.08644>.

M

m2b Movement to Behaviour Inference using Random Forest
Prediction of behaviour from movement characteristics using observation and random forest for the analyses of movement data in ecology. From movement information (speed, bearing…) the model predicts the observed behaviour (movement, foraging…) using random forest. The model can then extrapolate behavioural information to movement data without direct observation of behaviours. The specificity of this method relies on the derivation of multiple predictor variables from the movement data over a range of temporal windows. This procedure allows to capture as much information as possible on the changes and variations of movement and ensures the use of the random forest algorithm to its best capacity. The method is very generic, applicable to any set of data providing movement data together with observation of behaviour.
M2SMF Multi-Modal Similarity Matrix Factorization for Integrative Multi-Omics Data Analysis
A new method to implement clustering from multiple modality data of certain samples, the function M2SMF() jointly factorizes multiple similarity matrices into a shared sub-matrix and several modality private sub-matrices, which is further used for clustering. Along with this method, we also provide function to calculate the similarity matrix and function to evaluate the best cluster number from the original data.
mable Maximum Approximate Bernstein Likelihood Estimation
Fit raw or grouped continuous data from a population with a smooth density on unit interval by an approximate Bernstein polynomial model which is a mixture of certain beta distributions and find maximum approximate Bernstein likelihood estimator of the unknown coefficients. Consequently, maximum likelihood estimates of the unknown density, distribution functions, and more can be obtained. If the support of the density is not the unit interval then transformation can be applied. This is an implementation of the methods proposed by the author this package published in the Journal of Nonparametric Statistics: Guan (2016) <doi:10.1080/10485252.2016.1163349> and Guan (2017) <doi:10.1080/10485252.2017.1374384>.
maboost Binary and Multiclass Boosting Algorithms
Performs binary and multiclass boosting in maximum-margin, sparse, smooth and normal settings as described in “A Boosting Framework on Grounds of Online Learning” by T. Naghibi and B. Pfister, (2014). For further information regarding the algorithms, please refer to http://…/1409.7202
macc Mediation Analysis of Causality under Confounding
Performs causal mediation analysis under confounding or correlated errors. This package includes a single level mediation model, a two-level mediation model, and a three-level mediation model for data with hierarchical structures. Under the two/three-level mediation model, the correlation parameter is identifiable and is estimated based on a hierarchical-likelihood, a marginal-likelihood or a two-stage method. See reference for details (Zhao, Y., & Luo, X. (2014). Estimating Mediation Effects under Correlated Errors with an Application to fMRI. arXiv preprint arXiv:1410.7217. <https://…/1410.7217> ).
machina Machina Time Series Generation and Backtesting
Connects to <https://machi.na/>. and allows the creation of time series, and running backtests on selected portfolio if requested.
MachineShop Machine Learning Models and Tools
Meta-package for statistical and machine learning with a common interface for model fitting, prediction, performance assessment, and presentation of results. Supports predictive modeling of numerical, categorical, and censored time-to-event outcomes and resample (bootstrap and cross-validation) estimation of model performance.
machQA QA Machina Indicators
Performs QA on Machina (see <https://machi.na/> for more information) algebraic indicators ‘sma’ (simple moving average), ‘wavg’ (weighted average),’xavg’ (exponential moving average), ‘hma’ (Hull moving average), ‘adma’ (adaptive moving average), ‘tsi’ (true strength index), ‘rsi’ (relative strength index), ‘gauss’ (Gaussian elimination), ‘momo’ (momentum), ‘t3’ (triple exponential moving average), ‘macd’ (moving average convergence divergence).
madness Automatic Differentiation of Multivariate Operations
An object that supports automatic differentiation of matrix- and multidimensional-valued functions with respect to multidimensional independent variables. Automatic differentiation is via ‘forward accumulation’.
madr Model Averaged Double Robust Estimation
Estimates average treatment effects using model average double robust (MA-DR) estimation. The MA-DR estimator is defined as weighted average of double robust estimators, where each double robust estimator corresponds to a specific choice of the outcome model and the propensity score model. The MA-DR estimator extend the desirable double robustness property by achieving consistency under the much weaker assumption that either the true propensity score model or the true outcome model be within a specified, possibly large, class of models.
madrat May All Data be Reproducible and Transparent (MADRaT) *
Provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation. * The title is a wish not a promise. By no means we expect this package to deliver everything what is needed to achieve full reproducibility and transparency, but we believe that it supports efforts in this direction.
mafs Multiple Automatic Forecast Selection
Fits several forecast models available from the forecast package and selects the best one according to an error metric. Its main function is select_forecast().
magclass Data Class and Tools for Handling Spatial-Temporal Data
Data class for increased interoperability working with spatial-temporal data together with corresponding functions and methods (conversions, basic calculations and basic data manipulation). The class distinguishes between spatial, temporal and other dimensions to facilitate the development and interoperability of tools build for it. Additional features are name-based addressing of data and internal consistency checks (e.g. checking for the right data order in calculations).
magicfor Magic Functions to Obtain Results from for Loops
Magic functions to obtain results from for loops.
magick Advanced Image-Processing in R
Bindings to ImageMagick: the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment.
magickGUI GUI Tools for Interactive Image Processing with ‘magick’
Enables us to use the functions of the package ‘magick’ interactively.
MagneticMap Magnetic Laplacian Matrix and Magnetic Eigenmap Visualization
Constructs the normalized magnetic Laplacian Matrix of a square matrix, returns the eigenvectors and visualization of magnetic eigenmap.
magree Implements the O’Connell-Dobson-Schouten Estimators of Agreement for Multiple Observers
Implements an interface to the legacy Fortran code from O’Connell and Dobson (1984) <DOI:10.2307/2531148>. Implements Fortran 77 code for the methods developed by Schouten (1982) <DOI:10.1111/j.1467-9574.1982.tb00774.x>. Includes estimates of average agreement for each observer and average agreement for each subject.
makedummies Create Dummy Variables from Categorical Data
Create dummy variables from categorical data. This package can convert categorical data (factor and ordered) into dummy variables and handle multiple columns simultaneously. This package enables to select whether a dummy variable for base group is included (for principal component analysis/factor analysis) or excluded (for regression analysis) by an option.
MakefileR Create ‘Makefiles’ Using R
A user-friendly interface for the construction of ‘Makefiles’.
makeFlow Visualizing Sequential Classifications
A user-friendly tool for visualizing categorical or group movement.
makeParallel Transform Serial R Code into Parallel R Code
Writing parallel R code can be difficult, particularly for code that is not ’embarrassingly parallel’. This experimental package automates the transformation of serial R code into more efficient parallel versions. It identifies task parallelism by statically analyzing entire scripts to detect dependencies between statements. It implements an extensible system for scheduling and generating new code. It includes a reference implementation of the ‘List Scheduling’ approach to the general task scheduling problem of scheduling statements on multiple processors.
manet Multiple Allocation Model for Actor-Event Networks
Mixture model with overlapping clusters for binary actor-event data. Parameters are estimated in a Bayesian framework. Model and inference are described in Ranciati, Vinciotti, Wit (2017) Modelling actor-event network data via a mixture model under overlapping clusters. Submitted.
Mangrove Risk Prediction on Trees
Methods for performing genetic risk prediction from genotype data. You can use it to perform risk prediction for individuals, or for families with missing data.
ManifoldOptim An R Interface to the ROPTLIB Library for Riemannian Manifold Optimization
An R interface to the ‘ROPTLIB’ optimization library (see <http://…/~whuang2> for more information). Optimize real-valued functions over manifolds such as Stiefel, Grassmann, and Symmetric Positive Definite matrices.
manipulate Interactive Plots for RStudio
Interactive plotting functions for use within RStudio. The manipulate function accepts a plotting expression and a set of controls (e.g. slider, picker, checkbox, or button) which are used to dynamically change values within the expression. When a value is changed using its corresponding control the expression is automatically re-executed and the plot is redrawn.
manipulateWidget Add Even More Interactivity to Interactive Charts
Like package ‘manipulate’ does for static graphics, this package helps to easily add controls like sliders, pickers, checkboxes, etc. that can be used to modify the input data or the parameters of an interactive chart created with htmlwidgets.
MANOVA.RM Analysis of Multivariate Data and Repeated Measures Designs
Implemented are various tests for semi-parametric repeated measures and general MANOVA designs that do neither assume multivariate normality nor covariance homogeneity, i.e., the procedures are applicable for a wide range of general multivariate factorial designs.
manymodelr Build and Tune Several Models
Frequently one needs a convenient way to build and tune, several models in one go.The goal is to provide a number of convenience functions useful in machine learning applications. It provides the ability to build,tune and obtain predictions of several models in one function. The models are built using ‘caret’ functions with easier to read syntax. Kuhn(2014)<arXiv:1405.6974v14>. Kuhn(2008)<doi10.18637/jss.v028.i05>. Chambers,J.M.(1992)<doi:10.1371/journal.pone.0053143>. Wilkinson,G.N. and Rogers, C. E. (1973)<doi:10.2307/2346786>.
mapdeck Interactive Maps Using ‘Mapbox GL JS’ and ‘Deck.gl’
Provides a mechanism to plot an interactive map using ‘Mapbox GL’ (<https://…/> ), a javascript library for interactive maps, and ‘Deck.gl’ (<http://…/> ), a javascript library which uses ‘WebGL’ for visualising large data sets.
mapedit Interactive Editing of Spatial Data in R
Suite of interactive functions and helpers for selecting and editing geospatial data.
mapReasy Producing Administrative Boundary Map with Additional Features Embedded
Produce administrative boundary map, visualize and compare different factors on map, tracking latitude and longitude, bubble plot. The package provides some handy functions to produce different administrative maps easily. Functions to obtain colorful visualization of different regions of interest and sub-divisional administrative map at different levels are included. This csn be used to increase feasibility of mapping disease pattern across different regions (disease mapping) with appropriate colors having intensity coherent with magnitude of prevalence. In many surveys, information on location of sample are collected. Sometimes it is of interest to quick look at the spreadness of the collected sample, check if any observation falls outside of the survey area and identify them. The package provides unique function to perform these tasks easily. Besides, some additional features have been added to make ad-lib comparison of different factors across the region through these maps. Visual presentation of two different variables on a particular map using two way bubble plot is also provided. Simple bar chart and pie chart can be produced on map to compare several factors.This package will be helpful to researchers-both statistician and non-statistician, to create geographic location wise plotting of different indicators. These types of maps are used in different research areas such public health, economics, environment, journalism etc. It provides functions that will also be helpful to users to create map using two indicators at a time (for example, shade on a map will give the information of one indicator variable, bar/pie/bubble chart will give the information on another indicator). Users only need to select the indicator’s value and country wise region specific shapefile and run the functions to find their graphs quickly.The distinguishable features of the functions in this package are they are easy to understand to new R users who are searching some ad-lib functions to produce administrative map with different features and easy to use for those who are unfamiliar with file format of spatial data or geographic location data. Functions in this package adopt, compile and implement functions from some well-known packages on handling spatial data to make an user friendly functionality. So users do not need any additional knowledge about spatial statistics or geographic location data. All the examples presented in this package use shapefile of country Bangladesh downloaded from <http://www.gadm.org>. Users are requested to visit <http://www.gadm.org>, then select Download, then choose country and shapefile from country and File format dropdown menu. After downloading the shapefile of any particular country as compressed file, unzip the file and keep them in a known directory or working directory. Shapefiles of respective countries will be required to produce corresponding country maps. Use shapefile of corresponding country to produce all types of maps available in this package.
mapsapi sf’-Compatible Interface to ‘Google Maps’ APIs
Interface to the ‘Google Maps’ APIs: (1) routing directions based on the ‘Directions API’, returned as ‘sf’ objects, either as single feature per alternative route, or a single feature per segment per alternative route, (2) travel distance or time matrices based on the ‘Distance Matrix API’.
mapview Interactive Viewing of Spatial Objects in R
Methods to view spatial objects interactively.
mar1s Multiplicative AR(1) with Seasonal Processes
Multiplicative AR(1) with Seasonal is a stochastic process model built on top of AR(1). The package provides the following procedures for MAR(1)S processes: fit, compose, decompose, advanced simulate and predict.
march Markov Chains
Computation of various Markovian models for categorical data including homogeneous Markov chains of any order, MTD models, Hidden Markov models, and Double Chain Markov Models.
marcher Migration and Range Change Estimation in R
A set of tools for likelihood-based estimation, model selection and testing of two- and three-range shift and migration models for animal movement data as described in Gurarie et al. (2017) <doi: 10.1111/1365-2656.12674>. Provided movement data (X, Y and Time), including irregularly sampled data, functions estimate the time, duration and location of one or two range shifts, as well as the ranging area and auto-correlation structure of the movment. Tests assess, for example, whether the shift was ‘significant’, and whether a two-shift migration was a true return migration.
MargCond Joint Marginal-Conditional Model
Fits joint marginal conditional models for multivariate longitudinal data, as in Proudfoot, Faig, Natarajan, and Xu (2018) <doi:10.1002/sim.7552>. Development of this package was supported by the UCSD Altman Translational Research Institute, NIH grant UL1TR001442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
margins Marginal Effects for Model Objects
An R port of Stata’s ‘margins’ command, which can be used to calculate marginal (or partial) effects from model objects.
marima Multivariate ARIMA Analysis
Multivariate time series estimation using Spliid’s algorithm.
MarketMatching Market Matching and Causal Impact Inference
For a given test market find the best control markets using time series matching and analyze the impact of an intervention. The intervention could be be a marketing event or some other local business tactic that is being tested. The workflow implemented in MarketMatching utilizes dynamic time warping (the dtw package) to do the matching and the CausalImpact package to analyze the causal impact. In fact, this package can be considered a ‘workflow wrapper’ for those two packages.
marl Multivariate Analysis Based on Relative Likelihoods
Functions provided allow data simulation; construction of weighted relative likelihood functions; clustering and principal component analysis based on weighted relative likelihood functions.
mason Build Data Structures for Common Statistical Analysis
Use a consistent syntax to create data structures of common statistical techniques that can be continued in a pipe chain. Design the analysis, add settings and variables, construct the results, and polish the final structure. Rinse and repeat for any number of statistical techniques.
MASS Support Functions and Datasets for Venables and Ripley’s MASS
Functions and datasets to support Venables and Ripley, ‘Modern Applied Statistics with S’ (4th edition, 2002).
Massign Simple Matrix Construction
Constructing matrices for quick prototyping can be a nuisance, requiring the user to think about how to fill the matrix with values using the matrix() function. The %<-% operator solves that issue by allowing the user to construct matrices using code that shows the actual matrices.
MATA Model-Averaged Tail Area Wald (MATA-Wald) Confidence Interval
Calculates Model-Averaged Tail Area Wald (MATA-Wald) confidence intervals, which are constructed using single-model estimators and model weights.
matahari Spy on Your R Session
Conveniently log everything you type into the R console. Logs are are stored as tidy data frames which can then be analyzed using ‘tidyverse’ style tools.
MatchingFrontier Computation of the Balance – Sample Size Frontier in Matching Methods for Causal Inference
Returns the subset of the data with the minimum imbalance for every possible subset size (N – 1, N – 2, …), down to the data set with the minimum possible imbalance. Also includes tools for the estimation of causal effects for each subset size, functions for visualization and data export, and functions for calculating model dependence as proposed by Athey and Imbens.
matchingMarkets Structural Estimators and Algorithms for the Analysis of Stable Matchings
Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. Also contains R code for matching algorithms such as the deferred-acceptance algorithm for college admissions, the top-trading-cycles algorithm for house allocation and a partitioning linear program for the roommates problem.
matchingR Gale-Shapley Algorithm in R and C++
Computes the Gale-Shapley Algorithm efficiently using Rcpp. Provides algorithms to compute the stable matching for the marriage problem and for the college-admissions problem, i.e. the matching of students to colleges.
MatchItSE Calculates SE for Matched Samples from ‘MatchIt’
Contains various methods for Standard Error estimation for ‘MatchIt’ objects.
MatchLinReg Combining Matching and Linear Regression for Causal Inference
Core functions as well as diagnostic and calibration tools for combining matching and linear regression for causal inference in observational studies.
matchMulti Optimal Multilevel Matching using a Network Algorithm
Performs multilevel matches for data with cluster-level treatments and individual-level outcomes using a network optimization algorithm. Functions for checking balance at the cluster and individual levels are also provided, as are methods for permutation-inference-based outcome analysis.
matconv A Code Converter from the Matlab/Octave Language to R
Transferring over a code base from Matlab to R is often a repetitive and inefficient use of time. This package provides a translator for Matlab / Octave code into R code. It does some syntax changes, but most of the heavy lifting is in the function changes since the languages are so similar. Options for different data structures and the functions that can be changed are given. The Matlab code should be mostly in adherence to the standard style guide but some effort has been made to accommodate different number of spaces and other small syntax issues. This will not make the code more R friendly and may not even run afterwards. However, the rudimentary syntax, base function and data structure conversion is done quickly so that the maintainer can focus on changes to the design structure.
matdist Matrix Variate Distributions
It provides tools for computing densities and generating random samples from matrix variate distributions, including matrix normal, Wishart, matrix Student-t, matrix Dirichlet and matrix beta distributions. For complete disposition, see Gupta and Nagar (1999) <ISBN:978-1584880462>.
mateable Tools to Assess Mating Potential in Space and Time
Provides tools to simulate, manage, visualize, and analyze spatially and temporally explicit datasets of mating potential. Implements methods to calculate synchrony, proximity, and compatibility.
mathgraph Directed and Undirected Graphs
Simple tools for constructing and manipulating objects of class mathgraph from the book ‘S Poetry’, available at <http://…/spoetry.html>.
mathpix Support for the ‘Mathpix’ API (Image to ‘LaTeX’)
Given an image of a formula (typeset or handwritten) this package provides calls to the ‘Mathpix’ service to produce the ‘LaTeX’ code which should generate that image, and pastes it into a (e.g. an ‘rmarkdown’) document. See <https://…/> for full details. ‘Mathpix’ is an external service and use of the API is subject to their terms and conditions.
matlabr An Interface for MATLAB using System Calls
Provides users to call MATLAB from using the ‘system’ command. Allows users to submit lines of code or MATLAB m files. This is in comparison to ‘R.matlab’, which creates a MATLAB server.
GitHub
matlib Matrix Functions for Teaching and Learning Linear Algebra and Multivariate Statistics
A collection of matrix functions for teaching and learning matrix linear algebra as used in multivariate statistical methods. These functions are mainly for tutorial purposes in learning matrix algebra ideas using R. In some cases, functions are provided for concepts available elsewhere in R, but where the function call or name is not obvious. In other cases, functions are provided to show or demonstrate an algorithm.
Matrix Sparse and Dense Matrix Classes and Methods
Classes and methods for dense and sparse matrices and operations on them using ‘LAPACK’ and ‘SuiteSparse’.
Matrix.utils Data Frame Operations on Sparse and Dense Matrix Objects
Implements cast, aggregate, and merge/join for Matrix and matrix-like objects.
MatrixCorrelation Matrix Correlation Coefficients
Computation and visualization of matrix correlation coefficients. The main method is the Similarity of Matrices Index, while various related measures like r1, r2, r3, r4, Yanai’s GCD, RV, RV2 and adjusted RV are included for comparison.
matrixLaplacian Normalized Laplacian Matrix and Laplacian Map
Constructs the normalized Laplacian matrix of a square matrix, returns the eigenvectors (singular vectors) and visualization of normalized Laplacian map.
MatrixLDA Penalized Matrix-Normal Linear Discriminant Analysis
Fits the penalized matrix-normal model to be used for linear discriminant analysis with matrix-valued predictors.
matrixNormal The Matrix Normal Distribution
Computes densities, probabilities, and random deviates of the Matrix Normal (Iranmanesh et.al. (2010) <doi:10.7508/ijmsi.2010.02.004>). Also includes simple but useful matrix functions. See R package help file for more information.
matrixsampling Simulations of Matrix Variate Distributions
Provides samplers for various matrix variate distributions: Wishart, inverse-Wishart, normal, t, inverted-t, Beta type I and Beta type II. Allows to simulate the noncentral Wishart distribution without the integer restriction on the degrees of freedom.
matrixStats Methods that Apply to Rows and Columns of a Matrix
Methods operating on rows and columns of matrices, e.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). There are also some vector-based methods, e.g. binMeans(), madDiff() and weightedMedians(). All methods have been optimized for speed and memory usage.
matrixTests Fast Statistical Hypothesis Tests on Rows and Columns of Matrices
Functions to perform fast statistical hypothesis tests on rows/columns of matrices. The main goals are: 1) speed via vectorization, 2) output that is detailed and easy to use, 3) compatibility with tests implemented in R (like those available in the ‘stats’ package).
matsbyname An Implementation of Matrix Mathematics
An implementation of matrix mathematics wherein operations are performed ‘by name.’
MatTransMix Clustering with Matrix Gaussian and Matrix Transformation Mixture Models
Provides matrix Gaussian mixture models, matrix transformation mixture models and their model-based clustering results. The parsimonious models of the mean matrices and variance covariance matrices are implemented with a total of 196 variations.
mau Decision Models with Multi Attribute Utility Theory
Build and test decision models based in Multi Attribute Utility Theory (MAUT). Automatic evaluation of utilities at any level of the decision tree, weight simulations for sensitivity analysis.
MAVE Methods for Dimension Reduction
Functions for dimension reduction, using MAVE (Minimum Average Variance Estimation), OPG (Outer Product of Gradient) and KSIR (sliced inverse regression of kernel version). Methods for selecting the best dimension are also included.
MAVIS Meta Analysis via Shiny
Interactive shiny application for running a meta-analysis, provides support for both random effects and fixed effects models with the ‘metafor’ package. Additional support is included for calculating effect sizes plus support for single case designs, graphical output, and detecting publication bias.
maxadjAUC Maximizing the Adjusted AUC
Fits a linear combination of predictors by maximizing a smooth approximation to the estimated covariate-adjusted area under the receiver operating characteristic curve (AUC) for a discrete covariate. (Meisner, A, Parikh, CR, and Kerr, KF (2017) <http://…/>.)
MaxentVariableSelection Selecting the Best Set of Relevant Environmental Variables along with the Optimal Regularization Multiplier for Maxent Niche Modeling
Complex niche models show low performance in identifying the most important range-limiting environmental variables and in transferring habitat suitability to novel environmental conditions (Warren and Seifert, 2011 <DOI:10.1890/10-1171.1>; Warren et al., 2014 <DOI:10.1111/ddi.12160>). This package helps to identify the most important set of uncorrelated variables and to fine-tune Maxent’s regularization multiplier. In combination, this allows to constrain complexity and increase performance of Maxent niche models (assessed by information criteria, such as AICc (Akaike, 1974 <DOI:10.1109/TAC.1974.1100705>), and by the area under the receiver operating characteristic (AUC) (Fielding and Bell, 1997 <DOI:10.1017/S0376892997000088>). Users of this package should be familiar with Maxent niche modelling.
maximin Sequential Space-Filling Design under the Criterion of Maximin Distance
Builds up sequential space-filling design under the criterion of maximin distance. Both discrete and continuous versions are provided.
maxmatching Maximum Matching for General Weighted Graph
Computes the maximum matching for unweighted graph and maximum matching for (un)weighted bipartite graph efficiently.
MaxMC Maximized Monte Carlo
An implementation of the Monte Carlo techniques described in details by Dufour (2006) <doi:10.1016/j.jeconom.2005.06.007> and Dufour and Khalaf (2007) <doi:10.1002/9780470996249.ch24>. The two main features available are the Monte Carlo method with tie-breaker, mc(), for discrete statistics, and the Maximized Monte Carlo, mmc(), for statistics with nuisance parameters.
MaxSkew Orthogonal Data Projections with Maximal Skewness
It finds Orthogonal Data Projections with Maximal Skewness. The first data projection in the output is the most skewed among all linear data projections. The second data projection in the output is the most skewed among all data projections orthogonal to the first one, and so on.
maxTPR Maximizing the TPR for a Specified FPR
Estimates a linear combination of predictors by maximizing a smooth approximation to the estimated true positive rate (TPR; sensitivity) while constraining a smooth approximation to the estimated false positive rate (FPR; 1-specificity) at a user-specified level.
MazamaCoreUtils Utility Functions for Production R Code
A suite of utility functions providing functionality commonly needed for production level projects such as logging, error handling, and cache management.
mazeGen Elithorn Maze Generator
A maze generator that creates the Elithorn Maze (HTML file) and the functions to calculate the associated maze parameters (i.e. Difficulty and Ability).
mazeinda Monotonic Association on Zero-Inflated Data
Methods for calculating and testing the significance of pairwise monotonic association from and based on the work of Pimentel (2009) <doi:10.4135/9781412985291.n2>. Computation of association of vectors from one or multiple sets can be performed in parallel thanks to the packages ‘foreach’ and ‘doMC’.
mbclusterwise Clusterwise Multiblock Analyses
Perform clusterwise multiblock analyses (clusterwise multiblock Partial Least Squares, clusterwise multiblock Redundancy Analysis or a regularized method between the two latter ones) associated with a F-fold cross-validation procedure to select the optimal number of clusters and dimensions.
mbgraphic Measure Based Graphic Selection
Measure based exploratory data analysis. Some of the functions call interactive apps programmed with the package shiny to provide flexible selection options.
MBHdesign Spatial Designs for Ecological and Environmental Surveys
Provides spatially balanced designs from a set of (contiguous) potential sampling locations in a study region. Accommodates , without detrimental effects on spatial balance, sites that the researcher wishes to include in the survey for reasons other than the current randomisation (legacy sites).
mbir Magnitude-Based Inferences
Allows practitioners and researchers a wholesale approach for deriving magnitude-based inferences from raw data. A major goal of ‘mbir’ is to programmatically detect appropriate statistical tests to run in lieu of relying on practitioners to determine correct stepwise procedures independently.
mboost Model-Based Boosting
Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
mbrglm Median Bias Reduction in Binomial-Response GLMs
Fit generalized linear models with binomial responses using a median modified score approach (Kenne Pagui et al., 2016, <https://…/1604.04768> ) to median bias reduction. This method respects equivariance under reparameterizations for each parameter component and also solves the infinite estimates problem (data separation).
MBSGS Multivariate Bayesian Sparse Group Selection with Spike and Slab
An implementation of a Bayesian sparse group model using spike and slab priors in a regression context. It is designed for regression with a multivariate response variable, but also provides an implementation for univariate response.
MBSP Multivariate Bayesian Model with Shrinkage Priors
Implements a sparse Bayesian multivariate linear regression model using shrinkage priors from the three parameter beta normal family. The method is described in Bai and Ghosh (2018) <arXiv:1711.07635>.
MBTAr Access Data from the Massachusetts Bay Transit Authority (MBTA) Web API
Access to the MBTA API for R. Creates an easy-to-use bundle of functions to work with all the built-in calls to the MBTA API. Allows users to download realtime tracking data in dataframe format that is manipulable in standard R analytics functions.
mBvs Multivariate Bayesian Variable Selection Method Exploiting Dependence among Outcomes
Bayesian variable selection methods for data with continuous multivariate responses and multiple covariates.
mc.heterogeneity A Monte Carlo Based Heterogeneity Test for Meta-Analysis
Implements a Monte Carlo Based Heterogeneity Test for standardized mean differences (d), Fisher-transformed Pearson’s correlations (r), and natural-logarithm-transformed odds ratio (OR) in Meta-Analysis Studies. Depending on the presence of moderators, this Monte Carlo Based Test can be implemented in the random or mixed-effects model. This package uses rma() function from the R package ‘metafor’ to obtain parameter estimates and likelihood, so installation of R package ‘metafor’ is required.
MCAvariants Multiple Correspondence Analysis Variants
Provides two variants of multiple correspondence analysis (ca): multiple ca and ordered multiple ca via orthogonal polynomials of Emerson.
mcBFtest Monte Carlo Based Tests for the Behrens Fisher Problem as an Alternative to Welch’s t-Approximation
Monte Carol based tests for the Behrens Fisher Problem enhance the statistical power and performs better than Welch’s t-approximation, see Ullah et al. (2019).
mcca Multi-Category Classification Accuracy
It contains six robust diagnostic accuracy methods to evaluate three or four category classifiers. Hypervolume Under Manifold (HUM), described in the paper: Jialiang Li (2008) <doi:10.1093/biostatistics/kxm050>. Jialiang Li (2014) <doi:10.3109/1354750X.2013.868516>. Correct Classification Percentage (CCP), Integrated Discrimination Improvement (IDI), Net Reclassification Improvement (NRI), R-Squared Value (RSQ), described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. Polytomous Discrimination Index (PDI), described in the paper: Van Calster B (2012) <doi:10.1007/s10654-012-9733-3>. Jialiang Li (2017) <doi:10.1177/0962280217692830>.
mccmeiv Estimating Parameters for a Matched Case Control Design with a Mismeasured Exposure using Instrumental Variables
Applying the methodology from Manuel, Wang, and Sinha (2018), estimates for the parameters for matched case control data with a mismeasured exposure are calculated through the use of user supplied instrumental variables.
mccr The Matthews Correlation Coefficient
The Matthews correlation coefficient (MCC) score is calculated (Matthews BW (1975) <DOI:10.1016/0005-2795(75)90109-9>).
MCDM Multi-Criteria Decision Making Methods
An R implementation of Four Multi-Criteria Decision Making (MCDM) Methods: Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), ‘VIseKriterijumska Optimizacija I Kompromisno Resenje’ (VIKOR), both Multi-Objective Optimization by Ratio Analysis and Full Multiplicative Form (Multi-MOORA) and Weighted Aggregated Sum Product ASsessment (WASPAS). In addition, this package provides a MetaRanking function which combines the output of the previous methods.
mcemGLM Maximum Likelihood Estimation for Generalized Linear Mixed Models
Maximum likelihood estimation for generalized linear mixed models via Monte Carlo EM.
mcen Multivariate Cluster Elastic Net
Fits the Multivariate Cluster Elastic Net (MCEN) presented in Price & Sherwood (2018) <arXiv:1707.03530>. The MCEN model simultaneously estimates regression coefficients and a clustering of the responses for a multivariate response model. Currently accommodates the Gaussian and binomial likelihood.
mcgfa Mixtures of Contaminated Gaussian Factor Analyzers
Performs clustering and classification using the Mixtures of Contaminated Gaussian Factor Analyzers model. Allows for automatic detection of outliers and noise.
MCI2 Market Area Models for Retail and Service Locations
Market area models are used to analyze and predict store choices and market areas concerning retail and service locations. This package is a more user-friendly wrapper of the functions in the package ‘MCI’ (Wieland 2017) providing market area analysis using the Huff Model or the Multiplicative Competitive Interaction (MCI) Model. In ‘MCI2’, also a function for creating transport costs matrices is provided.
MCL Markov Cluster Algorithm
Contains the Markov cluster algorithm (MCL) for identifying clusters in networks and graphs. The algorithm simulates random walks on a (n x n) matrix as the adjacency matrix of a graph. It alternates an expansion step and an inflation step until an equilibrium state is reached.
mclcar Estimating Conditional Auto-Regression (CAR) Models using Monte Carlo Likelihood Methods
The likelihood of direct CAR models and Binomial and Poisson GLM with latent CAR variables are approximated by the Monte Carlo likelihood. The Maximum Monte Carlo likelihood estimator is found either by an iterative procedure of directly maximising the Monte Carlo approximation or by a response surface design method.
mclust Normal Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
Normal Mixture Modelling fitted via EM algorithm for Model-Based Clustering, Classification, and Density Estimation, including Bayesian regularization.
mclustcomp Measures for Comparing Clusters
Given a set of data points, a clustering is defined as a disjoint partition where each pair of sets in a partition has no overlapping elements. This package provides a collection of methods that play a role somewhat similar to distance or metric that measures similarity of two clusterings – or partitions. For a more detailed description, see Meila, M. (2005) <doi:10.1145/1102351.1102424>.
MCMC.qpcr Bayesian Analysis of qRT-PCR Data
Quantitative RT-PCR data are analyzed using generalized linear mixed models based on lognormal-Poisson error distribution, fitted using MCMC. Control genes are not required but can be incorporated as Bayesian priors or, when template abundances correlate with conditions, as trackers of global effects (common to all genes). The package also implements a lognormal model for higher-abundance data and a ‘classic’ model involving multi-gene normalization on a by-sample basis. Several plotting functions are included to extract and visualize results. The detailed tutorial is available here: <http://bit.ly/1Nwo4CB>.
mcmcabn Flexible Implementation of a Structural MCMC Sampler for DAGs
Flexible implementation of a structural MCMC sampler for Directed Acyclic Graphs (DAGs). It supports the new edge reversal move from Grzegorczyk and Husmeier (2008) <doi:10.1007/s10994-008-5057-7> and the Markov blanket resampling from Su and Borsuk (2016) <http://…/su16a.html>. It supports three priors: a prior controlling for structure complexity from Koivisto and Sood (2004) <http://…/citation.cfm?id=1005332.1005352>, an uninformative prior and a user defined prior. The three main problems that can be addressed by this R package are selecting the most probable structure based on a cache of pre-computed scores, controlling for overfitting and sampling the landscape of high scoring structures. It allows to quantify the marginal impact of relationships of interest by marginalising out over structures or nuisance dependencies. Structural MCMC seems a very elegant and natural way to estimate the true marginal impact, so one can determine if it’s magnitude is big enough to consider as a worthwhile intervention.
MCMCprecision Precision of Discrete Parameters in Transdimensional MCMC
Estimates the precision of transdimensional Markov chain Monte Carlo (MCMC) output, which is often used for Bayesian analysis of models with different dimensionality (e.g., model selection). Transdimensional MCMC (e.g., reversible jump MCMC) relies on sampling a discrete model-indicator variable to estimate the posterior model probabilities. If only few switches occur between the models, precision may be low and assessment based on the assumption of independent samples misleading. Based on the observed transition matrix of the indicator variable, the method of Heck, Overstall, Gronau, & Wagenmakers (2017) <https://…/1703.10364> draws posterior samples of the stationary distribution to (a) assess the uncertainty in the estimated posterior model probabilities and (b) estimate the effective sample size of the MCMC output.
MCMCtreeR Prepare MCMCtree Analyses and Plot Bayesian Divergence Time Analyses Estimates on Trees
Provides functions to prepare time priors for ‘MCMCtree’ analyses in the ‘PAML’ software from Yang (2007)<doi:10.1093/molbev/msm088> and plot time-scaled phylogenies from any Bayesian divergence time analysis. Most time-calibrated node prior distributions require user-specified parameters. The package provides functions to refine these parameters, so that the resulting prior distributions accurately reflect confidence in known, usually fossil, time information. These functions also enable users to visualise distributions and write ‘MCMCtree’ ready input files. Additionally, the package supplies flexible functions to visualise age uncertainty on a plotted tree with either using node bars, using branch widths proportional to the age uncertainty, or by plotting the full posterior distributions on nodes. The package also gives options to include the geological timescale and other details alongside the plotted phylogeny. All plotting functions these are applicable with output from any Bayesian software, not just ‘MCMCtree’.
MCMCvis Tools to Visualize, Manipulate, and Summarize MCMC Output
Performs key functions for MCMC analysis using minimal code – visualizes, manipulates, and summarizes MCMC output. Functions support simple and straightforward subsetting of model parameters within the calls, and produce presentable and ‘publication-ready’ output. MCMC output may be derived from Bayesian model output fit with JAGS, Stan, or other MCMC samplers.
mcMST A Toolbox for the Multi-Criteria Minimum Spanning Tree Problem
Algorithms to approximate the Pareto-front of multi-criteria minimum spanning tree problems. Additionally, a modular toolbox for the generation of multi-objective benchmark graph problems is included.
mcompanion Objects and Methods for Multi-Companion Matrices
Provides a class for multi-companion matrices with methods for arithmetic and factorization. A method for generation of multi-companion matrices with prespecified spectral properties is provided, as well as some utilities for periodically correlated and multivariate time series models. See Boshnakov (2002) <doi:10.1016/j.laa.2007.02.010> and Boshnakov & Iqelan (2009) <doi:10.1111/j.1467-9892.2009.00617.x>.
mcPAFit Estimating Node Fitness from a Single Network Snapshot by Markov Chain Monte Carlo
A Markov chain Monte Carlo method is provided to estimate fitness from a single network snapshot. Conventional methods require the complete information about the appearance order of nodes and edges in the network. This package performs inference on the timeline space, and so does not require such information.
mcparallelDo A Simplified Interface for Running Commands on Parallel Processes
Provides a function that wraps mcparallel() and mccollect() from ‘parallel’ with temporary variables and a task handler. Wrapped in this way the results of an mcparallel() call can be returned to the R session when the fork is complete without explicitly issuing a specific mccollect() to retrieve the value. Outside of top-level tasks, multiple mcparallel() jobs can be retrieved with a single call to mcparallelDoCheck().
MCS Model Confidence Set Procedure
Perform the model confidence set procedure of Hansen et al (2011).
mcStats Visualize Results of Statistical Hypothesis Tests
Provides functionality to produce graphs of sampling distributions of test statistics from a variety of common statistical tests. With only a few keystrokes, the user can conduct a hypothesis test and visualize the test statistic and corresponding p-value through the shading of its sampling distribution. Initially created for statistics at Middlebury College.
mctest Multicollinearity Diagnostic Measures
Overall and Individual Multicollinearity Diagnostic measures.
MCTM Markov Chains Transition Matrices
Transition matrices (probabilities or counts) estimation for discrete Markov Chains of order n (1 <= n <= 5).
md Selecting Bandwidth for Kernel Density Estimator with Minimum Distance Method
Selects bandwidth for the kernel density estimator with minimum distance method as proposed by Devroye and Lugosi (1996). The minimum distance method directly selects the optimal kernel density estimator from countably infinite kernel density estimators and indirectly selects the optimal bandwidth. This package selects the optimal bandwidth from finite kernel density estimators.
md.log Produces Markdown Log File with a Built-in Function Call
Produces clean and neat Markdown log file and also provide an argument to include the function call inside the Markdown log.
mDAG Inferring Causal Network from Mixed Observational Data Using a Directed Acyclic Graph
Learning a mixed directed acyclic graph based on both continuous and categorical data.
mdendro Variable-Group Methods for Agglomerative Hierarchical Clustering
A collection of methods for agglomerative hierarchical clustering strategies on a matrix of distances, implemented using the variable-group approach introduced in Fernandez and Gomez (2008) <doi:10.1007/s00357-008-9004-x>. Descriptive measures to analyze the resulting hierarchical trees are also provided. In addition to the usual clustering methods, two parameterized methods are provided to explore an infinite family of hierarchical clustering strategies. When there are ties in proximity values, the hierarchical trees obtained are unique and independent of the order of the elements in the input matrix.
mdhglm Multivariate Double Hierarchical Generalized Linear Models
Allows different models for multivariate response variables where each response is assumed to follow double hierarchical generalized linear models. In double hierarchical generalized linear models, the mean, dispersion parameters for variance of random effects, and residual variance (overdispersion) can be further modeled as random-effect models.
MDimNormn Multi-Dimensional MA Normalization for Plate Effect
Normalize data to minimize the difference between sample plates (batch effects). For given data in a matrix and grouping variable (or plate), the function ‘normn_MA’ normalizes the data on MA coordinates. More details are in the citation. The primary method is ‘Multi-MA’. Other fitting functions on MA coordinates can also be employed e.g. loess.
mdir.logrank Multiple-Direction Logrank Test
Implemented is the multiple-direction logrank test for two-sample right censored data. In addition to the statistic two p-values are calculated: the first one is based on a chi-squared approximation and the second one is based on a permutation approach. Ditzhaus, M. and Friedrich, S. (2018) <arXiv:1807.05504>.
mdmb Model Based Treatment of Missing Data
Contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation. Multiple imputation can be also conducted.
MDMR Multivariate Distance Matrix Regression
Allows a user to conduct multivariate distance matrix regression using analytic p-values and compute measures of effect size.
mdpeer Graph-Constrained Regression with Enhanced Regularization Parameters Selection
Performs graph-constrained regularization in which regularization parameters are selected with the use of a known fact of equivalence between penalized regression and Linear Mixed Model solutions. Provides implementation of three different regression methods where graph-constraints among coefficients are accounted for. ‘crPEER’ (Partially Empirical Eigenvectors for Regression with Constant Ridge, Constant Ridge PEER) method utilizes additional Ridge term to handle the non-invertibility of a graph Laplacian matrix. ‘vrPEER’ (Variable Reduction PEER) method performs variable-reduction procedure to handle the non-invertibility of a graph Laplacian matrix. Finally, ‘RidgePEER’ method employs a penalty term being a linear combination of graph-originated and Ridge-originated penalty terms, whose two regularization parameters are ML estimators from corresponding Linear Mixed Model solution. Notably, in ‘RidgePEER’ method a graph-originated penalty term allows imposing similarity between coefficients based on graph information given whereas additional Ridge-originated penalty term facilitates parameters estimation: it reduces computational issues arising from singularity in a graph-originated penalty matrix and yields plausible results in situations when graph information is not informative or when it is unclear whether connectivities represented by a graph reflect similarities among corresponding coefficients.
mds Medical Devices Surveillance
A set of core functions for handling medical device event data in the context of post-market surveillance, pharmacovigilance, signal detection and trending, and regulatory reporting. Primary inputs are data on events by device and data on exposures by device. Outputs include: standardized device-event and exposure datasets, defined analyses, and time series.
mdscore Improved Score Tests for Generalized Linear Models
A set of functions to obtain modified score test for generalized linear models.
mdsOpt Searching for Optimal MDS Procedure for Metric Data
Searching for Optimal MDS procedure for metric data.
MDSPCAShiny Interactive Document for Working with Multidimensional Scaling and Principal Component Analysis
An interactive document on the topic of multidimensional scaling and principal component analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
mdsr Complement to ‘Modern Data Science with R’
A complement to *Modern Data Science with R* (ISBN: 978-1498724487, publisher URL: <https://…/9781498724487> ). This package contains all of the data and code necessary to complete exercises and reproduce examples from the text. It also facilitates connections to the SQL database server used in the book.
mdsstat Statistical Trending for Medical Devices Surveillance
A collection of common statistical algorithms used in active surveillance of medical device events. Context includes post-market surveillance, pharmacovigilance, signal detection and trending, and regulatory reporting. Primary inputs are device-event time series. Outputs include trending results with the ability to run multiple algorithms at once. This package works well with the ‘mds’ package, but does not require it.
mdw Maximum Diversity Weighting
Dimension-reduction methods aim at defining a score that maximizes signal diversity. Two approaches, namely maximum entropy weights and maximum variance weights, are provided.
meanr Basic Sentiment Analysis Scorer
A popular technique in text analysis today is sentiment analysis, or trying to determine the overall emotional attitude of a piece of text (positive or negative). We provide a new, basic implementation of a common method for computing sentiment, whereby words are scored as positive or negative according to a ‘dictionary’, and then an average of those scores for the document is produced. The package uses the ‘Hu’ and ‘Liu’ sentiment dictionary for assigning sentiment.
MeanShift Clustering via the Mean Shift Algorithm
Clustering using the mean shift algorithm (multi-core processing is supported) or its blurring version. See <https://…/> for more information.
meanShiftR A Computationally Efficient Mean Shift Implementation
Performs mean shift classification using linear and k-d tree based nearest neighbor implementations for the Gaussian kernel.
measurements Tools for Units of Measurement
Collection of tools to make working with physical measurements easier. Convert between metric and imperial units, or calculate a dimension’s unknown value from other dimensions’ measurements.
MEclustnet Fits the Mixture of Experts Latent Position Cluster Model to Network Data
Fits the mixture of experts latent position cluster model to network data to cluster nodes into subgroups, while incorporating covariate information, in a mixture of experts model setting.
MED Mediation by Tilted Balancing
Nonparametric estimation and inference for natural direct and indirect effects by Chan, Imai, Yam and Zhang (2016) <arXiv:1601.03501>.
mediation Causal Mediation Analysis
We implement parametric and non parametric mediation analysis. This package performs the methods and suggestions in Imai, Keele and Yamamoto (2010), Imai, Keele and Tingley (2010), Imai, Tingley and Yamamoto (2013), Imai and Yamamoto (2013) and Yamamoto (2013). In addition to the estimation of causal mediation effects, the software also allows researchers to conduct sensitivity analysis for certain parametric models.
medmod Simple Mediation and Moderation Analysis
This toolbox allows you to do simple mediation and moderation analysis. It is also available as a module for ‘jamovi’ (see <https://www.jamovi.org> for more information). ‘Medmod’ is based on the ‘lavaan’ package by Yves Rosseel. You can find an in depth tutorial on the ‘lavaan’ model syntax used for this package on <http://…/index.html>.
MedSurvey Mediation Analysis for Complex Surveys
It is a computer tool to conduct mediation analysis for complex surveys using multi-stage sampling. Specifically, the mediation analysis method using balanced repeated replications was proposed by Mai, Ha, and Soulakova (2019) <DOI:10.1080/10705511.2018.1559065>. The development of ‘MedSurvey’ was sponsored by American Lebanese Syrian Associated Charities (ALSAC). However, the contents of MedSurvey do not necessarily represent the policy of the ALSAC.
meetupapi Access ‘Meetup’ API
Allows management of ‘Meetup’ groups via the <h…/>. Provided are a set of functions that enable fetching information of joined meetups, attendance, and members. This package requires the use of an API key.
MEFM Electricity Forecasting
Shu Fan and I have devel­oped a model for elec­tric­ity demand fore­cast­ing that is now widely used in Aus­tralia for long-​​term fore­cast­ing of peak elec­tric­ity demand. It has become known as the “Monash Elec­tric­ity Fore­cast­ing Model”. We have decided to release an R pack­age that imple­ments our model so that other peo­ple can eas­ily use it. The pack­age is called “MEFM” and is avail­able on github. We will prob­a­bly also put in on CRAN eventually. The model was first described in Hyn­d­man and Fan (2010). We are con­tin­u­ally improv­ing it, and the lat­est ver­sion is decribed in the model doc­u­men­ta­tion which will be updated from time to time. The pack­age is being released under a GPL licence, so any­one can use it. All we ask is that our work is prop­erly cited. Naturally, we are not able to pro­vide free tech­ni­cal sup­port, although we wel­come bug reports. We are avail­able to under­take paid con­sult­ing work in elec­tric­ity forecasting.
MEGENA Multiscale Clustering of Geometrical Network
Co-Expression Network Analysis by adopting network embedding technique.
mekko Variable Width Bar Charts: Bar Mekko
Create variable width bar charts i.e. ‘bar mekko’ charts to include important quantitative context. Closely related to mosaic, spine (or spinogram), matrix, submarine, olympic, Mondrian or product plots and tree maps.
meltt Matching Event Data by Location, Time and Type
Framework for merging and disambiguating event data based on spatiotemporal co-occurrence and secondary event characteristics. It can account for intrinsic ‘fuzziness’ in the coding of events, varying event taxonomies and different geo-precision codes.
melviewr View and Classify MELODIC Output for ICA+FIX
Provides a graphical interface that allows the user to easily view and classify output from ‘MELODIC’, a part of the ‘FSL’ neuroimaging analysis software suite that performs independent component analysis (ICA; see <https://…/> for more information). The user categorizes a component as signal or noise based on its spatial and temporal characteristics and can then save a text file of these classifications in the format required by ‘ICA+FIX’, an automatic noise removal tool (<https://…/FIX> ).
memery Internet Memes for Data Analysts
Generates internet memes that optionally include a superimposed inset plot and other atypical features, combining the visual impact of an attention-grabbing meme with graphic results of data analysis. The package differs from related packages that focus on imitating and reproducing standard memes. Some packages do this by interfacing with online meme generators whereas others achieve this natively. This package takes the latter approach. It does not interface with online meme generators or require any authentication with external websites. It reads images directly from local files or via URL and meme generation is done by the package. While this is similar to the ‘meme’ package available on CRAN, it differs in that the focus is on allowing for non-standard meme layouts and hybrids of memes mixed with graphs. While this package can be used to make basic memes like an online meme generator would produce, it caters primarily to hybrid graph-meme plots where the meme presentation can be seen as a backdrop highlighting foreground graphs of data analysis results. The package also provides support for an arbitrary number of meme text labels with arbitrary size, position and other attributes rather than restricting to the standard top and/or bottom text placement. This is useful for proper aesthetic interleaving of plots of data between meme image backgrounds and overlain text labels. The package offers a selection of templates for graph placement and appearance with respect to the underlying meme. Graph templates also permit additional template-specific customization.
memnet Network Tools for Memory Research
Efficient implementations of network science tools to facilitate research into human (semantic) memory. In its current version, the package contains several methods to infer networks from verbal fluency data, various network growth models, diverse (switcher-) random walk processes, and tools to analyze and visualize networks. To deliver maximum performance the majority of the code is written in C++. For an application see: Wulff, D. U., Hills, T., & Mata, R. (2018) <doi:10.31234/osf.io/s73dp>.
memo In-Memory Caching for Repeated Computations
A simple in-memory, LRU cache that can be wrapped around any function to memoize it. The cache can be keyed on a hash of the input data (using ‘digest’) or on pointer equivalence.
memoise Memoisation of Functions
Cache the results of a function so that when you call it again with the same arguments it returns the pre-computed value.
memor A ‘rmarkdown’ Template that Can be Highly Customized
A ‘rmarkdown’ template that supports company logo, contact info, watermarks and more. Currently restricted to ‘Latex’/’Markdown’; a similar ‘HTML’ theme will be added in the future.
MenuCollection Collection of Configurable GTK+ Menus
Set of configurable menus built with GTK+ to graphically interface new functions.
MEPDF Multivariate Empirical Density Function
Based on the input data an n-dimensional cube with sub cells of user specified side length is created. The number of sample points which fall in each sub cube is counted, and with the cell volume and overall sample size an empirical probability can be computed. A number of cubes of higher resolution can be superimposed. The basic method stems from J.L. Bentley in ‘Multidimensional Divide and Conquer’. J. L. Bentley (1980) <doi:10.1145/358841.358850>.
mephas Medical and Pharmaceutical Statistics Shiny Application
This is a shiny application which facilitates researchers to analyze medical and pharmaceutical and related data.
merDeriv Case-Wise and Cluster-Wise Derivatives for Mixed Effects Models
Compute analytic case-wise and cluster-wise derivative for mixed effects models with respect to fixed effects parameter, random effect (co)variances, and residual variance.
merlin Mixed Effects Regression for Linear, Non-Linear and User-Defined Models
Fits linear, non-linear, and user-defined mixed effects regression models following the framework developed by Crowther (2017) <arXiv:1710.02223>. ‘merlin’ can fit multivariate outcome models of any type, each of which could be repeatedly measured (longitudinal), with any number of levels, and with any number of random effects at each level. Standard distributions/models available include the Bernoulli, Gaussian, Poisson, beta, negative-binomial, and time-to-event/survival models include the exponential, Gompertz, Royston-Parmar, Weibull and general hazard model. ‘merlin’ provides a flexible predictor syntax, allowing the user to define variables, random effects, spline and fractional polynomial functions, functions of other outcome models, and any interaction between each of them. Non-linear and time-dependent effects are seamlessly incorporated into the predictor. ‘merlin’ allows multivariate normal random effects, which are integrated out using Gaussian quadrature or Monte-Carlo integration. Note, ‘merlin’ is based on the ‘Stata’ package of the same name, described in Crowther (2018) <arXiv:1806.01615>.
merror Accuracy and Precision of Measurements
N>=3 methods are used to measure each of n items. The data are used to estimate simultaneously systematic error (bias) and random error (imprecision). Observed measurements for each method or device are assumed to be linear functions of the unknown true values and the errors are assumed normally distributed. Maximum likelihood estimation is used for the imprecision standard deviation estimates. Pairwise calibration curves and plots can be easily generated.
merTools Tools for Analyzing Mixed Effect Regression Models
Provides methods for extracting results from merMod objects in the lme4 package. Allows construction of prediction intervals efficiently from large scale LMM and GLMM models.
meshsimp Simplification of Surface Triangular Meshes with Associated Distributed Data
Iterative simplification strategy for surface triangular meshes (2.5D meshes) with associated data. Each iteration corresponds to an edge collapse where the selection of the edge to contract is driven by a cost functional that depends both on the geometry of the mesh than on the distribution of the data locations over the mesh. The library can handle both zero and higher genus surfaces. The package has been designed to be fully compatible with the R package ‘fdaPDE’, which implements regression models with partial differential regularizations, making use of the Finite Element Method. In the future, the functionalities provided by the current package may be directly integrated into ‘fdaPDE’.
messaging Conveniently Issue Messages, Warnings, and Errors
Provides tools for creating and issuing nicely-formatted text within R diagnostic messages and those messages given during warnings and errors. The formatting of the messages can be customized using templating features. Issues with singular and plural forms can be handled through specialized syntax.
meta4diag Meta-Analysis for Diagnostic Test Studies
Bayesian inference analysis for bivariate meta-analysis of diagnostic test studies using integrated nested Laplace approximation with INLA. A purpose built graphic user interface is available. The installation of R package INLA is compulsory for successful usage. The INLA package can be obtained from <http://www.r-inla.org>. We recommend the testing version, which can be downloaded by running: source(‘http://…/givemeINLA-testing.R’ ).
MetaAnalyser An Interactive Visualisation of Meta-Analysis as a Physical Weighing Machine
An interactive application to visualise meta-analysis data as a physical weighing machine. The interface is based on the Shiny web application framework, though can be run locally and with the user’s own data.
metaBLUE BLUE for Combining Location and Scale Information in a Meta-Analysis
The sample mean and standard deviation are two commonly used statistics in meta-analyses, but some trials use other summary statistics such as the median and quartiles to report the results. Therefore, researchers need to transform those information back to the sample mean and standard deviation. This package implemented sample mean estimators by Luo et al. (2016) <arXiv:1505.05687>, sample standard deviation estimators by Wan et al. (2014) <arXiv:1407.8038>, and the best linear unbiased estimators (BLUEs) of location and scale parameters by Yang et al. (2018, submitted) based on sample quantiles derived summaries in a meta-analysis.
metaBMA Bayesian Model Averaging for Random and Fixed Effects Meta-Analysis
Computes the posterior model probabilities for four meta-analysis models (null model vs. alternative model assuming either fixed- or random-effects, respectively). These posterior probabilities are used to estimate the overall mean effect size as the weighted average of the mean effect size estimates of the random- and fixed-effect model as proposed by Gronau, Van Erp, Heck, Cesario, Jonas, & Wagenmakers (2017, <doi:10.1080/23743603.2017.1326760>). The user can define a wide range of noninformative or informative priors for the mean effect size and the heterogeneity coefficient. Funding for this research was provided by the Berkeley Initiative for Transparency in the Social Sciences, a program of the Center for Effective Global Action (CEGA), with support from the Laura and John Arnold Foundation.
metacart Meta-CART: A Flexible Approach to Identify Moderators in Meta-Analysis
Fits meta-CART by integrating classification and regression trees (CART) into meta-analysis. Meta-CART is a flexible approach to identify interaction effects between moderators in meta-analysis. The methods are described in Dusseldorp et al. (2014) <doi:10.1037/hea0000018> and Li et al. (2017) <doi:10.1111/bmsp.12088>.
metacoder Tools for Parsing, Manipulating, and Graphing Hierarchical Data
A set of tools for parsing, manipulating, and graphing data classified by a hierarchy (e.g. a taxonomy).
MetaComp EDGE Taxonomy Assignments Visualization
Implements routines for metagenome sample taxonomy assignments collection, aggregation, and visualization. Accepts the EDGE-formatted output from GOTTCHA/GOTTCHA2, BWA, Kraken, and MetaPhlAn. Produces SVG and PDF heatmap-like plots comparing taxa abundances across projects.
MetaCycle Evaluate Periodicity in Large Scale Data
Provides two functions-meta2d and meta3d for detecting rhythmic signals from time-series datasets. For analyzing time-series datasets without individual information, ‘meta2d’ is suggested, which could incorporates multiple methods from ARSER, JTK_CYCLE and Lomb-Scargle in the detection of interested rhythms. For analyzing time-series datasets with individual information, ‘meta3d’ is suggested, which takes use of any one of these three methods to analyze time-series data individual by individual and gives out integrated values based on analysis result of each individual.
metaDigitise Extract and Summarise Data from Published Figures
High-throughput, flexible and reproducible extraction of data from figures in primary research papers. metaDigitise() can extract data and / or automatically calculate summary statistics for users from box plots, bar plots (e.g., mean and errors), scatter plots and histograms.
metaforest Exploring Heterogeneity in Meta-Analysis using Random Forests
A requirement of classic meta-analysis is that the studies being aggregated are conceptually similar, and ideally, close replications. However, in many fields, there is substantial heterogeneity between studies on the same topic. Similar research questions are studied in different laboratories, using different methods, instruments, and samples. Classic meta-analysis lacks the power to assess more than a handful of univariate moderators, or to investigate interactions between moderators, and non-linear effects. MetaForest, by contrast, has substantial power to explore heterogeneity in meta-analysis. It can identify important moderators from a larger set of potential candidates, even with as little as 20 studies (Van Lissa, in preparation). This is an appealing quality, because many meta-analyses have small sample sizes. Moreover, MetaForest yields a measure of variable importance which can be used to identify important moderators, and offers partial prediction plots to explore the shape of the marginal relationship between moderators and effect size.
metafuse Fused Lasso Approach in Regression Coefficient Clustering
For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models.
metagear Comprehensive Research Synthesis Tools for Systematic Reviews and Meta-Analysis
Functionalities for facilitating systematic reviews, data extractions, and meta-analyses. It includes a GUI (graphical user interface) to help screen the abstracts and titles of bibliographic data; tools to assign screening effort across multiple collaborators/reviewers and to assess inter-reviewer reliability; tools to help automate the download and retrieval of journal PDF articles from online databases; automated data extractions from scatter-plots; PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagrams; simple imputation tools to fill gaps in incomplete or missing study parameters; generation of random effects sizes for Hedges’ d, log response ratio, odds ratio, and correlation coefficients for Monte Carlo experiments; covariance equations for modelling dependencies among multiple effect sizes (e.g., effect sizes with a common control); and finally summaries that replicate analyses and outputs from widely used but no longer updated meta-analysis software. Funding for this package was supported by National Science Foundation (NSF) grants DBI-1262545 and DEB-1451031.
metaheur Metaheuristic Optimization Framework for Preprocessing Combinations
Automation of preprocessing often requires computationally costly preprocessing combinations. This package helps to find near-best combinations faster. Sub heuristics supported are random and grid restarts, taboo list, decreasing probability for accepting inferior solutions and location of previously best solution candidate is compared to. The package is intended to be used with package ‘preprocomb’ and takes its ‘GridClass’ object as input.
metaheuristicOpt Metaheuristic for Optimization
An implementation of metaheuristic algorithms for continuous optimization. Currently, the package contains the implementations of the following algorithms: particle swarm optimization (Kennedy and Eberhart, 1995), ant lion optimizer (Mirjalili, 2015 <doi:10.1016/j.advengsoft.2015.01.010>), grey wolf optimizer (Mirjalili et al., 2014 <doi:10.1016/j.advengsoft.2013.12.007>), dragonfly algorithm (Mirjalili, 2015 <doi:10.1007/s00521-015-1920-1>), firefly algorithm (Yang, 2009 <doi:10.1007/978-3-642-04944-6_14>), genetic algorithm (Holland, 1992, ISBN:978-0262581110), grasshopper optimisation algorithm (Saremi et al., 2017 <doi:10.1016/j.advengsoft.2017.01.004>), harmony search algorithm (Mahdavi et al., 2007 <doi:10.1016/j.amc.2006.11.033>), moth flame optimizer (Mirjalili, 2015 <doi:10.1016/j.knosys.2015.07.006>, sine cosine algorithm (Mirjalili, 2016 <doi:10.1016/j.knosys.2015.12.022>) and whale optimization algorithm (Mirjalili and Lewis, 2016 <doi:10.1016/j.advengsoft.2016.01.008>).
metamer Creates Data with Identical Statistics
Creates data with identical statistics (metamers) using an iterative algorithm proposed by Matejka & Fitzmaurice (2017) <DOI:10.1145/3025453.3025912>.
MetaPath Perform the Meta-Analysis for Pathway Enrichment Analysis (MAPE)
Perform the Meta-analysis for Pathway Enrichment (MAPE) methods introduced by Shen and Tseng (2010). It includes functions to automatically perform MAPE_G (integrating multiple studies at gene level), MAPE_P (integrating multiple studies at pathway level) and MAPE_I (a hybrid method integrating MAEP_G and MAPE_P methods). In the simulation and real data analyses in the paper, MAPE_G and MAPE_P have complementary advantages and detection power depending on the data structure. In general, the integrative form of MAPE_I is recommended to use. In the case that MAPE_G (or MAPE_P) detects almost none pathway, the integrative MAPE_I does not improve performance and MAPE_P (or MAPE_G) should be used. Reference: Shen, Kui, and George C Tseng. Meta-analysis for pathway enrichment analysis when combining multiple microarray studies.Bioinformatics (Oxford, England) 26, no. 10 (April 2010): 1316-1323. doi:10.1093/bioinformatics/btq148. http://…/20410053.
metaplot Formalized Plots for Self-Describing Data
Creates fully-annotated plots with minimum guidance. Since the data is self-describing, less effort is needed for creating the plot. Generally expects data of class folded (see fold package). If attributes GUIDE and LABEL are present, they will be used to create formal axis labels. Several aesthetics are supported, such as reference lines, unity lines, smooths, and log transformations.
metaplotr Creates CrossHairs Plots for Meta-Analyses
Creates crosshairs plots to summarize and analyse meta-analysis results. In due time this package will contain code that will create other kind of meta-analysis graphs.
metaplus Robust Meta-Analysis and Meta-Regression
Performs meta-analysis and meta-regression using standard and robust methods with confidence intervals from the profile likelihood. Robust methods are based on alternative distributions for the random effect, either the t-distribution (Lee and Thompson, 2008 doi:10.1002/sim.2897 or Baker and Jackson, 2008 doi:10.1007/s10729-007-9041-8) or mixtures of normals (Beath, 2014 doi:10.1002/jrsm.1114).
metaRMST Meta-Analysis of RMSTD
R implementation of a multivariate meta-analysis of randomized controlled trials (RCT) with the difference in restricted mean survival times (RMSTD). Use this package with individual patient level data from an RCT for a time-to-event outcome to determine combined effect estimates according to 4 methods: 1) a univariate meta-analysis using observed treatment effects, 2) a univariate meta-analysis using effects predicted by fitted Royston-Parmar flexible parametric models, 3) multivariate meta-analysis with analytically derived covariance, 4) multivariate meta-analysis with bootstrap derived covariance. This package computes all combined effects and provides an RMSTD curve with combined effect estimates and their confidence intervals.
metaSEM Meta-Analysis using Structural Equation Modeling
A collection of functions for conducting meta-analysis using a structural equation modeling (SEM) approach via the ‘OpenMx’ package. It also implements the two-stage SEM approach to conduct meta-analytic structural equation modeling on correlation and covariance matrices.
MetaSKAT Meta Analysis for SNP-Set (Sequence) Kernel Association Test
Functions for Meta-analysis Burden test, SKAT and SKAT-O by Lee et al. (2013) <doi: 10.1016/j.ajhg.2013.05.010>. These methods use summary-level score statistics to carry out gene-based meta-analysis for rare variants.
MetaStan Bayesian Meta-Analysis via ‘Stan’
Performs Bayesian meta-analysis using ‘Stan’. Includes binomial-normal hierarchical models and option to use weakly informative priors for the heterogeneity parameter and the treatment effect parameter which are described in Guenhan, Roever, and Friede (2018) <arXiv:1809.04407>.
MetaSubtract Subtracting Summary Statistics of One or more Cohorts from Meta-GWAS Results
If results from a meta-GWAS are used for validation in one of the cohorts that was included in the meta-analysis, this will yield biased (i.e. too optimistic) results. The validation cohort needs to be independent from the meta-GWAS results. MetaSubtract will subtract the results of the respective cohort from the meta-GWAS results analytically without having to redo the meta-GWAS analysis using the leave-one-out methodology. It can handle different meta-analyses methods and takes into account if single or double genomic control correction was applied to the original meta-analysis. It can be used for whole GWAS, but also for a limited set of SNPs or other genetic markers.
MetaUtility Utility Functions for Conducting and Reporting Meta-Analysis
Contains functions to estimate the proportion of effects stronger than a threshold of scientific importance (per Mathur & VanderWeele, 2018 [<https://…/sim.8057>] ), to make various effect size conversions, and to compute and format inference in a meta-analysis.
metavcov Variance-Covariance Matrix for Multivariate Meta-Analysis
Compute variance-covariance matrix for multivariate meta-analysis. Effect sizes include correlation (r), mean difference (MD), standardized mean difference (SMD), log odds ratio (logOR), log risk ratio (logRR), and risk difference (RD).
metaviz Rainforest Plots for Meta-Analysis
Creates rainforest plots (proposed by Schild & Voracek, 2015 <doi:10.1002/jrsm.1125>), a variant and enhancement of the classic forest plot for meta-analysis. In the near future, the ‘metaviz’ package will be extended by further, established as well as novel, plotting options for visualizing meta-analytic data.
metawho Meta-Analytical Implementation to Identify Who Benefits Most from Treatments
A tool for implementing so called ‘deft’ approach (see Fisher, David J., et al. (2017) <DOI:10.1136/bmj.j573>) and model visualization.
MEtest A Homogeneity Test under the Presence of Measurement Errors
Provides a function me.test() to test equality of distributions when observations are subject to measurement errors.
MethodCompare Bias and Precision Plots to Compare Two Measurements with Possibly Heteroscedastic Measurement Errors
Implementation of the methodology from the paper titled ‘Effective plots to assess bias and precision in method comparison studies’ published in Statistical Methods in Medical Research, P. Taffe (2016) <doi:10.1177/0962280216666667>.
metricsgraphics An htmlwidget interface to the MetricsGraphics.js D3 chart library
metricsgraphics is an ‘htmlwidget’ interface to the MetricsGraphics.js D3 chart library. The current htmlwidget wrapper for it is minimaly functional and does not provide support for metricsgraphics histograms and provides nascent support for metricsgraphics’ best feature – time series charts.
metricTester Test Metric and Null Model Statistical Performance
Explore the behavior and performance of phylogenetic metrics and null models.
mexhaz Mixed Effect Excess Hazard Models
Fit flexible (excess) hazard regression models with a random effect defined at the cluster level.
MFAg Multiple Factor Analysis (MFA)
Performs Multiple Factor Analysis method for quantitative, categorical, frequency and mixed data, in addition to generating a lot of graphics, also has other useful functions.
mfbvar Mixed-Frequency Bayesian VAR Models
Estimation of mixed-frequency Bayesian vector autoregressive (VAR) models with Minnesota or steady-state priors. The package implements a state space-based VAR model that handles mixed frequencies of the data. The model is estimated using Markov Chain Monte Carlo to numerically approximate the posterior distribution, where the prior can be either the Minnesota prior, as used by Schorfheide and Song (2015) <doi:10.1080/07350015.2014.954707>, or the steady-state prior, as advocated by Ankargren, Unosson and Yang (2018) <http://…/FULLTEXT01.pdf>.
MFDFA MultiFractal Detrended Fluctuation Analysis
Applies the MultiFractal Detrended Fluctuation Analysis (MFDFA) to time series. The MFDFA() function proposed in this package was used in Laib et al. (<doi:10.1016/j.chaos.2018.02.024> and <doi:10.1063/1.5022737>). See references for more information.
mfe Meta-Feature Extractor
Extracts meta-features from datasets to support the design of recommendation systems based on Meta-Learning. The meta-features, also called characterization measures, are able to characterize the complexity of datasets and to provide estimates of algorithm performance. The package contains not only the standard characterization measures, but also more recent characterization measures. By making available a large set of meta-feature extraction functions, this package allows a comprehensive data characterization, a deep data exploration and a large number of Meta-Learning based data analysis. These concepts are described in the book: Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R. (2009) <doi:10.1007/978-3-540-73263-1>.
mfGARCH Mixed-Frequency GARCH Models
Estimating GARCH-MIDAS (MIxed-DAta-Sampling) models (Engle, Ghysels, Sohn, 2013, <doi:10.1162/REST_a_00300>) and related statistical inference, accompanying the paper ‘Two are better than one: volatility forecasting using multiplicative component GARCH models’ by Conrad, Kleen (2018, Working Paper). The GARCH-MIDAS model decomposes the conditional variance of (daily) stock returns into a short- and long-term component, where the latter may depend on an exogenous covariate sampled at a lower frequency.
mFilter Miscellaneous Time Series Filters
The mFilter package implements several time series filters useful for smoothing and extracting trend and cyclical components of a time series. The routines are commonly used in economics and finance, however they should also be interest to other areas. Currently, Christiano-Fitzgerald, Baxter-King, Hodrick-Prescott, Butterworth, and trigonometric regression filters are included in the package.
MFPCA Multivariate Functional Principal Component Analysis for Data Observed on Different Dimensional Domains
Calculate a multivariate functional principal component analysis for data observed on different dimensional domains. The estimation algorithm relies on univariate basis expansions for each element of the multivariate functional data. Multivariate and univariate functional data objects are represented by S4 classes for this type of data implemented in the package ‘funData’.
MFT The Multiple Filter Test for Change Point Detection
Provides statistical tests and algorithms for the detection of change points in time series and point processes – particularly for changes in the mean in time series and for changes in the rate and in the variance in point processes. References – Michael Messer, Marietta Kirchner, Julia Schiemann, Jochen Roeper, Ralph Neininger and Gaby Schneider (2014) <doi:10.1214/14-AOAS782>, Stefan Albert, Michael Messer, Julia Schiemann, Jochen Roeper, Gaby Schneider (2017) <doi:10.1111/jtsa.12254>, Michael Messer, Kaue M. Costa, Jochen Roeper and Gaby Schneider (2017) <doi:10.1007/s10827-016-0635-3>.
mfx Marginal Effects, Odds Ratios and Incidence Rate Ratios for GLMs
Estimates probit, logit, Poisson, negative binomial, and beta regression models, returning their marginal effects, odds ratios, or incidence rate ratios as an output. Greene (2008, pp. 780-7) provides a textbook introduction to this topic.
mgarchBEKK Simulating, Estimating and Diagnosing MGARCH (BEKK and mGJR) Processes
Procedures to simulate, estimate and diagnose MGARCH processes of BEKK and multivariate GJR (bivariate asymmetric GARCH model) specification.
mgc Multiscale Graph Correlation
Multiscale Graph Correlation (MGC) is a framework developed by Shen et al. (2017) <arXiv:1609.05148> that extends global correlation procedures to be multiscale; consequently, MGC tests typically require far fewer samples than existing methods for a wide variety of dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC provides a simple and elegant multiscale characterization of the potentially complex latent geometry underlying the relationship.
mgcViz Visualisations for Generalized Additive Models
Extension of the ‘mgcv’ package, providing visual tools for Generalized Additive Models (GAMs) that exploit the additive structure of GAMs, scale to large data sets and can be used in conjunction with a wide range of response distributions. The focus is providing visual methods for better understanding the model output and for aiding model checking and development beyond simple exponential family regression. The graphical framework is based on the layering system provided by ‘ggplot2’.
MGGM Structural Pursuit Over Multiple Undirected Graphs
Implement algorithms to recover multiple networks by pursuit of both sparseness and cluster.
MGL Module Graphical Lasso
An aggressive dimensionality reduction and network estimation technique for a high-dimensional Gaussian graphical model (GGM). Please refer to: Efficient Dimensionality Reduction for High-Dimensional Network Estimation, Safiye Celik, Benjamin A. Logsdon, Su-In Lee, Proceedings of The 31st International Conference on Machine Learning, 2014, p. 1953-1961.
MGLM Multivariate Response Generalized Linear Models
Provides functions that (1) fit multivariate discrete distributions, (2) generate random numbers from multivariate discrete distributions, and (3) run regression and penalized regression on the multivariate categorical response data. Implemented models include: multinomial logit model, Dirichlet multinomial model, generalized Dirichlet multinomial model, and negative multinomial model. Making the best of the minorization-maximization (MM) algorithm and Newton-Raphson method, we derive and implement stable and efficient algorithms to find the maximum likelihood estimates. On a multi-core machine, multi-threading is supported.
mglmn Model Averaging for Multivariate GLM with Null Models
Tools for univariate and multivariate generalized linear models with model averaging and null model technique.
mgm Estimating Mixed Graphical Models
Functions to estimate and sample from a mixed Graphical model.
mgsub Safe, Multiple, Simultaneous String Substitution
Designed to enable simultaneous substitution in strings in a safe fashion. Safe means it does not rely on placeholders (which can cause errors in same length matches).
mgwrsar GWR and MGWR with Spatial Autocorrelation
Functions for computing (Mixed) Geographycally Weighted Regression with spatial autocorrelation, Geniaux and Martinetti (2017) <doi:10.1016/j.regsciurbeco.2017.04.001>.
mhde Minimum Hellinger Distance Test for Normality
Implementation of a goodness-of-fit test for normality using the Minimum Hellinger Distance.
mhtboot Multiple Hypothesis Test Based on Distribution of p Values
A framework for multiple hypothesis testing based on distribution of p values. It is well known that the p values come from different distribution for null and alternatives, in this package we provide functions to detect that change. We provide a method for using the change in distribution of p values as a way to detect the true signals in the data.
MHTcop Tests Controlling the FDR / FWER under Certain Copula Models
Implements tests controlling the false discovery rate (FDR) / family-wise error rate (FWER) for some copula models.
MHTdiscrete Multiple Hypotheses Testing for Discrete Data
A Comprehensive tool for almost all existing multiple testing methods for discrete data. The package also provides some novel multiple testing procedures controlling FWER/FDR for discrete data. In order to make decisions conveniently, given discrete p-values and their domains, the [method].p.adjust function returns adjusted p-values, which can be used to compare with the nominal significant level alpha.
MHTmult Multiple Hypotheses Testing for Multiple Families/Groups Structure
A Comprehensive tool for almost all existing multiple testing methods for multiple families. The package summarizes the existing methods for multiple families multiple testing procedures (MTPs) such as double FDR, group Benjamini-Hochberg (GBH) procedure and average FDR controlling procedure. The package also provides some novel multiple testing procedures using selective inference idea.
mhurdle Multiple Hurdle Tobit Models
Estimation of models with zero left-censored variables. Null values may be caused by a selection process (Cragg (1971) <doi:10.2307/1909582>), insufficient resources (Tobin (1958) <doi:10.2307/1907382>) or infrequency of purchase (Deaton and Irish (1984) <doi:10.1016/0047-2727(84)90067-7>).
MIAmaxent Maxent Distribution Model Selection
Tools for training, selecting, and evaluating maximum entropy (Maxent) distribution models. This package provides tools for user- controlled transformation of explanatory variables, selection of variables by nested model comparison, and flexible model evaluation and projection. It is based on the strict maximum likelihood interpretation of maximum entropy modelling.
mice Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
micEconIndex Price and Quantity Indices
Tools for calculating Laspeyres, Paasche, and Fisher price and quantity indices.
miceExt Extension Package to ‘mice’
Extends and builds on the ‘mice’ package by adding a functionality to perform multivariate predictive mean matching on imputed data as well as new functionalities to perform predictive mean matching on factor variables.
miceFast Fast Imputations Using ‘Rcpp’ and ‘Armadillo’
Fast imputations under the object-oriented programming paradigm. There was used quantitative models with a closed-form solution. Thus package is based on linear algebra operations. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used.
micemd Multiple Imputation by Chained Equations with Multilevel Data
Addons for the ‘mice’ package to perform multiple imputation using chained equations with two-level data. Includes imputation methods specifically handling sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2017), the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation for ‘mice’.
miceMNAR Missing not at Random Imputation Models for Multiple Imputation by Chained Equation
Provides imputation models and functions for binary or continuous Missing Not At Random (MNAR) outcomes through the use of the ‘mice’ package. The mice.impute.hecknorm() function provides imputation model for continuous outcome based on Heckman’s model also named sample selection model as described in Galimard et al (2016) <doi:10.1002/sim.6902>. The mice.impute.heckprob() function provides imputation model for binary outcome based on bivariate probit model.
micompr Multivariate Independent Comparison of Observations
A procedure for comparing multivariate samples associated with different groups. It uses principal component analysis to convert multivariate observations into a set of linearly uncorrelated statistical measures, which are then compared using a number of statistical methods. The procedure is independent of the distributional properties of samples and automatically selects features that best explain their differences, avoiding manual selection of specific points or summary statistics. It is appropriate for comparing samples of time series, images, spectrometric measures or similar multivariate observations.
miCoPTCM Promotion Time Cure Model with Mis-Measured Covariates
Fits Semiparametric Promotion Time Cure Models, taking into account (using a corrected score approach or the SIMEX algorithm) or not the measurement error in the covariates, using a backfitting approach to maximize the likelihood.
microbats An Implementation of Bat Algorithm in R
A nature-inspired metaheuristic algorithm based on the echolocation behavior of microbats that uses frequency tuning to optimize problems in both continuous and discrete dimensions. This R package makes it easy to implement the standard bat algorithm on any user-supplied function. The algorithm was first developed by Xin-She Yang in 2010 (<DOI:10.1007/978-3-642-12538-6_6>, <DOI:10.1109/CINTI.2014.7028669>).
microdemic Microsoft Academic’ API Client
The ‘Microsoft Academic Knowledge’ API provides programmatic access to scholarly articles in the ‘Microsoft Academic Graph’ (<https://…/> ). Includes methods matching all ‘Microsoft Academic’ API routes, including search, graph search, text similarity, and interpret natural language query string.
MicroMacroMultilevel Micro-Macro Multilevel Modeling
Most multilevel methodologies can only model macro-micro multilevel situations in an unbiased way, wherein group-level predictors (e.g., city temperature) are used to predict an individual-level outcome variable (e.g., citizen personality). In contrast, this R package enables researchers to model micro-macro situations, wherein individual-level (micro) predictors (and other group-level predictors) are used to predict a group-level (macro) outcome variable in an unbiased way.
micromap Linked Micromap Plots
This group of functions simplifies the creation of linked micromap plots.
microplot Use R Graphics Files as Microplots (Sparklines) in Tables in LaTeX or HTML Files, and Output Data.frames to Org-Mode
Prepare lists of R graphics files to be used as microplots (sparklines) in tables in either LaTeX or HTML files. For LaTeX use the Hmisc::latex() function or xtable::xtable() with Sweave, knitr, rmarkdown, or Emacs org-mode to construct latex tabular environments which include the graphs. For HTML files use either Emacs org-mode or the htmlTable::htmlTable() function to construct an HTML file containing tables which include the graphs. Examples are shown with lattice graphics, base graphics, and ggplot2 graphics. Examples for LaTeX include Sweave (both LaTeX-style and Noweb-style), knitr, emacs org-mode, and rmarkdown input files and their pdf output files. Examples for HTML include org-mode and Rmd input files and their webarchive HTML output files. In addition, the as.orgtable function can display a data.frame in an org-mode document.
microsynth Synthetic Control Methods with Micro- And Meso-Level Data
A generalization of the ‘Synth’ package that is designed for data at a more granular level (e.g., micro-level). Provides functions to construct weights (including propensity score-type weights) and run analyses for synthetic control methods with micro- and meso-level data; see Robbins, Saunders, and Kilmer (2017) <doi:10.1080/01621459.2016.1213634>.
middlechild Tools to Intercept, Validate and Consume Web/Network Traffic
The ‘mitmproxy’ https://mitmproxy.org project provides tools to intercept, modify and/or introspect network traffic. Methods are provided to download, install, configure and launch ‘mitmproxy’ plus introspect and validate network captures. Special tools are provided enabling testing of R packages that make API calls.
MIDN Nearly Exact Sample Size Calculation for Exact Powerful Nonrandomized Tests for Differences Between Binomial Proportions
Implementation of the mid-n algorithms presented in Wellek S (2015) <DOI:10.1111/stan.12063> Statistica Neerlandica 69, 358-373 for exact sample size calculation for superiority trials with binary outcome.
midrangeMCP Multiples Comparisons Procedures Based on Studentized Midrange and Range Distributions
Apply tests of multiple comparisons based on studentized midrange and range distributions. The tests are: Tukey Midrange test, Student-Newman-Keuls Midrange test, Skott-Knott Midrange test and Skott-Knott Range test.
miic Multivariate Information Inductive Causation
We report an information-theoretic method which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables, commonly found in many datasets. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. This approach can be applied on a wide range of datasets and provide new biological insights on regulatory networks from single cell expression data, genomic alterations during tumor development and co-evolving residues in protein structures. For more information you can refer to: Verny et al. Plos Comput Biol. (2017) <doi:10.1371/journal.pcbi.1005662>.
MIIVsem Two Stage Least Squares with Model Implied Instrumental Search
Functions for estimating structural equation models using a model-implied instrumental variable (MIIV) search and two stage least squares (2SLS) estimator (MIIV-2SLS).
milonga Multiple Imputation for Multivariate Binary Outcome
Multiple imputation for multivariate binary outcome by using Gibbs’ Sampler on all potential profiles.
milr Multiple-Instance Logistic Regression with LASSO Penalty
The multiple instance data set consists of many independent subjects (called bags) and each subject is composed of several components (called instances). The outcomes of such data set are binary or multinomial, and, we can only observe the subject-level outcomes. For example, in manufactory processes, a subject is labeled as ‘defective’ if at least one of its own components is defective, and otherwise, is labeled as ‘non-defective’. The milr package focuses on the predictive model for the multiple instance data set with binary outcomes and performs the maximum likelihood estimation with the Expectation-Maximization algorithm under the framework of logistic regression. Moreover, the LASSO penalty is attached to the likelihood function for simultaneous parameter estimation and variable selection.
mime Map Filenames to MIME Types
Guesses the MIME type from a filename extension using the data derived from /etc/mime.types in UNIX-type systems.
mimi Main Effects and Interactions in Mixed and Incomplete Data
Estimation of main effects and interactions in mixed data sets with missing values. Numeric, binary and count variables are supported. Main effects and interactions are modelled using an exponential family parametric model. Particular examples include the log-linear model for count data and the linear model for numeric data. Estimation is done through a convex program where main effects are assumed sparse and the interactions low-rank. Genevieve Robin, Olga Klopp, Julie Josse, Aric Moulines, Robert Tibshirani (2018) <arXiv:1806.09734>.
mindr Convert Files Between Markdown or Rmarkdown Files and Mindmaps
Convert Markdown (‘.md’) or Rmarkdown (‘.Rmd’) files into FreeMind mindmap (‘.mm’) files, and vice versa. FreeMind mindmap (‘.mm’) files can be opened by or imported to common mindmap software such as ‘FreeMind’ (<http://…/Main_Page> ) and ‘XMind’ (<http://www.xmind.net> ).
mined Minimum Energy Designs
This is a method (MinED) for mining probability distributions using deterministic sampling which is proposed by Joseph, Wang, Gu, Lv, and Tuo (2018). The MinED samples can be used for approximating the target distribution. They can be generated from a density function that is known only up to a proportionality constant and thus, it might find applications in Bayesian computation. Moreover, the MinED samples are generated with much fewer evaluations of the density function compared to random sampling-based methods such as MCMC and therefore, this method will be especially useful when the unnormalized posterior is expensive or time consuming to evaluate.
minimalRSD Minimally Changed CCD and BBD
Generate central composite designs (CCD)with full as well as fractional factorial points (half replicate) and Box Behnken designs (BBD) with minimally changed run sequence.
minimap Create Tile Grid Maps
Create tile grid maps, which are like choropleth maps except each region is represented with equal visual space.
minimaxdesign Minimax and Minimax Projection Designs
The ‘minimaxdesign’ package provides two main functions: mMcPSO() and miniMaxPro(), which generates minimax designs and minimax projection designs using clustering and particle swarm optimization (PSO) techniques. These designs can be used in a variety of settings, e.g., as space-filling designs for computer experiments or sensor allocation designs. A detailed description of the two designs and the employed algorithms can be found in Mak and Joseph (2016).
minimist Parse Argument Options
A binding to the minimist JavaScript library. This module implements the guts of optimist’s argument parser without all the fanciful decoration.
Minirand Minimization Randomization
Randomization schedules are generated in the schemes with k (k>=2) treatment groups and any allocation ratios by minimization algorithms.
miniUI Shiny UI Widgets for Small Screens
Provides UI widget and layout functions for writing Shiny apps that work well on small screens.
mipfp Multidimensional Iterative Proportional Fitting and Alternative Models
An implementation of the iterative proportional fitting (IPFP), maximum likelihood, minimum chi-square and weighted least squares procedures for updating a N-dimensional array with respect to given target marginal distributions (which, in turn can be multi-dimensional). The package also provides an application of the IPFP to simulate multivariate Bernoulli distributions.
MIRL Multiple Imputation Random Lasso for Variable Selection with Missing Entries
Implements a variable selection and prediction method for high-dimensional data with missing entries following the paper Liu et al. (2016) <doi:10.1214/15-AOAS899>. It deals with missingness by multiple imputation and produces a selection probability for each variable following stability selection. The user can further choose a threshold for the selection probability to select a final set of variables. The threshold can be picked by cross validation or the user can define a practical threshold for selection probability. If you find this work useful for your application, please cite the method paper.
mirtjml Joint Maximum Likelihood Estimation for High-Dimensional Item Factor Analysis
Provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing ‘OpenMP’ API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2017). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. arXiv preprint <arXiv:1712.08966>.
miscF Miscellaneous Functions
Various functions for random number generation, density estimation, classification, curve fitting, and spatial data analysis.
misclassGLM Computation of Generalized Linear Models with Misclassified Covariates Using Side Information
Estimates models that extend the standard GLM to take misclassification into account. The models require side information from a secondary data set on the misclassification process, i.e. some sort of misclassification probabilities conditional on some common covariates. A detailed description of the algorithm can be found in Dlugosz, Mammen and Wilke (2015) \url{http://…/PU70410}.
miscor Miscellaneous Functions for the Correlation Coefficient
Statistical test for the product-moment correlation coefficient based on H0: rho = rho0 including sample size computation, statistical test for comparing the product-moment correlation coefficient in independent and dependent samples, partial and semipartial correlation, sequential triangular test for the product-moment correlation coefficient, and simulation of bivariate normal and non-normal distribution with a specified correlation.
mise Clears the Workspace (Mise en Place)
Clears the workspace. Useful for the beginnings of R scripts, to avoid potential problems with accidentally using information from variables or functions from previous script evaluations, too many figure windows open at the same time, packages that you don’t need any more, or a cluttered console. Uses code from various StackOverflow users. See help(mise) for pointers to the relevant StackOverflow pages.
mispr Multiple Imputation with Sequential Penalized Regression
Generates multivariate imputations using sequential regression with L2 penalty. For more details see Zahid and Heumann (2018) <doi:10.1177/0962280218755574>.
MiSPU Microbiome Based Sum of Powered Score (MiSPU) Tests
There is an increasing interest in investigating how the compositions of microbial communities are associated with human health and disease. In this package, we present a novel global testing method called aMiSPU, that is highly adaptive and thus high powered across various scenarios, alleviating the issue with the choice of a phylogenetic distance. Our simulations and real data analysis demonstrated that aMiSPU test was often more powerful than several competing methods while correctly controlling type I error rates.
misreport Statistical Analysis of Misreporting on Sensitive Survey Questions
Enables investigation of the predictors of misreporting on sensitive survey questions through a multivariate list experiment regression method. The method permits researchers to model whether a survey respondent’s answer to the sensitive item in a list experiment is different from his or her answer to an analogous direct question.
missCompare Intuitive Missing Data Imputation Framework
Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. The central assumption behind missCompare is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. missCompare takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. missCompare will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.
missRanger Fast Imputation of Missing Values
Alternative implementation of the beautiful ‘MissForest’ algorithm used to impute mixed-type data sets by chaining tree ensembles, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random jungle package ‘ranger’. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow e.g. to do multiple imputation when repeating the call to missRanger().
mistr Mixture and Composite Distributions
A flexible computational framework for mixture distributions with the focus on the composite models.
mitml Tools for Multiple Imputation in Multilevel Modeling
Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the ‘pan’ package, and several functions for visualization, data management and the analysis of multiply imputed data sets.
miWQS Multiple Imputation using Weighted Quantile Sum Analysis
Consider a set/mixture of continuous, correlated, and censored components/chemicals that are reasonable to combine in an index and share a common outcome. These components are also interval-censored between zero and upper thresholds, or detection limits, that may be different among the components. The `miWQS` package applies the multiple imputation (MI) procedure to the weighted quantile sum regression (WQS) methodology for continuous, binary, or count outcomes. In summary, MI consists of three stages: (1) imputation, (2) analysis, and (3) pooling. First, the missing values are imputed by bootstrapping (Lubin et.al (2004) <doi:10.1289/ehp.7199>), Bayesian imputation, or placing the below the detection limits in the first quantile (BDLQ1) (Ward et.al. (2014) <doi:10.1289/ehp.1307602>). Second, the estimate.wqs() function implements WQS regression if the components are complete, imputed, or missing (Carrico et.al. (2014) <doi:10.1007/s13253-014-0180-3>) . If the data is missing, BDLQ1 is automatically implemented. Lastly, the pool.mi() function calculates the pooled statistics according to Rubin’s rules (Rubin 1987).
MixAll Clustering using Mixture Models
Algorithms and methods for estimating parametric mixture models with missing data.
mixdir Cluster High Dimensional Categorical Datasets
Scalable Bayesian clustering of categorical datasets. The package implements a hierarchical Dirichlet (Process) mixture of multinomial distributions. It is thus a probabilistic latent class model (LCM) and can be used to reduce the dimensionality of hierarchical data and cluster individuals into latent classes. It can automatically infer an appropriate number of latent classes or find k classes, as defined by the user. The model is based on a paper by Dunson and Xing (2009) <doi:10.1198/jasa.2009.tm08439>, but implements a scalable variational inference algorithm so that it is applicable to large datasets.
MixedDataImpute Missing Data Imputation for Continuous and Categorical Data using Nonparametric Bayesian Joint Models
Missing data imputation for continuous and categorical data, using nonparametric Bayesian joint models (specifically the hierarchically coupled mixture model with local dependence described in Murray and Reiter (2015); see ‘citation(‘MixedDataImpute’)’ or http://…/1410.0438 ). See ‘?hcmm_impute’ for example usage.
mixedMem Tools for Discrete Multivariate Mixed Membership Models
Fits mixed membership models with discrete multivariate data (with or without repeated measures) following the general framework of Erosheva 2004. This package uses a Variational EM approach by approximating the posterior distribution of latent memberships and selecting hyperparameters through a pseudo-MLE procedure. Currently supported data types are Bernoulli, multinomial and rank (Plackett-Luce).
MixedPoisson Mixed Poisson Models
The estimation of the parameters in Mixed Poisson models.
mixedsde Estimation Methods for Stochastic Differential Mixed Effects Models
Inference on stochastic differential models Ornstein-Uhlenbeck or Cox-Ingersoll-Ross, with one or two random effects in the drift function.
mixEMM A Mixed-Effects Model for Analyzing Cluster-Level Non-Ignorable Missing Data
Contains functions for estimating a mixed-effects model for clustered data (or batch-processed data) with cluster-level (or batch- level) missing values in the outcome, i.e., the outcomes of some clusters are either all observed or missing altogether. The model is developed for analyzing incomplete data from labeling-based quantitative proteomics experiments but is not limited to this type of data. We used an expectation conditional maximization (ECM) algorithm for model estimation. The cluster-level missingness may depend on the average value of the outcome in the cluster (missing not at random).
mixer Random Graph Clustering
Estimates the parameters, the clusters, as well as the number of clusters of a (binary) stochastic block model (J.-J Daudin, F. Picard, S. Robin (2008) <doi:10.1007/s11222-007-9046-7>).
MIXFIM Evaluation of the FIM in NLMEMs using MCMC
Evaluation and optimization of the Fisher Information Matrix in NonLinear Mixed Effect Models using Markov Chains Monte Carlo for continuous and discrete data.
mixggm Mixtures of Gaussian Graphical Models
Mixtures of Gaussian graphical models for model-based clustering with sparse covariance and concentration matrices. See Fop, Murphy, and Scrucca (2018) <doi:10.1007/s11222-018-9838-y>.
mixKernel Omics Data Integration Using Kernel Methods
Kernel-based methods are powerful methods for integrating heterogeneous types of data. mixKernel aims at providing methods to combine kernel for unsupervised exploratory analysis. Different solutions are provided to compute a meta-kernel, in a consensus way or in a way that best preserves the original topology of the data. mixKernel also integrates kernel PCA to visualize similarities between samples in a non linear space and from the multiple source point of view. Functions to assess and display important variables are also provided in the package.
mixlink Mixture Link Regression
The Mixture Link model <arXiv:1612.03302> is a proposed extension to generalized linear models, where the outcome distribution is a finite mixture of J > 1 densities. This package supports Mixture Link computations for Poisson and Binomial outcomes. This includes the distribution functions, numerical maximum likelihood estimation, Bayesian analysis, and quantile residuals to assess model fit.
mixor Mixed-Effects Ordinal Regression Analysis
Provides the function ‘mixord’ for fitting a mixed-effects ordinal and binary response models and associated methods for printing, summarizing, extracting estimated coefficients and variance-covariance matrix, and estimating contrasts for the fitted models.
mixpack Tools to Work with Mixture Components
A collection of tools implemented to facilitate the analysis of the components of a finite mixture distributions. The package has some functions to generate random samples coming from a finite mixture. The package provides a C++ implementation for the construction of a hierarchy over the components of a given finite mixture.
mixR Finite Mixture Modeling for Raw and Binned Data
Performs maximum likelihood estimation for finite mixture models for families including Normal, Weibull, Gamma and Lognormal by using EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary. It also conducts mixture model selection by using information criteria or bootstrap likelihood ratio test. The data used for mixture model fitting can be raw data or binned data. The model fitting process is accelerated by using R package ‘Rcpp’.
mixRasch Mixture Rasch Models with JMLE
Estimates Rasch models and mixture Rasch models, including the dichotomous Rasch model, the rating scale model, and the partial credit model.
MixRF A Random-Forest-Based Approach for Imputing Clustered Incomplete Data
It offers random-forest-based functions to impute clustered incomplete data. The package is tailored for but not limited to imputing multitissue expression data, in which a gene’s expression is measured on the collected tissues of an individual but missing on the uncollected tissues.
MixSAL Mixtures of Multivariate Shifted Asymmetric Laplace (SAL) Distributions
The current version of the ‘MixSAL’ package allows users to generate data from a multivariate SAL distribution or a mixture of multivariate SAL distributions, evaluate the probability density function of a multivariate SAL distribution or a mixture of multivariate SAL distributions, and fit a mixture of multivariate SAL distributions using the Expectation-Maximization (EM) algorithm (see Franczak et. al, 2014, <doi:10.1109/TPAMI.2013.216>, for details).
MixSIAR Bayesian Mixing Models in R
Creates and runs Bayesian mixing models to analyze biotracer data (i.e. stable isotopes, fatty acids), which estimate the proportions of source (prey) contributions to a mixture (consumer). ‘MixSIAR’ is not one model, but a framework that allows a user to create a mixing model based on their data structure and research questions, via options for fixed/ random effects, source data types, priors, and error terms. ‘MixSIAR’ incorporates several years of advances since ‘MixSIR’ and ‘SIAR’, and includes both GUI (graphical user interface) and script versions.
mixsqp Sequential Quadratic Programming for Fast Maximum-Likelihood Estimation of Mixture Proportions
Provides optimization algorithms based on sequential quadratic programming (SQP) for maximum likelihood estimation of the mixture proportions in a finite mixture model where the component densities are known. The algorithms are expected to obtain solutions that are at least as accurate as the state-of-the-art MOSEK interior-point solver (called by function ‘KWDual’ in the ‘REBayes’ package), and they are expected to arrive at solutions more quickly in large data sets. The algorithms are described in Y. Kim, P. Carbonetto, M. Stephens & M. Anitescu (2012) <arXiv:1806.01412>.
MixtureRegLTIC Fit Mixture Regression Models for Left-Truncated and Interval-Censored Data
Fit mixture regression models with nonsusceptibility/cure for left-truncated and interval-censored (LTIC) data (see Chen et al. (2013) <doi:10.1002/sim.5845>). This package also provides the nonparametric maximum likelihood estimator (NPMLE) for the survival/event curves with LTIC data.
mize Unconstrained Numerical Optimization Algorithms
Optimization algorithms implemented in R, including conjugate gradient (CG), Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the limited memory BFGS (L-BFGS) methods. Most internal parameters can be set through the call interface. The solvers hold up quite well for higher-dimensional problems.
mknapsack Multiple Knapsack Problem Solver
Package solves multiple knapsack optimisation problem. Given a set of items, each with volume and value, it will allocate them to knapsacks of a given size in a way that value of top N knapsacks is as large as possible.
ML.MSBD Maximum Likelihood Inference on Multi-State Trees
Inference of a multi-states birth-death model from a phylogeny, comprising a number of states N, birth and death rates for each state and on which edges each state appears. Inference is done using a hybrid approach: states are progressively added in a greedy approach. For a fixed number of states N the best model is selected via maximum likelihood. Reference: J. Barido-Sottani and T. Stadler (2017) <doi:10.1101/215491>.
mlapi Abstract Classes for Building ‘scikit-learn’ Like API
Provides ‘R6’ abstract classes for building machine learning models with ‘scikit-learn’ like API. <http://…/> is a popular module for ‘Python’ programming language which design became de facto a standard in industry for machine learning tasks.
MlBayesOpt Hyper Parameter Tuning for Machine Learning, Using Bayesian Optimization
Hyper parameter tuning using Bayesian optimization (Shahriari et al. <doi:10.1109/JPROC.2015.2494218>) for support vector machine, random forest, and extreme gradient boosting (Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>). Unlike already existing packages (e.g. ‘mlr’, ‘rBayesianOptimization’, or ‘xgboost’), there is no need to change in accordance with the package or method of machine learning. You just prepare a data frame with feature vectors and the label column that has any class (‘character’, ‘factor’, ‘integer’). Moreover, to write a optimization function, you have only to specify the data and the column name of the label to classify.
MLCIRTwithin Latent Class Item Response Theory Models Under ‘Within-Item Multi-Dimensionality’
Framework for the Item Response Theory analysis of dichotomous and ordinal polytomous outcomes under the assumption of within-item multi-dimensionality and discreteness of the latent traits. The fitting algorithms allow for missing responses and for different item parameterizations and are based on the Expectation-Maximization paradigm. Individual covariates affecting the class weights may be included in the new version.
MLCM Maximum Likelihood Conjoint Measurement
Conjoint measurement is a psychophysical procedure in which stimulus pairs are presented that vary along 2 or more dimensions and the observer is required to compare the stimuli along one of them. This package contains functions to estimate the contribution of the n scales to the judgment by a maximum likelihood method under several hypotheses of how the perceptual dimensions interact. Reference: Knoblauch & Maloney (2012) ‘Modeling Psychophysical Data in R’. <doi:10.1007/978-1-4614-4475-6>.
MLDAShiny Interactive Document for Working with Discriminant Analysis
An interactive document on the topic of linear discriminant analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
mldr Exploratory Data Analysis and Manipulation of Multi-Label Data Sets
Exploratory data analysis and manipulation functions for multi-label data sets along with interactive Shiny application to ease their use.
mle.tools Expected/Observed Fisher Information and Bias-Corrected Maximum Likelihood Estimate(s)
Calculates the expected/observed Fisher information and the bias-corrected maximum likelihood estimate(s) via Cox-Snell Methodology.
mleap Interface to ‘MLeap’
A ‘sparklyr’ <https://spark.rstudio.com> extension that provides an interface to ‘MLeap’ <https://…/mleap>, an open source library that enables exporting and serving of ‘Apache Spark’ pipelines.
mlergm Multilevel Exponential-Family Random Graph Models
Estimates exponential-family random graph models for multilevel network data, assuming the multilevel structure is observed. The scope, at present, covers multilevel models where the set of nodes is nested within known blocks. The estimation method uses Monte-Carlo maximum likelihood estimation (MCMLE) methods to estimate a variety of canonical or curved exponential family models for binary random graphs. MCMLE methods for curved exponential-family random graph models can be found in Hunter and Handcock (2006) <DOI: 10.1198/106186006X133069>. The package supports parallel computing, and provides methods for assessing goodness-of-fit of models and visualization of networks.
mlf Machine Learning Foundations
Offers a gentle introduction to machine learning concepts for practitioners with a statistical pedigree: decomposition of model error (bias-variance trade-off), nonlinear correlations, information theory and functional permutation/bootstrap simulations. Székely GJ, Rizzo ML, Bakirov NK. (2007). <doi:10.1214/009053607000000505>. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. (2011). <doi:10.1126/science.1205438>.
mlflow Interface to ‘MLflow’
R interface to ‘MLflow’, open source platform for the complete machine learning life cycle, see <https://…/>. This package supports installing ‘MLflow’, tracking experiments, creating and running projects, and saving and serving models.
MLID Multilevel Index of Dissimilarity
Tools and functions to fit a multilevel index of dissimilarity.
mljar R API for MLJAR
Provides an R API wrapper for ‘mljar.com’, a web service allowing for on-line training for machine learning models (see <https://mljar.com> for more information).
mlma Multilevel Mediation Analysis
Used for mediation analysis with generalized multilevel models.
mlmc Multi-Level Monte Carlo
An implementation of Multi-level Monte Carlo for R. This package builds on the original ‘Matlab’ and C++ implementations by Mike Giles to provide a full MLMC driver and example level samplers. Multi-core parallel sampling of levels is provided built-in.
MLmetrics Machine Learning Evaluation Metrics
A collection of evaluation metrics, including loss, score and utility functions, that measure regression and classification performance.
mlmm Multilevel Model for Multivariate Responses with Missing Values
To conduct Bayesian inference regression for responses with multilevel explanatory variables and missing values(Zeng ISL (2017) <doi:10.1101/153049>). Functions utilizing ‘Stan’, a software to implement posterior sampling using Hamiltonian MC and its variation Non-U-Turn algorithms are generated and provided to implement the posterior sampling of regression coefficients from the multilevel regression models. The package has two main functions to handle not-missing-at-random missing responses and left-censored with not-missing-at random responses. The purpose is to provide a similar format as the other R regression functions but using ‘Stan’ models.
mlmm.gwas Pipeline for GWAS Using MLMM
Pipeline for Genome-Wide Association Study using Multi-Locus Mixed Model from Segura V, Vilhjálmsson BJ et al. (2012) <doi:10.1038/ng.2314>. The pipeline include detection of associated SNPs with MLMM, model selection by lowest eBIC and markers selection by p-value threshold, estimation of the effects of the SNPs in the selected model and graphical functions.
MLPUGS Multi-Label Prediction Using Gibbs Sampling (and Classifier Chains)
An implementation of classifier chains (CC’s) for multi-label prediction. Users can employ an external package (e.g. ‘randomForest’, ‘C50’), or supply their own. The package can train a single set of CC’s or train an ensemble of CC’s — in parallel if running in a multi-core environment. New observations are classified using a Gibbs sampler since each unobserved label is conditioned on the others. The package includes methods for evaluating the predictions for accuracy and aggregating across iterations and models to produce binary or probabilistic classifications.
mlrCPO Composable Preprocessing Operators and Pipelines for Machine Learning
Toolset that enriches ‘mlr’ with a diverse set of preprocessing operators. Composable Preprocessing Operators (‘CPO’s) are first-class R objects that can be applied to data.frames and ‘mlr’ ‘Task’s to modify data, can be attached to ‘mlr’ ‘Learner’s to add preprocessing to machine learning algorithms, and can be composed to form preprocessing pipelines.
MLRShiny Interactive Application for Working with Multiple Linear Regression
An interactive application for working with multiple linear regression technique. The application has a template for solving problems on multiple linear regression. Runtime examples are provided in the package function as well as at <https://…/>.
mlt Most Likely Transformations
Likelihood-based estimation of conditional transformation models via the most likely transformation approach.
mlt.docreg Most Likely Transformations: Documentation and Regression Tests
Additional documentation, a package vignette and regression tests for package mlt.
mltools Machine Learning Tools
A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heacy use of the ‘data.table’ package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.
mlVAR Multi-Level Vector Autoregression
Compute estimates of the multivariate vector autoregression model as used to (but not limited to) analyze experience sampling method data in clinical psychology. The model can be extended through treatment effects, covariates and pre- and post-assessment effects.
mlvocab Vocabulary and Corpus Preprocessing for Natural Language Pipelines
Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.
MLZ Mean Length-Based Estimators of Mortality using TMB
Estimation functions and diagnostic tools for mean length-based total mortality estimators based on Gedamke and Hoenig (2006) <doi:10.1577/T05-153.1>.
mma Multiple Mediation Analysis
Used for general multiple mediation analysis. The analysis method is described in Yu et al. (2014).
mmabig Multiple Mediation Analysis for Big Data Sets
Used for general multiple mediation analysis with big data sets.
MMAC Data for Mathematical Modeling and Applied Calculus
Contains the data sets for the textbook ‘Mathematical Modeling and Applied Calculus’ by Joel Kilty and Alex M. McAllister. The book will be published by Oxford University Press in 2018 with ISBN-13: 978-019882472.
mmapcharr Memory-Map Character Files
Uses memory-mapping to enable the random access of elements of a text file of characters separated by characters as if it was a simple R(cpp) matrix.
mmc Multivariate Measurement Error Correction
Provides routines for multivariate measurement error correction. Includes procedures for linear, logistic and Cox regression models. Bootstrapped standard errors and confidence intervals can be obtained for corrected estimates.
mme Multinomial Mixed Effects Models
Fit Gaussian Multinomial mixed-effects models for small area estimation: Model 1, with one random effect in each category of the response variable (Lopez-Vizcaino,E. et al., 2013) <doi:10.1177/1471082X13478873>; Model 2, introducing independent time effect; Model 3, introducing correlated time effect. mme calculates direct and parametric bootstrap MSE estimators (Lopez-Vizcaino,E et al., 2014) <doi:10.1111/rssa.12085>.
MMLR Fitting Markov-Modulated Linear Regression Models
A set of tools for fitting Markov-modulated linear regression, where responses Y(t) are time-additive, and model operates in the external environment, which is described as a continuous time Markov chain with finite state space. Model is proposed by Alexander Andronov (2012) <arXiv:1901.09600v1> and algorithm of parameters estimation is based on eigenvalues and eigenvectors decomposition. Also, package will provide a set of data simulation tools for Markov-modulated linear regression (for academical/research purposes). Research project No. 1.1.1.2/VIAA/1/16/075.
mmmgee Simultaneous Inference for Multiple Linear Contrasts in GEE Models
Provides global hypothesis tests, multiple testing procedures and simultaneous confidence intervals for multiple linear contrasts of regression coefficients in a single generalized estimating equation (GEE) model or across multiple GEE models. GEE models are fit by a modified version of the ‘geeM’ package.
mMPA Implementation of Marker-Assisted Mini-Pooling with Algorithm
To determine the number of quantitative assays needed for a sample of data using pooled testing methods, which include mini-pooling (MP), MP with algorithm (MPA), and marker-assisted MPA (mMPA). To estimate the number of assays needed, the package also provides a tool to conduct Monte Carlo (MC) to simulate different orders in which the sample would be collected to form pools. Using MC avoids the dependence of the estimated number of assays on any specific ordering of the samples to form pools.
mmpf Monte-Carlo Methods for Prediction Functions
Marginalizes prediction functions using Monte-Carlo integration and computes permutation importance.
mmpp Various Similarity and Distance Metrics for Marked Point Processes
Compute similarities and distances between marked point processes.
mmppr Markov Modulated Poisson Process for Unsupervised Event Detection in Time Series of Counts
Time-series of count data occur in many different contexts. A Markov-modulated Poisson process provides a framework for detecting anomalous events using an unsupervised learning approach.
MMPPsampler Efficient Gibbs-Sampler for Markov-Modulated-Poisson-Processes
Efficient implementation of the Gibbs sampler by Fearnheard and Sherlock (2006) <DOI:10.1111/j.1467-9868.2006.00566.x> for the Markov modulated Poisson process that uses ‘C++’ via the ‘Rcpp’ interface. Fearnheard and Sherlock proposed an exact Gibbs-sampler for performing Bayesian inference on Markov Modulated Poisson processes. This package is an efficient implementation of their proposal for binned data. Furthermore, the package contains an efficient implementation of the hierarchical MMPP framework, proposed by Clausen, Adams, and Briers (2017) <https://…/Master_thesis_Henry.pdf>, that is tailored towards inference on network flow arrival data and extends Fearnheard and Sherlock’s Gibbs sampler. Both frameworks harvests greatly from routines that are optimised for this specific problem in order to remain scalable and efficient for large amounts of input data. These optimised routines include matrix exponentiation and multiplication, and endpoint-conditioned Markov process sampling. Both implementations require an input vector that contains the binned observations, the length of a binning interval, the number of states of the hidden Markov process, and lose prior hyperparameters. As a return, the user receives the desired number of sample trajectories of the hidden Markov process as well as the likelihood of each trajectory.
mmsample Multivariate Matched Sampling
Subset a control group to match an intervention group on a set of features using multivariate matching and propensity score calipers. Based on methods in Rosenbaum and Rubin (1985).
mmtfa Model-Based Clustering and Classification with Mixtures of Modified t Factor Analyzers
Fits a family of mixtures of multivariate t-distributions under a continuous t-distributed latent variable structure for the purpose of clustering or classification. The alternating expectation-conditional maximization algorithm is used for parameter estimation.
mmtsne Multiple Maps t-SNE
An implementation of multiple maps t-distributed stochastic neighbor embedding (t-SNE). Multiple maps t-SNE is a method for projecting high-dimensional data into several low-dimensional maps such that non-metric space properties are better preserved than they would be by a single map. Multiple maps t-SNE with only one map is equivalent to standard t-SNE. When projecting onto more than one map, multiple maps t-SNE estimates a set of latent weights that allow each point to contribute to one or more maps depending on similarity relationships in the original data. This implementation is a port of the original ‘Matlab’ library by Laurens van der Maaten. See Van der Maaten and Hinton (2012) <doi:10.1007/s10994-011-5273-4>. This material is based upon work supported by the United States Air Force and Defense Advanced Research Project Agency (DARPA) under Contract No. FA8750-17-C-0020. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and Defense Advanced Research Projects Agency. Distribution Statement A: Approved for Public Release; Distribution Unlimited.
MMWRweek Convert Dates to MMWR Day, Week, and Year
The first day of any MMWR week is Sunday. MMWR week numbering is sequential beginning with 1 and incrementing with each week to a maximum of 52 or 53. MMWR week #1 of an MMWR year is the first week of the year that has at least four days in the calendar year. This package provides functionality to convert Dates to MMWR day, week, and year and the reverse.
mnlfa Moderated Nonlinear Factor Analysis
Conducts moderated nonlinear factor analysis (e.g., Curran et al., 2014, <doi:10.1080/00273171.2014.889594>). Regularization methods are implemented for assessing non-invariant items. Currently, the package includes dichotomous items and unidimensional item response models. Extensions will be included in future package versions.
MNLR Interactive Shiny Presentation for Working with Multinomial Logistic Regression
An interactive presentation on the topic of Multinomial Logistic Regression. It is helpful to those who want to learn Multinomial Logistic Regression quickly and get a hands on experience. The presentation has a template for solving problems on Multinomial Logistic Regression. Runtime examples are provided in the package function as well as at <https://…/MultinomPresentation>.
mnreadR MNREAD Parameters Estimation
Allows to analyze MNREAD data. The MNREAD Acuity Charts are continuous text reading acuity charts for normal and low vision. Provides the necessary functions to estimate automatically the four MNREAD parameters: Maximum Reading Speed, Critical Print Size, Reading Acuity and Reading Accessibility Index (Calabrese et al (2016) <doi:10.1001/jamaophthalmol.2015.6097>).
MNS Mixed Neighbourhood Selection
An implementation of the mixed neighbourhood selection (MNS) algorithm. The MNS algorithm can be used to estimate multiple related precision matrices. In particular, the motivation behind this work was driven by the need to understand functional connectivity networks across multiple subjects. This package also contains an implementation of a novel algorithm through which to simulate multiple related precision matrices which exhibit properties frequently reported in neuroimaging analysis.
mobForest Model Based Random Forest Analysis
Functions to implements random forest method for model based recursive partitioning. The mob() function, developed by Zeileis et al. (2008), within ‘party’ package, is modified to construct model-based decision trees based on random forests methodology. The main input function mobforest.analysis() takes all input parameters to construct trees, compute out-of-bag errors, predictions, and overall accuracy of forest. The algorithm performs parallel computation using cluster functions within ‘parallel’ package.
MobileTrigger Run Reports, Models, and Scripts from a Mobile Device
A framework for interacting with R modules such as Reports, Models, and Scripts from a mobile device. The framework allows you to list available modules and select a module of interest using a basic e-mail interface. After selecting a specific module, you can either run it as is or provide input via the e-mail interface. After parsing your request, R will send the results back to your mobile device.
moc General Nonlinear Multivariate Finite Mixtures
Fits and vizualize user defined finite mixture models for multivariate observations using maximum likelihood. (McLachlan, G., Peel, D. (2000) Finite Mixture Models. Wiley-Interscience.)
moc.gapbk Multi-Objective Clustering Algorithm Guided by a-Priori Biological Knowledge
Implements the Multi-Objective Clustering Algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) which was proposed by Parraga-Alava, J. et. al. (2018) <doi:10.1186/s13040-018-0178-4>.
mockery Mocking Library for R
The two main functionalities of this package are creating mock objects (functions) and selectively intercepting calls to a given function that originate in some other function. It can be used with any testing framework available for R. Mock objects can be injected with either this package’s own stub() function or a similar with_mock() facility present in the testthat package.
mockr Mocking in R
Provides a means to mock a package function, i.e., temporarily substitute it for testing. Designed as a drop-in replacement for ‘testthat::with_mock()’, which may break in R 3.4.0 and later.
Modalclust Hierarchical Modal Clustering
Performs Modal Clustering (MAC) including Hierarchical Modal Clustering (HMAC) along with their parallel implementation (PHMAC) over several processors. These model-based non-parametric clustering techniques can extract clusters in very high dimensions with arbitrary density shapes. By default clustering is performed over several resolutions and the results are summarised as a hierarchical tree. Associated plot functions are also provided. There is a package vignette that provides many examples. This version adheres to CRAN policy of not spanning more than two child processes by default.
modcmfitr Fit a Modified Connor-Mosimann Distribution to Elicited Quantiles in Multinomial Problems
Fits a modified version of the Connor-Mosimann distribution (Connor & Mosimann (1969)<doi:10.2307/2283728>), a Connor-Mosimann distribution or Dirichlet distribution (e.g. Gelman, Carlin, Stern & Rubin Chapter 3.5 (2004, <ISBN:1-58488-388-X>) to elicited quantiles of a multinomial distribution. Code is also provided to sample from the distributions, generating inputs suitable for a probabilistic sensitivity analysis / Monte Carlo simulation in a decision model.
model4you Stratified and Personalised Models Based on Model-Based Trees and Forests
Model-based trees for subgroup analyses in clinical trials and model-based forests for the estimation and prediction of personalised treatment effects (personalised models). Currently partitioning of linear models, lm(), generalised linear models, glm(), and Weibull models, survreg(), is supported. Advanced plotting functionality is supported for the trees and a test for parameter heterogeneity is provided for the personalised models. For details on model-based trees for subgroup analyses see Seibold, Zeileis and Hothorn (2016) <doi:10.1515/ijb-2015-0032>; for details on model-based forests for estimation of individual treatment effects see Seibold, Zeileis and Hothorn (2017) <doi:10.1177/0962280217693034>.
Modelcharts Classification Model Charts
Provides two important functions for producing Gain chart and Lift chart for any classification model.
modelDown modelDown generates a website with HTML summaries for predictive models
modelDown generates a website with HTML summaries for predictive models. Is uses DALEX explainers to compute and plot summaries of how given models behave. We can see how exactly scores for predictions were calculated (Prediction BreakDown), how much each variable contributes to predictions (Variable Response), which variables are the most important for a given model (Variable Importance) and how well out models behave (Model Performance).
Modeler Classes and Methods for Training and Using Binary Prediction Models
Defines classes and methods to learn models and use them to predict binary outcomes. These are generic tools, but we also include specific examples for many common classifiers.
modelgrid A Framework for Creating, Managing and Training Multiple Caret Models
A minimalistic but flexible framework that facilitates the creation, management and training of multiple ‘caret’ models. A model grid consists of two components: (1) a set of settings that is shared by all models by default, and (2) specifications that apply only to the individual models. When the model grid is trained, model and training specifications are first consolidated from the shared and the model specific settings into complete ‘caret’ model configurations. These models are then trained with the ‘train’ function from the ‘caret’ package.
modellingTools A Collection of Useful Custom Tools for D&A and Modelling
Useful functions for performing common data analysis tasks that I could not find in other packages. For example, flexible functions for discretizing (‘binning’) continuous variables are included, since this is a common technique used in industry that is not well appreciated in academia. See the vignette for a more detailed high-level explanation, or the documentation for full details.
ModelMetrics Rapid Calculation of Model Metrics
Collection of metrics for evaluating models written in C++ using ‘Rcpp’.
modelObj A Model Object Framework for Regression Analysis
A utility library to facilitate the generalization of statistical methods built on a regression framework. Package developers can use ‘modelObj’ methods to initiate a regression analysis without concern for the details of the regression model and the method to be used to obtain parameter estimates. The specifics of the regression step are left to the user to define when calling the function. The user of a function developed within the ‘modelObj’ framework creates as input a ‘modelObj’ that contains the model and the R methods to be used to obtain parameter estimates and to obtain predictions. In this way, a user can easily go from linear to non-linear models within the same package.
modelplotr Plots to Evaluate the Business Performance of Predictive Models
Plots to assess the quality of predictive models from a business perspective. Using these plots, it can be shown how implementation of the model will impact business targets like response on a campaign or return on investment. Different scopes can be selected: compare models, compare datasets or compare target class values and various plot customization and highlighting options are available. targets like response on a campaign. Different scopes can be selected: compare models, compare datasets or compare target class values and various plot customization and highlighting options are available.
modelr Modelling Functions that Work with the Pipe
Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.
modelwordcloud Model Word Clouds
Makes a word cloud of text, sized by the frequency of the word, and colored either by user-specified colors or colored by the strength of the coefficient of that text derived from a regression model.
moderndive Accompaniment Package to ModernDive: An Introduction to Statistical and Data Sciences via R
An accompaniment R package to ModernDive: An Introduction to Statistical and Data Sciences via R available at <http://…/>, in particular wrapper functions targeted at novices to easily generate tidy linear regression output.
modes Find the Modes and Assess the Modality of Complex and Mixture Distributions, Especially with Big Datasets
Designed with a dual purpose of accurately estimating the mode (or modes) as well as characterizing the modality of data. The specific application area includes complex or mixture distributions particularly in a big data environment. The heterogeneous nature of (big) data may require deep introspective statistical and machine learning techniques, but these statistical tools often fail when applied without first understanding the data. In small datasets, this often isn’t a big issue, but when dealing with large scale data analysis or big data thoroughly inspecting each dimension typically yields an O(n^n-1) problem. As such, dealing with big data require an alternative toolkit. This package not only identifies the mode or modes for various data types, it also provides a programmatic way of understanding the modality (i.e. unimodal, bimodal, etc.) of a dataset (whether it’s big data or not). See <http://…/modes_package> for examples and discussion.
modesto Modeling and Analysis of Stochastic Systems
Compute important quantities when we consider stochastic systems that are observed continuously. Such as, Transition matrix, Transition distribution and Occupancy matrix. The methods are describe, for example, Ross S. (2014), Introduction to Probability Models. Eleven Edition. Academic Press.
modeval Evaluation of Classification Model Options
Designed to assist novice to intermediate analysts in choosing an optimal classification model, particularly for working with relatively small data sets. It provides cross-validated results comparing several different models at once using a consistent set of performance metrics, so users can hone in on the most promising approach rather than attempting single model fittings at a time. The package predefined 12 most common classification models, although users are free to select from the 200+ other options available in caret package.
modi Multivariate Outlier Detection and Imputation for Incomplete Survey Data
Algorithms for multivariate outlier detection when missing values occur. Algorithms are based on Mahalanobis distance or data depth. Imputation is based on the multivariate normal model or uses nearest neighbour donors. The algorithms take sample designs, in particular weighting, into account. The methods are described in Bill and Hulliger (2016) <doi:10.17713/ajs.v45i1.86>.
modifiedmk Modified Mann Kendall Trend Tests with Variance Correction Approach
Power of non-parametric Mann-Kendall test is highly influenced by serially correlated data. To address this issue, original time-series is modified by removing any trend component existing in the data and calculating effective sample size. Hamed, K. H., & Ramachandra Rao, A. (1998). A modified Mann-Kendall trend test for auto correlated data. Journal of Hydrology, 204(1-4), 182-196. <doi:10.1016/S0022-1694(97)00125-X>. Yue, S., & Wang, C. Y. (2004). The Mann-Kendall test modified by effective sample size to detect trend in serially correlated hydrological series. Water Resources Management, 18(3), 201-218. <doi:10.1023/B:WARM.0000043140.61082.60>.
MODIS Acquisition and Processing of MODIS Products
Download and processing functionalities for the Moderate Resolution Imaging Spectroradiometer (MODIS). The package provides automated access to the global online data archives LP DAAC (<https://…/> ), LAADS (<https://…/> ) and NSIDC (<https://…/> ) as well as processing capabilities such as file conversion, mosaicking, subsetting and time series filtering.
modmarg Calculating Marginal Effects and Levels with Errors
Calculate predicted levels and marginal effects from ‘glm’ objects, using the delta method to calculate standard errors. This is an R-based version of the ‘margins’ command from Stata.
modMax Community Structure Detection via Modularity Maximization
The algorithms implemented here are used to detect the community structure of a network. These algorithms follow different approaches, but are all based on the concept of modularity maximization.
modopt.matlab MatLab’-Style Modeling of Optimization Problems
MatLab’-Style Modeling of Optimization Problems with ‘R’. This package provides a set of convenience functions to transform a ‘MatLab’-style optimization modeling structure to its ‘ROI’ equivalent.
modQR Multiple-Output Directional Quantile Regression
Contains basic tools for performing multiple-output quantile regression and computing regression quantile contours by means of directional regression quantiles. In the location case, one can thus obtain halfspace depth contours in two to six dimensions.
modTurPoint Estimate ED50 Based on Modified Turning Point Method
Turning point method is a method proposed by Choi (1990) <doi:10.2307/2531453> to estimate 50 percent effective dose (ED50) in the study of drug sensitivity. The method has its own advantages for that it can provide robust ED50 estimation. This package contains the modified function of Choi’s turning point method.
modules Self Contained Units of Source Code
Provides modules as an organizational unit for source code. Modules enforce to be more rigorous when defining dependencies and have a local search path. They can be used as a sub unit within packages or in scripts.
MoEClust Parsimonious Model-Based Clustering with Covariates
Clustering via parsimonious Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2017) <arXiv:1711.05632>. This package fits finite Gaussian mixture models with gating and expert network covariates using parsimonious covariance parameterisations from ‘mclust’ via the EM algorithm. Visualisation of the results of such models using generalised pairs plots is also facilitated.
moezipfR Marshall-Olkin Extended Zipf
Statistical utilities for the analysis of data by means of the Marshall-Olkin Extended Zipf distribution. By plotting the probabilities in log-log scale, this two parameter distribution allows a concave as well as a convex behavior of the function at the beginning of the distribution, maintaining the linearity in the tail. The model contains the Zipf model as a particular case.
mogavs Multiobjective Genetic Algorithm for Variable Selection in Regression
Functions for exploring the best subsets in regression with a genetic algorithm. The package is much faster than methods relying on complete enumeration, and is suitable for datasets with large number of variables.
moko Multi-Objective Kriging Optimization
Multi-Objective optimization based on the Kriging metamodel. Important functions: mkm, VMPF, MEGO and HEGO.
MoLE Modeling Language Evolution
Model for simulating language evolution in terms of cultural evolution (Smith & Kirby (2008) <DOI:10.1098/rstb.2008.0145>; Deacon 1997). The focus is on the emergence of argument-marking systems (Dowty (1991) <DOI:10.1353/lan.1991.0021>, Van Valin 1999, Dryer 2002, Lestrade 2015a), i.e. noun marking (Aristar (1997) <DOI:10.1075/sl.21.2.04ari>, Lestrade (2010) <DOI:10.7282/T3ZG6R4S>), person indexing (Ariel 1999, Dahl (2000) <DOI:10.1075/fol.7.1.03dah>, Bhat 2004), and word order (Dryer 2013), but extensions are foreseen. Agents start out with a protolanguage (a language without grammar; Bickerton (1981) <10.17169/langsci.b91.109>, Jackendoff 2002, Arbib (2015) <DOI:10.1002/9781118346136.ch27>) and interact through language games (Steels 1997). Over time, grammatical constructions emerge that may or may not become obligatory (for which the tolerance principle is assumed; Yang 2016). Throughout the simulation, uniformitarianism of principles is assumed (Hopper (1987) <DOI:10.3765/bls.v13i0.1834>, Givon (1995) <DOI:10.1075/z.74>, Croft (2000), Saffran (2001) <DOI:10.1111/1467-8721.01243>, Heine & Kuteva 2007), in which maximal psychological validity is aimed at (Grice (1975) <DOI:10.1057/9780230005853_5>, Levelt 1989, Gaerdenfors 2000) and language representation is usage based (Tomasello 2003, Bybee 2010). In Lestrade (2015b) <DOI:10.15496/publikation-8640>, Lestrade (2015c) <DOI:10.1075/avt.32.08les>, and Lestrade (2016) <DOI:10.17617/2.2248195>), which reported on the results of preliminary versions, this package was announced as WDWTW (for who does what to whom), but for reasons of pronunciation and generalization the title was changed.
MOLHD Multiple Objective Latin Hypercube Design
Generate the optimal maximin distance, minimax distance (only for low dimensions), and maximum projection designs within the class of Latin hypercube designs efficiently for computer experiments. Generate Pareto front optimal designs for each two of the three criteria and all the three criteria within the class of Latin hypercube designs efficiently. Provide criterion computing functions. References of this package can be found in Morris, M. D. and Mitchell, T. J. (1995) <doi:10.1016/0378-3758(94)00035-T>, Lu Lu and Christine M. Anderson-CookTimothy J. Robinson (2011) <doi:10.1198/Tech.2011.10087>, Joseph, V. R., Gul, E., and Ba, S. (2015) <doi:10.1093/biomet/asv002>.
momentchi2 Moment-Matching Methods for Weighted Sums of Chi-Squared Random Variables
A collection of moment-matching methods for computing the cumulative distribution function of a positively-weighted sum of chi-squared random variables. Methods include the Satterthwaite-Welch method, Hall-Buckley-Eagleson method, Wood’s F method, and the Lindsay-Pilla-Basak method.
momentuHMM Maximum Likelihood Analysis of Animal Movement Behavior Using Multivariate Hidden Markov Models
Extended tools for analyzing telemetry data using (multivariate) hidden Markov models. These include processing of tracking data, fitting HMMs to location and auxiliary biotelemetry or environmental data, multiple imputation for incorporating location measurement error and missing data, visualization of data and fitted model, decoding of the state process…
Mondrian A Simple Graphical Representation of the Relative Occurrence and Co-Occurrence of Events
The unique function of this package allows representing in a single graph the relative occurrence and co-occurrence of events measured in a sample. As examples, the package was applied to describe the occurrence and co-occurrence of different species of bacterial or viral symbionts infecting arthropods at the individual level. The graphics allows determining the prevalence of each symbiont and the patterns of multiple infections (i.e. how different symbionts share or not the same individual hosts). We named the package after the famous painter as the graphical output recalls Mondrian’s paintings.
MonetDBLite In-Process Version of MonetDB for R
An in-process version of MonetDB, a relational database focused on analytical tasks. Similar to SQLite, the database runs entirely inside the R shell, with the main difference that queries complete much faster thanks to MonetDB’s columnar architecture.
monkeylearn Accesses the Monkeylearn API for Text Classifiers and Extractors
Allows using some services of Monkeylearn <http://…/> which is a Machine Learning platform on the cloud for text analysis (classification and extraction).
MonoInc Monotonic Increasing
Various imputation methods are utilized in this package, where one can flag and impute non-monotonic data that is outside of a prespecified range.
monoreg Bayesian Monotonic Regression Using a Marked Point Process Construction
An extended version of the nonparametric Bayesian monotonic regression procedure described in Saarela & Arjas (2011) <DOI:10.1111/j.1467-9469.2010.00716.x>, allowing for multiple additive monotonic components in the linear predictor, and time-to-event outcomes through case-base sampling. The extension and its applications, including estimation of absolute risks, are described in Saarela & Arjas (2015) <DOI:10.1111/sjos.12125>.
monotonicity Test for Monotonicity in Expected Asset Returns, Sorted by Portfolios
Test for monotonicity in financial variables sorted by portfolios. It is conventional practice in empirical research to form portfolios of assets ranked by a certain sort variable. A t-test is then used to consider the mean return spread between the portfolios with the highest and lowest values of the sort variable. Yet comparing only the average returns on the top and bottom portfolios does not provide a sufficient way to test for a monotonic relation between expected returns and the sort variable. This package provides nonparametric tests for the full set of monotonic patterns by Patton, A. and Timmermann, A. (2010) <doi:10.1016/0304-4076(89)90094-8> and compares the proposed results with extant alternatives such as t-tests, Bonferroni bounds, and multivariate inequality tests through empirical applications and simulations.
Monte.Carlo.se Monte Carlo Standard Errors
Computes Monte Carlo standard errors for summaries of Monte Carlo output. Summaries and their standard errors are based on columns of Monte Carlo simulation output. Dennis D. Boos and Jason A. Osborne (2015) <doi:10.1111/insr.12087>.
MonteCarlo Automatic Parallelized Monte Carlo Simulations
Simplifies Monte Carlo simulation studies by automatically setting up loops to run over parameter grids and parallelising the Monte Carlo repetitions. It also generates LaTeX tables.
morpheus Estimate Parameters of Mixtures of Logistic Regressions
Mixture of logistic regressions parameters (H)estimation with (U)spectral methods. The main methods take d-dimensional inputs and a vector of binary outputs, and return parameters according to the GLMs mixture model (General Linear Model). For more details see chapter 3 in the PhD thesis of Mor-Absa Loum: <http://…/s156435>, available here <https://…/these.compressed-2.pdf>.
MortalityGaps The Double-Gap Life Expectancy Forecasting Model
Life expectancy is highly correlated over time among countries and between males and females. These associations can be used to improve forecasts. Here we have implemented a method for forecasting female life expectancy based on analysis of the gap between female life expectancy in a country compared with the record level of female life expectancy in the world. Second, to forecast male life expectancy, the gap between male life expectancy and female life expectancy in a country is analysed. We named this method the Double-Gap model. For a detailed description of the method see Pascariu et al. (2017). <doi:10.1016/j.insmatheco.2017.09.011>.
MortalityTables A Framework for Various Types of Mortality / Life Tables
Classes to implement and plot cohort life tables for actuarial calculations. In particular, birth-year dependent mortality tables using a yearly trend to extrapolate from a base year are implemented, as well as period life table, cohort life tables using an age shift, and merged life tables.
mosaicCalc Function-Based Numerical and Symbolic Differentiation and Antidifferentiation
Part of the Project MOSAIC (<http://…/> ) suite that provides utility functions for doing calculus (differentiation and integration) in R. The main differentiation and antidifferentiation operators are described using formulas and return functions rather than numerical values. Numerical values can be obtained by evaluating these functions.
mosaicCore Common Utilities for Other MOSAIC-Family Packages
Common utilities used in other MOSAIC-family packages are collected here.
mosaicModel An Interface to Statistical Modeling Independent of Model Architecture
Provides functions for evaluating, displaying, and interpreting statistical models. The goal is to abstract the operations on models from the particular architecture of the model. For instance, calculating effect sizes rather than looking at coefficients. The package includes interfaces to both regression and classification architectures, including lm(), glm(), rlm() in ‘MASS’, random forests and recursive partitioning, k-nearest neighbors, linear and quadratic discriminant analysis, and models produced by the ‘caret’ package’s train(). It’s straightforward to add in other other model architectures.
MOST Multiphase Optimization Strategy
Provides functions similar to the ‘SAS’ macros previously provided to accompany Collins, Dziak, and Li (2009) <DOI:10.1037/a0015826> and Dziak, Nahum-Shani, and Collins (2012) <DOI:10.1037/a0026972>, papers which outline practical benefits and challenges of factorial and fractional factorial experiments for scientists interested in developing biological and/or behavioral interventions, especially in the context of the multiphase optimization strategy (see Collins, Kugler & Gwadz 2016) <DOI:10.1007/s10461-015-1145-4>. The package currently contains three functions. First, RelativeCosts1() draws a graph of the relative cost of complete and reduced factorial designs versus other alternatives. Second, RandomAssignmentGenerator() returns a dataframe which contains a list of random numbers that can be used to conveniently assign participants to conditions in an experiment with many conditions. Third, FactorialPowerPlan() estimates the power, detectable effect size, or required sample size of a factorial or fractional factorial experiment, for main effects or interactions, given several possible choices of effect size metric, and allowing pretests and clustering.
mosum Moving Sum Based Procedures for Changes in the Mean
Implementations of MOSUM-based statistical procedures and algorithms for detecting multiple changes in the mean. This comprises the MOSUM procedure for estimating multiple mean changes from Eichinger and Kirch (2018) <doi:10.3150/16-BEJ887> and the multiscale algorithmic extensions from Cho and Kirch (2018+).
MoTBFs Learning Hybrid Bayesian Networks using Mixtures of Truncated Basis Functions
Learning, manipulation and evaluation of mixtures of truncated basis functions (MoTBFs), which include mixtures of polynomials (MOPs) and mixtures of truncated exponentials (MTEs). MoTBFs are a flexible framework for modelling hybrid Bayesian networks. The package provides functionality for learning univariate, multivariate and conditional densities, with the possibility of incorporating prior knowledge. Structural learning of hybrid Bayesian networks is also provided. A set of useful tools is provided, including plotting, printing and likelihood evaluation. This package makes use of S3 objects, with two new classes called ‘motbf’ and ‘jointmotbf’.
MotilityLab Quantitative Analysis of Motion
Statistics to quantify tracks of moving things (x-y-z-t data), such as cells, bacteria or animals. Available measures include mean square displacement, confinement ratio, autocorrelation, straightness, turning angle, and fractal dimension.
motmot.2.0 Models of Trait Macroevolution on Trees
Functions for fitting models of trait evolution on phylogenies for continuous traits. The majority of functions described in Thomas and Freckleton (2011) <doi:10.1111/j.2041-210X.2011.00132.x> and include functions that allow for tests of variation in the rates of trait evolution.
motoRneuron Analyzing Paired Neuron Discharge Times for Time-Domain Synchronization
The temporal relationship between motor neurons can offer explanations for neural strategies. We combined functions to reduce neuron action potential discharge data and analyze it for short-term, time-domain synchronization. Even more so, motoRneuron combines most available methods for the determining cross correlation histogram peaks and most available indices for calculating synchronization into simple functions. See Nordstrom, Fuglevand, and Enoka (1992) <doi:10.1113/jphysiol.1992.sp019244> for a more thorough introduction.
mountainplot Mountain Plots, Folded Empirical Cumulative Distribution Plots
Lattice functions for drawing folded empirical cumulative distribution plots, or mountain plots. A mountain plot is similar to an empirical CDF plot, except that the curve increases from 0 to 0.5, then decreases from 0.5 to 1 using an inverted scale at the right side. See: Monti (1995), Folded empirical distribution function curves-mountain plots. The American Statistician, 49, 342-345.
mousetrap Process and Analyze Mouse-Tracking Data
Mouse-tracking, the analysis of mouse movements in computerized experiments, is a method that is becoming increasingly popular in the cognitive sciences. The mousetrap package offers functions for importing, preprocessing, analyzing, aggregating, and visualizing mouse-tracking data.
movecost Calculation of Accumulated Cost Surface and Least-Cost Paths Related to Human Movement Across the Landscape
Provides the facility to calculate accumulated cost surface and least-cost paths using a number of human-movement-related cost functions that can be selected by the user. It just requires a Digital Terrain model, a start location and (optionally) destination locations.
moveVis Movement Data Visualization
Tools to visualize movement data of any kind, e. g by creating path animations from GPS point data.
mpbart Multinomial Probit Bayesian Additive Regression Trees
Fits Multinomial Probit Bayesian Additive Regression Trees.
MPCI Multivariate Process Capability Indices (MPCI)
It performs the followings Multivariate Process Capability Indices: Shahriari et al. (1995) Multivariate Capability Vector, Taam et al. (1993) Multivariate Capability Index (MCpm), Pan and Lee (2010) proposal (NMCpm) and the followings based on Principal Component Analysis (PCA):Wang and Chen (1998), Xekalaki and Perakis (2002) and Wang (2005). Two datasets are included.
mpcmp Mean-Parametrized Conway-Maxwell Poisson (COM-Poisson) Regression
A collection of functions for estimation, testing and diagnostic checking for the mean-parametrized Conway-Maxwell Poisson (COM-Poisson) regression model of Huang (2017) <doi:10.1177/1471082X17697749>.
mpe Multiple Primary Endpoints
Functions for calculating sample size and power for clinical trials with multiple (co-)primary endpoints.
MPkn Calculations of One Discrete Model in Several Time Steps
A matrix discrete model having the form ‘M[i+1] = (I + Q)*M[i]’. The calculation of the values of ‘M[i]’ only for pre-selected values of ‘i’. The method of calculation is presented in the vignette ‘Fundament’ (‘Base’). Maybe it`s own idea of the author of the package. A weakness is that the method gives information only in selected steps of the process. It mainly refers to cases with matrices that are not Markov chain. If ‘Q’ is Markov transition matrix, then MUPkL() may be used to calculate the steady-state distribution ‘p’ for ‘p = Q*p’. Matrix power of non integer (matrix.powerni()) gives the same results as a mpower() from package ‘matlib’. References: ‘Markov chains’, (<https://…arkov_chain#Expected_number_of_visits> ). Donald R. Burleson, Ph.D. (2005), ‘ON NON-INTEGER POWERS OF A SQUARE MATRIX’, (<http://…/Eigenvalues.htm> ).
mplot Graphical Model Stability and Variable Selection Procedures
Model stability and variable importance plots [Mueller and Welsh (2010, <doi:10.1111/j.1751-5823.2010.00108.x>); Murray, Heritier and Mueller (2013, <doi:10.1002/sim.5855>)] as well as the adaptive fence [Jiang et al. (2008, doi:10.1214/07-AOS517>); Jiang et al. (2009, <doi:10.1016/j.spl.2008.10.014>)] for linear and generalised linear models.
mpr Multi-Parameter Regression (MPR)
Package for fitting Multi-Parameter Regression (MPR) models to right-censored survival data. These are flexible parametric regression models which extend standard models, for example, proportional hazards.
MPS Estimating Through the Maximum Product Spacing Approach
Developed for computing the probability density function, computing the cumulative distribution function, computing the quantile function, random generation, and estimating the parameters of 24 G-family of statistical distributions via the maximum product spacing approach introduced in <https://…/2345411>. The set of families contains: beta G distribution, beta exponential G distribution, beta extended G distribution, exponentiated G distribution, exponentiated exponential Poisson G distribution, exponentiated generalized G distribution, exponentiated Kumaraswamy G distribution, gamma type I G distribution, gamma type II G distribution, gamma uniform G distribution, gamma-X generated of log-logistic family of G distribution, gamma-X family of modified beta exponential G distribution, geometric exponential Poisson G distribution, generalized beta G distribution, generalized transmuted G distribution, Kumaraswamy G distribution, log gamma type I G distribution, log gamma type II G distribution, Marshall Olkin G distribution, Marshall Olkin Kumaraswamy G distribution, modified beta G distribution, odd log-logistic G distribution, truncated-exponential skew-symmetric G distribution, and Weibull G distribution.
Mqrcm M-Quantile Regression Coefficients Modeling
Parametric modeling of M-quantile regression coefficient functions.
mr.raps Two Sample Mendelian Randomization using Robust Adjusted Profile Score
Mendelian randomization is a method of identifying and estimating a confounded causal effect using genetic instrumental variables. This packages implements methods for two-sample Mendelian randomization with summary statistics by using Robust Adjusted Profile Score (RAPS). References: Qingyuan Zhao, Jingshu Wang, Jack Bowden, Dylan S. Small. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. <arXiv:1801.09652>.
mra Mark-Recapture Analysis
Accomplishes mark-recapture analysis with covariates. Models available include the Cormack-Jolly-Seber open population (Cormack (1972) <doi:10.2307/2556151>; Jolly (1965) <doi:10.2307/2333826>; Seber (1965) <doi:10.2307/2333827>) and Huggin’s (1989) <doi:10.2307/2336377> closed population. Link functions include logit, sine, and hazard. Model selection, model averaging, plot, and simulation routines included. Open population size by the Horvitz-Thompson (1959) <doi:10.2307/2280784> estimator.
mrbsizeR Scale Space Multiresolution Analysis of Random Signals
A method for the multiresolution analysis of spatial fields and images to capture scale-dependent features. mrbsizeR is based on scale space smoothing and uses differences of smooths at neighbouring scales for finding features on different scales. To infer which of the captured features are credible, Bayesian analysis is used. The scale space multiresolution analysis has three steps: (1) Bayesian signal reconstruction. (2) Using differences of smooths, scale-dependent features of the reconstructed signal can be found. (3) Posterior credibility analysis of the differences of smooths created. The method has first been proposed by Holmstrom, Pasanen, Furrer, Sain (2011) <DOI:10.1016/j.csda.2011.04.011>.
MRFA Fitting and Predicting Large-Scale Nonlinear Regression Problems using Multi-Resolution Functional ANOVA (MRFA) Approach
Performs the MRFA approach proposed by Sung et al. (unpublished) to fit and predict nonlinear regression problems, particularly for large-scale and high-dimensional problems. The application includes deterministic or stochastic computer experiments, spatial datasets, and so on.
mrfDepth Depth Measures in Multivariate, Regression and Functional Settings
Tools to compute depth measures and implementations of related tasks such as outlier detection, data exploration and classification of multivariate, regression and functional data.
MRHawkes Multivariate Renewal Hawkes Process
Simulate a (bivariate) multivariate renewal Hawkes (MRHawkes) self-exciting process, with given immigrant hazard rate functions and offspring density function. Calculate the likelihood of a MRHawkes process with given hazard rate functions and offspring density function for an (increasing) sequence of event times. Calculate the Rosenblatt residuals of the event times. Predict future event times based on observed event times up to a given time. For details see Stindl and Chen (2018) <doi:10.1016/j.csda.2018.01.021>.
mri Modified Rand Index (1 and 2.1 and 2.2) and Modified Adjusted Rand Index (1 and 2.1 and 2.2)
Provides three Modified Rand Indices and three Modified Adjusted Rand Indices for comparing two partitions, which are usually obtained on two different sets of units, where one is a subset of another set of units. Splitting and merging of clusters have a different affects on the value of the indices.
mRMRe Parallelized Minimum Redundancy, Maximum Relevance (mRMR) Ensemble Feature Selection’
Computes mutual information matrices from continuous, categorical and survival variables, as well as feature selection with minimum redundancy, maximum relevance (mRMR) and a new ensemble mRMR technique with DOI: N De Jay et al. (2013) <doi:10.1093/bioinformatics/btt383>.’
mro Multiple Correlation
Computes multiple correlation coefficient when the data matrix is given and tests its significance.
MRS Multi-Resolution Scanning for Cross-Sample Differences
An implementation of the MRS algorithm for comparison across distributions. The model is based on a nonparametric process taking the form of a Markov model that transitions between a ‘null’ and a ‘alternative’ state on a multi-resolution partition tree of the sample space. MRS effectively detects and characterizes a variety of underlying differences. These differences can be visualized using several plotting functions.
MRSP Multinomial Response Models with Structured Penalties
Fits regularized multinomial response models using penalized loglikelihood methods with structured penalty terms.
MRTSampleSize A Sample Size Calculator for Micro-Randomized Trials
Provide a sample size calculator for micro-randomized trials (MRTs) based on methodology developed in Sample Size Calculations for Micro-randomized Trials in mHealth by Liao et al. (2016) <DOI:10.1002/sim.6847>.
msaenet Multi-Step Adaptive Elastic-Net
Multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions.
msaR Multiple Sequence Alignment for R Shiny
Visualises multiple sequence alignments dynamically within the Shiny web application framework.
mschart Chart Generation for ‘Microsoft Word’ and ‘Microsoft PowerPoint’ Documents
Create native charts for ‘Microsoft PowerPoint’ and ‘Microsoft Word’ documents. These can then be edited and annotated. Functions are provided to let users create charts, modify and format their content. The chart’s underlying data is automatically saved within the ‘Word’ document or ‘PowerPoint’ presentation. It extends package ‘officer’ that does not contain any feature for ‘Microsoft’ native charts production.
MSCMT Multivariate Synthetic Control Method Using Time Series
Multivariate Synthetic Control Method Using Time Series. Two generalizations of the synthetic control method (which has already an implementation in package ‘Synth’) are implemented: first, ‘MSCMT’ allows for using multiple outcome variables, second, time series can be supplied as economic predictors. Much effort has been taken to make the implementation as stable as possible (including edge cases) without losing computational efficiency.
mscstexta4r R Client for the Microsoft Cognitive Services Text Analytics REST API
R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://…/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
mscstts R Client for the Microsoft Cognitive Services ‘Text-to-Speech’ REST API
R Client for the Microsoft Cognitive Services ‘Text-to-Speech’ REST API, including voice synthesis. A valid account must be registered at the Microsoft Cognitive Services website <https://…/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
mscsweblm4r R Client for the Microsoft Cognitive Services Web Language Model REST API
R Client for the Microsoft Cognitive Services Web Language Model REST API, including Break Into Words, Calculate Conditional Probability, Calculate Joint Probability, Generate Next Words, and List Available Models. A valid account MUST be registered at the Microsoft Cognitive Services website (https://…/cognitive-services ) in order to obtain a (free) API key. Without an API key, this package will not work properly.
MsdeParEst Parametric Estimation in Mixed-Effects Stochastic Differential Equations
Parametric estimation in stochastic differential equations with random effects in the drift, or in the diffusion or both. Approximate maximum likelihood methods are used. M. Delattre, V. Genon-Catalot and A. Samson (2012) <doi:10.1111/j.1467-9469.2012.00813.x> M. Delattre, V. Genon-Catalot and A. Samson (2015) <doi:10.1051/ps/2015006> M. Delattre, V. Genon-Catalot and A. Samson (2016) <doi:10.1016/j.jspi.2015.12.003>.
MSEtool Management Strategy Evaluation Toolkit
Simulation tools for management strategy evaluation are provided for the ‘DLMtool’ operating model to inform data-rich fisheries. ‘MSEtool’ provides complementary assessment models of varying complexity with standardized reporting, diagnostic tools for evaluating assessment models within closed-loop simulation, and helper functions for building more complex operating models and management procedures.
MSGARCH Markov-Switching GARCH Models
The MSGARCH package offers methods to fit (by Maximum Likelihood or Bayesian), simulate, and forecast various Markov-Switching GARCH processes.
msgl Multinomial Sparse Group Lasso
Multinomial logistic regression with sparse group lasso penalty. Simultaneous feature selection and parameter estimation for classification. Suitable for high dimensional multiclass classification with many classes. The algorithm computes the sparse group lasso penalized maximum likelihood estimate. Use of parallel computing for cross validation and subsampling is supported through the ‘foreach’ and ‘doParallel’ packages. Development version is on GitHub, please report package issues on GitHub.
MSGLasso Multivariate Sparse Group Lasso for the Multivariate Multiple Linear Regression with an Arbitrary Group Structure
For fitting multivariate response and multiple predictor linear regressions with an arbitrary group structure assigned on the regression coefficient matrix, using the multivariate sparse group lasso and the mixed coordinate descent algorithm.
msgpack A Compact, High Speed Data Format
A fast C-based encoder and streaming decoder for the ‘messagepack’ data format. ‘Messagepack’ is similar in structure to ‘JSON’ but uses a more compact binary encoding. Based on the CWPack C library.
msgtools Tools for Developing Diagnostic Messages
A number of utilities for developing and maintaining error, warning, and other messages in R packages, including checking for consistency across messages, spell-checking messages, and building message translations into various languages for purposes of localization.
msmtools Building Augmented Data to Run Multi-State Models with ‘msm’ Package
A fast and general method for restructuring classical longitudinal data into augmented ones. The reason for this is to facilitate the modeling of longitudinal data under a multi-state framework using the ‘msm’ package.
MSMwRA Multivariate Statistical Methods with R Applications
Data sets in the book entitled ‘Multivariate Statistical Methods with R Applications’, H.Bulut (2018). The book will be published in Turkish and the original name of this book be ‘R Uygulamalari ile Cok Degiskenli Istatistiksel Yontemler’.
MSPRT Modified Sequential Probability Ratio Test (MSPRT)
A modified SPRT (MSPRT) can be designed and implemented with the help of this package. In a MSPRT design, the maximum sample size of an experiment is fixed prior to the start of an experiment, the alternative hypothesis used to define the rejection region of the test is derived from the size of the test (Type I error), the maximum available sample size (N), and the targeted Type 2 error (equal to 1 minus the power) is also prespecified. Given these values, the MSPRT is defined in a manner very similar to Wald’s initial proposal. This test can reduce the average sample size required to perform statistical hypothesis tests at the specified levels of significance and power. This package facilitates the carrying out of one sample Z tests, T Tests and tests of binomial success probabilities. A user guidance for this software package is provided here and also in the supplemental information.
mssm Multivariate State Space Models
Provides methods to perform parameter estimation and make analysis of multivariate observed outcomes through time which depends on a latent state variable. All methods scale well in the dimension of the observed outcomes at each time point. The package contains an implementation of a Laplace approximation, particle filters like suggested by Lin, Zhang, Cheng, & Chen (2005) <doi:10.1198/016214505000000349>, and the gradient and observed information matrix approximation suggested by Poyiadjis, Doucet, & Singh (2011) <doi:10.1093/biomet/asq062>.
mssqlR MSSQL Querying using R
Can be used to query data from data from Microsoft SQL Server (MSSQL, see <http://…/> for more information). Based on the concepts of Entity Framework, the package allows querying data from MSSQL Database.
mstknnclust MST-kNN Clustering Algorithm
Implements the MST-kNN clustering algorithm which was proposed by Inostroza-Ponta, M. (2008) <https://…28729389?selectedversion=NBD44634158>.
mstR Procedures to Generate Patterns under Multistage Testing
Generation of response patterns under dichotomous and polytomous computerized multistage testing (MST) framework. It holds various IRT- and score-based methods to select the next module and estimate ability levels.
mstrio Interface for ‘MicroStrategy’ REST APIs
Interface for creating data sets and extracting data through the ‘MicroStrategy’ REST APIs. Access the demo API at <https://…/index.html>.
msu Multivariate Symmetric Uncertainty and Other Measurements
Estimators for multivariate symmetrical uncertainty based on the work of Gustavo Sosa et al. (2016) <arXiv:1709.08730>, total correlation, information gain and symmetrical uncertainty of categorical variables.
MTA Multiscalar Territorial Analysis
Build multiscalar territorial analysis based on various contexts.
MTDrh Mass Transportation Distance Rank Histogram
The Mass Transportation Distance rank histogram was developed to assess the reliability of scenarios with equal or different probabilities of occurrence <doi:10.1002/we.1872>.
mtk Mexico ToolKit library (MTK)
MTK (Mexico ToolKit) is a generic platform for the sensitivity and uncertainty analysis of complex models. It provides functions and facilities for experimental design, model simulation, sensitivity and uncertainty analysis, methods integration and data reporting, etc.
MtreeRing A Shiny Application for Automatic Measurements of Tree-Ring Widths on Digital Images
Use morphological image processing and edge detection algorithms to automatically identify tree-ring boundaries on digital images. Tree-ring boundaries are determined by the changes of light reflectance from late wood to early wood. Two geometric models are provided to calibrate the errors resulted from inclined rings. The package provides a Shiny-based application, allowing R beginners to easily analyze tree-ring images and export ring-width series in standard file formats.
MTSYS Methods in Mahalanobis-Taguchi (MT) System
Mahalanobis-Taguchi (MT) system is a collection of multivariate analysis methods developed for the field of quality engineering. MT system consists of two families depending on their purpose. One is a family of Mahalanobis-Taguchi (MT) methods (in the broad sense) for diagnosis (see Woodall, W. H., Koudelik, R., Tsui, K. L., Kim, S. B., Stoumbos, Z. G., and Carvounis, C. P. (2003) <doi:10.1198/004017002188618626>) and the other is a family of Taguchi (T) methods for forecasting (see Kawada, H., and Nagata, Y. (2015) <doi:10.17929/tqs.1.12>). The MT package contains three basic methods for the family of MT methods and one basic method for the family of T methods. The MT method (in the narrow sense), the Mahalanobis-Taguchi Adjoint (MTA) methods, and the Recognition-Taguchi (RT) method are for the MT method and the two-sided Taguchi (T1) method is for the family of T methods. In addition, the Ta and Tb methods, which are the improved versions of the T1 method, are included.
MuChPoint Multiple Change Point
Nonparametric approach to estimate the location of block boundaries (change-points) of non-overlapping blocks in a random symmetric matrix which consists of random variables whose distribution changes from block to block. BRAULT Vincent, OUADAH Sarah, SANSONNET Laure and LEVY-LEDUC Celine (2017) <doi:10.1016/j.jmva.2017.12.005>.
mudata2 Interchange Tools for Multi-Parameter Spatiotemporal Data
Formatting and structuring multi-parameter spatiotemporal data is often a time-consuming task. This package offers functions and data structures designed to easily organize and visualize these data for applications in geology, paleolimnology, dendrochronology, and paleoclimate.
mudfold A Nonparametric Model for Unfolding Scale Analysis
Nonparametric item response theory model fruitful for the analysis of proximity data.
mueRelativeRisk Relative Risk Based on the Ratio of Median Unbiased Estimates
Implements an estimator for relative risk based on the median unbiased estimator. The relative risk estimator is well defined and performs satisfactorily for a wide range of data configurations. The details of the method are available in Carter et al (2010) <doi:10.1111/j.1467-9876.2010.00711.x>.
muHVT Constructing Hierarchical Voronoi Tessellations and Overlay Heatmap for Data Analysis
Constructing hierarchical Voronoi tessellations for a given data set and overlay heatmap for variables at various levels of the tessellations for in-depth data analysis. See <https://…/Voronoi_diagram> for more information. Credits to Mu Sigma for their continuous support throughout the development of the package.
muir Exploring Data with Tree Data Structures
A simple tool allowing users to easily and dynamically explore or document a data set using a tree structure.
MullerPlot Generates Muller Plot from Population/Abundance/Frequency Dynamics Data
Generates Muller plot from parental/genealogy/phylogeny information and population/abundance/frequency dynamics data. Muller plots are plots which combine information about succession of different OTUs (genotypes, phenotypes, species, …) and information about dynamics of their abundances (populations or frequencies) over time. They are powerful and fascinating tools to visualize evolutionary dynamics. They may be employed also in study of diversity and its dynamics, i.e. how diversity emerges and how changes over time. They are called Muller plots in honor of Hermann Joseph Muller which used them to explain his idea of Muller’s ratchet (Muller, 1932, American Naturalist). A big difference between Muller plots and normal box plots of abundances is that a Muller plot depicts not only the relative abundances but also succession of OTUs based on their genealogy/phylogeny/parental relation. In a Muller plot, horizontal axis is time/generations and vertical axis represents relative abundances of OTUs at the corresponding times/generations. Different OTUs are usually shown with polygons with different colors and each OTU originates somewhere in the middle of its parent area in order to illustrate their succession in evolutionary process. To generate a Muller plot one needs the genealogy/phylogeny/parental relation of OTUs and their abundances over time. MullerPlot package has the tools to generate Muller plots which clearly depict the origin of successors of OTUs.
mulset Multiset Intersection Generator
Computes efficient data distributions from highly inconsistent datasets with many missing values using multi-set intersections. Based upon hash functions, ‘mulset’ can quickly identify intersections from very large matrices of input vectors across columns and rows and thus provides scalable solution for dealing with missing values. Tomic et al. (2019) <doi:10.1101/545186>.
multDM Multivariate Version of the Diebold-Mariano Test
Allows to perform the multivariate version of the Diebold-Mariano test for equal predictive ability of multiple forecast comparison. Main reference: Mariano, R.S., Preve, D. (2012) <doi:10.1016/j.jeconom.2012.01.014>.
multdyn Multiregression Dynamic Models
The Multiregression Dynamic Models (MDM) are a multivariate graphical model for a multidimensional time series that allows the estimation of time-varying effective connectivity.
multfisher Optimal Exact Tests for Multiple Binary Endpoints
Calculates exact hypothesis tests to compare a treatment and a reference group with respect to multiple binary endpoints. The tested null hypothesis is an identical multidimensional distribution of successes and failures in both groups. The alternative hypothesis is a larger success proportion in the treatment group in at least one endpoint. The tests are based on the multivariate permutation distribution of subjects between the two groups. For this permutation distribution, rejection regions are calculated that satisfy one of different possible optimization criteria. In particular, regions with maximal exhaustion of the nominal significance level, maximal power under a specified alternative or maximal number of elements can be found. Optimization is achieved by a branch-and-bound algorithm. By application of the closed testing principle, the global hypothesis tests are extended to multiple testing procedures.
multiApply Apply Functions to Multiple Multidimensional Arguments
The base apply function and its variants, as well as the related functions in the ‘plyr’ package, typically apply user-defined functions to a single argument (or a list of vectorized arguments in the case of mapply). The ‘multiApply’ package extends this paradigm to functions taking a list of multiple unidimensional or multidimensional arguments (or combinations thereof) as input, which can have different numbers of dimensions as well as different dimension lengths.
multiCA Multinomial Cochran-Armitage Trend Test
Implements a generalization of the Cochran-Armitage trend test to multinomial data. In addition to an overall test, multiple testing adjusted p-values for trend in individual outcomes and power calculation is available.
multicastR A Companion to the Multi-CAST Collection
Provides a basic interface for accessing annotation data from the Multi-CAST collection, a database of spoken natural language texts edited by Geoffrey Haig and Stefan Schnell. The collection draws from a diverse set of languages and has been annotated across multiple levels. Annotation data is downloaded on request from the servers of the Language Archive Cologne. See the Multi-CAST website <https://…/> for more information and a list of related publications.
multichull A Generic Convex-Hull-Based Model Selection Method
Given a set of models for which a measure of model (mis)fit and model complexity is provided, CHull(), developed by Ceulemans and Kiers (2006) <doi:10.1348/000711005X64817>, determines the models that are located on the boundary of the convex hull and selects an optimal model by means of the scree test values.
multicmp Flexible Modeling of Multivariate Count Data via the Multivariate Conway-Maxwell-Poisson Distribution
A toolkit containing statistical analysis models motivated by multivariate forms of the Conway-Maxwell-Poisson (COM-Poisson) distribution for flexible modeling of multivariate count data, especially in the presence of data dispersion. Currently the package only supports bivariate data, via the bivariate COM-Poisson distribution described in Sellers et al. (2016) <doi:10.1016/j.jmva.2016.04.007>. Future development will extend the package to higher-dimensional data.
multicolor Add Multiple Colors to your Console Output
Add multiple colors to text that is printed to the console.
multifluo Dealing with Several Images of a Same Object Constituted of Different Zones
Deals with several images of a same object, constituted of different zones. Each image constitutes a variable for a given pixel. The user can interactively select different zones of an image. Then, multivariate analysis (PCA) can be run in order to characterize the different selected zones, according to the different images. Hotelling (Hotelling, 1931, <doi:10.1214/aoms/1177732979>) and Srivastava (Srivastava, 2009, <doi:10.1016/j.jmva.2006.11.002>) tests can be run to detect multivariate differences between the zones.
multifwf Read Fixed Width Format Files Containing Lines of Different Type
Read a table of fixed width formatted data of different types into a data.frame for each type.
MultiGHQuad Multidimensional Gauss-Hermite Quadrature
Uses a transformed, rotated and optionally adapted n-dimensional grid of quadrature points to calculate the numerical integral of n multivariate normal distributed parameters.
multigraph Plot and Manipulate Multigraphs
Functions to plot and manipulate multigraphs and bipartite graphs with different layout options.
MultiJoin Enables Efficient Joining of Data File on Common Fields using the Unix Utility Join
Wrapper around the Unix join facility which is more efficient than the built-in R routine merge(). The package enables the joining of multiple files on disk at once. The files can be compressed and various filters can be deployed before joining. Compiles only under Unix.
multilaterals Transitive Index Numbers for Cross-Sections and Panel Data
Computing transitive (and non-transitive) index numbers (Coelli et al., 2005 <doi:10.1007/b136381>) for cross-sections and panel data. For the calculation of transitive indexes, the EKS (Coelli et al., 2005 <doi:10.1007/b136381>; Rao et al., 2002 <doi:10.1007/978-1-4615-0851-9_4>) and Minimum spanning tree (Hill, 2004 <doi:10.1257/0002828043052178>) methods are implemented. Traditional fixed-base and chained indexes, and their growth rates, can also be derived using the Paasche, Laspeyres, Fisher and Tornqvist formulas.
multilevelMatching Propensity Score Matching and Subclassification in Observational Studies with Multi-Level Treatments
Implements methods to estimate causal effects from observational studies when there are 2+ distinct levels of treatment (i.e., ‘multilevel treatment’) using matching estimators, as introduced in Yang et al. (2016) <doi:10.1111/biom.12505>. Matching on covariates, and matching or stratification on modeled propensity scores, are available. These methods require matching on only a scalar function of generalized propensity scores.
multilevelPSA Multilevel Propensity Score Analysis
Conducts and visualizes propensity score analysis for multilevel, or clustered data. Bryer & Pruzek (2011) <doi:10.1080/00273171.2011.636693>.
multimark Capture-Mark-Recapture Analysis using Multiple Non-Invasive Marks
Capture-mark-recapture analysis with multiple non-invasive marks. The models implemented in ‘multimark’ combine encounter history data arising from two different non-invasive ‘marks’, such as images of left-sided and right-sided pelage patterns of bilaterally asymmetrical species, to estimate abundance and related demographic parameters while accounting for imperfect detection. Bayesian models are specified using simple formulae and fitted using Markov chain Monte Carlo.
multimode Mode Testing and Exploring
Different examples and methods for testing (including different proposals described in Ameijeiras-Alonso et al., 2016 <arXiv:1609.05188>) and exploring (including the mode tree, mode forest and SiZer) the number of modes using nonparametric techniques.
multinet Analysis and Mining of Multilayer Social Networks
Functions for the creation/generation and analysis of multilayer social networks.
multinets Multilevel Networks Analysis
Analyze multilevel networks as described in Lazega et al (2008) <doi:10.1016/j.socnet.2008.02.001> and in Lazega and Snijders (2016, ISBN:978-3-319-24520-1). The package was developed essentially as an extension to ‘igraph’.
multinomineq Bayesian Inference for Multinomial Models with Inequality Constraints
Implements Gibbs sampling and Bayes factors for multinomial models with linear inequality constraints on the vector of probability parameters. As special cases, the model class includes models that predict a linear order of binomial probabilities (e.g., p[1] < p[2] < p[3] < .50) and mixture models assuming that the parameter vector p must be inside the convex hull of a finite number of predicted patterns (i.e., vertices). A formal definition of inequality-constrained multinomial models and the implemented computational methods is provided in: Heck, D.W., & Davis-Stober, C.P. (2019). Multinomial models with linear inequality constraints: Overview and improvements of computational methods for Bayesian inference. Journal of Mathematical Psychology, 91, 70-87. <doi:10.1016/j.jmp.2019.03.004>. Inequality-constrained multinomial models have applications in the area of judgment and decision making to fit and test random utility models (Regenwetter, M., Dana, J., & Davis-Stober, C.P. (2011). Transitivity of preferences. Psychological Review, 118, 42-56, <doi:10.1037/a0021150>) or to perform outcome-based strategy classification to select the decision strategy that provides the best account for a vector of observed choice frequencies (Heck, D.W., Hilbig, B.E., & Moshagen, M. (2017). From information processing to decisions: Formalizing and comparing probabilistic choice models. Cognitive Psychology, 96, 26-40. <doi:10.1016/j.cogpsych.2017.05.003>).
MultipleBubbles Test and Detection of Explosive Behaviors for Time Series
Provides the Augmented Dickey-Fuller test and its variations to check the existence of bubbles (explosive behavior) for time series, based on the article by Peter C. B. Phillips, Shuping Shi and Jun Yu (2015a) <doi:10.1111/iere.12131>. Some functions may take a while depending on the size of the data used, or the number of Monte Carlo replications applied.
MultiplierDEA Multiplier Data Envelopment Analysis and Cross Efficiency
Functions are provided for calculating efficiency using multiplier DEA (Data Envelopment Analysis): Measuring the efficiency of decision making units (Charnes et al., 1978 <doi:10.1016/0377-2217(78)90138-8>) and cross efficiency using single and two-phase approach. In addition, it includes some datasets for calculating efficiency and cross efficiency.
multiplyr Data Manipulation with Parallelism and Shared Memory Matrices
Provides a new form of data frame backed by shared memory matrices and a way to manipulate them. Upon creation these data frames are shared across multiple local nodes to allow for simple parallel processing.
multiRDPG Multiple Random Dot Product Graphs
Fits the Multiple Random Dot Product Graph Model and performs a test for whether two networks come from the same distribution. Both methods are proposed in Nielsen, A.M., Witten, D., (2018) ‘The Multiple Random Dot Product Graph Model’, arXiv preprint <arXiv:1811.12172> (Submitted to Journal of Computational and Graphical Statistics).
multirich Calculate Multivariate Richness via UTC and sUTC
Functions to calculate Unique Trait Combinations (UTC) and scaled Unique Trait Combinations (sUTC) as measures of multivariate richness. The package can also calculate beta-diversity for trait richness and can partition this into nestedness-related and turnover components. The code will also calculate several measures of overlap.
MultiRNG Multivariate Pseudo-Random Number Generation
Pseudo-random number generation for 11 multivariate distributions: Normal, t, Uniform, Bernoulli, Hypergeometric, Beta (Dirichlet), Multinomial, Dirichlet-Multinomial, Laplace, Wishart, and Inverted Wishart.
MultiRobust Multiply Robust Methods for Missing Data Problems
Multiply robust estimation for population mean (Han and Wang 2013) <doi:10.1093/biomet/ass087>, regression analysis (Han 2014) <doi:10.1080/01621459.2014.880058> (Han 2016) <doi:10.1111/sjos.12177> and quantile regression (Han et al. 2019) <doi:10.1111/rssb.12309>.
multiROC Calculating and Visualizing ROC Curves Across Multi-Class Classifications
Tools to solve real-world problems with multiple classes by computing the areas under ROC curve via micro-averaging and macro-averaging. The methodology is described in V. Van Asch (2013) <https://…/microaverage.pdf> and Pedregosa et al. (2011) <http://…/plot_roc.html>.
MultiRR Bias, Precision, and Power for Multi-Level Random Regressions
Calculates bias, precision, and power for multi-level random regressions. Random regressions are types of hierarchical models in which data are structured in groups and (regression) coefficients can vary by groups. Tools to estimate model performance are designed mostly for scenarios where (regression) coefficients vary at just one level. ‘MultiRR’ provides simulation and analytical tools (based on ‘lme4’) to study model performance for random regressions that vary at more than one level (multi-level random regressions), allowing researchers to determine optimal sampling designs.
multiselect Selecting Combinations of Predictors by Leveraging Multiple AUCs for an Ordered Multilevel Outcome
Uses multiple AUCs to select a combination of predictors when the outcome has multiple (ordered) levels and the focus is discriminating one particular level from the others. This method is most naturally applied to settings where the outcome has three levels. (Meisner, A, Parikh, CR, and Kerr, KF (2017) <http://…/>.)
MultisiteMediation Causal Mediation Analysis in Multisite Trials
We implement multisite causal mediation analysis using the methods proposed by Qin and Hong (in press). It enables causal mediation analysis in multisite trials, in which individuals are assigned to a treatment or a control group at each site. It allows for estimation and hypothesis testing for not only the population average but also the between-site variance of direct and indirect effects. This strategy conveniently relaxes the assumption of no treatment-by-mediator interaction while greatly simplifying the outcome model specification without invoking strong distributional assumptions.
MultiSkew Measures, Tests and Removes Multivariate Skewness
Computes the third multivariate cumulant of either the raw, centered or standardized data. Computes the main measures of multivariate skewness, together with their bootstrap distributions. Finally, computes the least skewed linear projections of the data.
multisom Clustering a Dataset using Multi-SOM Algorithm
Implements two version of Multi-SOM algorithm namely stochastic Multi-SOM and batch Multi-SOM. The package determines also the best number of clusters and offers to the user the best clustering scheme from different results.
multispatialCCM Multispatial Convergent Cross Mapping
The multispatial convergent cross mapping algorithm can be used as a test for causal associations between pairs of processes represented by time series. This is a combination of convergent cross mapping (CCM), described in Sugihara et al., 2012, Science, 338, 496-500, and dew-drop regression, described in Hsieh et al., 2008, American Naturalist, 171, 71-80. The algorithm allows CCM to be implemented on data that are not from a single long time series. Instead, data can come from many short time series, which are stitched together using bootstrapping.
multistate Fitting Multistate Models
Medical researchers are often interested in investigating the relationship between explicative variables and multiple times-to-event. Time-inhomogeneous Markov models consist of modelling the probabilities of transitions according to the chronological times (times since the baseline of the study). Semi-Markov (SM) models consist of modelling the probabilities of transitions according to the times spent in states. In this package, we propose functions implementing such 3-state and 4-state multivariable and multistate models. The user can introduce multiple covariates to estimate conditional (subject-specific) effects. We also propose to adjust for possible confounding factors by using the Inverse Probability Weighting (IPW). When a state is patient death, the user can consider to take into account the mortality of the general population (relative survival approach). Finally, in the particular situation of one initial transient state and two competing and absorbing states, this package allows for estimating mixture models.
multistateutils Utility Functions for Parametric Multi-State Models
Provides functions for working with multi-state modelling, such as efficient simulation routines for estimating transition probabilities and length of stay. It is designed as an extension to multi-state modelling capabilities provided with the ‘flexsurv’ package (see Jackson (2016) <doi:10.18637/jss.v070.i08>).
multivariance Measuring Multivariate Dependence Using Distance Multivariance
Distance multivariance is a measure of dependence which can be used to detect and quantify dependence structures. The necessary functions are implemented in this packages, and examples are given. For the theoretic background we refer to the forthcoming papers: B. Böttcher, M. Keller-Ressel, R.L. Schilling (2017) Detecting independence of random vectors I + II. Preprints.
MultivariateRandomForest Multivariate Random Forest for Linearly Related Output Features
In Random Forest prediction has been done for single output feature, while linear relation between the output features are not considered. In this package, using linear relation of the output features, a multivariate random forest prediction has been done.
MultiVarMI Multiple Imputation for Multivariate Data
Fully parametric Bayesian multiple imputation framework for massive multivariate data of different variable types as seen in Demirtas, H. (2017) <doi:10.1007/978-981-10-3307-0_8>.
MultiVarSel Variable Selection in the Multivariate Linear Model
It provides a novel variable selection approach in the multivariate framework of the general linear model taking into account the dependence that may exist between the columns of the observations matrix. For further details we refer the reader to the paper Perrot-Dockes et al. (2017), <arXiv:1704.00076>.
multiviewtest Hypothesis Test for Dependent Clusterings of Two Data Views
Implements a hypothesis test of whether clusterings of two data views are independent from Gao, L.L., Bien, J., and Witten, D. (2019) Are Clusterings of Multiple Data Views Independent? Biostatistics <DOI:10.1093/biostatistics/kxz001>.
multiwave Estimation of Multivariate Long-Memory Models Parameters
Computation of an estimation of the long-memory parameters and the long-run covariance matrix using a multivariate model (Lobato, 1999; Shimotsu 2007). Two semi-parametric methods are implemented: a Fourier based approach (Shimotsu 2007) and a wavelet based approach (Achard and Gannaz 2014).
multiway Component Models for Multi-Way Data
Fits multi-way component models via alternating least squares algorithms with optional constraints (orthogonality and non-negativity). Fit models include Individual Differences Scaling, Parallel Factor Analysis (1 and 2), Simultaneous Component Analysis, and Tucker Factor Analysis.
MuMIn Multi-Model Inference
Model selection and model averaging based on information criteria (AICc and alike).
mumm Multiplicative Mixed Models using Template Model Builder
Fit multiplicative mixed models using maximum likelihood estimation via the Template Model Builder (TMB), Kristensen K, Nielsen A, Berg CW, Skaug H, Bell BM (2016) <doi:10.18637/jss.v070.i05>. One version of the multiplicative mixed model is applied in Piepho (1999) <doi:10.1111/j.0006-341X.1999.01120.x>. The package provides functions for calculating confidence intervals for the model parameters and for performing likelihood ratio tests.
munsell Munsell colour system
Functions for exploring and using the Munsell colour system
muRL Mailmerge using R, LaTeX, and the Web
Provides mailmerge methods for reading spreadsheets of addresses and other relevant information to create standardized but customizable letters. Provides a method for mapping US ZIP codes, including those of letter recipients. Provides a method for parsing and processing html code from online job postings of the American Political Science Association.
murphydiagram Murphy Diagrams for Forecast Comparisons
Data and code for the paper by Ehm, Gneiting, Jordan and Krueger (‘Of Quantiles and Expectiles: Consistent Scoring Functions, Choquet Representations, and Forecast Rankings’, 2015).
muRty Murty’s Algorithm for k-Best Assignments
Calculates k-best solutions and costs for an assignment problem following the method outlined in Murty (1968) <doi:10.1287/opre.16.3.682>.
musica Multiscale Climate Model Assessment
Provides functions allowing for (1) easy aggregation of multivariate time series into custom time scales, (2) comparison of statistical summaries between different data sets at multiple time scales (e.g. observed and bias-corrected data), (3) comparison of relations between variables and/or different data sets at multiple time scales (e.g. correlation of precipitation and temperature in control and scenario simulation) and (4) transformation of time series at custom time scales.
mut Pairwise Likelihood Ratios
Main function LR2 calculates likelihood ratio for non-inbred relationships accounting for mutation, silent alleles and theta correction. Egeland, Pinto and Amorim (2017) <DOI:10.1016/j.fsigen.2017.04.018>.
MvBinary Modelling Multivariate Binary Data with Blocks of Specific One-Factor Distribution
Modelling Multivariate Binary Data with Blocks of Specific One-Factor Distribution. Variables are grouped into independent blocks. Each variable is described by two continuous parameters (its marginal probability and its dependency strength with the other block variables), and one binary parameter (positive or negative dependency). Model selection consists in the estimation of the repartition of the variables into blocks. It is carried out by the maximization of the BIC criterion by a deterministic (faster) algorithm or by a stochastic (more time consuming but optimal) algorithm. Tool functions facilitate the model interpretation.
mvcluster Multi-View Clustering
Implementation of multi-view bi-clustering algorithms. When a sample is characterized by two or more sets of input features, it creates multiple data matrices for the same set of examples, each corresponding to a view. For instance, individuals who are diagnosed with a disorder can be described by their clinical symptoms (one view) and their genomic markers (another view). Rows of a data matrix correspond to examples and columns correspond to features. A multi-view bi-clustering algorithm groups examples (rows) consistently across the views and simultaneously identifies the subset of features (columns) in each view that are associated with the row groups. This mvcluster package includes three such methods. (1) MVSVDL1: multi-view bi-clustering based on singular value decomposition where the left singular vectors are used to identify row clusters and the right singular vectors are used to identify features (columns) for each row cluster. Each singular vector is regularized by the L1 vector norm. (2) MVLRRL0: multi-view bi-clustering based on sparse low rank representation (i.e., matrix approximation) where the decomposed components are regularized by the so-called L0 vector norm (which is not really a vector norm). (3) MVLRRL1: multi-view bi-clustering based on sparse low rank representation (i.e., matrix approximation) where the decomposed components are regularized by the L1 vector norm.
mvdalab Multivariate Data Analysis Laboratory
Implementation of latent variable methods and multivariate modeling tools. The focus is on exploratory analyses using dimensionality reduction methods and classical multivariate statistical tools.
mvgraphnorm Multivariate Gaussian Graphical Model Analysis
Generate constrained covariance matrix for a given graph to generate samples from a Gaussian graphical model using different algorithms for the analysis of complex network structure. We use three algorithms which are (1) Kim, K. I. et. al. (2008), <doi: 10.1186/1471-2105-9-114> (2) IPF, Speed, T. et. al. (1986) <doi: 10.1214/aos/1176349846>, (3) HTF, Hastie, T. et. al. (2009) <isbn: 9780387848570>.
MVLM Multivariate Linear Model with Analytic p-Values
Allows a user to conduct multivariate multiple regression using analytic p-values rather than approximations based on Wilks’ Lambda, Pillai’s trace, etc.
mvLSW Multivariate Locally Stationary Wavelet Analysis
Tools for analysing multivariate time series with wavelets. This includes: simulation of a multivariate locally stationary wavelet (mvLSW) process from a multivatriate evolutionary wavelet spectrum (mvEWS); estimation of the mvEWS, local coherence and local partial coherence; and, estimation of the asymptotic variance for mvEWS elements. See Park, Eckley and Ombao (2014) <doi:10.1109/TSP.2014.2343937> for details.
mvmesh Multivariate Meshes and Histograms in Arbitrary Dimensions
Define, manipulate and plot meshes on simplices, spheres, balls, and rectangles for use in multivariate statistics. Directional and other multivariate histograms are provided.
mvMISE A General Framework of Multivariate Mixed-Effects Selection Models
Offers a general framework of multivariate mixed-effects models for the joint analysis of multiple correlated outcomes with clustered data structures and potential missingness proposed by Wang et al. (2018) <doi:10.1093/biostatistics/kxy022>. The missingness of outcome values may depend on the values themselves (missing not at random and non-ignorable), or may depend on only the covariates (missing at random and ignorable), or both. This package provides functions for two models: 1) mvMISE_b() allows correlated outcome-specific random intercepts with a factor-analytic structure, and 2) mvMISE_e() allows the correlated outcome-specific error terms with a graphical lasso penalty on the error precision matrix. Both functions are motivated by the multivariate data analysis on data with clustered structures from labelling-based quantitative proteomic studies. These models and functions can also be applied to univariate and multivariate analyses of clustered data with balanced or unbalanced design and no missingness.
mvMonitoring Multi-State Adaptive Dynamic Principal Component Analysis for Multivariate Process Monitoring
Use multi-state splitting to apply Adaptive-Dynamic PCA (ADPCA) to data generated from a continuous-time multivariate industrial or natural process. Employ PCA-based dimension reduction to extract linear combinations of relevant features, reducing computational burdens. For a description of ADPCA, see <doi:10.1007/s00477-016-1246-2>, the 2016 paper from Kazor et al. The multi-state application of ADPCA is from a manuscript under current revision entitled ‘Multi-State Multivariate Statistical Process Control’ by Odom, Newhart, Cath, and Hering, and is expected to appear in Q1 of 2018.
MVN Multivariate Normality Tests
Assessing the assumption of multivariate normality is required by many parametric multivariate statistical methods, such as discriminant analysis, principal component analysis, MANOVA, etc. Here, we present an R package to asses multivariate normality. The MVN package contains three widely used multivariate normality tests, including Mardia’s, Henze-Zirkler’s, Royston’s, graphical approaches, including chi-square Q-Q plot, perspective plot and contour plot and two outlier detection methods based on Mahalanobis distance. We have also developed web-tool version of the package which is available at http://…/MVN .
MVNBayesian Bayesian Analysis Framework for MVN (Mixture) Distribution
Tools of Bayesian analysis framework using the method suggested by Berger (1985) <doi:10.1007/978-1-4757-4286-2> for multivariate normal (MVN) distribution and multivariate normal mixture (MixMVN) distribution: a) calculating Bayesian posteriori of (Mix)MVN distribution; b) generating random vectors of (Mix)MVN distribution; c) Markov chain Monte Carlo (MCMC) for (Mix)MVN distribution.
mvnTest Goodness of Fit Tests for Multivariate Normality
Routines for assessing multivariate normality. Implements three Wald’s type chi-squared tests; non-parametric Anderson-Darling and Cramer-von Mises tests; Doornik-Hansen test, Royston test and Henze-Zirkler test.
mvord Multivariate Ordinal Regression Models
A flexible framework for fitting multivariate ordinal regression models with composite likelihood methods.
mvp Fast Symbolic Multivariate Polynomials
Fast manipulation of symbolic multivariate polynomials using the ‘Map’ class of the Standard Template Library. The package uses print and coercion methods from the ‘mpoly’ package (Kahle 2013, ‘Multivariate polynomials in R’. The R Journal, 5(1):162), but offers speed improvements. It is comparable in speed to the ‘spray’ package for sparse arrays, but retains the symbolic benefits of ‘mpoly’.
mvPot Multivariate Peaks-over-Threshold Modelling for Spatial Extreme Events
Tools for high-dimensional peaks-over-threshold inference and simulation of spatial processes such as the Brown–Resnick model.
mvQuad Methods for Multivariate Quadrature
Provides methods to construct multivariate grids, which can be used for multivariate quadrature. This grids can be based on different quadrature rules like Newton-Cotes formulas (trapezoidal-, Simpson’s- rule, …) or Gauss quadrature (Gauss-Hermite, Gauss-Legendre, …). For the construction of the multidimensional grid the product-rule or the combination- technique can be applied.
mvst Bayesian Inference for the Multivariate Skew-t Model
Estimates the multivariate skew-t and nested models, as described in the articles Liseo, B., Parisi, A. (2013). Bayesian inference for the multivariate skew-normal model: a population Monte Carlo approach. Comput. Statist. Data Anal. <doi:10.1016/j.csda.2013.02.007> and in Parisi, A., Liseo, B. Objective Bayesian analysis for the multivariate skew-t model (to appear).
MVT Estimation and Testing for the Multivariate t-Distribution
Routines to perform estimation and inference under the multivariate t-distribution. Currently, the following methodologies are implemented: multivariate mean and covariance estimation, hypothesis testing about the mean, equicorrelation and homogeneity of variances, the Wilson-Hilferty transformation, QQ-plots with envelopes and random variate generation. Some auxiliary functions are also provided.
mvtboost Tree Boosting for Multivariate Outcomes
Fits a multivariate model of decision trees for multiple, continuous outcome variables. A model for each outcome variable is fit separately, selecting predictors that explain that explain covariance in multiple outcomes. Package is built on top of ‘gbm’.
MWRidge Two Stage Moving-Window Ridge Method for Prediction and Estimation
A two stage moving-window Ridge method for coefficients estimation and model prediction. In the first stage, moving-window penalty and L1 penalty are applied. In the second stage, ridge regression is applied.
mwshiny Shiny’ for Multiple Windows
A simple function, mwsApp(), that runs a ‘shiny’ app spanning multiple, connected windows. This uses all standard ‘shiny’ conventions, and depends only on the ‘shiny’ package.
mxnet Deep Learning for R
The MXNet R packages brings flexible and efficient GPU computing and state-of-art deep learning to R.
• It enables you to write seamless tensor/matrix computation with multiple GPUs in R.
• It also enables you to construct and customize the state-of-art deep learning models in R, and apply them to tasks such as image classification and data science challenges.
My.stepwise Stepwise Variable Selection Procedures for Regression Analysis
The stepwise variable selection procedure (with iterations between the ‘forward’ and ‘backward’ steps) can be used to obtain the best candidate final regression model in regression analysis. All the relevant covariates are put on the ‘variable list’ to be selected. The significance levels for entry (SLE) and for stay (SLS) are usually set to 0.15 (or larger) for being conservative. Then, with the aid of substantive knowledge, the best candidate final regression model is identified manually by dropping the covariates with p value > 0.05 one at a time until all regression coefficients are significantly different from 0 at the chosen alpha level of 0.05.

N

n1qn1 Port of the ‘Scilab’ ‘n1qn1’ Module for Unconstrained BFGS Optimization
Provides ‘Scilab’ ‘n1qn1’, or Quasi-Newton BFGS ‘qn’ without constraints. This takes more memory than traditional L-BFGS. This routine is useful since it allows prespecification of a Hessian. If the Hessian is near enough the truth in optimization it can speed up the optimization problem. The algorithm is described in the ‘Scilab’ optimization documentation located at <http://…/optimization_in_scilab.pdf>.
na.tools Comprehensive Library for Working with Missing (NA) Values in Vectors
This comprehensive toolkit provide a consistent and extensible framework for working with missing values in vectors. The companion package ‘tidyimpute’ provides similar functionality for list-like and table-like structures). Functions exist for detection, removal, replacement, imputation, recollection, etc. of ‘NAs’.
NACHO NanoString Quality Control Dashboard
NanoString nCounter data are gene expression assays where there is no need for the use of enzymes or amplification protocols and work with fluorescent barcodes (Geiss et al. (2018) <doi:10.1038/nbt1385>). Each barcode is assigned a messenger-RNA/micro-RNA (mRNA/miRNA) which after bonding with its target can be counted. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA. ‘NACHO’ (NAnoString quality Control dasHbOard) is able to analyse the exported NanoString nCounter data and facilitates the user in performing a quality control. ‘NACHO’ does this by visualising quality control metrics, expression of control genes, principal components and sample specific size factors in an interactive web application.
naivebayes High Performance Implementation of the Naive Bayes Algorithm
High performance implementation of the Naive Bayes algorithm.
naivereg Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective
In empirical studies, instrumental variable (IV) regression is the signature method to solve the endogeneity problem. If we enforce the exogeneity condition of the IV, it is likely that we end up with a large set of IVs without knowing which ones are good. This package uses adaptive group lasso and B-spline methods to select the nonparametric components of the IV function, with the linear function being a special case. The package incorporates two stage least squares estimator (2SLS), generalized method of moment (GMM), generalized empirical likelihood (GEL) methods post instrument selection. It is nonparametric version of ‘ivregress’ in ‘Stata’ with IV selection and high dimensional features. The package is based on the paper ‘Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective’ (2017) published online in Journal of Business & Economic Statistics <doi:10.1080/07350015.2016.1180991>.
NAM Nested Association Mapping Analysis
Designed for association studies in nested association mapping (NAM) panels, also handling biparental and random panels. It includes functions for genome-wide associations mapping of multiple populations, marker quality control, solving mixed models and finding variance components through REML and Gibbs sampling.
namedCapture Named Capture Regular Expressions
User-friendly wrappers for named capture regular expressions.
namer Names Your ‘R Markdown’ Chunks
It names the ‘R Markdown’ chunks of files based on the filename.
naniar Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. ‘naniar’ provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of ‘ggplot2’ and tidy data.
nanotime Nanosecond-Resolution Time for R
Full 64-bit resolution date and time support with resolution up to nanosecond granularity is provided, with easy transition to and from the standard ‘POSIXct’ type.
naptime A Robust Flexible Sys.sleep() Replacement
Provides a near drop-in replacement for base::Sys.sleep() that allows more types of input to produce delays in the execution of code and silences/prevents typical sources of error.
narray R Package for Handling Arrays in a Consistent Manner
Provides functions to query and manipulate arrays of arbitrary dimensions.
natural Estimating the Error Variance in a High-Dimensional Linear Model
Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <arXiv:1712.02412>.
nauf Regression with NA Values in Unordered Factors
Fits regressions where unordered factors can be set to NA in subsets of the data where they are not applicable or otherwise not contrastive by using sum contrasts and setting NA values to zero.
NB.MClust Negative Binomial Model-Based Clustering
Model-based clustering of high-dimensional nonnegative data that follow Generalized Negative Binomial distribution. All functions in this package applies to either continuous or integer data. Correlation between variables is allowed, while samples are assumed to be independent.
nbc4va Bayes Classifier for Verbal Autopsy Data
An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <DOI:10.1186/s12916-015-0521-2>.
nbconvertR Vignette Engine Wrapping IPython Notebooks
Calls the ‘Jupyter’/’IPython’ script ‘nbconvert’ to create vignettes from notebooks. Those notebooks (‘.ipynb’ files) are files containing rich text, code, and its output. Code cells can be edited and evaluated interactively. See <http://…/notebook.html> for more information.
NCA Necessary Condition Analysis
Performs a Necessary Condition Analysis (NCA). (Dul, J. 2015. Necessary Condition Analysis (NCA). ‘Logic and Methodology of ‘Necessary but not Sufficient’ causality.’ Organizational Research Methods) NCA identifies necessary (but not sufficient) conditions in datasets. Instead of drawing a regression line ‘through the middle of the data’ in an xy-plot, NCA draws the ceiling line. The ceiling line y = f(x) separates the area with observations from the area without observations. (Nearly) all observations are below the ceiling line: y <= f(x). The empty zone is in the upper left hand corner of the xy-plot (with the convention that the x-axis is ‘horizontal’ and the y-axis is ‘vertical’ and that values increase ‘upwards’ and ‘to the right’). The ceiling line is a (piecewise) linear non-decreasing line: a linear step function or a straight line. It indicates which level of x (e.g. an effort or input) is necessary but not sufficient for a (desired) level of y (e.g. good performance or output).
ncappc NCA Calculations and Population Model Diagnosis
A flexible tool that can perform (i) traditional non-compartmental analysis (NCA) and (ii) Simulation-based posterior predictive checks for population pharmacokinetic (PK) and/or pharmacodynamic (PKPD) models using NCA metrics.
ncdump Extract Metadata from ‘NetCDF’ Files as Data Frames
Tools for handling ‘NetCDF’ metadata in data frames. The metadata is provided as relations in tabular form, to avoid having to scan printed header output or to navigate nested lists of raw metadata.
ncodeR Techniques for Automated Classifiers
A set of techniques that can be used to develop, validate, and implement automated classifiers. A powerful tool for transforming raw data into meaningful information, ‘ncodeR’ (Shaffer, D. W. (2017) Quantitative Ethnography. ISBN: 0578191687) is designed specifically for working with big data: large document collections, logfiles, and other text data.
nCopula Hierarchical Archimedean Copulas Constructed with Multivariate Compound Distributions
Construct and manipulate hierarchical Archimedean copulas with multivariate compound distributions. The model used is the one of Cossette et al. (2017) <doi:10.1016/j.insmatheco.2017.06.001>.
ncpen Nonconvex Penalized Estimation for Generalized Linear Models
An efficient unified algorithm for estimating the nonconvex penalized linear, logistic and Poisson regression models. The unified algorithm is implemented based on the convex concave procedure and the algorithm can be applied to most of the existing nonconvex penalties. The algorithm also supports convex penalty: least absolute shrinkage and selection operator (LASSO). Supported nonconvex penalties include smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP), truncated LASSO penalty (TLP), clipped LASSO (CLASSO), sparse ridge (SRIDGE), modified bridge (MBRIDGE) and modified log (MLOG). For a data set with many variables (high-dimensional data), the algorithm selects relevant variables producing a parsimonious regression model. Kwon, S., Lee, S. and Kim, Y. (2015) <doi:10.1016/j.csda.2015.07.001>, Lee, S., Kwon, S. and Kim, Y. (2016) <doi:10.1016/j.csda.2015.08.019>. (This project is funded by Julian Virtue Professorship from Center for Applied Research at Graziadio School of Business and Management at Pepperdine University.)
NCSampling Nearest Centroid (NC) Sampling
Provides functionality for performing Nearest Centroid (NC) Sampling. The NC sampling procedure was developed for forestry applications and selects plots for ground measurement so as to maximize the efficiency of imputation estimates. It uses multiple auxiliary variables and multivariate clustering to search for an optimal sample. Further details are given in Melville G. & Stone C. (2016) <doi:10.1080/00049158.2016.1218265>.
ndjson Wicked-Fast Streaming ‘JSON’ (‘ndjson’) Reader
Streaming ‘JSON’ (‘ndjson’) has one ‘JSON’ record per-line and many modern ‘ndjson’ files contain large numbers of records. These constructs may not be columnar in nature, but it’s often useful to read in these files and ‘flatten’ the structure out to work in an R data.frame-like context. Functions are provided that make it possible to read in plain ‘ndjson’ files or compressed (‘gz’) ‘ndjson’ files and either validate the format of the records or create ‘flat’ data.table (‘tbl_dt’) structures from them.
ndl Naive Discriminative Learning
Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.
NDP Interactive Presentation for Working with Normal Distribution
An interactive presentation on the topic of normal distribution using ‘rmarkdown’ and ‘shiny’ packages. It is helpful to those who want to learn normal distribution quickly and get a hands on experience. The presentation has a template for solving problems on normal distribution. Runtime examples are provided in the package function as well as at <https://…/>.
nearfar Near-Far Matching
Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable.
neat Network Enrichment Analysis Test (NEAT)
Includes functions and examples to compute NEAT, a network-based test for gene enrichment analysis.
neatmaps Heatmaps for Multiple Network Data
Simplify the exploratory data analysis process for multiple network data sets with the help of hierarchical clustering and heatmaps. Contains the tools necessary to convert the raw data of multiple networks into a single dynamic report that summarizes many of the relationships of their graph, node and structural characteristics.
needmining A Simple Needmining Implementation
Showcasing needmining (the semi-automatic extraction of customer needs from social media data) with Twitter data. It uses the handling of the Twitter API provided by the package ‘rtweet’ and the textmining algorithms provided by the packages ‘RTextTools’ and ‘tm’. Niklas Kuehl (2016) <doi:10.1007/978-3-319-32689-4_14> wrote an introduction to the topic of needmining.
needs Attaches and Installs Packages
A simple function for easier package loading and auto-installation.
NegBinBetaBinreg Negative Binomial and Beta Binomial Bayesian Regression Models
The Negative Binomial regression with mean and shape modeling and mean and variance modeling and Beta Binomial regression with mean and dispersion modeling.
neo4r A ‘Neo4J’ Driver
A Modern and Flexible ‘Neo4J’ Driver, allowing you to query data on a ‘Neo4J’ server and handle the results in R. It’s modern in the sense it provides a driver that can be easily integrated in a data analysis workflow, especially by providing an API working smoothly with other data analysis and graph packages. It’s flexible in the way it returns the results, by trying to stay as close as possible to the way ‘Neo4J’ returns data. That way, you have the control over the way you will compute the results. At the same time, the result is not too complex, so that the ‘heavy lifting’ of data wrangling is not left to the user.
neonUtilities Utilities for Working with NEON Data
NEON data packages can be accessed through the NEON Data Portal <http://data.neonscience.org> or through the NEON Data API (see <http://…/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://…/NEON-utilities>.
NestedCategBayesImpute Modeling and Generating Synthetic Versions of Nested Categorical Data in the Presence of Impossible Combinations
This tool set provides a set of functions to fit the nested Dirichlet process mixture of products of multinomial distributions (NDPMPM) model for nested categorical household data in the presence of impossible combinations. It has direct applications in generating synthetic nested household data.
netchain Inferring Causal Effects on Collective Outcomes under Interference
In networks, treatments may spill over from the treated individual to his or her social contacts and outcomes may be contagious over time. Under this setting, causal inference on the collective outcome observed over all network is often of interest. We use chain graph models approximating the projection of the full longitudinal data onto the observed data to identify the causal effect of the intervention on the whole outcome. Justification of such approximation is demonstrated in Ogburn et al. (2018) <arXiv:1812.04990>.
netcoh Statistical Modeling with Network Cohesion
Model fitting procedures for regression with network cohesion effects, when a network connecting sample individuals is available in a regression problem. In the future, other commonly used statistical models will be added, such as gaussian graphical model.
netCoin Interactive Networks with R
Create interactive networked coincidences. It joins the data analysis power of R to study coincidences and the visualization libraries of JavaScript in one package.
netdep Testing for Network Dependence
When network dependence is present, that is when social relations can engender dependence in the outcome of interest, treating such observations as independent results in invalid, anti-conservative statistical inference. We propose a test of independence among observations sampled from a single network <arXiv:1710.03296>.
netdiffuseR Network Analysis for Diffusion of Innovations
Empirical statistical analysis, visualization and simulation of network models of the diffusion of innovations. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN:9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
netgen Network Generator for Combinatorial Graph Problems
Methods for the generation of a wide range of network geographies, e.g., grid networks or clustered networks. Useful for the generation of benchmarking instances for the investigation of, e.g., Vehicle-Routing-Problems or Travelling Salesperson Problems.
netjack Tools for Working with Samples of Networks
Tools for managing large sets of network data and performing whole network analysis. This package is focused on the network based statistic jackknife method, and implements a framework that can be extended to other network manipulations and analyses.
NetOrigin Origin Estimation for Propagation Processes on Complex Networks
Performs network-based source estimation. Different approaches are available: effective distance median, recursive backtracking, and centrality-based source estimation. Additionally, we provide public transportation network data as well as methods for data preparation, source estimation performance analysis and visualization.
netrankr Analyzing Partial Rankings in Networks
Implements methods for centrality related analyses of networks. While the package includes the possibility to build more than 20 indices, its main focus lies on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. These partial rankings can be analyzed with different methods, including probabilistic methods like computing expected node ranks and relative rank probabilities (how likely is it that a node is more central than another?). The methodology is described in depth in the vignettes and in Schoch (2018) <doi:10.1016/j.socnet.2017.12.003>.
netregR Regression of Network Responses
Regress network responses (both directed and undirected) onto covariates of interest that may be actor-, relation-, or network-valued. In addition, compute principled variance estimates of the coefficients assuming that the errors are jointly exchangeable. Missing data is accommodated. Additionally implements building and inversion of covariance matrices under joint exchangeability, and generates random covariance matrices of this class. For more detail on methods, see Marrs (2017) <arXiv:1701.05530>.
NetRep Permutation Testing Network Module Preservation Across Datasets
Functions for assessing the replication/preservation of a network module’s topology across datasets through permutation testing.
nettools A Network Comparison Framework
A collection of network inference methods for co-expression networks, quantitative network distances and a novel framework for network stability analysis.
NetWeaver Graphic Presentation of Complex Genomic and Network Data Analysis
Implements various simple function utilities and flexible pipelines to generate circular images for visualizing complex genomic and network data analysis features.
networkABC Network Reverse Engineering with Approximate Bayesian Computation
We developed an inference tool based on approximate Bayesian computation to decipher network data and assess the strength of the inferred links between network’s actors. It is a new multi-level approximate Bayesian computation (ABC) approach. At the first level, the method captures the global properties of the network, such as scale-freeness and clustering coefficients, whereas the second level is targeted to capture local properties, including the probability of each couple of genes being linked. Up to now, Approximate Bayesian Computation (ABC) algorithms have been scarcely used in that setting and, due to the computational overhead, their application was limited to a small number of genes. On the contrary, our algorithm was made to cope with that issue and has low computational cost. It can be used, for instance, for elucidating gene regulatory network, which is an important step towards understanding the normal cell physiology and complex pathological phenotype. Reverse-engineering consists in using gene expressions over time or over different experimental conditions to discover the structure of the gene network in a targeted cellular process. The fact that gene expression data are usually noisy, highly correlated, and have high dimensionality explains the need for specific statistical methods to reverse engineer the underlying network.
NetworkChange Bayesian Package for Network Changepoint Analysis
Network changepoint analysis for undirected network data. The package implements a hidden Markov multilinear tenstor regression model (Park and Sohn, 2017, <http://…/NetworkChange.pdf> ). Functions for break number detection using the approximate marginal likelihood and WAIC are also provided.
NetworkComparisonTest Statistical Comparison of Two Networks Based on Three Invariance Measures
This permutation based hypothesis test, suited for Gaussian and binary data, assesses the difference between two networks based on several invariance measures (network structure invariance, global strength invariance, edge invariance). Network structures are estimated with l1-regularized partial correlations (Gaussian data) or with l1-regularized logistic regression (eLasso, binary data). Suited for comparison of independent and dependent samples (currently, only for one group measured twice).
networkD3 Tools for Creating D3 JavaScript Network Graphs from R
Creates D3 JavaScript network, tree, dendrogram, and Sankey graphs from R.
NetworkDistance Distance Measures for Networks
Network is a prevalent form of data structure in many fields. As an object of analysis, many distance or metric measures have been proposed to define the concept of similarity between two networks. We provide a number of distance measures for networks. See Jurman et al (2011) <doi:10.3233/978-1-60750-692-8-227> for an overview on spectral class of inter-graph distance measures.
NetworkInference Inferring Latent Diffusion Networks
This is an R implementation of the netinf algorithm (Gomez Rodriguez, Leskovec, and Krause, 2010)<doi:10.1145/1835804.1835933>. Given a set of events that spread between a set of nodes the algorithm infers the most likely stable diffusion network that is underlying the diffusion process.
networkR Network Analysis and Visualization
Collection of functions for fast manipulation, handling, and analysis of large-scale networks based on family and social data. Functions are utility functions used to manipulate data in three ‘formats’: sparse adjacency matrices, pedigree trio family data, and pedigree family data. When possible, the functions should be able to handle millions of data points quickly for use in combination with data from large public national registers and databases. Kenneth Lange (2003, ISBN:978-8181281135).
NetworkRiskMeasures Risk Measures for (Financial) Networks
Implements some risk measures for (financial) networks, such as DebtRank, Impact Susceptibility, Impact Diffusion and Impact Fluidity.
NetworkToolbox Network Filtering Methods and Measures
Implements numerous network filtering methods (TMFG; Massara, Di Matteo, & Aste (2016) <doi:10.1093/comnet/cnw015>, MaST; Chu & Liu (1965) and Edmonds (1967) <doi:10.6028/jres.071B.032>, ECO; Fallani, Latora, & Chavez (2017) <doi:10.1371/journal.pcbi.1005305>, and ECO+MaST; Fallani, Latora, & Chavez (2017) <doi:10.1371/journal.pcbi.1005305>), and several network measures (centrality, characteristic path length, clustering coefficient, and edge replication; Rubinov and Sporns (2010) <doi:10.1016/j.neuroimage.2009.10.003>).
networktools Assorted Tools for Identifying Important Nodes in Networks (Impact, Expected Influence)
Includes assorted tools for network analysis. Specifically, includes functions for calculating impact statistics, which aim to identify how each node impacts the overall network structure (global strength impact, network structure impact, edge impact), and for calculating and visualizing expected influence.
networktree Recursive Partitioning of Network Models
Methods to create tree models with correlation-based network models (multivariate normal distributions).
neural Neural Networks
RBF and MLP neural networks with graphical user interface
neuralnet Training of neural networks
Training of neural networks using backpropagation, resilient backpropagation with (Riedmiller, 1994) or without weight backtracking (Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis et al. (2005). The package allows flexible settings through custom-choice of error and activation function. Furthermore, the calculation of generalized weights (Intrator O & Intrator N, 1993) is implemented.
NeuralNetTools Visualization and Analysis Tools for Neural Networks
Visualization and analysis tools to aid in the interpretation of neural network models. Functions are available for plotting, quantifying variable importance, conducting a sensitivity analysis, and obtaining a simple list of model weights.
neutralitytestr Test for a Neutral Evolutionary Model in Cancer Sequencing Data
Package takes frequencies of mutations as reported by high throughput sequencing data from cancer and fits a theoretical neutral model of tumour evolution. Package outputs summary statistics and contains code for plotting the data and model fits. See Williams et al 2016 <doi:10.1038/ng.3489> and Williams et al 2017 <doi:10.1101/096305> for further details of the method.
neverhpfilter A Better Alternative to the Hodrick-Prescott Filter
In the working paper titled ‘Why You Should Never Use the Hodrick-Prescott Filter’, James D. Hamilton proposes an interesting new alternative to economic time series filtering. The neverhpfilter package provides functions for implementing his solution. Hamilton (2017) <doi:10.3386/w23429>.
newsmap Semi-Supervised Model for Geographical Document Classification
Semi-supervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, Spanish, Japanese and Russian.
newTestSurvRec Statistical Tests to Compare Curves with Recurrent Events
Implements the routines to compare the survival curves with recurrent events, including the estimations of survival curves. The first model is a model for recurrent event, when the data are correlated or not correlated. It was proposed by Wang and Chang (1999) <doi:10.2307/2669690>. In the independent case, the survival function can be estimated by the generalization of the limit product model of Pena (2001) <doi:10.1198/016214501753381922>.
nFCA Numerical Formal Concept Analysis for Systematic Clustering
Numerical Formal Concept Analysis (nFCA) is a modern unsupervised learning tool for analyzing general numerical data. Given input data, this R package nFCA outputs two nFCA graphs: a H-graph and an I-graph that reveal systematic, hierarchical clustering and inherent structure of the data.
NFP Network Fingerprint Framework in R
An implementation of the network fingerprint framework. This method worked by making systematic comparisons to a set of well-studied ‘basic networks’, measuring both the functional and topological similarity. A biological could be characterized as a spectrum-like vector consisting of similarities to basic networks. It shows great potential in biological network study.
ngram An n-gram Babbler
This package offers utilities for creating, displaying, and ‘babbling’ n-grams. The babbler is a simple Markov process.
ngramrr A Simple General Purpose N-Gram Tokenizer
A simple n-gram (contiguous sequences of n items from a given sequence of text) tokenizer to be used with the ‘tm’ package with no ‘rJava’/’RWeka’ dependency.
ngspatial Fitting the centered autologistic and sparse spatial generalized linear mixed models for areal data
ngspatial provides tools for analyzing spatial data, especially non-Gaussian areal data. The current version supports the sparse spatial generalized linear mixed model of Hughes and Haran (2013) and the centered autologistic model of Caragea and Kaiser (2009).
ngstk Next-Generation Sequencing (NGS) Data Analysis Toolkit
Can be used to facilitate the analysis of NGS data, such as visualization, conversion of data format for WEB service input and other purpose.
NHMSAR Non-Homogeneous Markov Switching Autoregressive Models
Calibration, simulation, validation of (non-)homogeneous Markov switching autoregressive models with Gaussian or von Mises innovations. Penalization methods are implemented for Markov Switching Vector Autoregressive Models of order 1 only.
nhstplot Plot Null Hypothesis Significance Tests
Illustrate graphically the most common Null Hypothesis Significance Testing procedures. More specifically, this package provides functions to plot Chi-Squared, F, t (one- and two-tailed) and z (one- and two-tailed) tests, by plotting the probability density under the null hypothesis as a function of the different test statistic values. Although highly flexible (color theme, fonts, etc.), only the minimal number of arguments (observed test statistic, degrees of freedom) are necessary for a clear and useful graph to be plotted, with the observed test statistic and the p value, as well as their corresponding value labels. The axes are automatically scaled to present the relevant part and the overall shape of the probability density function. This package is especially intended for education purposes, as it provides a helpful support to help explain the Null Hypothesis Significance Testing process, its use and/or shortcomings.
nilde Nonnegative Integer Solutions of Linear Diophantine Equations with Applications
Routines for enumerating all existing nonnegative integer solutions of a linear Diophantine equation. The package provides routines for solving 0-1, bounded and unbounded knapsack problems; 0-1, bounded and unbounded subset sum problems; and a problem of additive partitioning of natural numbers.
nima Nima Hejazi’s Miscellaneous R Code
Miscellaneous R functions developed over the course of statistical research. These include utilities that supplement the existing idiosyncrasies of R; extend plotting functionality and aesthetics; provide alternative presentations of matrix decompositions; extend types of random variables supported for simulation; extend access to command line tools and system information, making work on remote systems easier.
nimble An R package for programming with BUGS models and compiling parts of R.
NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods. NIMBLE is built in R but compiles your models and algorithms using C++ for speed. It includes three components:
1. A system for using models written in the BUGS language as programmable objects in R.
2. An initial library of algorithms for BUGS models, including basic MCMC, which can be used directly or can be customized from R before being compiled and run.
3. A language embedded in R for programming algorithms for BUGS models, both of which are compiled through C++ code and loaded into R.
NIMBLE can also be used without BUGS models as a way to compile simple R-like code into C++, which is then compiled and loaded into R with an interface function or object.
nipals Principal Components Analysis using NIPALS with Gram-Schmidt Orthogonalization
Principal Components Analysis of a matrix using Non-linear Iterative Partial Least Squares with Gram-Schmidt orthogonalization of the scores and loadings. Optimized for speed. See Andrecut (2009) <doi:10.1089/cmb.2008.0221>.
NIRStat Novel Statistical Methods for Studying Near-Infrared Spectroscopy (NIRS) Time Series Data
Provides transfusion-related differential tests on Near-infrared spectroscopy (NIRS) time series with detection limit, which contains two testing statistics: Mean Area Under the Curve (MAUC) and slope statistic. This package applied a penalized spline method within an imputation setting. Testing is conducted by a nested permutation approach within imputation. Refer to Guo et al (2018) <arXiv:1801.08153> for further details.
NITPicker Finds the Best Subset of Points to Sample
Given a few examples of experiments over a time (or spatial) course, ‘NITPicker’ selects a subset of points to sample in follow-up experiments, which would (i) best distinguish between the experimental conditions and the control condition (ii) best distinguish between two models of how the experimental condition might differ from the control (iii) a combination of the two. Ezer and Keir (2018) <doi:10.1101/301796>.
NlcOptim Solve Nonlinear Optimization with Nonlinear Constraints
Optimization for nonlinear objective and constraint functions. Linear or nonlinear equality and inequality constraints are allowed. It accepts the input parameters as a constrained matrix.
nlcv Nested Loop Cross Validation
Nested loop cross validation for classification purposes for misclassification error rate estimation. The package supports several methodologies for feature selection: random forest, Student t-test, limma, and provides an interface to the following classification methods in the ‘MLInterfaces’ package: linear, quadratic discriminant analyses, random forest, bagging, prediction analysis for microarray, generalized linear model, support vector machine (svm and ksvm). Visualizations to assess the quality of the classifier are included: plot of the ranks of the features, scores plot for a specific classification algorithm and number of features, misclassification rate for the different number of features and classification algorithms tested and ROC plot. For further details about the methodology, please check: Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann (2004) <doi:10.2202/1544-6115.1078>.
NlinTS Non Linear Time Series Analysis
The main functionalities of this package are about time series forecasting and causality detection. In particular, it provides a neural network Vector Auto-Regressive, the classical Granger causality test C.W.J.Granger (1980) <doi:10.1016/0165-1889(80)90069-X>, and a non-linear version of it.
nlirms Non-Life Insurance Rate-Making System
Design of non-life insurance rate-making system with a frequency and a severity component based on the a posteriori criteria. The rate-making system is a general form of bonus-malus system introduced by Lemaire (1995), <doi:10.1007/978-94-011-0631-3> and Frangos and Vrontos (2001), <doi:10.2143/AST.31.1.991>.
nlme Linear and Nonlinear Mixed Effects Models
Fit and compare Gaussian linear and nonlinear mixed-effects models.
nlmixr Nonlinear Mixed Effects Models in Population Pharmacokinetics and Pharmacodynamics
Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the ‘RxODE’ package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>).
NLMR Simulating Neutral Landscape Models
Provides neutral landscape models (Gardner et al. 1987 <doi:10.1007/BF02275262>, With 1997 <doi:10.1046/j.1523-1739.1997.96210.x>) that can easily extend in existing landscape analyses. Neutral landscape models range from ‘hard’ neutral models (only random functions) to ‘soft’ ones (with parameters) and generate landscape patterns that are not grounded in ecological reasoning. Thus, these patterns can be used as null models in landscape ecology. ‘NLMR’ combines a large number of algorithms from published software (Saura & Martínez 2000 <doi:10.1023/A:1008107902848>, Etherington et al. 2015 <doi:10.1111/2041-210X.12308>) for simulating neutral landscapes and includes utility functions to classify and combine the landscapes. The simulation results are obtained in a geospatial data format (raster* objects from the ‘raster’ package) and can, therefore, be used in any sort of raster data operation that is performed with standard observation data.
nlnet Nonlinear Network Reconstruction and Clustering Based on DCOL (Distance Based on Conditional Ordered List)
It includes three methods: K-profiles clustering, non-linear network reconstruction, and non-linear hierarchical clustering.
NLP Natural Language Processing Infrastructure
Basic classes and methods for Natural Language Processing.
nlr Nonlinear Regression Modelling using Robust Methods
Non-Linear Robust package is developed to handle the problem of outliers in nonlinear regression, using robust statistics. It covers classic methods in nonlinear regression as well. It has facilities to fit models in the case of auto correlated and heterogeneous variance cases, while it include tools to detecting outliers in nonlinear regression. (Riazoshams H, Midi H, and Ghilagaber G, (2018, ISBN:978-1-118-73806-1). Robust Nonlinear Regression, with Application using R, Joh Wiley and Sons.)
nlrr Non-Linear Relative Risk Estimation and Plotting
Estimate the non-linear odds ratio and plot it against a continuous exposure.
nlrx Setup, Run and Analyze ‘NetLogo’ Model Simulations from ‘R’ via ‘XML’
The purpose of this package is to provide tools to setup, run and analyze ‘NetLogo’ (<https://…/> ) model simulations in ‘R’. ‘nlrx’ experiments use a similar structure as ‘NetLogos’ Behavior Space experiments. However, ‘nlrx’ offers more flexibility and additional tools for running and analyzing complex simulation designs and sensitivity analyses. The user defines all information that is needed in an intuitive framework, using class objects. Experiments are submitted from ‘R’ to ‘NetLogo’ via ‘XML’ files that are dynamically written, based on specifications defined by the user. By nesting model calls in future environments, large simulation design with many runs can be executed in parallel. This also enables simulating ‘NetLogo’ experiments on remote HPC machines. In order to use this package, ‘Java’ and ‘NetLogo’ (>= 5.3.1) need to be available on the executing system.
nls.multstart Robust Non-Linear Regression using AIC Scores
Non-linear least squares regression with the Levenberg-Marquardt algorithm using multiple starting values for increasing the chance that the minimum found is the global minimum.
nlsem Fitting Structural Equation Mixture Models
Estimation of structural equation models with nonlinear effects and underlying nonnormal distributions.
nlshelper Convenient Functions for Non-Linear Regression
A few utilities for summarizing, testing, and plotting non-linear regression models fit with nls(), nlsList() or nlme().
nlshrink Non-Linear Shrinkage Estimation of Population Eigenvalues and Covariance Matrices
Non-linear shrinkage estimation of population eigenvalues and covariance matrices, based on publications by Ledoit and Wolf (2004, 2015, 2016).
nlsr Functions for Nonlinear Least Squares Solutions
Provides tools for working with nonlinear least squares problems. It is intended to eventually supersede the nls() function in the R distribution. For example, nls() specifically does NOT deal with small or zero residual problems. Its Gauss-Newton method frequently stops with ‘singular gradient’ messages.
nlsrk Runge-Kutta Solver for Function nls()
Performs univariate or multivariate computation of a single ODE or of a set of ODE (ordinary differential equations).
nlstimedist Non-Linear Model Fitting of Time Distribution of Biological Phenomena
Fit biologically meaningful distribution functions to time-sequence data (phenology), estimate parameters to draw the cumulative distribution function and probability density function and calculate standard statistical moments and percentiles.
nlstools Tools for Nonlinear Regression Analysis
Several tools for assessing the quality of fit of a gaussian nonlinear model are provided.
nmadb Network Meta-Analysis Database API
Set of functions for accessing database of network meta-analyses described in Petropoulou M, et al. Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015 <doi:10.1016/j.jclinepi.2016.11.002>. The database is hosted in a REDcap database at the Institute of Social and Preventive Medicine (ISPM) in the University of Bern.
nmaINLA Network Meta-Analysis using Integrated Nested Laplace Approximations
Performs network meta-analysis using integrated nested Laplace approximations (‘INLA’). Includes methods to assess the heterogeneity and inconsistency in the network. Contains more than ten different network meta-analysis data. Installation of R package ‘INLA’ is compulsory for successful usage. ‘INLA’ package can be obtained from <http://www.r-inla.org>. We recommend the testing version.
nmathresh Thresholds and Invariant Intervals for Network Meta-Analysis
Calculation and presentation of decision-invariant bias adjustment thresholds and intervals for Network Meta-Analysis, as described by Phillippo et al. (2017) <doi:10.1111/rssa.12341>. These describe the smallest changes to the data that would result in a change of decision.
NMF Algorithms and Framework for Nonnegative Matrix Factorization (NMF)
Provides a framework to perform Non-negative Matrix Factorization (NMF). The package implements a set of already published algorithms and seeding methods, and provides a framework to test, develop and plug new/custom algorithms. Most of the built-in algorithms have been optimized in C++, and the main interface function provides an easy way of performing parallel computations on multicore machines.
nmfem NMF-EM Algorithm
Provides a version of the Expectation-Maximization algorithm for mix-models, reducing the numbers of parameters to estimate using Non-negative Matrix Factorization methods. For more explanations, see pre-print of Carel and Alquier (2017) <arXiv:1709.03346>.
nmfgpu4R Non-Negative Matrix Factorization (NMF) using CUDA
Wrapper package for the nmfgpu library, which implements several Non-negative Matrix Factorization (NMF) algorithms for CUDA platforms. By using the acceleration of GPGPU computing, the NMF can be used for real-world problems inside the R environment. All CUDA devices starting with Kepler architecture are supported by the library.
NMI Normalized Mutual Information of Community Structure in Network
Calculates the normalized mutual information (NMI) of two community structures in network analysis.
Nmisc Miscellaneous Functions Used at ‘Numeract LLC’
Contains functions useful for debugging, set operations on vectors, and ‘UTC’ date and time functionality. It adds a few vector manipulation verbs to ‘purrr’ and ‘dplyr’ packages. It can also generate an R file to install and update packages to simplify deployment into production. The functions were developed at the data science firm ‘Numeract LLC’ and are used in several packages and projects.
nmslibR Non Metric Space (Approximate) Library
A Non-Metric Space Library (‘NMSLIB’ <https://…/nmslib> ) wrapper, which according to the authors ‘is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the ‘NMSLIB’ <https://…/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic non-metric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on approximate methods’. The wrapper also includes Approximate Kernel k-Nearest-Neighbor functions based on the ‘NMSLIB’ <https://…/nmslib> ‘Python’ Library.
nna Nearest-Neighbor Analysis
Calculates spatial pattern analysis using a T-square sample procedure. This method is based on two measures ‘x’ and ‘y’. ‘x’ – Distance from the random point to the nearest individual. ‘y’ – Distance from individual to its nearest neighbor. This is a methodology commonly used in phytosociology or marine benthos ecology to analyze the species’ distribution (random, uniform or clumped patterns). Ludwig & Reynolds (1988, ISBN:0471832359).
nnet Feed-Forward Neural Networks and Multinomial Log-Linear Models
Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
nnetpredint Prediction Intervals of Multi-Layer Neural Networks
Computing prediction intervals of neural network models (e.g.backpropagation) at certain confidence level. It can take the output from models trained by other packages like ‘nnet’, ‘neuralnet’, ‘RSNNS’, etc.
nnfor Time Series Forecasting with Neural Networks
Contains functions to facilitate automatic time series modelling with neural networks. Allows fully automatic, semi-manual or fully manual specification of networks. For details of the specification methodology see: (i) Crone and Kourentzes (2010) <doi:10.1016/j.neucom.2010.01.017>; and (ii) Kourentzes et al. (2014) <doi:10.1016/j.eswa.2013.12.011>.
nngeo k-Nearest Neighbor Join for Spatial Data
K-nearest neighbor search for projected and non-projected ‘sf’ spatial layers. Nearest neighbor search uses (1) function nn2() from package ‘RANN’ for projected point data, or (2) function st_distance() from package ‘sf’ for other types of spatial data.
nnlasso Non-Negative Lasso and Elastic Net Penalized Generalized Linear Models
Estimates of coefficients of lasso penalized linear regression and generalized linear models subject to non-negativity constraints on the parameters using multiplicative iterative algorithm. Entire regularization path for a sequence of lambda values can be obtained. Functions are available for creating plots of regularization path, cross validation and estimating coefficients at a given lambda value. There is also provision for obtaining standard error of coefficient estimates.
NNLM Fast and Versatile Non-Negative Matrix Factorization
This is a package for Non-Negative Linear Models (NNLM). It implements fast sequential coordinate descent algorithms for non-negative linear regression and non-negative matrix factorization (NMF). It supports mean square error and Kullback-Leibler divergence loss. Many other features are also implemented, including missing value imputation, domain knowledge integration, designable W and H matrices and multiple forms of regularizations.
NNMIS Nearest Neighbor Based Multiple Imputation for Survival Data with Missing Covariates
Imputation for both missing covariates and censored observations (optional) for survival data with missing covariates by the nearest neighbor based multiple imputation algorithm as described in Hsu et al. (2006) <doi:10.1002/sim.2452>, Long et al. (2012) <doi:10.5705/ss.2010.069>, Hsu et al. (2014) <doi:10.1080/10543406.2014.888444>, and Hsu and Yu (2017) <arXiv:1710.04721>. Note that the current version can only impute for a situation with one missing covariate.
NNS Nonlinear Nonparametric Statistics
Nonlinear nonparametric statistics using partial moments.
nnTensor Non-Negative Tensor Decomposition
Some functions for performing non-negative matrix factorization, non-negative CANDECOMP/PARAFAC (CP) decomposition, non-negative Tucker decomposition, and generating toy model data. See Andrzej Cichock et al (2009) <doi:10.1002/9780470747278> and the reference section of GitHub README.md <https://…/nnTensor>, for details of the methods.
nodbi NoSQL’ Database Connector
Simplified document database manipulation and analysis, including support for many ‘NoSQL’ databases, including document databases (‘Elasticsearch’, ‘CouchDB’, ‘MongoDB’), ‘key-value’ databases (‘Redis’), and other ‘NoSQL’ types (‘etcd’).
nofrills Low-Cost Anonymous Functions
Provides a compact variation of the usual syntax of function declaration, in order to support Tidyverse-style quasiquotation of a function’s arguments and body.
NoiseFiltersR Label Noise Filters for Data Preprocessing in Classification
An extensive implementation of state-of-the-art and classical algorithms to preprocess label noise in classification problems.
nomclust Hierarchical Nominal Clustering Package
Package for hierarchical clustering of objects characterized by nominal variables.
nomogramEx Extract Equations from a Nomogram
A nomogram can not be easily applied, because it is difficult to calculate the points or even the survival probability. The package, including a function of nomogramEx(), is to extract the polynomial equations to calculate the points of each variable, and the survival probability corresponding to the total points.
noncompliance Causal Inference in the Presence of Treatment Noncompliance Under the Binary Instrumental Variable Model
A finite-population significance test of the ‘sharp’ causal null hypothesis that treatment exposure X has no effect on final outcome Y, within the principal stratum of Compliers. A generalized likelihood ratio test statistic is used, and the resulting p-value is exact. Currently, it is assumed that there are only Compliers and Never Takers in the population.
noncomplyR Bayesian Analysis of Randomized Experiments with Non-Compliance
Functions for Bayesian analysis of data from randomized experiments with non-compliance. The functions are based on the models described in Imbens and Rubin (1997) <doi:10.1214/aos/1034276631>. Currently only two types of outcome models are supported: binary outcomes and normally distributed outcomes. Models can be fit with and without the exclusion restriction and/or the strong access monotonicity assumption. Models are fit using the data augmentation algorithm as described in Tanner and Wong (1987) <doi:10.2307/2289457>.
nonet Weighted Average Ensemble without Training Labels
It provides ensemble capabilities to supervised and unsupervised learning models predictions without using training labels. It decides the relative weights of the different models predictions by using best models predictions as response variable and rest of the mo. User can decide the best model, therefore, It provides freedom to user to ensemble models based on their design solutions.
nonlinearICP Invariant Causal Prediction for Nonlinear Models
Performs ‘nonlinear Invariant Causal Prediction’ to estimate the causal parents of a given target variable from data collected in different experimental or environmental conditions, extending ‘Invariant Causal Prediction’ from Peters, Buehlmann and Meinshausen (2016), <arXiv:1501.01332>, to nonlinear settings. For more details, see C. Heinze-Deml, J. Peters and N. Meinshausen: ‘Invariant Causal Prediction for Nonlinear Models’, <arXiv:1706.08576>.
nonmem2R Loading NONMEM Output Files and Simulate with Parameter Uncertainty
Loading NONMEM (NONlinear Mixed-Effect Modeling, <http://…/> ) output files and simulate with parameter uncertainty.
nonmemica Create and Evaluate NONMEM Models in a Project Context
Systematically creates and modifies NONMEM(R) control streams. Harvests NONMEM output, builds run logs, creates derivative data, generates diagnostics. NONMEM (ICON Development Solutions <http://…/> ) is software for nonlinear mixed effects modeling. See ‘package?nonmemica’.
nonneg.cg Non-Negative Conjugate-Gradient Minimizer
Minimize a differentiable function subject to all the variables being non-negative (i.e. >= 0), using a Conjugate-Gradient algorithm based on a modified Polak-Rubiere-Polyak formula as described in (Li, Can, 2013, <https://…/> ).
nonpar A Collection of Nonparametric Hypothesis Tests
Contains the following 5 nonparametric hypothesis tests: The Sign Test, The 2 Sample Median Test, Miller’s Jackknife Procedure, Cochran’s Q Test, & The Stuart-Maxwell Test.
nopaco Non-Parametric Concordance Coefficient: A Non-Parametric Concordance Test
A non-parametric test for multi-observer concordance and differences between concordances in (un)balanced data.
norm2 Analysis of Incomplete Multivariate Data under a Normal Model
Functions for parameter estimation, Bayesian posterior simulation and multiple imputation from incomplete multivariate data under a normal model.
NORMA Builds General Noise SVRs
Builds general noise SVR models using Naive Online R Minimization Algorithm, NORMA, an optimization method based on classical stochastic gradient descent suitable for computing SVR models in an online setting.
NormalBetaPrime Normal Beta Prime Prior
Implements Bayesian linear regression, variable selection, normal means estimation, and multiple hypothesis testing using the normal-beta prime prior, as introduced by Bai and Ghosh (2018) <arXiv:1807.02421> and Bai and Ghosh (2018) <arXiv:1807.06539>. Normal means estimation and multiple testing for the Dirichlet-Laplace <doi:10.1080/01621459.2014.960967> and horseshoe+ priors <doi:10.1214/16-BA1028> are also available in this package.
normalr Normalisation of Multiple Variables in Large-Scale Datasets
The robustness of many of the statistical techniques, such as factor analysis, applied in the social sciences rests upon the assumption of item-level normality. However, when dealing with real data, these assumptions are often not met. The Box-Cox transformation (Box & Cox, 1964) <http://…/2984418> provides an optimal transformation for non-normal variables. Yet, for large datasets of continuous variables, its application in current software programs is cumbersome with analysts having to take several steps to normalise each variable. We present an R package normalr that enables researchers to make convenient optimal transformations of multiple variables in datasets. This R package enables users to quickly and accurately: (1) anchor all of their variables at 1.00, (2) select the desired precision with which the optimal lambda is estimated, (3) apply each unique exponent to its variable, (4) rescale resultant values to within their original X1 and X(n) ranges, and (5) provide original and transformed estimates of skewness, kurtosis, and other inferential assessments of normality.
NORTARA Generation of Multivariate Data with Arbitrary Marginals
An implementation of a specific method for generating n-dimensional random vectors with given marginal distributions and correlation matrix. The method uses the NORTA (NORmal To Anything) approach which generates a standard normal random vector and then transforms it into a random vector with specified marginal distributions and the RA (Retrospective Approximation) algorithm which is a generic stochastic root-finding algorithm. The marginals can be continuous or discrete. See the vignette of package for more details.
nortestARMA Neyman Smooth Tests of Normality for the Errors of ARMA Models
Tests the goodness-of-fit to the Normal distribution for the errors of an ARMA model.
nos Compute Node Overlap and Segregation in Ecological Networks
Calculate NOS (node overlap and segregation) and the associated metrics described in Strona and Veech (2015) <DOI:10.1111/2041-210X.12395> and Strona et al. (2017; In Press, DOI to be provided in subsequent package version). The functions provided in the package enable assessment of structural patterns ranging from complete node segregation to perfect nestedness in a variety of network types. In addition, they provide a measure of network modularity.
NostalgiR Advanced Text-Based Plots
Provides functions to produce advanced ascii graphics, directly to the terminal window. This package utilizes the txtplot() function from the ‘txtplot’ package, to produce text-based histograms, empirical cumulative distribution function plots, scatterplots with fitted and regression lines, quantile plots, density plots, image plots, and contour plots.
not Narrowest-Over-Threshold Change-Point Detection
Provides efficient implementation of the Narrowest-Over-Threshold methodology for detecting an unknown number of change-points occurring at unknown locations in one-dimensional data following ‘deterministic signal + noise’ model. Currently implemented scenarios are: piecewise-constant signal, piecewise-constant signal with a heavy-tailed noise, piecewise-linear signal, piecewise-quadratic signal, piecewise-constant signal and with piecewise-constant variance of the noise.
noteMD Print Text from ‘Shiny’ User Interface (Support Markdown Syntax) to Pdf or ‘Word’ Report
When building a ‘shiny’ app to generate reports (pdf or ‘word’), we can insert a comment box in front-end side for user to write down them notes and use this package to document those notes in reports.
notifier Cross Platform Desktop Notifications
Send desktop notifications from R, on macOS, Windows and Linux.
novelist NOVEL Integration of the Sample and Thresholded (NOVELIST) Correlation and Covariance Estimators
Estimate Large correlation and covariance matrices and their inverses using integration of the sample and thresholded correlation and covariance estimators.
nowcasting Nowcasting Analysis and Create Real-Time Data Basis
Methods and tools for ‘forecast’ the current state (nowcast) of Brazilian economic time series. It allows extract information in real time, creating a real time data base; estimate relationship between macroeconomic variables via estimation of dynamic factors; forecast time series in previous periods of reference; forecast time series in the current period of reference (nowcasting); recreate a data base simulating the information available in the past for evaluating forecasting models accuracy. The econometric framework we follow is proposed in Giannone et al. (2008) <doi:10.1016/j.jmoneco.2008.05.010>.
nparMD Nonparametric Analysis of Multivariate Data in Factorial Designs
Analysis of multivariate data with two-way completely randomized factorial design. The analysis is based on fully nonparametric, rank-based methods and uses test statistics based on the Dempster’s ANOVA, Wilk’s Lambda, Lawley-Hotelling and Bartlett-Nanda-Pillai criteria. The multivariate response is allowed to be ordinal, quantitative, binary or a mixture of the different variable types. The package offers two functions performing the analysis, one for small and the other for large sample sizes. The underlying methodology is largely described in Bathke and Harrar (2016) <doi:10.1007/978-3-319-39065-9_7> and in Munzel and Brunner (2000) <doi:10.1016/S0378-3758(99)00212-8>.
nparsurv Nonparametric Tests for Main Effects, Simple Effects and Interaction Effect in a Factorial Design with Censored Data
Nonparametric Tests for Main Effects, Simple Effects and Interaction Effect with Censored Data and Two Factorial Influencing Variables.
NPBayesImpute Non-Parametric Bayesian Multiple Imputation for Categorical Data
These routines create multiple imputations of missing at random categorical data, with or without structural zeros. Imputations are based on Dirichlet process mixtures of multinomial distributions, which is a non-parametric Bayesian modeling approach that allows for flexible joint modeling.
npcopTest Non Parametric Test for Detecting Changes in the Copula
A non parametric test for change points detection in the dependence between the components of multivariate data, with or without (multiple) changes in the marginal distributions.
npcure Nonparametric Estimation in Mixture Cure Models
Performs nonparametric estimation in mixture cure models, and significance tests for the cure probability. For details, see López-Cheda et al. (2017a) <doi:10.1016/j.csda.2016.08.002> and López-Cheda et al. (2017b) <doi:10.1007/s11749-016-0515-1>.
NPflow Bayesian Nonparametrics for Automatic Gating of Flow-Cytometry Data
Dirichlet process mixture of multivariate normal, skew-normal or skew t-distributions modeling oriented towards flow-cytometry data preprocessing applications.
npmlda Nonparametric Models for Longitudinal Data
Support the book: Wu CO and Tian X (2018). Nonparametric Models for Longitudinal Data. Chapman & Hall/CRC (to appear); and provide fit for using global and local smoothing methods for the conditional-mean and conditional-distribution based models with longitudinal Data.
npmr Nuclear Penalized Multinomial Regression
Fit multinomial logistic regression with a penalty on the nuclear norm of the estimated regression coefficient matrix, using proximal gradient descent.
npphen Vegetation Phenological Cycle and Anomaly Detection using Remote Sensing Data
Calculates phenological cycle and anomalies using a non-parametric approach applied to time series of vegetation indices derived from remote sensing data or field measurements. The package implements basic and high-level functions for manipulating vector data (numerical series) and raster data (satellite derived products). Processing of very large raster files is supported.
npregfast Nonparametric Estimation of Regression Models with Factor-by-Curve Interactions
A method for obtain nonparametric estimates of regression models with or without factor-by-curve interactions using local polynomial kernel smoothers. Additionally, a parametric model (allometric model) can be estimated.
nprobust Robust Data-Driven Statistical Inference for Local Polynomial Regression and Kernel Density Estimation
Tools for data-driven analytical statistical inference for Local Polynomial Regression estimators and Kernel Density Estimation.
nproc Neyman-Pearson Receiver Operator Curve
Given a sample of class 0 and class 1 and a classification method, the package generates the corresponding Neyman-Pearson classifier with a pre-specified type-I error control and Neyman-Pearson Receiver Operator Curve.
npROCRegression Kernel-Based Nonparametric ROC Regression Modelling
Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.
nprotreg Nonparametric Rotations for Sphere-Sphere Regression
Fits sphere-sphere regression models by estimating locally weighted rotations. Simulation of sphere-sphere data according to non-rigid rotation models. Provides methods for bias reduction applying iterative procedures within a Newton-Raphson learning scheme. Cross-validation is exploited to select smoothing parameters. See Marco Di Marzio, Agnese Panzera & Charles C. Taylor (2018) <doi:10.1080/01621459.2017.1421542>.
npsf Nonparametric and Stochastic Efficiency and Productivity Analysis
Provides a variety of tools for nonparametric and parametric efficiency measurement.
npsr Validate Instrumental Variables using NPS
An R implementation of the Necessary and Probably Sufficient (NPS) test for finding valid instrumental variables, as suggested by Amit Sharma (2016, Working Paper) <http://…sary_probably_sufficient_iv_test.pdf>. The NPS test, compares the likelihood that a given set of observational data of the three variables Z, X and Y is generated by a valid instrumental variable model (Z -> X -> Y) to the likelihood that the data is generated by an invalid IV model.
npsurv Non-Parametric Survival Analysis
Contains functions for non-parametric survival analysis of exact and interval-censored observations.
nptest Nonparametric Tests
Robust permutation tests for location, correlation, and regression problems, as described in Helwig (2019) <doi:10.1002/wics.1457>. Univariate and multivariate tests are supported. For each problem, exact tests and Monte Carlo approximations are available. Parallel computing is implemented via the ‘parallel’ package.
NRejections Metrics for Multiple Testing with Correlated Outcomes
Implements methods in Mathur and VanderWeele (in preparation) to characterize global evidence strength across W correlated ordinary least squares (OLS) hypothesis tests. Specifically, uses resampling to estimate a null interval for the total number of rejections in, for example, 95% of samples generated with no associations (the global null), the excess hits (the difference between the observed number of rejections and the upper limit of the null interval), and a test of the global null based on the number of rejections.
nricens NRI for Risk Prediction Models with Time to Event and Binary Response Data
Calculating the net reclassification improvement (NRI) for risk prediction models with time to event and binary data.
nse Numerical Standard Errors Computation in R
Offers multiple ways to calculate numerical standard error (NSE) of univariate time series.
nseval A Clean API for Lazy and Non-Standard Evaluation
Facilities to capture, inspect, manipulate, and create lazy values (promises), ‘…’ lists, and active calls.
nsgp Non-Stationary Gaussian Process Regression
A Gaussian process regression using a Gaussian kernel for both one-sample and two-sample cases. Includes non-stationary Gaussian kernel (exponential decay function) and several likelihood ratio tests for differential testing along target points.
nspmix Nonparametric and Semiparametric Mixture Estimation
Contains functions for maximum likelihood estimation of nonparametric and semiparametric mixture models.
nsROC Non-Standard ROC Curve Analysis
Tools for estimating Receiver Operating Characteristic (ROC) curves, building confidence bands, comparing several curves both for dependent and independent data, estimating the cumulative-dynamic ROC curve in presence of censored data, and performing meta-analysis studies, among others.
NSUM Network Scale Up Method
A Bayesian framework for population group size estimation using the Network Scale Up Method (NSUM). Size estimates are based on a random degree model and include options to adjust for barrier and transmission effects.
NTS Nonlinear Time Series Analysis
Simulation, estimation, prediction procedure, and model identification methods for nonlinear time series analysis, including threshold autoregressive models, Markov-switching models, convolutional functional autoregressive models, nonlinearity tests, Kalman filters and various sequential Monte Carlo methods. More examples and details about this package can be found in the book ‘Nonlinear Time Series Analysis’ by Ruey S. Tsay and Rong Chen, Wiley, 2018 (ISBN: 978-1-119-26407-1).
Numero Statistical Framework to Define Subgroups in Complex Datasets
High-dimensional datasets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their datasets based on visual cues [Makinen V-P et al. (2011) J Proteome Res 11:1782-1790, <doi:10.1021/pr201036j>]. The framework includes the necessary functions to import large data files, to construct a self-organizing map of the data, to evaluate the statistical significance of the observed data patterns, and to visualize the results in scalable vector graphics.
numform Tools to Format Numbers for Publication
Format numbers for publication; includes the removal of leading zeros, standardization of number of digits, addition of affixes, and a p-value formatter. These tools combine the functionality of several ‘base’ functions such as paste(), format(), and sprintf() into specific use case functions that are named in a way that is consistent with usage, making their names easy to remember and easy to deploy.
numGen Number Series Generator
A number series generator that creates number series items based on cognitive models.
numKM Create a Kaplan-Meier Plot with Numbers at Risk
To add the table of numbers at risk below the Kaplan-Meier plot.
nutriNetwork Structure Learning with Copula Graphical Model
Statistical tool for learning the structure of direct associations among variables for continuous data, discrete data and mixed discrete-continuous data. The package is based on the copula graphical model in Behrouzi and Wit (2017) <doi:10.1111/rssc.12287>.
nVennR Create n-Dimensional, Quasi-Proportional Venn Diagrams
Provides an interface for the nVenn algorithm (Perez-Silva et al. 2018) <DOI:10.1093/bioinformatics/bty109>. This algorithm works for any number of sets, and usually yields pleasing and informative Venn diagrams with proportionality information. However, representing more than six sets takes a long time and is hard to interpret, unless many of the regions are empty. If you cannot make sense of the result, you may want to consider ‘UpSetR’ <https://…/README.html>.
nvmix Multivariate Normal Variance Mixtures (Including Student’s t Distribution for Non-Integer Degrees of Freedom)
Functions for working with multivariate normal variance mixture distributions including evaluating their distribution functions, densities and random number generation.
nzilbb.labbcat Accessing Data Stored in ‘LaBB-CAT’ Instances
LaBB-CAT’ is a web-based language corpus management system developed by the New Zealand Institute of Language, Brain and Behaviour (NZILBB) – see <https://labbcat.canterbury.ac.nz>. This package defines functions for accessing corpus data in a ‘LaBB-CAT’ instance. For more information about ‘LaBB-CAT’, see Robert Fromont and Jennifer Hay (2008) <doi:10.3366/E1749503208000142> or Robert Fromont (2017) <doi:10.1016/j.csl.2017.01.004>.

O

oaColors OpenAnalytics Colors Package
Provides carefully chosen color palettes as used a.o. at OpenAnalytics <http://www.openanalytics.eu>.
oak Trees Creation and Manipulation
Functions and classes to create and manipulate trees and nodes.
oapackage Orthogonal Array Package
Interface to D-optimal design generation code of the Orthogonal Array package. Can generate D-optimal designs with specified number of runs and factors. The optimality of the designs is defined in terms of a user specified optimization function based on the D-efficiency and Ds-efficiency.
oaPlots OpenAnalytics Plots Package
Offers a suite of functions for enhancing R plots.
oaqc Computation of the Orbit-Aware Quad Census
Implements the efficient algorithm by Ortmann and Brandes (2017) <doi:10.1007/s41109-017-0027-2> to compute the orbit-aware frequency distribution of induced and non-induced quads, i.e. subgraphs of size four. Given an edge matrix, data frame, or a graph object (e.g., ‘igraph’), the orbit-aware counts are computed respective each of the edges and nodes.
oaxaca Blinder-Oaxaca Decomposition
An implementation of the Blinder-Oaxaca decomposition for linear regression models.
objectremover RStudio’ Addin for Removing Objects from the Global Environment Based on Patterns and Object Type
An ‘RStudio’ addin to assist with removing objects from the global environment. Features include removing objects according to name patterns and object type. During the course of an analysis, temporary objects are often created and this tool assists with removing them quickly. This can be useful when memory management within ‘R’ is important.
obliqueRSF Oblique Random Forests for Right-Censored Time-to-Event Data
Oblique random survival forests incorporate linear combinations of input variables into random survival forests (Ishwaran, 2008 <DOI:10.1214/08-AOAS169>). Regularized Cox proportional hazard models (Simon, 2016 <DOI:10.18637/jss.v039.i05>) are used to identify optimal linear combinations of input variables.
OBMbpkg Estimate the Population Size for the Mb Capture-Recapture Model
Applies an objective Bayesian method to the Mb capture-recapture model to estimate the population size N. The Mb model is a class of capture-recapture methods used to account for variations in capture probability due to animal behavior. Under the Mb formulation, the initial capture of an animal may effect the probability of subsequent captures due to their becoming ‘trap happy’ or ‘trap shy.’
OBRE Optimal B-Robust Estimator Tools
An implementation for computing Optimal B-Robust Estimators (OBRE) of two parameters distributions. The procedure is composed by some equations that are evaluated alternatively until the solution is reached. Some tools for analyzing the estimates are included. The most relevant is OBRE covariance matrix computation using a closed formula.
Observation Collect and Process Physical Activity Direct Observation Data
Two-part system for first collecting then managing direct observation data, as described by Hibbing PR, Ellingson LD, Dixon PM, & Welk GJ (2018) <doi:10.1249/MSS.0000000000001486>.
observer Observe and Check your Data
Checks that a given dataset passes user-specified rules. The main functions are observe_if() and inspect().
OBsMD Objective Bayesian Model Discrimination in Follow-Up Designs
Implements the objective Bayesian methodology proposed in Consonni and Deldossi in order to choose the optimal experiment that better discriminate between competing models.
oc Optimal Classification Roll Call Analysis Software
Estimates optimal classification (Poole 2000) <doi:10.1093/oxfordjournals.pan.a029814> scores from roll call votes supplied though a ‘rollcall’ object from package ‘pscl’.
OCA Optimal Capital Allocations
Computes optimal capital allocations based on some standard principles such as Haircut, Overbeck type II and the Covariance Allocation Principle. It also provides some shortcuts for obtaining the Value at Risk and the Expectation Shortfall, using both the normal and the t-student distribution, see Urbina and Guillén (2014)<doi:10.1016/j.eswa.2014.05.017> and Urbina (2013)<http://…/19443>.
oceanis Cartography for Statistical Analysis
Creating maps for statistical analysis such as proportional circles, chroropleth, typology and flows. Some functions use ‘shiny’ or ‘leaflet’ technologies for dynamism and interactivity. The great features are : – Create maps in a web environment where the parameters are modifiable on the fly (‘shiny’ and ‘leaflet’ technology). – Create interactive maps through zoom and pop-up (‘leaflet’ technology). – Create frozen maps with the possibility to add labels.
ockc Order Constrained Solutions in k-Means Clustering
Extends ‘flexclust’ with an R implementation of order constrained solutions in k-means clustering (Steinley and Hubert, 2008, <doi:10.1007/s11336-008-9058-z>).
ocomposition Regression for Rank-Indexed Compositional Data
Regression model where the response variable is a rank-indexed compositional vector (non-negative values that sum up to one and are ordered from the largest to the smallest). Parameters are estimated in the Bayesian framework using MCMC methods.
ocp Bayesian Online Changepoint Detection
Implements the Bayesian online changepoint detection method by Adams and MacKay (2007) <arXiv:0710.3742> for univariate or multivariate data. Gaussian and Poisson probability models are implemented. Provides post-processing functions with alternative ways to extract changepoints.
OData R Helper for OData Web Services
Helper methods for accessing data from web service based on OData Protocol. It provides several helper methods to access the service metadata, the data from datasets and to download some file resources (it only support CSV for now). For more information about OData go to http://…/.
odbc Connect to ODBC Compatible Databases (using the DBI Interface)
A DBI-compatible interface to ODBC databases.
odds.n.ends Odds Ratios, Contingency Table, and Model Significance from a Generalized Linear Model Object
Computes odds ratios and 95% confidence intervals from a generalized linear model object. It also computes model significance with the chi-squared statistic and p-value and it computes model fit using a contingency table to determine the percent of observations for which the model correctly predicts the value of the outcome. Calculates model sensitivity and specificity.
oddsratio Odds Ratio Calculation for GAM & GLM
Simplified odds ratio calculation of GAM(M)s & GLM(M)s. Provides structured output (data frame) of all predictors and their corresponding odds ratios for further analyses. It helps to avoid false references of predictors and increments by specifying these parameters in a list instead of using ‘exp(coef(model))’ (standard approach of odds ratio calculation for GLMs) which just returns a plain numeric output. For GAM(M)s, odds ratio calculation is highly simplified with this package since it takes care of the multiple ‘predict()’ calls of the chosen predictor while holding other predictors constant. Also, this package allows odds ratio calculation of percentage steps across the whole predictor distribution range for GAM(M)s.
ODEnetwork Network of Differential Equations
Simulates a network of ordinary differential equations of order two. The package provides an easy interface to construct networks. In addition you are able to define different external triggers to manipulate the trajectory. The method is described by Surmann, Ligges, and Wheis (2014) <doi:10.1109/ENERGYCON.2014.6850482>.
odk Convert ‘ODK’ or ‘XLSForm’ to ‘SPSS’ Data Frame
After develop a ‘ODK’ <https://…/> frame, we can link the frame to ‘Google Sheets’ <https://…/> and collect data through ‘Android’ <https://…/>. This data uploaded to a ‘Google sheets’. odk2spss() function help to convert the ‘odk’ frame into ‘SPSS’ <https://…/> frame. Also able to add downloaded ‘Google sheets’ data or read data from ‘Google sheets’ by using ‘ODK’ frame ‘submission_url’.
odkr Open Data Kit’ (‘ODK’) R API
Utility functions for working with datasets gathered using ‘Open Data Kit’ (‘ODK’) <https://…/>. These include an API to interface with ‘ODK Briefcase’, a ‘Java’ application for fetching and pushing ‘ODK’ forms and their contents, that allows pulling of data from either a remote ‘ODK Aggregate Server’ or a local ‘ODK’ folder, a rename function to give more human readable variable names for ‘ODK’ datasets, a merge function to create a single dataframe from a nested ‘ODK’ dataset and an expand function to disaggregate multiple choice answers that have been collapsed into single code by ‘ODK’.
odpc One-Sided Dynamic Principal Components
Functions to compute the one-sided dynamic principal components (‘odpc’) introduced in Smucler, Peña and Yohai (2017) <arXiv:1708.04705>. ‘odpc’ is a novel dimension reduction technique for multivariate time series, that is useful for forecasting. These dynamic principal components are defined as the linear combinations of the present and past values of the series that minimize the reconstruction mean squared error.
odr Optimal Design and Statistical Power of Cost-Efficient Multilevel Randomized Trials
Calculate the optimal sample allocation that minimizes variance of treatment effect in a multilevel randomized trial under fixed budget and cost structure, perform power analyses with and without accommodating costs and budget. The reference for proposed methods is: Shen, Z., & Kelcey, B. (under review). Optimal design of cluster randomized trials under condition- and unit-specific cost structures. 2018 American Educational Research Association (AERA) annual conference.
oec The Observatory of Economic Complexity
Use The Observatory of Economic Complexity’s API from R console to obtain international trade data to create spreadsheets (csv format) and D3Plus visualizations.
officer Manipulation of Microsoft Word and PowerPoint Documents
Manipulate ‘Microsoft Word’ and ‘Microsoft PowerPoint’ documents from R. The package focus on tabular and graphical reporting from R. A set of functions lets add and remove images, tables and paragraphs of text in new or existing documents. When working with ‘PowerPoint’ presentations, slides can be added or removed; shapes inside slides can also be added or removed. When working with ‘Word’ documents, a cursor can be used to help insert or delete content at a specific location in the document. The package does not require any installation of Microsoft product to be able to write Microsoft files.
OGI Objective General Index
Consider a data matrix of n individuals with p variates. The objective general index (OGI) is a general index that combines the p variates into a univariate index in order to rank the n individuals. The OGI is always positively correlated with each of the variates. More details can be found in Sei (2016) <doi:10.1016/j.jmva.2016.02.005>.
oglmx Estimation of Ordered Generalized Linear Models
Ordered models such as ordered probit and ordered logit presume that the error variance is constant across observations. In the case that this assumption does not hold estimates of marginal effects are typically biased. This package allows for generalization of ordered probit and ordered logit models by allowing the user to specify a model for the variance. Furthermore, the package includes functions to calculate the marginal effects.
Ohit OGA+HDIC+Trim and High-Dimensional Linear Regression Models
Ing and Lai (2011) <doi:10.5705/ss.2010.081> proposed a high-dimensional model selection procedure that comprises three steps: orthogonal greedy algorithm (OGA), high-dimensional information criterion (HDIC), and Trim. The first two steps, OGA and HDIC, are used to sequentially select input variables and determine stopping rules, respectively. The third step, Trim, is used to delete irrelevant variables remaining in the second step. This package aims at fitting a high-dimensional linear regression model via OGA+HDIC+Trim.
OHPL Ordered Homogeneity Pursuit Lasso for Group Variable Selection
Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection proposed in Lin et al. (2017) <DOI:10.1016/j.chemolab.2017.07.004>. The OHPL method takes the homogeneity structure in high-dimensional data into account and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.
ohtadstats Tomoka Ohta D Statistics
Calculate’s Tomoka Ohta’s partitioning of linkage disequilibrium, deemed D-statistics, for pairs of loci. Beissinger et al. (2016) <doi:10.1038/hdy.2015.81>.
ojUtils A Collection of Utility Functions
This is a collection of utility functions. Currently, it provides alternatives to base ifelse() and base combn() functions utilizing ‘Rcpp’, and providing a significant speedup compared to base.
olctools Open Location Code Handling in R
Open Location Codes’ (http://openlocationcode.com ) are a Google-created standard for identifying geographic locations. olctools provides utilities for validating, encoding and decoding entries that follow this standard.
olsrr Tools for Teaching and Learning OLS Regression
Tools for teaching and learning ordinary least squares regression. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.
OmicsPLS Perform Two-Way Orthogonal Partial Least Squares
Performs the O2PLS data integration method for two datasets yielding joint and data-specific parts for each dataset. The algorithm automatically switches to a memory-efficient approach to fit O2PLS to high dimensional data. It provides a rigorous and a faster alternative cross-validation method to select the number of components, as well as functions to report proportions of explained variation and to construct plots of your results. See Trygg and Wold (2003) <doi:10.1002/cem.775> and el Bouhaddani et al (2016) <doi:10.1186/s12859-015-0854-z>.
ompr Model and Solve Mixed Integer Linear Programs
Model mixed integer linear programs in an algebraic way directly in R. The model is solver-independent and thus offers the possibility to solve a model with different solvers. It currently only supports linear constraints and objective functions. See the ‘ompr’ website <https://…/ompr> for more information, documentation and examples.
ompr.roi A Solver for ‘ompr’ that Uses the R Optimization Infrastructure (‘ROI’)
A solver for ‘ompr’ based on the R Optimization Infrastructure (‘ROI’). The package makes all solvers in ‘ROI’ available to solve ‘ompr’ models. Please see the ‘ompr’ website <https://…/ompr> and package docs for more information and examples on how to use it.
omu A Metabolomics Analysis Tool for Intuitive Figures and Convenient Metadata Collection
Facilitates the creation of intuitive figures to describe metabolomics data by utilizing Kyoto Encyclopedia of Genes and Genomes (KEGG) hierarchy data, and gathers functional orthology and gene data using the package ‘KEGGREST’ to access the ‘KEGG’ API.
onehot Fast Onehot Encoding for Data.frames
Quickly create numeric matrices for machine learning algorithms that require them. It converts factor columns into onehot vectors.
onlineCPD Detect Changepoints in Multivariate Time Series
Detects multiple changepoints in uni- or multivariate time series data. The algorithm is based on Bayesian methods and detects changes on-line; i.e. the model updates with every observation rather than relying on retrospective segmentation. However, the user may choose to use the algorithm off- or on-line.
onlinePCA Online Principal Component Analysis
Online PCA for multivariate and functional data using perturbation, incremental, and stochastic gradient methods.
onlineVAR Online Fitting of Time-Adaptive Lasso Vector Auto Regression
Functions for recursive online fitting of time-adaptive lasso vector auto regression. A recursive coordinate descent algorithm is used to estimate sparse vector auto regressive models and exponential forgetting is applied to allow model changes. Details can be found in Jakob W. Messner and Pierre Pinson (2018). ‘Online adaptive LASSO estimation in Vector Auto Regressive models for wind power forecasting in high dimension’. International Journal of Forecasting, in press. Preprint: <http://…/MessnerPinson18.pdf>.
onls Orthogonal Nonlinear Least-Squares Regression
Orthogonal Nonlinear Least-Squares Regression using Levenberg-Marquardt minimization.
onnx R Interface to ‘ONNX’
R Interface to ‘ONNX’ – Open Neural Network Exchange <https://onnx.ai/>. ‘ONNX’ provides an open source format for machine learning models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.
ontologyIndex Functions for Reading Ontologies into R
Functions for reading ontologies into R as lists and handling sets of ontological terms.
ontologyPlot Functions for Visualising Sets of Ontological Terms
Functions for visualising sets of ontological terms using the graphviz layout system.
ontologySimilarity Functions for Calculating Ontological Similarities
Functions for calculating semantic similarities between ontological terms or sets of ontological terms based on term information content and assessing statistical significance of similarity in the context of a collection of sets of ontological terms.
OOBCurve Out of Bag Learning Curve
Provides a function to calculate the out-of-bag learning curve for random forests for any measure that is available in the ‘mlr’ package. Supported random forest packages are ‘randomForest’ and ‘ranger’ and trained models of these packages with the train function of ‘mlr’.
oompaBase Class Unions, Matrix Operations, and Color Schemes for OOMPA
Provides the class unions that must be preloaded in order for the basic tools in the OOMPA (Object-Oriented Microarray and Proteomics Analysis) project to be defined and loaded. It also includes vectorized operations for row-by-row means, variances, and t-tests. Finally, it provides new color schemes. Details on the packages in the OOMPA project can be found at <http://…/>.
oompaData Data to Illustrate OOMPA Algorithms
This is a data-only package to provide example data for other packages that are part of the ‘Object-Oriented Microrray and Proteomics Analysis’ suite of packages. These are described in more detail at the package URL.
OOR Optimistic Optimization in R
Implementation of optimistic optimization methods for global optimization of deterministic or stochastic functions. The algorithms feature guarantees of the convergence to a global optimum. They require minimal assumptions on the (only local) smoothness, where the smoothness parameter does not need to be known. They are expected to be useful for the most difficult functions when we have no information on smoothness and the gradients are unknown or do not exist. Due to the weak assumptions, however, they can be mostly effective only in small dimensions, for example, for hyperparameter tuning.
openadds Client to Access ‘Openaddresses.io’ Data
Openaddresses’ (<http://…/> ) client. Search, fetch data, and combine ‘datasets’. Outputs are easy to visualize with base plots, ‘ggplot2’, or ‘leaflet’.
opencage Interface to the OpenCage API
Tool for accessing the OpenCage API, which provides forward geocoding (from placename to longitude and latitude) and reverse geocoding (from longitude and latitude to placename).
openCR Open Population Capture-Recapture
Functions for the analysis of capture-recapture data from animal populations subject to turnover. The models extend Schwarz and Arnason (1996) <DOI:10.2307/2533048> and Borchers and Efford (2008) <DOI:10.1111/j.1541-0420.2007.00927.x>, and may be non-spatial or spatial. The parameterisation of recruitment is flexible (options include population growth rate and per capita recruitment). Spatially explicit analyses may assume home-range centres are fixed or allow dispersal between sampling sessions.
opencv Bindings to ‘OpenCV’ Computer Vision Library
Experimenting with computer vision and machine learning in R. This package exposes some of the available ‘OpenCV’ vision algorithms, such as edge, body or face detection. These can either be applied to analyze static images, or to filter live video footage from a camera device.
openEBGM EBGM Scores for Mining Large Contingency Tables
An implementation of DuMouchel’s (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the ‘PhViD’ package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency.
OpenImageR An Image Processing Toolkit
Incorporates functions for image preprocessing, filtering and image recognition. The package takes advantage of ‘RcppArmadillo’ to speed up computationally intensive functions. The histogram of oriented gradients descriptor is a modification of the ‘findHOGFeatures’ function of the ‘SimpleCV’ computer vision platform and the average_hash(), dhash() and phash() functions are based on the ‘ImageHash’ python library.
OpenMx Advanced Structural Equation Modeling
OpenMx is free and open source software for use with R that allows estimation of a wide variety of advanced multivariate statistical models. OpenMx consists of a library of functions and optimizers that allow you to quickly and flexibly define an SEM model and estimate parameters given observed data.
openNLP Apache OpenNLP Tools Interface
An interface to the Apache OpenNLP tools (version 1.5.3). The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. See http://opennlp.apache.org for more information.
OpenRepGrid Tools to Analyze Repertory Grid Data
Analyze repertory grids, a qualitative-quantitative data collection technique devised by George A. Kelly in the 1950s. Today, grids are used across various domains ranging from clinical psychology to marketing. The package contains functions to quantitatively analyze and visualize repertory grid data (see e.g. Bell, 2005, <doi:10.1002/0470013370.ch9>; Fransella, Bell, & Bannister, 2004, ISBN: 978-0-470-09080-0).
opensensmapr Client for the Data API of openSenseMap.org
Download environmental measurements and sensor station metadata from the API of open data sensor web platform <https://opensensemap.org> for analysis in R. This platform provides real time data of more than 1500 low-cost sensor stations for PM10, PM2.5, temperature, humidity, UV-A intensity and more phenomena. The package aims to be compatible with ‘sf’ and the ‘Tidyverse’, and provides several helper functions for data exploration and transformation.
openSTARS An Open Source Implementation of the ‘ArcGIS’ Toolbox ‘STARS’
An open source implementation of the ‘STARS’ toolbox (Peterson & Ver Hoef, 2014, <doi:10.18637/jss.v056.i02>) using ‘R’ and ‘GRASS GIS’. It prepares the *.ssn object needed for the ‘SSN’ package. A Digital Elevation Model (DEM) is used to derive stream networks (in contrast to ‘STARS’ that can clean an existing stream network).
openVA Automated Method for Verbal Autopsy
Implements multiple existing open-source algorithms for coding cause of death from verbal autopsies. It also provides tools for data manipulation tasks commonly used in Verbal Autopsy analysis and implements easy graphical visualization of individual and population level statistics.
openxlsx Read, Write and Edit XLSX Files
Simplifies the creation of .xlsx files by providing a high level interface to writing, styling and editing worksheets. Through the use of Rcpp, read/write times are comparable to the xlsx and XLConnect packages with the added benefit of removing the dependency on Java.
opera Online Prediction by Expert Aggregation
Misc methods to form online predictions, for regression-oriented time-series, by combining a finite set of forecasts provided by the user.
Opportunistic Transmissions and Receptions in an End to End Network
Computes the expectation of the number of broadcasts, transmissions and receptions considering an Opportunistic transport model. It provides theoretical results and also estimated values based on Monte Carlo simulations.
oppr Optimal Project Prioritization
A decision support tool for prioritizing conservation projects. Prioritizations can be developed by maximizing expected feature richness, expected phylogenetic diversity, the number of features that meet persistence targets, or identifying a set of projects that meet persistence targets for minimal cost. Constraints (e.g. lock in specific actions) and feature weights can also be specified to further customize prioritizations. After defining a project prioritization problem, solutions can be obtained using exact algorithms, heuristic algorithms, or random processes. In particular, it is recommended to install the ‘Gurobi’ optimizer (available from <https://www.gurobi.com> ) because it can identify optimal solutions very quickly. Finally, methods are provided for comparing different prioritizations and evaluating their benefits.
optband surv’ Object Confidence Bands Optimized by Area
Given a certain coverage level, obtains simultaneous confidence bands for the survival and cumulative hazard functions such that the area between is minimized. Produces an approximate solution based on local time arguments.
optCluster Determine Optimal Clustering Algorithm and Number of Clusters
Cluster analysis using statistical and biological validation measures for both transformed and count data.
optDesignSlopeInt Optimal Designs for Estimating the Slope Divided by the Intercept
Software which helps practitioners optimally design experiments that measure the slope divided by the intercept (originally invented for HSGC measurements) and provides confidence intervals. Other useful tools herein.
optifunset Set Options if Unset
A single function ‘options.ifunset(…)’ is contained herewith, which allows the user to set a global option ONLY if it is not already set. By this token, for package maintainers this function can be used in preference to the standard ‘options(…)’ function, making provision for THEIR end user to place ‘options(…)’ directives within their ‘.Rprofile’ file, which will not be overridden at the point when a package is loaded.
optigrab Command-Line Parsing for an R World
Parse options from the command-line using a simple, clean syntax. It requires little or no specification and supports short and long options, GNU-, Java- or Microsoft- style syntaxes, verb commands and more.
optim.functions Standard Benchmark Optimization Functions
A set of standard benchmark optimization functions for R and a common interface to sample them.
OptimalDesign Optimal Design
Algorithms for D-, A- and IV-optimal designs of experiments. Some of the functions in this package require the ‘gurobi’ software and its accompanying R package. For their installation, please follow the instructions at <www.gurobi.com> and <http://…/documentation>, respectively.
OptimalTiming Optimal Timing Identification
Identify the optimal timing for new treatment initiation during multiple state disease transition, including multistate model fitting, simulation of mean residual lifetime for a given transition state, and estimation of confidence interval. The method is referred to de Wreede, L., Fiocco, M., & Putter, H. (2011) <doi:10.18637/jss.v038.i07>.
OptimaRegion Confidence Regions for Optima
Computes confidence regions on the location of response surface optima.
OptimClassifier Create the Best Train for Classification Models
Patterns searching and binary classification in economic and financial data is a large field of research. There are a large part of the data that the target variable is binary. Nowadays, many methodologies are used, this package collects most popular and compare different configuration options for Linear Models (LM), Generalized Linear Models (GLM), Linear Mixed Models (LMM), Discriminant Analysis (DA), Classification And Regression Trees (CART), Neural Networks (NN) and Support Vector Machines (SVM).
optimization Multi-Purpose Optimization
Flexible multi-purpose optimizer with numerous input specifications. It allows a very detailed parameterization and is therefore useful for specific and complex loss functions, like functions with discrete parameter space. Visualization tools for validation and analysis of the convergence are also included.
optimParallel Parallel Versions of the Gradient-Based optim() Methods
Provides parallel versions of the gradient-based optim() methods. The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce the optimization time.
optimr A Replacement and Extension of the ‘optim’ Function
Provides a test of replacement and extension of the optim() function to unify and streamline optimization capabilities in R for smooth, possibly box constrained functions of several or many parameters. This version has a reduced set of methods and is intended to be on CRAN.
optimStrat Choosing the Sample Strategy
A package intended to assist in the choice of the sample strategy to implement in a survey. It compares five strategies having into account the information available in an auxiliary variable and two superpopulation models, called working and true models.
optimus Model Based Diagnostics for Multivariate Cluster Analysis
Assessment and diagnostics for comparing competing clustering solutions, using predictive models. The main intended use is for comparing clustering/classification solutions of ecological data (e.g. presence/absence, counts, ordinal scores) to 1) find an optimal partitioning solution, 2) identify characteristic species and 3) refine a classification by merging clusters that increase predictive performance. However, in a more general sense, this package can do the above for any set of clustering solutions for i observations of j variables.
optional Optional Types and Pattern Matching
Introduces some() and none, as well as match_with() from functional languages.
optionstrat Utilizes the Black-Scholes Option Pricing Model to Perform Strategic Option Analysis and Plot Option Strategies
Utilizes the Black-Scholes-Merton option pricing model to calculate key option analytics and graphical analysis of various option strategies. Provides functions to calculate the option premium and option greeks of European-style options.
optiSolve Linear, Quadratic, and Rational Optimization
Solver for linear, quadratic, and rational programs with linear, quadratic, and rational constraints. A unified interface to different R packages is provided. Optimization problems are transformed into equivalent formulations and solved by the respective package. For example, quadratic programming problems with linear, quadratic and rational constraints can be solved by augmented Lagrangian minimization using package ‘alabama’, or by sequential quadratic programming using solver ‘slsqp’. Alternatively, they can be reformulated as optimization problems with second order cone constraints and solved with package ‘cccp’, or transformed into semidefinite programming problems and solved using solver ‘csdp’.
optmatch Functions for optimal matching
Functions for optimal matching, including full matching
optrdd Optimized Regression Discontinuity Designs
Optimized inference in regression discontinuity designs, as proposed by Imbens and Wager (2017) <arXiv:1705.01677>.
OptSig Optimal Level of Significance for Regression and Other Statistical Tests
Calculates the optimal level of significance based on a decision-theoretic approach. The optimal level is chosen so that the expected loss from hypothesis testing is minimized. A range of statistical tests are covered, including the test for the population mean, population proportion, and a linear restriction in a multiple regression model. The details are covered in Kim, Jae H. and Choi, In, Choosing the Level of Significance: A Decision-Theoretic Approach (December 18, 2017), available at SSRN: <https://ssrn.com/abstract=2652773> or <doi:10.2139/ssrn.2652773>. See also Kim and Ji (2015) <doi:10.1016/j.jempfin.2015.08.006>.
optweight Targeted Stable Balancing Weights Using Optimization
Use optimization to estimate weights that balance covariates for binary, multinomial, continuous, and longitudinal treatments in the spirit of Zubizarreta (2015) <doi:10.1080/01621459.2015.1023805>. The degree of balance can be specified for each covariate.
opusminer OPUS Miner Algorithm for Filtered Top-k Association Discovery
Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the top-k productive, non-redundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of self-sufficient itemsets, using either leverage or lift. See <http://…/> for more information in relation to the OPUS Miner algorithm.
OpVaR Statistical Methods for Modeling Operational Risk
Functions for modeling operational (value-at-)risk. The implementation comprises functions for modeling loss frequencies and loss severities with plain, mixed (Frigessi et al. (2012) <doi:10.1023/A:1024072610684>) or spliced distributions using Maximum Likelihood estimation and Bayesian approaches (Ergashev et al. (2013) <doi:10.21314/JOP.2013.131>). In particular, the parametrization of tail distributions includes fitting of Tukey-type distributions (Kuo and Headrick (2014) <doi:10.1155/2014/645823>). Furthermore, the package contains the modeling of bivariate dependencies between loss severities and frequencies, Monte Carlo simulation for total loss estimation as well as a closed-form approximation based on Degen (2010) <doi:10.21314/JOP.2010.084> to determine the value-at-risk.
ordDisp Separating Location and Dispersion in Ordinal Regression Models
Estimate location-shift models or rating-scale models accounting for response styles (RSRS) for the regression analysis of ordinal responses.
orderedLasso Ordered Lasso and Time-lag Sparse Regression
Ordered lasso and time-lag sparse regression. Ordered Lasso fits a linear model and imposes an order constraint on the coefficients. It writes the coefficients as positive and negative parts, and requires positive parts and negative parts are non-increasing and positive. Time-Lag Lasso generalizes the ordered Lasso to a general data matrix with multiple predictors. For more details, see Suo, X.,Tibshirani, R., (2014) ‘An Ordered Lasso and Sparse Time-lagged Regression’.
orders Sampling from Order Statistics of New Families of Distributions
Set of tools to generate samples of order statistics from new families of distributions. The main references for this package are: Gentle, J. (2009), Computational Statistics, Springer-Verlag and Naradajah, S. and Rocha, R. (2016), Newdistns: An R Package for New Families of Distributions, Journal of Statistical Software. The families of distributions are: Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncated-exponential skew-symmetric G distributions, modi???ed beta G distributions, and exponentiated exponential Poisson G distributions.
orderstats Efficiently Generates Random Order Statistic Variables
All the methods in this package generate a vector of uniform order statistics using a beta distribution and use an inverse cumulative distribution function for some distribution to give a vector of random order statistic variables for some distribution. This is much more efficient than using a loop since it is directly sampling from the order statistic distribution.
ordinalClust Ordinal Data Clustering, Co-Clustering and Classification
Ordinal data classification, clustering and co-clustering using model-based approach with the Bos distribution for ordinal data (Christophe Biernacki and Julien Jacques (2016) <doi:10.1007/s11222-015-9585-2>).
ordinalCont Ordinal Regression Analysis for Continuous Scales
A regression framework for response variables which are continuous self-rating scales such as the Visual Analog Scale (VAS) used in pain assessment, or the Linear Analog Self-Assessment (LASA) scales in quality of life studies. These scales measure subjects’ perception of an intangible quantity, and cannot be handled as ratio variables because of their inherent nonlinearity. We treat them as ordinal variables, measured on a continuous scale. A function (the g function, currently the generalized logistic function) connects the scale with an underlying continuous latent variable. The link function is the inverse of the CDF of the assumed underlying distribution of the latent variable. Currently the logit link, which corresponds to a standard logistic distribution, is implemented.
ordinalForest Ordinal Forests: Prediction and Class Width Inference with Ordinal Target Variables
Ordinal forests (OF) are a method for ordinal regression with high-dimensional and low-dimensional data that is able to predict the values of the ordinal target variable for new observations and at the same time estimate the relative widths of the classes of the ordinal target variable. Using a (permutation-based) variable importance measure it is moreover possible to rank the importances of the covariates. OF will be presented in an upcoming technical report by Hornung et al.. The main functions of the package are: ordfor() (construction of OF), predict.ordfor() (prediction of the target variable values of new observations), and plot.ordfor() (visualization of the estimated relative widths of the classes of the ordinal target variable).
ordinalgmifs Ordinal Regression for High-Dimensional Data
Provides a function for fitting cumulative link, adjacent category, forward and backward continuation ratio, and stereotype ordinal response models when the number of parameters exceeds the sample size, using the the generalized monotone incremental forward stagewise method.
ordinalLBM Co-Clustering of Ordinal Data via Latent Continuous Random Variables
It implements functions for simulation and estimation of the ordinal latent block model (OLBM), as described in Corneli, Bouveyron and Latouche (2019).
ordinalNet Penalized Ordinal Regression
Fits ordinal regression models with elastic net penalty. Supported models include cumulative logit, probit, cauchit, and complementary log-log. The algorithm uses Fisher Scoring with Coordinate Descent updates.
ordinalRR Analysis of Repeatability and Reproducibility Studies with Ordinal Measurements
Implements Bayesian data analyses of balanced repeatability and reproducibility studies with ordinal measurements. Model fitting is based on MCMC posterior sampling with ‘rjags’. Function ordinalRR() directly carries out the model fitting, and this function has the flexibility to allow the user to specify key aspects of the model, e.g., fixed versus random effects. Functions for preprocessing data and for the numerical and graphical display of a fitted model are also provided. There are also functions for displaying the model at fixed (user-specified) parameters and for simulating a hypothetical data set at a fixed (user-specified) set of parameters for a random-effects rater population. For additional technical details, refer to Culp, Ryan, Chen, and Hamada (2018) and cite this Technometrics paper when referencing any aspect of this work. The demo of this package reproduces results from the Technometrics paper.
origami Generalized Framework for Cross-Validation
Provides a general framework for the application of cross-validation schemes to particular functions. By allowing arbitrary lists of results, origami accommodates a range of cross-validation applications.
orthoDr An Orthogonality Constrained Optimization Approach for Semi-Parametric Dimension Reduction Problems
Utilize an orthogonality constrained optimization algorithm of Wen & Yin (2013) <DOI:10.1007/s10107-012-0584-1> to solve a variety of dimension reduction problems in the semiparametric framework, such as Ma & Zhu (2013) <DOI:10.1214/12-AOS1072>, and Sun, Zhu, Wang & Zeng (2017) <arXiv:1704.05046>. It also serves as a general purpose optimization solver for problems with orthogonality constraints.
osc Orthodromic Spatial Clustering
Allows distance based spatial clustering of georeferenced data by implementing the City Clustering Algorithm – CCA. Multiple versions allow clustering for matrix, raster and single coordinates on a plain (euclidean distance) or on a sphere (great-circle or orthodromic distance).
OscillatorGenerator Generation of Customizable, Discretized Time Series of Oscillating Species
The supplied code allows for the generation of discrete time series of oscillating species. General shapes can be selected by means of individual functions, which are widely customizable by means of function arguments. All code was developed in the Biological Information Processing Group at the BioQuant Center at Heidelberg University, Germany.
OSCV One-Sided Cross-Validation
Functions for implementing different versions of the OSCV method in the kernel regression and density estimation frameworks. The package mainly supports the following articles: (1) Savchuk, O.Y., Hart, J.D. (2017). Fully robust one-sided cross-validation for regression functions. Computational Statistics, <doi:10.1007/s00180-017-0713-7> and (2) Savchuk, O.Y. (2017). One-sided cross-validation for nonsmooth density functions, <arXiv:1703.05157>.
OSDR Finds an Optimal System of Distinct Representatives
Provides routines for finding an Optimal System of Distinct Representatives (OSDR), as defined by D.Gale (1968) <doi:10.1016/S0021-9800(68)80039-0>.
oshka Recursive Quoted Language Expansion
Expands quoted language by recursively replacing any symbol that points to quoted language with the language it points to. The recursive process continues until only symbols that point to non-language objects remain. The resulting quoted language can then be evaluated normally. This differs from the traditional ‘quote’/’eval’ pattern because it resolves intermediate language objects that would interfere with evaluation.
osi Open Source Initiative API Connector
A connector to the API maintained by the Open Source Initiative <https://…/>, which provides machine-readable metadata about a variety of open source software licenses.
osmdata Import OpenStreetMap Data as Simple Features or Spatial Objects
Download and import of ‘OpenStreetMap’ (‘OSM’) data as ‘sf’ or ‘sp’ objects. ‘OSM’ data are extracted from the ‘Overpass’ web server and processed with very fast ‘C++’ routines for return to ‘R’.
osmplotr Customisable Images of OpenStreetMap Data
Produces customisable images of OpenStreetMap data. Extracts OpenStreetMap data for specified key-value pairs (e.g. key=’building’) using the overpass API. Different OSM objects can be plotted in different colours using the function add_osm_objects(). The function group_osm_objects() enables customised highlighting of selected regions using different graphical schemes designed to contrast with surrounding backgrounds.
OSMscale Add a Scale Bar to ‘OpenStreetMap’ Plots
Functionality to handle and project lat-long coordinates, easily download background maps and add a correct scale bar to ‘OpenStreetMap’ plots in any map projection.
osqp Quadratic Programming Solver using the ‘OSQP’ Library
Provides bindings to the ‘OSQP’ solver. The ‘OSQP’ solver is a numerical optimization package or solving convex quadratic programs written in ‘C’ and based on the alternating direction method of multipliers, ‘ADMM’. B. Stellato, G. Banjac, P. Goulart, A. Bemporad, S. Boyd (2018) <arXiv:1711.08013>.
osrm Interface Between R and the OpenStreetMap-Based Routing Service OSRM
An interface between R and the OSRM API. OSRM is a routing service based on OpenStreetMap data. See <http://…/> for more information. A public API exists but one can run its own instance. This package allows to compute distance (travel time and kilometric distance) between points and travel time matrices.
OSTSC Over Sampling for Time Series Classification
The OSTSC package is a powerful oversampling approach for classifying univariant, but multinomial time series data in R. This article provides a brief overview of the oversampling methodology implemented by the package. A tutorial of the OSTSC package is provided. We begin by providing three test cases for the user to quickly validate the functionality in the package. To demonstrate the performance impact of OSTSC, we then provide two medium size imbalanced time series datasets. Each example applies a TensorFlow implementation of a Long Short-Term Memory (LSTM) classifier – a type of a Recurrent Neural Network (RNN) classifier – to imbalanced time series. The classifier performance is compared with and without oversampling. Finally, larger versions of these two datasets are evaluated to demonstrate the scalability of the package. The examples demonstrate that the OSTSC package improves the performance of RNN classifiers applied to highly imbalanced time series data. In particular, OSTSC is observed to increase the AUC of LSTM from 0.543 to 0.784 on a high frequency trading dataset consisting of 30,000 time series observations.
OTclust Mean Partition, Uncertainty Assessment, Cluster Validation and Visualization Selection for Cluster Analysis
Providing mean partition for ensemble clustering by optimal transport alignment(OTA), uncertainty measures for both partition-wise and cluster-wise assessment and multiple visualization functions to show uncertainty, for instance, membership heat map and plot of covering point set. A partition refers to an overall clustering result.
OTE Optimal Trees Ensembles for Regression, Classification and Class Membership Probability Estimation
Functions for creating ensembles of optimal trees for regression, classification and class membership probability estimation are given. A few trees are selected from an initial set of trees grown by random forest for the ensemble on the basis of their individual and collective performance. Trees are assessed on out-of-bag data and on an independent training data set for individual and collective performance respectively. The prediction functions return estimates of the test responses and their class membership probabilities. Unexplained variations, error rates, confusion matrix, Brier scores, etc. are also returned for the test data.
otinference Inference for Optimal Transport
Sample from the limiting distributions of empirical Wasserstein distances under the null hypothesis and under the alternative. Perform a two-sample test on multivariate data using these limiting distributions and binning.
otrimle Robust Model-Based Clustering
Performs robust cluster analysis allowing for outliers and noise that cannot be fitted by any cluster. The data are modelled by a mixture of Gaussian distributions and a noise component, which is an improper uniform distribution covering the whole Euclidean space. Parameters are estimated by (pseudo) maximum likelihood. This is fitted by a EM-type algorithm. See Coretto and Hennig (2015) <https://…/1406.0808>, and Coretto and Hennig (2016) <https://…/1309.6895>.
OTRselect Variable Selection for Optimal Treatment Decision
A penalized regression framework that can simultaneously estimate the optimal treatment strategy and identify important variables. Appropriate for either censored or uncensored continuous response.
otsad Online Time Series Anomaly Detectors
Implements a set of online fault detectors for time-series, called: PEWMA see M. Carter et al. (2012) <doi:10.1109/SSP.2012.6319708>, SD-EWMA and TSSD-EWMA see H. Raza et al. (2015) <doi:10.1016/j.patcog.2014.07.028>, KNN-CAD see E. Burnaev et al. (2016) <arXiv:1608.04585>, KNN-LDCD see V. Ishimtsev et al. (2017) <arXiv:1706.03412> and CAD-OSE see M. Smirnov (2018) <https://…/CAD>. The first three algorithms belong to prediction-based techniques and the last three belong to window-based techniques. In addition, the SD-EWMA and PEWMA algorithms are algorithms designed to work in stationary environments, while the other four are algorithms designed to work in non-stationary environments.
otvPlots Over Time Variable Plots
Enables automated visualization of variable distribution and changes over time for predictive model building. Computes summary statistics aggregated by time for large datasets, and creates plots for variable level monitoring.
outcomerate AAPOR Survey Outcome Rates
Standardized survey outcome rate functions, including the response rate, contact rate, cooperation rate, and refusal rate. These outcome rates allow survey researchers to measure the quality of survey data using definitions published by the American Association of Public Opinion Research (AAPOR). For details on these standards, see AAPOR (2016) <https://…/Standard-Definitions-(1 ).aspx>.
OutlierDetection Outlier Detection
To detect outliers using different methods namely model based outlier detection (Barnett, V. 1978 <https://…/2347159> ), distance based outlier detection (Hautamaki, V., Karkkainen, I., and Franti, P. 2004 <http://…/papers.html> ), dispersion based outlier detection (Jin, W., Tung, A., and Han, J. 2001 <https://…/0-387-25465-X_7> ), depth based outlier detection (Johnson, T., Kwok, I., and Ng, R.T. 1998 <http://…/kdd98-038.php> ) and density based outlier detection (Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996 <https://…/citation.cfm?id=3001507> ). This package provides labelling of observations as outliers and outlierliness of each outlier. For univariate and bivariate data, visualization is also provided.
outliers Tests for outliers
A collection of some tests commonly used for identifying outliers.
OutliersO3 Draws Overview of Outliers (O3) Plot
Potential outliers are identified for all combinations of a dataset’s variables. The available methods are HDoutliers() from the package ‘HDoutliers’, FastPCS() from the package ‘FastPCS’, mvBACON() from ‘robustX’, adjOutlyingness() from ‘robustbase’, DectectDeviatingCells() from ‘cellWise’.
OutrankingTools Functions for Solving Multiple-criteria Decision-making Problems
Functions to process ‘outranking’ ELECTRE methods existing in the literature. See, e.g., http://…/ELECTRE about the outranking approach and the foundations of ELECTRE methods.
outreg Regression Table for Publication
Create regression tables for publication. Currently supports ‘lm’, ‘glm’, ‘survreg’, and ‘ivreg’ outputs.
overlapping Estimation of Overlapping in Empirical Distributions
Functions for estimating the overlapping area of two or more empirical distributions.
overlapptest Test Overlapping of Polygons Against Random Rotation
Tests the observed overlapping polygon area in a collection of polygons against a null model of random rotation, as explained in De la Cruz et al. (2017) <doi:10.13140/RG.2.2.12825.72801>.
overture Tools for Writing MCMC
Simplifies MCMC setup by automatically looping through sampling functions and saving the results. Reduces the memory footprint of running MCMC and saves samples to disk as the chain runs. Allows samples from the chain to be analyzed while the MCMC is still running. Provides functions for commonly performed operations such as calculating Metropolis acceptance ratios and creating adaptive Metropolis samplers. References: Roberts and Rosenthal (2009) <doi:10.1198/jcgs.2009.06134>.
oviz Vizualisation Methods in Freshwater Sciences
(well, for now you can just plot a Maucha diagram!)
ows4R Interface to OGC Web-Services (OWS)
Provides an Interface to Web-Services defined as standards by the Open Geospatial Consortium (OGC), including Web Feature Service (WFS) for vector data, Catalogue Service (CSW) for ISO/OGC metadata and associated standards such as the common web-service specification (OWS) and OGC Filter Encoding. The long-term purpose is to add support for additional OGC service standards such as Web Coverage Service (WCS) and Web Processing Service (WPS).

P

PabonLasso Pabon Lasso Graphs and Comparing Situations of a Unit in Two Different Times
Pabon Lasso is a graphical method for monitoring the efficiency of different wards of a hospital or different hospitals.Pabon Lasso graph is divided into 4 parts which are created after drawing the average of BTR and BOR. The part in the left-down side is Zone I, left-up side is Zone II, Right-up side part is Zone III and the last part is Zone IV.
PAC Partition-Assisted Clustering
Implements Partition-Assisted Clustering, which utilizes a collection of partition based nonparametric density estimation techniques to improve the robustness and accuracy of downstream clustering. The package also provides functions for effectively visualizing the clustering results. It is particularly useful for finding and visualizing subpopulations in single-cell data analysis.
PACBO Clustering Online Datasets
A function for clustering online datasets. The number of cells is data-driven which need not to be chosen in advance by the user. The method is introduced and fully described in Le Li, Benjamin Guedj and Sebastien Loustau (2016), ‘PAC-Bayesian Online Clustering’ (arXiv preprint: <https://…/1602.00522> ).
packagedocs Build Website of Package Documentation
Build a package documentation and function reference site and use it as the package vignette.
packagefinder Comfortable Search for R Packages on CRAN
Tool to search for R packages on CRAN, based on their title, short and long descriptions. ‘packagefinder’ allows to search for multiple keywords at once and to combine the keywords with logical operators (AND/OR).
packageRank Computation and Visualization of Package Download Counts and Percentiles
Compute and visualize the cross-sectional and longitudinal number and rank percentile of package downloads from RStudio’s CRAN mirror.
packagetrackr Track R Package Downloads from RStudio’s CRAN Mirror
Allows to get and cache R package download log files from RStudio’s CRAN mirror for analyzing package usage.
packcircles R package for random circle packing
This package provides a simple algorithm to arrange circles of arbitrary radii within a rectangle such that there is no-overlap between circles. The algorithm is adapted from an example written in Processing by Sean McCullough (which no longer seems to be available online). It involves iterative pair-repulsion, in which overlapping circles move away from each other. The distance moved by each circle is proportional to the radius of the other to approximate inertia (very loosely), so that when a small circle is overlapped by a large circle, the small circle moves furthest. This process is repeated iteratively until no more movement takes place (acceptable layout) or a maximum number of iterations is reached (layout failure). To avoid edge effects, the bounding rectangle is treated as a toroid. Each circle’s centre is constrained to lie within the rectangle but its edges are allowed to extend outside.
http://…/circle-packing-in-r-again.html
packrat A Dependency Management System for Projects and their R Package Dependencies
Manage the R packages your project depends on in an isolated, portable, and reproducible way.
PACLasso Penalized and Constrained Lasso Optimization
An implementation of both the equality and inequality constrained lasso functions for the algorithm described in ‘Penalized and Constrained Optimization’ by James, Paulson, and Rusmevichientong (Journal of the American Statistical Association, 2019; see <http://…/PAC.pdf> for a full-text version of the paper). The algorithm here is designed to allow users to define linear constraints (either equality or inequality constraints) and use a penalized regression approach to solve the constrained problem. The functions here are used specifically for constraints with the lasso formulation, but the method described in the PaC paper can be used for a variety of scenarios. In addition to the simple examples included here with the corresponding functions, complete code to entirely reproduce the results of the paper is available online through the Journal of the American Statistical Association.
pacman Package Management Tool
Tools to more conveniently perform tasks associated with add on packages. pacman conveniently wraps library and package related functions and names them in an intuitive and consistent fashion. It seeks to combine functionality from lower level functions which can speed up workflow.
pacotest Testing for Partial Copulas and the Simplifying Assumption in Vine Copulas
Routines for two different test types, the Equal Correlation (ECORR) test and the Vectorial Independence (VI) test are provided. The tests can be applied to check whether a conditional copula coincides with its partial copula. Functions to test whether a regular vine copula satisfies the so-called simplifying assumption or to test a single copula within a regular vine copula to be a (j-1)-th order partial copula are available. The ECORR test comes with a decision tree approach to allow testing in high-dimensional settings.
Pade Padé Approximant Coefficients
Given a vector of Taylor series coefficients of sufficient length as input, the function returns the numerator and denominator coefficients for the Padé approximant of appropriate order.
padr Quickly Get Datetime Data Ready for Analysis
Transforms datetime data into a format ready for analysis. It offers two functionalities; aggregating data to a higher level interval (thicken) and imputing records where observations were absent (pad). It also offers a few functions that assist with filling missing values after padding.
pagedown Paginate the HTML Output of R Markdown with CSS for Print
Use the paged media properties in CSS and the JavaScript library ‘paged.js’ to split the content of an HTML document into discrete pages. Each page can have its page size, page numbers, margin boxes, and running headers, etc. Applications of this package include books, letters, reports, papers, business cards, resumes, and posters.
pagenum Put Page Numbers on Graphics
A simple way to add page numbers to base/ggplot/lattice graphics.
paintmap Plotting Paintmaps
Plots matrices of colours as grids of coloured squares – aka heatmaps, guaranteeing legible row and column names, without transformation of values, without re-ordering rows or columns, and without dendrograms.
pairsD3 D3 Scatterplot Matrices
Creates an interactive scatterplot matrix using the D3 JavaScript library. See http://d3js.org for more information on D3.
PairwiseD Pairing Up Units and Vectors in Panel Data Setting
Pairing observations according to a chosen formula and facilitates bilateral analysis of the panel data. Paring is possible for observations, as well as for vectors of observations ordered with respect to time.
pak Another Approach to Package Installation
The goal of ‘pak’ is to make package installation faster and more reliable. In particular, it performs all HTTP operations in parallel, so metadata resolution and package downloads are fast. Metadata and package files are cached on the local disk as well. ‘pak’ has a dependency solver, so it finds version conflicts before performing the installation. This version of ‘pak’ supports CRAN, ‘Bioconductor’ and ‘GitHub’ packages as well.
palasso Paired Lasso Regression
Implements sparse regression with paired covariates (Rauschenberger et al. 2018).
paletteer Comprehensive Collection of Color Palettes
The choices of color palettes in R can be quite overwhelming with palettes spread over many packages with many different API’s. This packages aims to collect all color palettes across the R ecosystem under the same package with a streamlined API.
palm Fitting Point Process Models via the Palm Likelihood
Functions for the fitting of point process models using the Palm likelihood. First proposed by Tanaka, Ogata, and Stoyan (2008) <DOI:10.1002/bimj.200610339>, maximisation of the Palm likelihood can provide computationally efficient parameter estimation in situations where the full likelihood is intractable. This package is chiefly focused on Neyman-Scott point processes, but can also fit void processes. The development of this package was motivated by the analysis of capture-recapture surveys on which individuals cannot be identified—the data from which can conceptually be seen as a clustered point process. As such, some of the functions in this package are specifically for the estimation of cetacean density from two-camera aerial surveys.
palmtree Partially Additive (Generalized) Linear Model Trees
This is an implementation of model-based trees with global model parameters (PALM trees). The PALM tree algorithm is an extension to the MOB algorithm (implemented in the ‘partykit’ package), where some parameters are fixed across all groups. Details about the method can be found in Seibold, Hothorn, Zeileis (2016) <arXiv:1612.07498>. The package offers coef(), logLik(), plot(), and predict() functions for PALM trees.
palr Colour Palettes for Data
Colour palettes for data, based on some well known public data sets.
pals Color Palettes, Colormaps, and Tools to Evaluate Them
A comprehensive collection of color palettes, colormaps, and tools to evaluate them.
pamctdp Principal Axes Methods for Contingency Tables with Partition Structures on Rows and Columns
Correspondence Analysis of Contingency Tables with Simple and Double Structures Superimposed Representations, Intra Blocks Correspondence Analysis (IBCA), Weighted Intra Blocks Correspondence Analysis (WIBCA).
PAmeasures Prediction and Accuracy Measures for Nonlinear Models and for Right-Censored Time-to-Event Data
We propose a pair of summary measures for the predictive power of a prediction function based on a regression model. The regression model can be linear or nonlinear, parametric, semi-parametric, or nonparametric, and correctly specified or mis-specified. The first measure, R-squared, is an extension of the classical R-squared statistic for a linear model, quantifying the prediction function’s ability to capture the variability of the response. The second measure, L-squared, quantifies the prediction function’s bias for predicting the mean regression function. When used together, they give a complete summary of the predictive power of a prediction function. Please refer to Gang Li and Xiaoyan Wang (2016) <arXiv:1611.03063> for more details.
pammtools Piece-Wise Exponential Additive Mixed Modeling Tools
Functions that facilitate fitting piece-wise exponential (additive mixed) models (Bender and Scheipl (2018) <doi: 10.1177/1471082X17748083>). This includes restructuring the data to the needed format and various convenience functions, e.g., for plotting the results.
pAnalysis Benchmarking and Rescaling R2 using Noise Percentile Analysis
Provides the tools needed to benchmark the R2 value corresponding to a certain acceptable noise level while also providing a rescaling function based on that noise level yielding a new value of R2 we refer to as R2k which is independent of both the number of degrees of freedom and the noise distribution function.
pandocfilters Pandoc Filters for R
The document converter ‘pandoc’ <http://…/> is widely used in the R community. One feature of ‘pandoc’ is that it can produce and consume JSON-formatted abstract syntax trees (AST). This allows to transform a given source document into JSON-formatted AST, alter it by so called filters and pass the altered JSON-formatted AST back to ‘pandoc’. This package provides functions which allow to write such filters in native R code. Although this package is inspired by the Python package ‘pandocfilters’ <https://…/>, it provides additional convenience functions which make it simple to use the ‘pandocfilters’ package as a report generator. Since ‘pandocfilters’ inherits most of it’s functionality from ‘pandoc’ it can create documents in many formats (for more information see <http://…/> ) but is also bound to the same limitations as ‘pandoc’.
panelr Regression Models and Utilities for Repeated Measures and Panel Data
Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the ‘within-between’ (also known as ‘between-within’ and ‘hybrid’) panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via ‘Stan’. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE.
panelvar Panel Vector Autoregression
We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by Holtz-Eakin et al. (1988) <doi:10.2307/1913103>, Arellano and Bond (1991) <doi:10.2307/2297968> and the system GMM estimator by Blundell and Bond (1998) <doi:10.1016/S0304-4076(98)00009-8>. We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions.
panelView Visualizing Panel Data with Dichotomous Treatments
Visualizes panel data with dichotomous treatments. ‘panelView’ has two main functionalities: (1) it visualizes the treatment and missing-value statuses of each observation in a panel/time-series-cross-sectional (TSCS) dataset; and (2) it plots the outcome variable (either continuous or discrete) in a time-series fashion.
PANICr PANIC Tests of Nonstationarity
This package contains a methodology that makes use of the factor structure of large dimensional panels to understand the nature of nonstationarity inherent in data. This is referred to as PANIC – Panel Analysis of Nonstationarity in Idiosyncratic and Common Components. PANIC (2004) includes valid pooling methods that allow panel unit root tests tests to be constructed. PANIC (2004) can detect whether the nonstationarity in a series is pervasive, or variable specific, or both. PANIC (2010) includes the Panel Modified Sargan-Bhargava test, three models for a Moon and Perron style test, and a bias correction for the idiosyncratic unit root test of PANIC (2004). The PANIC model approximates the number of factors based on Bai and Ng (2002).
papeR A Toolbox for Writing ‘knitr’, ‘Sweave’ or Other ‘LaTeX’-Based Papers and Reports
A toolbox for writing ‘knitr’, ‘Sweave’ or other ‘LaTeX’-based reports and to prettify the output of various estimated models.
parallelDist Parallel Distance Matrix Computation using Multiple Threads
A fast parallelized alternative to R’s native ‘dist’ function to calculate distance matrices for continuous, binary, and multi-dimensional input matrices with support for a broad variety of distance functions from the ‘stats’, ‘proxy’ and ‘dtw’ R packages. For ease of use, the ‘parDist’ function extends the signature of the ‘dist’ function and uses the same parameter naming conventions as distance methods of existing R packages. The package is mainly implemented in C++ and leverages the ‘RcppParallel’ package to parallelize the distance computations with the help of the ‘TinyThread’ library. Furthermore, the ‘Armadillo’ linear algebra library is used for optimized matrix operations during distance calculations. The curiously recurring template pattern (CRTP) technique is applied to avoid virtual functions, which improves the Dynamic Time Warping calculations while keeping the implementation flexible enough to support different step patterns and normalization methods.
parallelML A Parallel-Voting Algorithm for many Classifiers
By sampling your data, running the provided classifier on these samples in parallel on your own machine and letting your models vote on a prediction, we return much faster predictions than the regular machine learning algorithm and possibly even more accurate predictions.
parallelsugar R package to provide mclapply() syntax for Windows machines
An R package to provide mclapply() syntax for Windows machines.
parallelSVM A Parallel-Voting Version of the Support-Vector-Machine Algorithm
By sampling your data, running the Support-Vector-Machine algorithm on these samples in parallel on your own machine and letting your models vote on a prediction, we return much faster predictions than the regular Support-Vector-Machine and possibly even more accurate predictions.
ParallelTree Parallel Tree
Provides two functions: Group_function() and Parallel_Tree(). Group_function() applies a given function (e.g.,mean()) to input variable(s) by group across levels. Has additional data management options. Parallel_Tree() uses ‘ggplot2’ to create a parallel coordinate plots (technically a facsimile of parallel coordinate plots in a Cartesian coordinate system). Used in combination these functions can create parallel tree plots, a variant of parallel coordinate plots, which are useful for visualizing multilevel data.
paramGUI A Shiny GUI for some Parameter Estimation Examples
Allows specification and fitting of some parameter estimation examples inspired by time-resolved spectroscopy via a Shiny GUI.
paramhetero Numeric and Visual Comparisons of Heterogeneity in Parametric Models
Performs statistical tests to compare coefficients and residual variance across multiple models. Also provides graphical methods for assessing heterogeneity in coefficients and residuals. Currently supports linear and generalized linear models, as well as their multi-level and complex survey-weighted variations from the ‘lme4’ and ‘survey’ packages, respectively. Reference: Li (2015) <https://…/>.
params Simplify Parameters
An interface to simplify organizing parameters used in a package, using external configuration files. This attempts to provide a cleaner alternative to options().
paramtest Run a Function Iteratively While Varying Parameters
Run simulations or other functions while easily varying parameters from one iteration to the next. Some common use cases would be grid search for machine learning algorithms, running sets of simulations (e.g., estimating statistical power for complex models), or bootstrapping under various conditions. See the ‘paramtest’ documentation for more information and examples.
ParBayesianOptimization Parallel Bayesian Optimization of Hyperparameters
Fast, flexible framework for implementing Bayesian optimization of model hyperparameters according to the methods described in Snoek et al. <arXiv:1206.2944>. The package allows the user to run scoring function in parallel, save intermediary results, and tweak other aspects of the process to fully utilize the computing resources available to the user.
parcoords Htmlwidget’ for ‘d3.js’ Parallel Coordinates Chart
Create interactive parallel coordinates charts with this ‘htmlwidget’ wrapper for ‘d3.js’ <https://…/parcoords-es> {‘parallel-coordinates’}.
parcor Regularized estimation of partial correlation matrices
The package estimates the matrix of partial correlations based on different regularized regression methods: lasso, adaptive lasso, PLS, and Ridge Regression. In addition, the package provides model selection for lasso, adaptive lasso and Ridge regression based on cross-validation.
parfm Parametric Frailty Models
Fits Parametric Frailty Models by maximum marginal likelihood. Possible baseline hazards: Weibull, inverse Weibull, exponential, Gompertz, lognormal and loglogistic. Possible Frailty distributions: gamma, inverse Gaussian, positive stable and lognormal.
parglm Parallel GLM
Provides a parallel estimation method for generalized linear models without compiling with a multithreaded LAPACK or BLAS.
parmsurvfit Parametric Survival Functions
Executes parametric survival analysis techniques similar to those in ‘Minitab’. Fits right censored data to a given parametric distribution, produces summary statistics of the fitted distribution, and plots parametric survival, hazard, and cumulative hazard plots. Produces Anderson-Darling test statistic and probability plots to assess goodness of fit of right censored data to a distribution.
parSim Parallel Simulation Studies
Perform flexible simulation studies using one or multiple computer cores. The package is set up to be usable on high-performance clusters in addition to being run locally, see examples on <https://…/parSim>.
parsnip A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. ‘R’, ‘Spark’, ‘Stan’, etc).
PartCensReg Partially Censored Regression Models Based on Heavy-Tailed Distributions
It estimates the parameters of a partially censored regression model via maximum penalized likelihood through a iterative EM-type algorithm. The model must belong to the semi-parametric family, including a parametric and nonparametric component. The error term considered belongs to the scale-mixture of normal (SMN) distribution, that includes well-known heavy tails distributions as the student’s-t distribution among others. To examine the performance of the fitted model, case-deletion and local influence techniques are provided to show its robust aspect against outlying and influential observations. This work is based in Ferreira, C. S., & Paula, G. A. (2017) <doi:10.1080/02664763.2016.1267124> but considering the SMN family.
partDSA Partitioning Using Deletion, Substitution, and Addition Moves
A novel tool for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.
partialAR Partial Autoregression
Fits time series models which consist of a sum of a permanent and a transient component.
Partiallyoverlapping Partially Overlapping Samples t-Tests
The ‘partially overlapping samples t-tests’, for the comparison of means for two samples which include both paired observations and independent observations. [See Derrick, B., Russ, B., Toher, D. & White P (2017). Test statistics for the comparison of means for two samples which include both paired observations and independent observations. Journal of Modern Applied Statistical Methods, 16(1)].
particles A Graph Based Particle Simulator Based on D3-Force
Simulating particle movement in 2D space has many application. The ‘particles’ package implements a particle simulator based on the ideas behind the ‘d3-force’ ‘JavaScript’ library. ‘particles’ implements all forces defined in ‘d3-force’ as well as others such as vector fields, traps, and attractors.
partition Agglomerative Partitioning Framework for Dimension Reduction
A fast and flexible framework for agglomerative partitioning. ‘partition’ uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. ‘partition’ is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized.
partitionComparison Implements Measures for the Comparison of Two Partitions
Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) <doi:10.1007/s00357-006-0017-z> and Meila M (2007) <doi:10.1016/j.jmva.2006.11.013>. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.
partools Tools for the ‘Parallel’ Package
Miscellaneous utilities for the ‘parallel’ package, cluster-oriented section; ‘Snowdoop’ alternative to MapReduce; file splitting and distributed operations such as sort and aggregate. ‘Software Alchemy’ method for parallelizing most statistical methods, presented in N. Matloff, Parallel Computation for Data Science, Chapman and Hall, 2015. Includes a debugging aid.
party A Laboratory for Recursive Partytioning
A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available.
passport Travel Smoothly Between Country Name and Code Formats
Smooths the process of working with country names and codes via powerful parsing, standardization, and conversion utilities arranged in a simple, consistent API. Country name formats include multiple sources including the Unicode Common Locale Data Repository (CLDR, <http://…/> ) common-sense standardized names in hundreds of languages.
password Create Random Passwords
Create random passwords of letters, numbers and punctuation.
passwordrandom Access the PasswordRandom.com API in R
passwordrandom is an R package to interface to the PasswordRandom.com API – http://…/api
pasta Noodlyfied Pasting of Strings
Intuitive and readable infix functions to paste strings together.
PATHChange A Tool for Identification of Differentially Expressed Pathways using Multi-Statistic Comparison
An R tool suited to Affymetrix microarray data that combines three different statistical tests (Bootstrap, Fisher exact and Wilcoxon signed rank) to evaluate genetic pathway alterations.
pathClass Classification using biological pathways as prior knowledge
pathClass is a collection of classification methods that use information about feature connectivity in a biological network as an additional source of information. This additional knowledge is incorporated into the classification a priori. Several authors have shown that this approach significantly increases the classification performance.
pathfindR Pathway Enrichment Analysis Utilizing Active Subnetworks
Pathway enrichment analysis enables researchers to uncover mechanisms underlying the phenotype. pathfindR is a tool for pathway enrichment analysis utilizing active subnetworks. It identifies active subnetworks in a protein-protein interaction network using user-provided a list of genes. It performs pathway enrichment analyses on the identified subnetworks. pathfindR also offers functionality to cluster enriched pathways and identify representative pathways. The method is described in detail in Ulgen E, Ozisik O, Sezerman OU. 2018. pathfindR: An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks. bioRxiv. <doi:10.1101/272450>.
pathlibr OO Path Manipulation in R
An OO Interface for path manipulation, emulating pythons ‘pathlib’.
pathmapping Compute Deviation and Correspondence Between Spatial Paths
Functions to compute and display the area-based deviation between spatial paths and to compute a mapping based on minimizing area and distance-based cost. For details, see: Mueller, S. T., Perelman, B. S., & Veinott, E. S. (2016) <DOI:10.3758/s13428-015-0562-7>.
pathological Path Manipulation Utilities
Utilities for paths, files and directories.
PathSelectMP Backwards Variable Selection for Paths using M Plus
Primarily for use with datasets containing only categorical variables, although continuous variables may be included as independent variables in paths. Using M Plus, backward variable selection is performed on all Total, Total Indirect, and then Direct effects until none of these effects have p-values greater than the specified target p-value. If there are missing values in the data, imputations are performed using the Mice package. Then selection is performed with the imputed data sets, and results are averaged.
patternize Quantification of Color Pattern Variation
Quantification of variation in organismal color patterns as obtained from image data. Patternize defines homology between pattern positions across images either through fixed landmarks or image registration. Pattern identification is performed by categorizing the distribution of colors using either an RGB threshold or unsupervised image segmentation.
patternplot Versatile Pie Chart using Patterns, Colors, and Images
Creates aesthetically pleasing and informative pie charts. It can plot pie charts either in black and white or in colors, with or without filled patterns. On the one hand, black and white pie charts filled with patterns are useful for publications, especially when an increasing number of journals only accept black and white figures or charge a significant amount for a color figure. On the other hand, colorful pie charts with or without patterns are useful for print design, online publishing, or poster and ‘PowerPoint’ presentations. ‘patternplot’ allows the flexibility of a variety of combinations of patterns and colors to choose from. It also has the ability to fill in the slices with any external images in ‘png’ and ‘jpeg’ formats. In summary, ‘patternplot’ allows the users to be as creative as they can while creating pie charts!
pawls Penalized Adaptive Weighted Least Squares Regression
Efficient algorithms for fitting weighted least squares regression with \eqn{L_{1}}{L1} regularization on both the coefficients and weight vectors, which is able to perform simultaneous variable selection and outliers detection efficiently.
paws Amazon Web Services Software Development Kit
Interface to Amazon Web Services <https://aws.amazon.com>, including storage, database, and compute services, such as ‘Simple Storage Service’ (‘S3’), ‘DynamoDB’ ‘NoSQL’ database, and ‘Lambda’ functions-as-a-service.
paws.application.integration Amazon Web Services Application Integration APIs
Interface to Amazon Web Services application integration APIs, including ‘Simple Queue Service’ (‘SQS’) message queue, ‘Simple Notification Service’ (‘SNS’) publish/subscribe messaging, and more <https://…/>.
paws.common Paws Low-Level Amazon Web Services API
Functions for making low-level API requests to Amazon Web Services <https://aws.amazon.com>. The functions handle building, signing, and sending requests, and receiving responses. They are designed to help build higher-level interfaces to individual services, such as Simple Storage Service (S3).
paws.compute Amazon Web Services Compute APIs
Interface to Amazon Web Services’ compute APIs, including ‘Elastic Compute Cloud’ (‘EC2’), ‘Lambda’ functions-as-a-service, containers, batch processing, and more <https://…/>.
paws.customer.engagement Amazon Web Services Customer Engagement APIs
Interface to Amazon Web Services customer engagement APIs, including ‘Simple Email Service’, ‘Connect’ contact center service, and more <https://…/>.
paws.machine.learning Amazon Web Services Machine Learning APIs
Interface to Amazon Web Services machine learning APIs, including ‘SageMaker’ managed machine learning service, natural language processing, speech recognition, translation, and more <https://…/>.
paws.management Amazon Web Services Management & Governance APIs
Interface to Amazon Web Services management and governance APIs, including ‘CloudWatch’ application and infrastructure monitoring, ‘Auto Scaling’ for automatically scaling resources, and more <https://…/>.
paws.networking Amazon Web Services Networking & Content Delivery APIs
Interface to Amazon Web Services networking and content delivery APIs, including ‘Route 53’ Domain Name System service, ‘CloudFront’ content delivery, load balancing, and more <https://…/>.
paws.security.identity Amazon Web Services Security, Identity, & Compliance APIs
Interface to Amazon Web Services security, identity, and compliance APIs, including the ‘Identity & Access Management’ (‘IAM’) service for managing access to services and resources, and more <https://…/>.
pbapply Adding Progress Bar to ‘*apply’ Functions
A lightweight package that adds progress bar to vectorized R functions (‘*apply’). The implementation can easily be added to functions, where showing the progress is useful for the user (e.g. bootstrap).
pbdRPC Programming with Big Data — Remote Procedure Call
A very light implementation yet secure for remote procedure calls with unified interface via ssh (OpenSSH) or plink/plink.exe (PuTTY).
PBIBD Efficiencies of PBIB Designs
Assists in calculating the efficiencies of any type of Partially Balanced Incomplete Block Designs with two, three, four and five associate classes. This will help the researchers in calculating the efficiencies of their PBIB designs very quickly and efficiently.
pbmcapply Tracking the Progress of Mc*pply with Progress Bar
A light-weight package helps you track and visualize the progress of parallel version of vectorized R functions (mc*apply).
pbv Probabilities for Bivariate Normal Distribution
Computes probabilities of the bivariate normal distribution in a vectorized R function (Drezner & Wesolowsky, 1990, <doi:10.1080/00949659008811236>).
PCA4TS Segmenting Multiple Time Series by Contemporaneous Linear Transformation
To seek for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially.
pcaBootPlot Create 2D Principal Component Plots with Bootstrapping
Draws a 2D principal component plot using the first 2 principal components from the original and bootstrapped data to give some sense of variability.
pcadapt Principal Component Analysis for Outlier Detection
Methods to detect genetic markers involved in biological adaptation. ‘pcadapt’ provides statistical tools for outlier detection based on Principal Component Analysis.
PCADSC Tools for Principal Component Analysis-Based Data Structure Comparisons
A suite of non-parametric, visual tools for assessing differences in data structures for two datasets that contain different observations of the same variables. These tools are all based on Principal Component Analysis (PCA) and thus effectively address differences in the structures of the covariance matrices of the two datasets. The PCASDC tools consist of easy-to-use, intuitive plots that each focus on different aspects of the PCA decompositions. The cumulative eigenvalue (CE) plot describes differences in the variance components (eigenvalues) of the deconstructed covariance matrices. The angle plot presents the information loss when moving from the PCA decomposition of one dataset to the PCA decomposition of the other. The chroma plot describes the loading patterns of the two datasets, thereby presenting the relative weighting and importance of the variables from the original dataset.
pcalg Methods for Graphical Models and Causal Inference
Functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC) and the Generalized Adjustment Criterion (GAC) are implemented.
pCalibrate Bayesian Calibrations of P-Values
Implements transformations of P-values to the smallest possible Bayes factor within the specified class of alternative hypotheses, as described in Held & Ott (2017, On p-values and Bayes factors, Annual Review of Statistics and Its Application, 5, to appear). Covers several common testing scenarios such as z-tests, t-tests, likelihood ratio tests and the F-test of overall significance in the linear model.
pcaPA Parallel Analysis for Ordinal and Numeric Data using Polychoric and Pearson Correlations with S3 Classes
A set of functions to perform parallel analysis for principal components analysis intended mainly for large data sets. It performs a parallel analysis of continuous, ordered (including dichotomous/binary as a special case) or mixed type of data associated with a principal components analysis. Polychoric correlations among ordered variables, Pearson correlations among continuous variables and polyserial correlation between mixed type variables (one ordered and one continuous) are used. Whenever the use of polyserial or polychoric correlations yields a non positive definite correlation matrix, the resulting matrix is transformed into the nearest positive definite matrix. This is a continued work based on a previous version developed at the Colombian Institute for the evaluation of education – ICFES.
PCDimension Finding the Number of Significant Principal Components
Implements methods to automate the Auer-Gervini graphical Bayesian approach for determining the number of significant principal components. Automation uses clustering, change points, or simple statistical models to distinguish ‘long’ from ‘short’ steps in a graph showing the posterior number of components as a function of a prior parameter.
pcdpca Dynamic Principal Components for Periodically Correlated Functional Time Series
Method extends multivariate dynamic principal components to periodically correlated multivariate time series.
pcensmix Model Fitting to Progressively Censored Mixture Data
Functions for generating progressively Type-II censored data in a mixture structure and fitting models using a constrained EM algorithm. It can also create a progressive Type-II censored version of a given real dataset to be considered for model fitting.
pcev Principal Component of Explained Variance
Principal component of explained variance (PCEV) is a statistical tool for the analysis of a multivariate response vector. It is a dimension-reduction technique, similar to Principal component analysis (PCA), which seeks the maximize the proportion of variance (in the response vector) being explained by a set of covariates.
pcgen Reconstruction of Causal Networks for Data with Random Genetic Effects
Implements the pcgen algorithm, which is a modified version of the standard pc-algorithm, with specific conditional independence tests and modified orientation rules.
pch Piecewise Constant Hazards Models for Censored and Truncated Data
Using piecewise constant hazards models is a very flexible approach for the analysis of survival data. The time line is divided into sub-intervals; for each interval, a different hazard is estimated using Poisson regression.
pcLasso Principal Components Lasso
A method for fitting the entire regularization path of the principal components lasso for linear and logistic regression models. The algorithm uses cyclic coordinate descent in a path-wise fashion. See URL below for more information on the algorithm. See Tay, K., Friedman, J. ,Tibshirani, R., (2014) ‘Principal component-guided sparse regression’ <arXiv:1810.04651>.
PCMRS Model Response Styles in Partial Credit Models
Implementation of PCMRS (Partial Credit Model with Response Styles) as proposed in by Tutz, Schauberger and Berger (2016) <https://…/> . PCMRS is an extension of the regular partial credit model. PCMRS allows for an additional person parameter that characterizes the response style of the person. By taking the response style into account, the estimates of the item parameters are less biased than in partial credit models.
PCovR Principal Covariates Regression
Analyzing regression data with many and/or highly collinear predictor variables, by simultaneously reducing the predictor variables to a limited number of components and regressing the criterion variables on these components. Several rotation options are provided in this package, as well as model selection options.
PCovR: An R Package for Principal Covariates Regression
pcrcoal Implementing the Coalescent Approach to PCR Simulation Developed by Weiss and Von Haeseler (NAR, 1997)
Implementing the Coalescent Approach to PCR Simulation.
pct Propensity to Cycle Tool
Functions and example data to teach and increase the reproducibility of the methods and code underlying the Propensity to Cycle Tool (PCT), a research project and web application hosted at <https://…/>. For an academic paper on the methods, see Lovelace et al (2017) <doi:10.5198/jtlu.2016.862>.
pdc Permutation Distribution Clustering
Permutation Distribution Clustering is a clustering method for time series. Dissimilarity of time series is formalized as the divergence between their permutation distributions. The permutation distribution was proposed as measure of the complexity of a time series.
PDFEstimator Nonparametric Probability Density Estimator
A nonparametric density estimator based on the maximum-entropy method. Accurately predicts a probability density function (PDF) for random data using a novel iterative scoring function to determine the best fit without overfitting to the sample.
pdfsearch Search Tools for PDF Files
Includes functions for keyword search of pdf files. There is also a wrapper that includes searching of all files within a single directory.
pdftables Programmatic Conversion of PDF Tables
Allows the user to convert PDF tables to formats more amenable to analysis (‘.csv’, ‘.xml’, or ‘.xlsx’) by wrapping the PDFTables API. In order to use the package, the user needs to sign up for an API account on the PDFTables website (<https://…/pdf-to-excel-api> ). The package works by taking a PDF file as input, uploading it to PDFTables, and returning a file with the extracted data.
pdftools Extract Text and Data from PDF Documents
Utilities based on libpoppler for extracting text, fonts, attachements and metadata from a pdf file. Also implements rendering of PDF to bitmaps on supported platforms.
PDM Photogrammetric Distances Measurer
Measures real distances in pictures. With PDM() function, you can choose one ‘*.jpg’ file, select the measure in mm of scale, starting and and finishing point in the graphical scale, the name of the measure, and starting and and finishing point of the measures. After, ask the user for a new measure.
PDN Personalized Disease Network
Building patient level networks for prediction of medical outcomes and draw the cluster of network. This package is based on paper Personalized disease networks for understanding and predicting cardiovascular diseases and other complex processes (See Cabrera et al. <http://…/A14957> ).
pdp Partial Dependence Functions
A general framework for creating partial dependence plots from various types machine learning models in R.
pdSpecEst Positive-Definite Wavelet-Based Multivariate Spectral Analysis
Implementation of wavelet-based multivariate spectral density estimation and clustering methods in the Riemannian manifold of Hermitian and positive-definite matrices.
Peacock.test Two and Three Dimensional Kolmogorov-Smirnov Two-Sample Tests
The original definition of the two and three dimensional Kolmogorov-Smirnov two-sample test statistics given by Peacock (1983) is implemented. Two R-functions: peacock2 and peacock3, are provided to compute the test statistics in two and three dimensional spaces, respectively. Note the Peacock test is different from the Fasano and Franceschini test (1987). The latter is a variant of the Peacock test.
peakPantheR Peak Picking and Annotation of High Resolution Experiments
An automated pipeline for the detection, integration and reporting of predefined features across a large number of mass spectrometry data files.
peakRAM Monitor the Total and Peak RAM Used by an Expression or Function
When working with big datasets, RAM conservation is critically important. However, it is not always enough to just monitor the size of the objects created. So-called ‘copy-on-modify’ behavior, characteristic of R, means that some expressions or functions may require an unexpectedly large amount of RAM overhead. For example, replacing a single value in a matrix duplicates that matrix in the backend, making this task require twice as much RAM as that used by the matrix itself. This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.
PeakSegDisk Disk-Based Implementation of PeakSegFPOP
Disk-based implementation of Functional Pruning Optimal Partitioning with up-down constraints <arXiv:1810.00117> for single-sample peak calling (independently for each sample and genomic problem), can handle huge data sets (10^7 or more).
PeakSegDP Dynamic Programming Algorithm for Peak Detection in ChIP-Seq Data
A quadratic time dynamic programming algorithm can be used to compute an approximate solution to the problem of finding the most likely changepoints with respect to the Poisson likelihood, subject to a constraint on the number of segments, and the changes which must alternate: up, down, up, down, etc. For more info read <http://…/hocking15.html> ‘PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data’ by TD Hocking et al, proceedings of ICML2015.
PeakSegOptimal Optimal Segmentation Subject to Up-Down Constraints
Computes optimal changepoint models using the Poisson likelihood for non-negative count data, subject to the PeakSeg constraint: the first change must be up, second change down, third change up, etc. For more info about the models and algorithms, read ‘A log-linear time algorithm for constrained changepoint detection’ <arXiv:1703.03352> by TD Hocking et al.
pedquant Public Economic Data and Quantitative Analysis
Provides an interface to access public economic and financial data for economic research and quantitative analysis. The data sources including NBS, FRED, Yahoo Finance, 163 Finance, Sina Finance and etc.
PeerPerformance Luck-Corrected Peer Performance Analysis in R
Provides functions to perform the peer performance analysis of funds’ returns as described in Ardia and Boudt (2016) <doi:10.2139/ssrn.2000901>.
PEGroupTesting Population Proportion Estimation using Group Testing
The population proportion using group testing can be estimated by different methods. Four functions including p.mle(), p.gart(), p.burrow() and p.order() are provided to implement four estimating methods including the maximum likelihood estimate, Gart’s estimate, Burrow’s estimate, and order statistic estimate.
penalizedSVM Feature Selection SVM using Penalty Functions
Provides feature selection SVM using penalty functions. The smoothly clipped absolute deviation (SCAD), ‘L1-norm’, ‘Elastic Net’ (‘L1-norm’ and ‘L2-norm’) and ‘Elastic SCAD’ (SCAD and ‘L2-norm’) penalties are available. The tuning parameters can be found using either a fixed grid or a interval search.
penaltyLearning Penalty Learning
Implementations of algorithms from Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression, by Hocking, Rigaill, Vert, Bach <http://…/hocking13.html> published in proceedings of ICML2013.
pencopulaCond Estimating Non-Simplified Vine Copulas Using Penalized Splines
Estimating Non-Simplified Vine Copulas Using Penalized Splines.
PenCoxFrail Regularization in Cox Frailty Models
A regularization approach for Cox Frailty Models by penalization methods is provided.
pendensity Density Estimation with a Penalized Mixture Approach
Estimation of univariate (conditional) densities using penalized B-splines with automatic selection of optimal smoothing parameter.
penRvine Flexible R-Vines Estimation Using Bivariate Penalized Splines
Offers routines for estimating densities and copula distribution of R-vines using penalized splines.
pense Penalized Elastic Net S/MM-Estimator of Regression
Robust penalized elastic net S and MM estimator for linear regression. The method is described in detail in Cohen Freue, G. V., Kepplinger, D., Salibian-Barrera, M., and Smucler, E. (2017) <https://…/PENSE_manuscript.pdf>.
pequod Moderated Regression Package
Moderated regression with mean and residual centering and simple slopes analysis.
Perc Using Percolation and Conductance to Find Information Flow Certainty in a Direct Network
To find the certainty of dominance interactions with indirect interactions being considered.
perccal Implementing Double Bootstrap Linear Regression Confidence Intervals Using the ‘perc-cal’ Method
Contains functions which allow the user to compute confidence intervals quickly using the double bootstrap-based percentile calibrated (‘perc-cal’) method for linear regression coefficients. ‘perccal_interval()’ is the primary user-facing function within this package.
perccalc Estimate Percentiles from an Ordered Categorical Variable
An implementation of two functions that estimate values for percentiles from an ordered categorical variable as described by Reardon (2011, isbn:978-0-87154-372-1). One function estimates percentile differences from two percentiles while the other returns the values for every percentile from 1 to 100.
performance Assessment of Regression Models Performance
Utilities for computing measures to assess model quality, which are not directly provided by R’s ‘base’ or ‘stats’ packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models.
peRiodiCS Functions for Generating Periodic Curves
Functions for generating variants of curves: restricted cubic spline, periodic restricted cubic spline, periodic cubic spline. Periodic splines can be used to model data that has periodic nature / seasonality.
periscope Enterprise Streamlined ‘Shiny’ Application Framework
An enterprise-targeted scalable and UI-standardized ‘shiny’ framework including a variety of developer convenience functions with the goal of both streamlining robust application development while assisting with creating a consistent user experience regardless of application or developer.
PerMallows Permutations and Mallows Distributions
Includes functions to work with the Mallows and Generalized Mallows Models. The considered distances are Kendall’s-tau, Cayley, Hamming and Ulam and it includes functions for making inference, sampling and learning such distributions, some of which are novel in the literature. As a by-product, PerMallows also includes operations for permutations, paying special attention to those related with the Kendall’s-tau, Cayley, Ulam and Hamming distances. It is also possible to generate random permutations at a given distance, or with a given number of inversions, or cycles, or fixed points or even with a given length on LIS (longest increasing subsequence).
permDep Permutation Tests for General Dependent Truncation
Implementations of permutation approach to hypothesis testing for quasi-independence of truncation time and failure time. The implemented approaches are powerful against non-monotone alternatives and thereby offer protection against erroneous assumptions of quasi-independence. The proposed tests use either a conditional or an unconditional method to evaluate the permutation p-value. The conditional method was first developed in Tsai (1980) <doi:10.2307/2336059> and Efron and Petrosian (1992) <doi:10.1086/171931>. The unconditional method provides a valid approximation to the conditional method, yet computationally simpler and does not hold fixed the size of each risk sets. Users also have an option to carry out the proposed permutation tests in a parallel computing fashion.
permGS Permutational Group Sequential Test for Time-to-Event Data
Permutational group-sequential tests for time-to-event data based on the log-rank test statistic. Supports exact permutation test when the censoring distributions are equal in the treatment and the control group and approximate imputation-permutation methods when the censoring distributions are different.
permuco Permutation Tests for Regression, (Repeated Measures) ANOVA/ANCOVA and Comparison of Signals
Functions to compute p-values based on permutation tests. Regression, ANOVA and ANCOVA, omnibus F-tests, marginal unilateral and bilateral t-tests are available. Several methods to handle nuisance variables are implemented (Kherad-Pajouh, S., & Renaud, O. (2010) <doi:10.1016/j.csda.2010.02.015> ; Kherad-Pajouh, S., & Renaud, O. (2014) <doi:10.1007/s00362-014-0617-3> ; Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M., & Nichols, T. E. (2014) <doi:10.1016/j.neuroimage.2014.01.060>). An extension for the comparison of signals issued from experimental conditions (e.g. EEG/ERP signals) is provided. Several corrections for multiple testing are possible, including the cluster-mass statistic (Maris, E., & Oostenveld, R. (2007) <doi:10.1016/j.jneumeth.2007.03.024>) and the threshold-free cluster enhancement (Smith, S. M., & Nichols, T. E. (2009) <doi:10.1016/j.neuroimage.2008.03.061>).
permutations Permutations of a Finite Set
Manipulates invertible functions from a finite set to itself. Can transform from word form to cycle form and back.
permutes Permutation Tests for Time Series Data
Helps you determine the analysis window to use when analyzing densely-sampled time-series data, such as EEG data, using permutation testing (Maris & Oostenveld 2007) <doi:10.1016/j.jneumeth.2007.03.024>. These permutation tests can help identify the timepoints where significance of an effects begins and ends, and the results can be plotted in various types of heatmap for reporting.
PerseusR Perseus R Interop
Enables the interoperability between the Perseus platform for omics data analysis (Tyanova et al. 2016) <doi:10.1038/nmeth.3901> and R. It provides the foundation for developing and running Perseus plugins implemented in R by providing all required input and output handling, including data and parameter parsing as described in Rudolph and Cox 2018 <doi:10.1101/447268>.
personograph Pictographic Representation of Treatment Effects
Visualizes treatment effects using person icons, similar to Cates (NNT) charts.
perturbR Random Perturbation of Count Matrices
The perturbR() function incrementally perturbs network edges and compares the resulting community detection solutions from the rewired networks with the solution found for the original network. These comparisons aid in understanding the stability of the original solution. The package requires symmetric, weighted (specifically, count) matrices/networks.
petitr Relative Growth Rate
Calculates the relative growth rate (RGR) of a series of individuals by building a life table and solving the Lotka-Birch equation. (See Birch, L. C. 1948. The intrinsic rate of natural increase of an insect population. – Journal of Animal Ecology 17: 15-26) <doi:10.2307/1605>.
petrinetR Building, Visualizing, Exporting and Replaying Petri Nets
Functions for the construction of Petri Nets. Petri Nets can be replayed by firing enabled transitions. Silent transitions will be hidden by the execution handler. Also includes functionalities for the visualization of Petri Nets and export of Petri Nets to PNML (Petri Net Markup Language) files.
pfa Estimates False Discovery Proportion Under Arbitrary Covariance Dependence
Estimate the false discovery proportion (FDP) by Principal Factor Approximation method with general known and unknown covariance dependence.
pgbart Bayesian Additive Regression Trees Using Particle Gibbs Sampler and Gibbs/Metropolis-Hastings Sampler
The Particle Gibbs sampler and Gibbs/Metropolis-Hastings sampler were implemented to fit Bayesian additive regression tree model. Construction of the model (training) and prediction for a new data set (testing) can be separated. Our reference paper is: Lakshminarayanan B, Roy D, Teh Y W. Particle Gibbs for Bayesian additive regression trees[C], Artificial Intelligence and Statistics. 2015: 553-561, <http://…/lakshminarayanan15.pdf>.
pgdraw Generate Random Samples from the Polya-Gamma Distribution
Generates random samples from the Polya-Gamma distribution using an implementation of the algorithm described in J. Windle’s PhD thesis (2013) <https://…/WINDLE-DISSERTATION-2013.pdf>. The underlying implementation is in C.
PGEE Penalized Generalized Estimating Equations in High-Dimension
Fits penalized generalized estimating equations to longitudinal data with high-dimensional covariates.
pgee.mixed Penalized Generalized Estimating Equations for Bivariate Mixed Outcomes
Perform simultaneous estimation and variable selection for correlated bivariate mixed outcomes (one continuous outcome and one binary outcome per cluster) using penalized generalized estimating equations. In addition, clustered Gaussian and binary outcomes can also be modeled. The SCAD, MCP, and LASSO penalties are supported. Cross-validation can be performed to find the optimal regularization parameter(s).
PGICA Parallel Group ICA Algorithm
A Group ICA Algorithm that can run in parallel on an SGE platform or multi-core PCs
pGMGM Estimating Multiple Gaussian Graphical Models (GGM) in Penalized Gaussian Mixture Models (GMM)
This is an R and C code implementation of the New-SP and New-JGL method of Gao et al. (2016) <DOI:10.1214/16-EJS1135> to perform model-based clustering and multiple graph estimation.
pgmm Parsimonious Gaussian Mixture Models
Carries out model-based clustering or classification using parsimonious Gaussian mixture models. McNicholas and Murphy (2008) <doi:10.1007/s11222-008-9056-0>, McNicholas (2010) <doi:10.1016/j.jspi.2009.11.006>, McNicholas and Murphy (2010) <doi:10.1093/bioinformatics/btq498>.
pgnorm The p-Generalized Normal Distribution
Evaluation of the pdf and the cdf of the univariate, noncentral, p-generalized normal distribution. Sampling from the univariate, noncentral, p-generalized normal distribution using either the p-generalized polar method, the p-generalized rejecting polar method, the Monty Python method, the Ziggurat method or the method of Nardon and Pianca. The package also includes routines for the simulation of the bivariate, p-generalized uniform distribution and the simulation of the corresponding angular distribution.
pGPx Pseudo-Realizations for Gaussian Process Excursions
Computes pseudo-realizations from the posterior distribution of a Gaussian Process (GP) with the method described in Azzimonti et al. (2016) <doi:10.1137/141000749>. The realizations are obtained from simulations of the field at few well chosen points that minimize the expected distance in measure between the true excursion set of the field and the approximate one. Also implements a R interface for (the main function of) Distance Transform of sampled Functions (<http://…/index.html> ).
pgraph Build Dependency Graphs using Projection
Implements a general framework for creating graphs using projection. Both lasso and sparse additive model projections and implemented. Both Pearson correlation and distance covariance are used to generate the graph.
pgsc Computes Powell’s Generalized Synthetic Control Estimator
Computes the generalized synthetic control estimator described in Powell (2017) <doi:10.7249/WR1142>. Provides both point estimates, and hypothesis testing.
ph2bye Phase II Clinical Trial Design Using Bayesian Methods
Calculate the Bayesian posterior/predictive probability and determine the sample size and stopping boundaries for single-arm Phase II design.
ph2hetero Adaptive Designs for Two-Stage Phase II Studies
Implementation of Jones (2007) <doi:10.1016/j.cct.2007.02.008> , Tournoux-Facon (2011) <doi:10.1002/sim.4148> and Parashar (2016) <doi:10.1002/pst.1742> designs.
ph2mult Phase II Clinical Trial Design for Multinomial Endpoints
Provide multinomial design methods under intersection-union test (IUT) and union-intersection test (UIT) scheme for Phase II trial. The design types include : Minimax (minimize the maximum sample size), Optimal (minimize the expected sample size), Admissible (minimize the Bayesian risk) and Maxpower (maximize the exact power level).
phase1PRMD Personalized Repeated Measurement Design for Phase I Clinical Trials
Implements Bayesian phase I repeated measurement design that accounts for multidimensional toxicity endpoints and longitudinal efficacy measure from multiple treatment cycles. The package provides flags to fit a variety of model-based phase I design, including 1 stage models with or without individualized dose modification, 3-stage models with or without individualized dose modification, etc. Functions are provided to recommend dosage selection based on the data collected in the available patient cohorts and to simulate trial characteristics given design parameters. Yin, Jun, et al. (2017) <doi:10.1002/sim.7134>.
PHEindicatormethods Common Public Health Statistics and their Confidence Intervals
Functions to calculate commonly used public health statistics and their confidence intervals using methods approved for use in the production of Public Health England indicators such as those presented via Fingertips (<http://…/> ). It provides functions for the generation of proportions, crude rates, means, directly standardised rates, indirectly standardised rates, standardised mortality ratios, slope and relative index of inequality and life expectancy. Statistical methods are referenced in the following publications. Breslow NE, Day NE (1987) <doi:10.1002/sim.4780080614>. Dobson et al (1991) <doi:10.1002/sim.4780100317>. Armitage P, Berry G (2002) <doi:10.1002/9780470773666>. Wilson EB. (1927) <doi:10.1080/01621459.1927.10502953>. Altman DG et al (2000, ISBN: 978-0-727-91375-3). Chiang CL. (1968, ISBN: 978-0-882-75200-6). Newell C. (1994, ISBN: 978-0-898-62451-9). Eayres DP, Williams ES (2004) <doi:10.1136/jech.2003.009654>. Silcocks PBS et al (2001) <doi:10.1136/jech.55.1.38>. Low and Low (2004) <doi:10.1093/pubmed/fdh175>.
phenocamr Facilitates ‘PhenoCam’ Data Access and Time Series Post-Processing
Programmatic interface to the ‘PhenoCam’ web services (<http://phenocam.sr.unh.edu> ). Allows for easy downloading of ‘PhenoCam’ data directly to your R workspace or your computer and provides post-processing routines for consistent and easy timeseries outlier detection, smoothing and estimation of phenological transition dates. Methods for this package are described in detail in Hufkens et. al (2018) <doi:10.1111/2041-210X.12970>.
phenoCDM Continuous Development Models for Incremental Time-Series Analysis
Using the Bayesian state-space approach, we developed a continuous development model to quantify dynamic incremental changes in the response variable. While the model was originally developed for daily changes in forest green-up, the model can be used to predict any similar process. The CDM can capture both timing and rate of nonlinear processes. Unlike statics methods, which aggregate variations into a single metric, our dynamic model tracks the changing impacts over time. The CDM accommodates nonlinear responses to variation in predictors, which changes throughout development.
phiDelta Tool for Phi Delta Analysis of Features
Analysis of features by phi delta diagrams. In particular, functions for reading data and calculating phi and delta as well as the functionality to plot it. Moreover it is possible to do further analysis on the data by generating rankings. For more information on phi delta diagrams, see also Giuliano Armano (2015) <doi:10.1016/j.ins.2015.07.028>.
philentropy Similarity and Distance Quantification Between Probability Functions
Computes 46 optimized distance and similarity measures for comparing probability functions. These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a base framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.
phmm Proportional Hazards Mixed-Effects Model (PHMM)
Fits proportional hazards model incorporating random effects using an EM algorithm using Markov Chain Monte Carlo at E-step. Vaida and Xu (2000) <doi:10.1002/1097-0258(20001230)19:24%3C3309::AID-SIM825%3E3.0.CO;2-9>.
photobiologyFilters Spectral Transmittance Data for Filters
A data only package with spectral ‘transmittance’ data for frequently used filters and similar materials. Plastic sheets and films; optical glass and ordinary glass and some labware.
phrasemachine Simple Phrase Extraction
Simple noun phrase extraction using part-of-speech information. Takes a collection of un-processed documents as input and returns a set of noun phrases associated with those documents.
phybreak Analysis of Outbreaks with Sequence Data
Implementation the outbreak analysis method described by Klinkenberg et al (2016) <doi:10.1101/069195>. Simulate outbreaks, analyse datasets by creating samples from the posterior distribution with a Markov-Chain Monte Carlo sampler, and summarize the output.
phylogram Dendrograms for Evolutionary Analysis
Contains functions for importing and exporting ‘dendrogram’ objects in parenthetic text format, and several functions for command-line tree manipulation. With an emphasis on speed and computational efficiency, the package also includes a suite of tools for rapidly computing distance matrices and building large trees using fast alignment-free ‘k-mer’ counting and divisive clustering techniques.
picasso Pathwise Calibrated Sparse Shooting Algorithm
Implement a new family of efficient algorithms, called PathwIse CalibrAted Sparse Shooting AlgOrithm, for a variety of sparse learning problems, including Sparse Linear Regression, Sparse Logistic Regression, Sparse Column Inverse Operator and Sparse Multivariate Regression. Different types of active set identification schemes are implemented, such as cyclic search, greedy search, stochastic search and proximal gradient search. Besides, the package provides the choices between convex (L1 norm) and non-convex (MCP and SCAD) regularizations. Moreover, group regularization, such as group Lasso, group MCP and group SCAD, are also implemented for Sparse Linear Regression, Sparse Logistic Regression and Sparse Multivariate Regression.
PieceExpIntensity Bayesian Model to Find Changepoints Based on Rates and Count Data
This function fits a reversible jump Bayesian piecewise exponential model that also includes the intensity of each event considered along with the rate of events.
piecewiseSEM Piecewise Structural Equation Modeling
Implements piecewise structural equation models.
pifpaf Potential Impact Fraction and Population Attributable Fraction for Cross-Sectional Data
Uses a generalized method to estimate the Potential Impact Fraction (PIF) and the Population Attributable Fraction (PAF) from cross-sectional data. It creates point-estimates, confidence intervals, and estimates of variance. In addition it generates plots for conducting sensitivity analysis. The estimation method corresponds to Zepeda-Tello, Camacho-García-Formentí, et al. 2017. ‘Nonparametric Methods to Estimate the Potential Impact Fraction from Cross-sectional Data’. Unpublished manuscript. This package was developed under funding by Bloomberg Philanthropies.
Pijavski Global Univariate Minimization
Global univariate minimization of Lipschitz functions is performed by using Pijavski method, which was published in Pijavski (1972) <DOI:10.1016/0041-5553(72)90115-2>.
pillar Coloured Formatting for Columns
Provides a ‘pillar’ generic designed for formatting columns of data using the full range of colours provided by modern terminals.
pim Fit Probabilistic Index Models
Fit a probabilistic index model as described in Thas et al <doi:10.1111/j.1467-9868.2011.01020.x>. The interface to the modeling function has changed in this new version. The old version is still available at R-Forge. You can install the old package using install.packages(‘pimold’, repos = ‘http://R-Forge.R-project.org’ ).
pimeta Prediction Intervals for Random-Effects Meta-Analysis
An implementation of prediction intervals for random-effects meta-analysis: Higgins et al. (2009) <doi:10.1111/j.1467-985X.2008.00552.x>, Partlett and Riley (2017) <doi:10.1002/sim.7140>, and Nagashima et al. (2018) <arXiv:1804.01054>.
pinbasic Fast and Stable Estimation of the Probability of Informed Trading (PIN)
Utilities for fast and stable estimation of the probability of informed trading (PIN) in the model introduced by Easley et al. (2002) <DOI:10.1111/1540-6261.00493> are implemented. Since the basic model developed by Easley et al. (1996) <DOI:10.1111/j.1540-6261.1996.tb04074.x> is nested in the former due to equating the intensity of uninformed buys and sells, functionalities can also be applied to this simpler model structure, if needed.
pingers Identify, Ping, and Log Internet Provider Connection Data
To assist you with troubleshooting internet connection issues and assist in isolating packet loss on your network. It does this by allowing you to retrieve the top trace route destinations your internet provider uses, and recursively ping each server in series while capturing the results and writing them to a log file. Each iteration it queries the destinations again, before shuffling the sequence of destinations to ensure the analysis is unbiased and consistent across each trace route.
pinp pinp’ is not ‘PNAS’
A ‘PNAS’-alike style for ‘rmarkdown’, derived from the ‘Proceedings of the National Academy of Sciences of the United States of America’ (PNAS, see <https://www.pnas.org> ) LaTeX style, and adapted for use with ‘markdown’ and ‘pandoc’.
PINSPlus Perturbation Clustering for Data Integration and Disease Subtyping
Clustering algorithm for determining the number of clusters and location of each sample in the clusters. Perturbation clustering for data INtegration and disease Subtyping (PINS – Nguyen T, Tagett R, Diaz D, Draghici S (2017) <doi:10.1101/gr.215129.116>) is a novel approach for integration of data and classification of diseases into various subtypes. PINS supports both single and multiple data types. ‘PINSPlus’ is designed to be user-friendly and primarily makes use of two main functions. ‘PINSPlus’ fasten PINS algorithm by supporting parallel processing and using efficient stopping criteria.
pinyin Convert Chinese Characters into Pinyin
Convert Chinese characters into Pinyin (the official romanization system for Standard Chinese in mainland China, Malaysia, Singapore, and Taiwan. See <https://…/Pinyin> for details).
pipefittr Convert Nested Functions to Pipes
To take nested function calls and convert them to a more readable form using pipes from package ‘magrittr’.
pipeliner Machine Learning Pipelines for R
A framework for defining ‘pipelines’ of functions for applying data transformations, model estimation and inverse-transformations, resulting in predicted value generation (or model-scoring) functions that automatically apply the entire pipeline of functions required to go from input to predicted output.
pirate Generated Effect Modifier
An implementation of the generated effect modifier (GEM) method. This method constructs composite variables by linearly combining pre-treatment scalar patient characteristics to create optimal treatment effect modifiers in linear models. The optimal linear combination is called a GEM. Treatment is assumed to have been assigned at random. For reference, see E Petkova, T Tarpey, Z Su, and RT Ogden. Generated effect modifiers (GEMs) in randomized clinical trials. Biostatistics (First published online: July 27, 2016, <doi:10.1093/biostatistics/kxw035>).
piratings Calculate Pi Ratings for Teams Competing in Sport Matches
Calculate and optimize dynamic performance ratings of association football teams competing in matches, in accordance with the method used in the research paper ‘Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries’, by Constantinou and Fenton (2013) <doi:10.1515/jqas-2012-0036> This dynamic rating system has proven to provide superior results for predicting association football outcomes.
piton Parsing Expression Grammars in Rcpp
A wrapper around the ‘Parsing Expression Grammar Template Library’, a C++11 library for generating Parsing Expression Grammars, that makes it accessible within Rcpp. With this, developers can implement their own grammars and easily expose them in R packages.
pivmet Pivotal Methods for Bayesian Relabelling and k-Means Clustering
Collection of pivotal algorithms for: relabelling the MCMC chains in order to undo the label switching problem in Bayesian mixture models, as proposed in Egidi, Pappadà, Pauli and Torelli (2018a)<doi:10.1007/s11222-017-9774-2>; initializing the centers of the classical k-means algorithm in order to obtain a better clustering solution. For further details see Egidi, Pappadà, Pauli and Torelli (2018b)<ISBN:9788891910233>.
pivot SQL’ PIVOT and UNPIVOT
Extends the ‘tidyverse’ packages ‘dbplyr’ and ‘tidyr’ functionality with pivot(), i.e. spread(), and unpivot(), i.e. gather(), for reshaping remote tables. Currently only ‘Microsoft SQL Server’ is supported.
pivotaltrackR A Client for the ‘Pivotal Tracker’ API
Pivotal Tracker’ <https://www.pivotaltracker.com> is a project management software-as-a-service that provides a REST API. This package provides an R interface to that API, allowing you to query it and work with its responses.
pivottabler Create Pivot Tables in R
Create regular pivot tables with just a few lines of R. More complex pivot tables can also be created, e.g. pivot tables with irregular layouts, multiple calculations and/or derived calculations based on multiple data frames.
pixels Tools for Working with Image Pixels
Provides tools to show and draw image pixels using ‘HTML’ widgets and ‘Shiny’ applications. It can be used to visualize the ‘MNIST’ dataset for handwritten digit recognition or to create new image recognition datasets.
pixiedust Tables so Beautifully Fine-Tuned You Will Believe It’s Magic
The introduction of the broom package has made converting model objects into data frames as simple as a single function. While the broom package focuses on providing tidy data frames that can be used in advanced analysis, it deliberately stops short of providing functionality for reporting models in publication-ready tables. pixiedust provides this functionality with a programming interface intended to be similar to ggplot2’s system of layers with fine tuned control over each cell of the table. Options for output include printing to the console and to the common markdown formats (markdown, HTML, and LaTeX). With a little pixiedust (and happy thoughts) tables can really fly.
pkgbuild Find Tools Needed to Build R Packages
Provides functions used to build R packages. Locates compilers needed to build R packages on various platforms and ensures the PATH is configured appropriately so R can use them.
pkgcache Cache ‘CRAN’-Like Metadata and R Packages
Metadata and package cache for CRAN-like repositories. This is a utility package to be used by package management tools that want to take advantage of caching.
pkgconfig Private Configuration for ‘R’ Packages
Set configuration options on a per-package basis. Options set by a given package only apply to that package, other packages are unaffected.#
pkgcopier Copy Local R Packages to Another Environment
Copy’ local R package information to a temporary cloud space and ‘paste’ your favorite R packages to a new environment.
pkggraph A Consistent and Intuitive Platform to Explore the Dependencies of Packages on the Comprehensive R Archive Network Like Repositories
Interactively explore various dependencies of a package(s) (on the Comprehensive R Archive Network Like repositories) and perform analysis using tidy philosophy. Most of the functions return a ‘tibble’ object (enhancement of ‘dataframe’) which can be used for further analysis. The package offers functions to produce ‘network’ and ‘igraph’ dependency graphs. The ‘plot’ method produces a static plot based on ‘ggnetwork’ and ‘plotd3’ function produces an interactive D3 plot based on ‘networkD3’.
pkgkitten Create simple packages which pass R CMD check
The base R function package.skeleton() is very useful for creating new packages for R. It is also very upsetting as it has been producing the same files which upset R CMD check in the exact same way. And as something terrible happens each time R CMD check barks, this package offers a wrapper function kitten() which leaves an adorable little package behind which does not upset R CMD check .
pkgload Simulate Package Installation and Attach
Simulates the process of installing a package and then attaching it. This is a key part of the ‘devtools’ package as it allows you to rapidly iterate while developing a package.
pkgmaker Package development utilities
This package provides some low-level utilities to use for package development. It currently provides managers for multiple package specific options and registries, vignette, unit test and bibtex related utilities. It serves as a base package for packages like NMF, RcppOctave, doRNG, and as an incubator package for other general purposes utilities, that will eventually be packaged separately. It is still under heavy development and changes in the interface(s) are more than likely to happen.
pkgnet Get Network Representation of an R Package
Tools from the domain of graph theory can be used to quantify the complexity and vulnerability to failure of a software package. That is the guiding philosophy of this package. ‘pkgnet’ provides tools to analyze the dependencies between functions in an R package and between its imported packages.
pkgsearch Search CRAN R Packages
Search CRAN R packages. Uses the ‘METACRAN’ search server, see <https://r-pkg.org>.
pkgverse Build a Meta-Package Universe
Build your own universe of packages similar to the ‘tidyverse’ package <https://…/> with this meta-package creator. Create a package-verse, or meta package, by supplying a custom name for the collection of packages and the vector of desired package names to include- and optionally supply a destination directory, an indicator of whether to keep the created package directory, and/or a vector of verbs implement via the ‘usethis’ <http://…/> package.
pkmon Least-Squares Estimator under k-Monotony Constraint for Discrete Functions
We implement two least-squares estimators under k-monotony constraint using a method based on the Support Reduction Algorithm from Groeneboom et al (2008) <DOI:10.1111/j.1467-9469.2007.00588.x>. The first one is a projection estimator on the set of k-monotone discrete functions. The second one is a projection on the set of k-monotone discrete probabilities. This package provides functions to generate samples from the spline basis from Lefevre and Loisel (2013) <DOI:10.1239/jap/1378401239>, and from mixtures of splines.
pksensi Global Sensitivity Analysis in Pharmacokinetic Modeling
Applying the global sensitivity analysis workflow to investigate the parameter uncertainty and sensitivity in pharmacokinetic (PK) models, especially the physiologically-based pharmacokinetic (PBPK) model with multivariate outputs. The package also provide some functions to check the sensitivity measures and its convergence of model parameters.
plac A Pairwise Likelihood Augmented Cox Estimator for Left-Truncated Data
A semi-parametric estimation method for the Cox model with left-truncated data using augmented information from the marginal of truncation times.
placement Tools for Accessing the Google Maps API
The main functions in this package are drive_time (used for calculating distances between physical addresses or coordinates) and geocode_url (used for estimating the lat/long coordinates of a physical address). Optionally, it generates the cryptographic signatures necessary for making API calls with a Google for Work/Premium account within the geocoding process. These accounts have larger quota limits than the ‘standard_api’ and, thus, this package may be useful for individuals seeking to submit large batch jobs within R to the Google Maps API. Placement also provides methods for accessing the standard API using a (free) Google API key (see: <https://…/get-api-key#get-an-api-key> ).
PlackettLuce Plackett-Luce Models for Rankings
Functions to prepare rankings data and fit the Plackett-Luce model jointly attributed to Plackett (1975) <doi:10.2307/2346567> and Luce (1959, ISBN:0486441369). The standard Plackett-Luce model is generalized to accommodate ties of any order in the ranking. Partial rankings, in which only a subset of items are ranked in each ranking, are also accommodated in the implementation. Disconnected/weakly connected networks implied by the rankings are handled by adding pseudo-rankings with a hypothetical item. Methods are provided to estimate standard errors or quasi-standard errors for inference as well as to fit Plackett-Luce trees. See the package website or vignette for full details.
plainview Plot Raster Images Interactively on a Plain HTML Canvas
Provides methods for plotting potentially large (raster) images interactively on a plain HTML canvas. In contrast to package ‘mapview’ data are plotted without background map, but data can be projected to any spatial coordinate reference system. Supports plotting of classes ‘RasterLayer’, ‘RasterStack’, ‘RasterBrick’ (from package ‘raster’) as well as ‘png’ files located on disk. Interactivity includes zooming, panning, and mouse location information. In case of multi-layer ‘RasterStacks’ or ‘RasterBricks’, RGB image plots are created (similar to ‘raster::plotRGB’ – but interactive).
Planesmuestra Functions for Calculating Dodge Romig, MIL STD 105E and MIL STD 414 Acceptance Sampling Plan
Calculates an acceptance sampling plan, (sample size and acceptance number) based in MIL STD 105E, Dodge Romig and MIL STD 414 tables and procedures. The arguments for each function are related to lot size, inspection level and quality level. The specific plan operating curve (OC), is calculated by the binomial distribution.
planor Generation of Regular Factorial Designs
Automatic generation of regular factorial designs, including fractional designs, orthogonal block designs, row-column designs and split-plots. Kobilinsky, Monod and Bailey (2017) <doi:10.1016/j.csda.2016.09.003>.
plaqr Partially Linear Additive Quantile Regression
Estimation, prediction, thresholding, and plotting for partially linear additive quantile regression. Intuitive functions for fitting and plotting partially linear additive quantile regression models. Uses and works with functions from the ‘quantreg’ package.
Plasmode Plasmode’ Simulation
Creates realistic simulated datasets for causal inference based on a user-supplied example study, Franklin JM, Schneeweiss S, Polinski JM, and Rassen JA (2014) <doi:10.1016/j.csda.2013.10.018>. It samples units from the data with replacement, and then simulates the exposure, the outcome, or both, based on the observed covariate values in the real data.
plfm Probabilistic Feature Analysis of Two-Way Two-Mode Frequencies
The package can be used to estimate probabilistic latent feature models with a disjunctive or a conjunctive mapping rule for two-way two-mode frequency data
plfMA A GUI to View, Design and Export Various Graphs of Data
Provides a graphical user interface for viewing and designing various types of graphs of the data. The graphs can be saved in different formats of an image.
pliable The Pliable Lasso Test
Fits a pliable lasso model. For details see Tibshirani and Friedman (2018) <arXiv:1712.00484>.
plink IRT Separate Calibration Linking Methods
Item response theory based methods are used to compute linking constants and conduct chain linking of unidimensional or multidimensional tests for multiple groups under a common item design. The unidimensional methods include the Mean/Mean, Mean/Sigma, Haebara, and Stocking-Lord methods for dichotomous (1PL, 2PL and 3PL) and/or polytomous (graded response, partial credit/generalized partial credit, nominal, and multiple-choice model) items. The multidimensional methods include the least squares method and extensions of the Haebara and Stocking-Lord method using single or multiple dilation parameters for multidimensional extensions of all the unidimensional dichotomous and polytomous item response models. The package also includes functions for importing item and/or ability parameters from common IRT software, conducting IRT true score and observed score equating, and plotting item response curves/surfaces, vector plots, information plots, and comparison plots for examining parameter drift.
PLMIX Bayesian Analysis of Finite Mixtures of Plackett-Luce Models for Partial Rankings/Orderings
Fit finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors.
PLmixed Estimate (Generalized) Linear Mixed Models with Factor Structures
Utilizes the ‘lme4’ package and the optim() function from ‘stats’ to estimate (generalized) linear mixed models (GLMM) with factor structures using a profile likelihood approach, as outlined in Jeon and Rabe-Hesketh (2012) <doi:10.3102/1076998611417628>. Factor analysis and item response models can be extended to allow for an arbitrary number of nested and crossed random effects, making it useful for multilevel and cross-classified models.
PLNmodels Poisson Lognormal Models
The Poisson-lognormal model and variants can be used for a variety of multivariate problems when count data are at play, including principal component analysis for count data (Chiquet, Mariadassou and Robin, 2018 <doi:10.1214/18-AOAS1177>), discriminant analysis and network inference (Chiquet, Mariadassou and Robin, 2018 <arXiv:1806.03120>). Implements variational algorithms to fit such models accompanied with a set of functions for visualization and diagnostic.
plogr The ‘plog’ C++ Logging Library
A simple header-only logging library for C++. Add ‘LinkingTo: plogr’ to ‘DESCRIPTION’, and ‘#include <plogr.h>’ in your C++ modules to use it.
plot.matrix Visualizes a Matrix as Heatmap
Visualizes a matrix object plainly as heatmap. It provides a single S3 function plot for matrices.
plot3logit Ternary Plots for Trinomial Regression Models
An implementation of the ternary plot for interpreting regression coefficients of trinomial regression models, as proposed in Santi, Dickson and Espa (2018) <doi:10.1080/00031305.2018.1442368>. Ternary plots are drawn using either ‘ggtern’ package (based on ‘ggplot2’) or ‘Ternary’ package (based on standard graphics).
PlotContour Plot Contour Line
This function plots a contour line with a user-defined probability and tightness of fit.
plotdap Easily Visualize Data from ‘ERDDAP’ Servers via the ‘rerddap’ Package
Easily visualize and animate ‘tabledap’ and ‘griddap’ objects obtained via the ‘rerddap’ package in a simple one-line command, using either base graphics or ‘ggplot2’ graphics. ‘plotdap’ handles extracting and reshaping the data, map projections and continental outlines. Optionally the data can be animated through time using the ‘gganmiate’ package.
plotfunctions Various Functions to Facilitate Visualization of Data and Analysis
When analyzing data, plots are a helpful tool for visualizing data and interpreting statistical models. This package provides a set of simple tools for building plots incrementally, starting with an empty plot region, and adding bars, data points, regression lines, error bars, gradient legends, density distributions in the margins, and even pictures. The package builds further on R graphics by simply combining functions and settings in order to reduce the amount of code to produce for the user. As a result, the package does not use formula input or special syntax, but can be used in combination with default R plot functions. Note: Most of the functions were part of the package ‘itsadug’, which is now split in two packages: 1. the package ‘itsadug’, which contains the core functions for visualizing and evaluating nonlinear regression models, and 2. the package ‘plotfunctions’, which contains more general plot functions.
plotGMM Custom Function to Plot Components from a Gaussian Mixture Model
Custom function to be used in plotting components from a Gaussian mixture model. Usage most often will be specifying the ‘fun’ argument within ‘stat_function’ in a ggplot2 object.
plothelper New Plots Based on ‘ggplot2’ and Functions to Create Regular Shapes
An extension to ‘ggplot2’ with miscellaneous functions. It contains two groups of functions: Functions in the first group draw ‘ggplot2’ – based plots: gg_shading_bar() draws barplot with shading colors in each bar. geom_rect_cm(), geom_circle_cm() and geom_ellipse_cm() draw rectangles, circles and ellipses with centimeter as their unit. Thus their sizes do not change when the coordinate system or the aspect ratio changes. Functions in the second group generate coordinates for regular shapes and make linear transformations.
plotKML Visualization of Spatial and Spatio-temporal Objects in Google Earth
Writes sp-class, spacetime-class, raster-class and similar spatial and spatio-temporal objects to KML following some basic cartographic rules.
plotluck ggplot2′ Version of ‘I’m Feeling Lucky!’
Examines the characteristics of a data frame and a formula to automatically choose the most suitable type of plot out of the following supported options: scatter, violin, box, bar, density, hexagon bin, spine plot, and heat map. The aim of the package is to let the user focus on what to plot, rather than on the ‘how’ during exploratory data analysis. It also automates handling of observation weights, logarithmic axis scaling, reordering of factor levels, and overlaying smoothing curves and median lines. Plots are drawn using ‘ggplot2’.
plotly Create Interactive Web Graphics via Plotly’s JavaScript Graphing Library
Easily translate ggplot2 graphs to an interactive web-based version and/or create custom web-based visualizations directly from R. Once uploaded to a plotly account, plotly graphs (and the data behind them) can be viewed and modified in a web browser.
GitHub
plotlyGeoAssets Render ‘Plotly’ Maps without an Internet Connection
Includes ‘JavaScript’ files that allow ‘plotly’ maps to render without an internet connection.
plotMElm Plot Marginal Effects from Linear Models
Plot marginal effects for interactions estimated from linear models.
PlotPrjNetworks Useful Networking Tools for Project Management
Useful set of tools for plotting network diagrams in any kind of project.
plotrr Making Visual Exploratory Data Analysis Easier
Functions for making visual exploratory data analysis easier.
plotwidgets Spider Plots, ROC Curves, Pie Charts and More for Use in Other Plots
Small self-contained plots for use in larger plots or to delegate plotting in other functions. Also contains a number of alternative color palettes and HSL color space based tools to modify colors or palettes.
plsdof Degrees of Freedom and Statistical Inference for Partial Least Squares Regression
The plsdof package provides Degrees of Freedom estimates for Partial Least Squares (PLS) Regression. Model selection for PLS is based on various information criteria (aic, bic, gmdl) or on cross-validation. Estimates for the mean and covariance of the PLS regression coefficients are available. They allow the construction of approximate confidence intervals and the application of test procedures. Further, cross-validation procedures for Ridge Regression and Principal Components Regression are available.
plspm.formula Formula Based PLS Path Modeling Utilities
The main objective is to make easy the PLS Path Modeling with R using the package ‘plspm’. It compute automatically the inner matrix and the outer list the ‘plspm’ function need simply by specify the model using formulas.
plspolychaos Sensitivity Indexes from Polynomial Chaos Expansions and PLS
Computation of sensitivity indexes by using a method based on a truncated Polynomial Chaos Expansion of the response and regression PLS, for computer models with correlated continuous inputs, whatever the input distribution. The truncated Polynomial Chaos Expansion is built from the multivariate Legendre orthogonal polynomials. The number of runs (rows) can be smaller than the number of monomials. It is possible to select only the most significant monomials. Of course, this package can also be used if the inputs are independent. Note that, when they are independent and uniformly distributed, the package ‘polychaosbasics’ is more appropriate.
plsr Pleasure – Partial Least Squares Analysis with Permutation Testing
Provides partial least squares analysis for the analysis of the relation between two high-dimensional data sets. Includes permutation testing and bootstrapping for resulting latent variables (following McIntosh & Lobaugh (2004) <doi:10.1016/j.neuroimage.2004.07.020>) and several visualization functions.
plsRbeta Partial Least Squares Regression for Beta Regression Models
Provides Partial least squares Regression for (weighted) beta regression models and k-fold cross-validation of such models using various criteria. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available.
plsRcox Partial Least Squares Regression for Cox Models and Related Techniques
Provides Partial least squares Regression and various regular, sparse or kernel, techniques for fitting Cox models in high dimensional settings.
plsRglm Partial Least Squares Regression for Generalized Linear Models
Provides (weighted) Partial least squares Regression for generalized linear models and repeated k-fold cross-validation of such models using various criteria. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available.
plsVarSel Variable Selection in Partial Least Squares
Interfaces and methods for variable selection in Partial Least Squares. The methods include filter methods, wrapper methods and embedded methods.
pltesim Simulate Probabilistic Long-Term Effects in Models with Temporal Dependence
Simulate probabilistic long-term effects in models with temporal dependence.
plumber An API Generator for R
Gives the ability to automatically generate and serve an HTTP API from R functions using the annotations in the R documentation around your functions.
pluscode Encoder for Google ‘Pluscodes’
Retrieves a ‘pluscode’ by inputting latitude and longitude. Includes additional functions to retrieve neighbouring ‘pluscodes’.
plyr Tools for splitting, applying and combining data
plyr is a set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each pieces and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of plyr has been generously supported by BD (Becton Dickinson).
pm4py Interface to the ‘PM4py’ Process Mining Library
Interface to ‘PM4py’ <http://pm4py.org>, a process mining library in ‘Python’. This package uses the ‘reticulate’ package to act as a bridge between ‘PM4Py’ and the ‘R’ package ‘bupaR’. It provides several process discovery algorithms, evaluation measures, and alignments.
pmap Process Map Visualization
A set of functions to produce the process map visualization for process analysis. It can generate the process map from a process event logs recorded in the process, with the ability of pruning the nodes and/or edges to reduce the complexity of the result.
PMCMRplus Calculate Pairwise Multiple Comparisons of Mean Rank Sums Extended
For one-way layout experiments the one-way ANOVA can be performed as an omnibus test. All-pairs multiple comparisons tests (Tukey-Kramer test, Scheffe test, LSD-test) and many-to-one tests (Dunnett test) for normally distributed residuals and equal within variance are available. Furthermore, all-pairs tests (Games-Howell test, Tamhane’s T2 test, Dunnett T3 test, Ury-Wiggins-Hochberg test) and many-to-one (Tamhane-Dunnett Test) for normally distributed residuals and heterogeneous variances are provided. Van der Waerden’s normal scores test for omnibus, all-pairs and many-to-one tests is provided for non-normally distributed residuals and homogeneous variances. The Kruskal-Wallis, BWS and Anderson-Darling omnibus test and all-pairs tests (Nemenyi test, Dunn test, Conover test, Dwass-Steele-Critchlow- Fligner test) as well as many-to-one (Nemenyi test, Dunn test, U-test) are given for the analysis of variance by ranks. Non-parametric trend tests (Jonckheere test, Cuzick test, Johnson-Mehrotra test, Spearman test) are included. In addition, a Friedman-test for one-way ANOVA with repeated measures on ranks (CRBD) and Skillings-Mack test for unbalanced CRBD is provided with consequent all-pairs tests (Nemenyi test, Siegel test, Miller test, Conover test, Exact test) and many-to-one tests (Nemenyi test, Demsar test, Exact test). A trend can be tested with Pages’s test. Durbin’s test for a two-way balanced incomplete block design (BIBD) is given in this package as well as Gore’s test for CRBD with multiple observations per cell is given. Outlier tests, Mandel’s k- and h statistic as well as functions for Type I error and Power analysis as well as generic summary, print and plot methods are provided.
pmd Paired Mass Distance Analysis for GC/LC-MS Based Non-Targeted Analysis
Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> for gas/liquid chromatography-mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. A GUI for PMD analysis is also included as a ‘shiny’ application.
pmhtutorial Minimal Working Examples for Particle Metropolis-Hastings
Routines for state estimate in a linear Gaussian state space model and a simple stochastic volatility model using particle filtering. Parameter inference is also carried out in these models using the particle Metropolis-Hastings algorithm that includes the particle filter to provided an unbiased estimator of the likelihood. This package is a collection of minimal working examples of these algorithms and is only meant for educational use and as a start for learning to them on your own.
pMineR Processes Mining in Medicine
Allows to build and train simple Process Mining (PM) models (at the moment only AlphaAlgorithm and First Order Markov Model). The aim is to support PM specifically for the clinical domain from both administrative and clinical data.
pmmlTransformations Transforms Input Data from a PMML Perspective
Allows for data to be transformed before using it to construct models. Builds structures to allow functions in the PMML package to output transformation details in addition to the model in the resulting PMML file.
pmpp Posterior Mean Panel Predictor
Dynamic panel modelling framework based on an empirical-Bayes approach. Contains tools for computing point forecasts and bootstrapping prediction intervals. Reference: Liu et al. (2016) <doi:10.2139/ssrn.2889000>.
pmsampsize Calculates the Minimum Sample Size Required for Developing a Multivariable Prediction Model
Computes the minimum sample size required for the development of a new multivariable prediction model using the criteria proposed by Riley et al. (2018) <doi: 10.1002/sim.7992>. pmsampsize can be used to calculate the minimum sample size for the development of models with continuous, binary or survival (time-to-event) outcomes. Riley et al. (2018) <doi: 10.1002/sim.7992> lay out a series of criteria the sample size should meet. These aim to minimise the overfitting and to ensure precise estimation of key parameters in the prediction model.
pmultinom One-Sided Multinomial Probabilities
Implements multinomial CDF (P(N1<=n1, …, Nk<=nk)) and tail probabilities (P(N1>n1, …, Nk>nk)), as well as probabilities with both constraints (P(l1<N1<=u1, …, lk<Nk<=uk)). Uses a method suggested by Bruce Levin (1981) <doi:10.1214/aos/1176345593>.
PMwR Portfolio Management with R
Functions and examples for ‘Portfolio Management with R’: backtesting investment and trading strategies, computing profit/loss and returns, analysing trades, handling lists of transactions, reporting, and more.
pnea Parametric Network Enrichment Analysis
Collection of functions for Parametric Network Enrichment Analysis.
PoA Finds the Price of Anarchy for Routing Games
Computes the optimal flow, Nash flow and the Price of Anarchy for any routing game defined within the game theoretical framework. The input is a routing game in the form of it’s cost and flow functions. Then transforms this into an optimisation problem, allowing both Nash and Optimal flows to be solved by nonlinear optimisation. See <https://…/Congestion_game> and Knight and Harper (2013) <doi:10.1016/j.ejor.2013.04.003> for more information.
POCRE Penalized Orthogonal-Components Regression
Penalized orthogonal-components regression (POCRE) is a supervised dimension reduction method for high-dimensional data. It sequentially constructs orthogonal components (with selected features) which are maximally correlated to the response residuals. POCRE can also construct common components for multiple responses and thus build up latent-variable models.
POD Probability of Detection for Qualitative PCR Methods
This tool computes the probability of detection (POD) curve and the limit of detection (LOD), i.e. the number of copies of the target DNA sequence required to ensure a 95 % probability of detection (LOD95). Other quantiles of the LOD can be specified. This is a reimplementation of the mathematical-statistical modelling of the validation of qualitative polymerase chain reaction (PCR) methods within a single laboratory as provided by the commercial tool ‘PROLab’ <http://…/>. The modelling itself has been described by Uhlig et al. (2015) <doi:10.1007/s00769-015-1112-9>.
POET Principal Orthogonal ComplEment Thresholding (POET) Method
Estimate large covariance matrices in approximate factor models by thresholding principal orthogonal complements.
pogit Bayesian Variable Selection for a Poisson-Logistic Model
Bayesian variable selection for regression models of under-reported count data as well as for (overdispersed) Poisson, negative binomal and binomial logit regression models using spike and slab priors.
pointblank Validation of Local and Remote Data Tables
Validate data in local data frames, local ‘tibble’ objects, in ‘CSV’ and ‘TSV’ files, and in database tables (‘PostgreSQL’ and ‘MySQL’). Validation pipelines can be made using easily-readable, consecutive validation steps and such pipelines allow for switching of the data table context. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions (e.g., sending email notifications).
pointdensityP Point density for geospatial data
pointdensity returns a density count and the temporal average for every point in the original list. The dataframe returned includes four columns: lat, lon, count, and date_avg. The “lat” column is the original latitude data; the “lon” column is the original longitude data; the “count” is the density count of the number of points within a radius of radius*grid_size (the neighborhood); and the date_avg column includes the average date of each point in the neighborhood.
PointFore Interpretation of Point Forecasts as State-Dependent Quantiles and Expectiles
Estimate specification models for the state-dependent level of an optimal quantile/expectile forecast. Wald Tests and the test of overidentifying restrictions are implemented. Plotting of the estimated specification model is possible. The package contains two data sets with forecasts and realizations: the daily accumulated precipitation at London, UK from the high-resolution model of the European Centre for Medium-Range Weather Forecasts (ECMWF, <https://…/> ) and GDP growth Greenbook data by the US Federal Reserve. See Schmidt, Katzfuss and Gneiting (2015) <arXiv:1506.01917> for more details on the identification and estimation of a directive behind a point forecast.
poisbinom A Faster Implementation of the Poisson-Binomial Distribution
Provides the probability, distribution, and quantile functions and random number generator for the Poisson-Binomial distribution. This package relies on FFTW to implement the discrete Fourier transform, so that it is much faster than the existing implementation of the same algorithm in R.
PoisBinOrdNonNor Generation of up to Four Different Types of Variables
Generation of a chosen number of count, binary, ordinal, and continuous (via Fleishman polynomials) random variables, with specified correlations and marginal properties.
poisFErobust Poisson Fixed Effects Robust
Computation of robust standard errors of Poisson fixed effects models, following Wooldridge (1999).
poisson Simulating Homogenous & Non-Homogenous Poisson Processes
Contains functions and classes for simulating, plotting and analysing homogenous and non-homogenous Poisson processes.
poissonMT Robust M-Estimators Based on Transformations for Poisson Model
R functions for the computation of Least Square based on transformation (L2T) and robust M-estimators based on transformations (MT-estimators) for Poisson regression models.
PoissonPCA Poisson-Noise Corrected PCA
For a multivariate dataset with independent Poisson measurement error, calculates principal components of transformed latent Poisson means. T. Kenney, T. Huang, H. Gu (2019) <arXiv:1904.11745>.
politeness Detecting Politeness Features in Text
Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness (Brown & Levinson, 1987 <http://…/1987-97641-000>; Danescu-Niculescu-Mizil et al., 2013 <arXiv:1306.6078>; Voigt et al., 2017 <doi:10.1073/pnas.1702413114>). We thank the Spencer Foundation, the Hewlett Foundation, and Harvard’s Institute for Quantitative Social Science for support.
politicaldata Tools for Acquiring and Analyzing Political Data
Provides for useful functions to obtain commonly-used data in political analysis and political science, including from sources such as the Comparative Agendas Project <https://www.comparativeagendas.net>, which provides data on politics and policy from 20+ countries, the MIT Election and Data Science Lab <https://www.electionlab.mit.edu>, and FiveThirtyEight <https://www.FiveThirtyEight.com>.
polmineR Toolset for Corpus Analysis
Tools for corpus analysis using the CWB as an efficient backend. The package offers basic functionality to flexibly create subcorpora and to carry out and basic statistical operations. Beyond that, it is intended to serve as an interface to packages implementing advanced statistical procedures.
polychaosbasics Sensitivity Indexes Calculated from Polynomial Chaos Expansions
Computation of sensitivity indexes by using a method based on a truncated Polynomial Chaos Expansions of the response. The necessary condition of the method is: the inputs must be uniformly and independently sampled. Since the inputs are uniformly distributed, the truncated Polynomial Chaos Expansion is built from the multivariate Legendre orthogonal polynomials.
Polychrome Qualitative Palettes with Many Colors
Tools for creating, viewing, and assessing qualitative palettes with many (20-30 or more) colors.
polyglot Learn Foreign Language Vocabulary
Use the R console as an interactive learning environment to memorize any two columns dataset.
polylabelr Find the Pole of Inaccessibility (Visual Center) of a Polygon
A wrapper around the C++ library ‘polylabel’ from ‘Mapbox’, providing an efficient routine for finding the approximate pole of inaccessibility of a polygon, which usually serves as an excellent candidate for labeling of a polygon.
polypoly Helper Functions for Orthogonal Polynomials
Tools for reshaping, plotting, and manipulating matrices of orthogonal polynomials.
polyreg Polynomial Regression
Automate formation and evaluation of polynomial regression models. Provides support for cross-validating categorical variables. The motivation for this package is described in ‘Polynomial Regression As an Alternative to Neural Nets’ by Xi Cheng, Bohdan Khomtchouk, Norman Matloff, and Pete Mohanty (<arXiv:1806.06850>).
PolyTrend Trend Classification Algorithm
This algorithm can classify the trends as linear, quadratic, cubic, concealed or no-trend. The ‘concealed trend’ are those trends that possess quadratic or cubic forms, but the net change between the start and end of the period hasn’t been significant. While, the ‘no-trend’ category includes simple linear trends that are below the significant slope coefficient.
pomdp Solver for Partially Observable Markov Decision Processes (POMDP)
Provides an interface to pomdp-solve, a solver for Partially Observable Markov Decision Processes (POMDP). The package enables the user to simply define all components of a POMDP model and solve the problem using several methods. The package also contains functions to analyze and visualize the POMDP solutions (e.g., the optimal policy).
pomp Statistical Inference for Partially Observed Markov Processes
Tools for working with partially observed Markov processes (POMPs, AKA stochastic dynamical systems, state-space models). ‘pomp’ provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a platform for the implementation of new inference methods.
pompom Person-Oriented Method and Perturbation on the Model
An implementation of a hybrid method of person-oriented method and perturbation on the model. Pompom is the initials of the two methods. The hybrid method will provide a multivariate intraindividual variability metric (iRAM). The person-oriented method used in this package refers to uSEM (unified structural equation modeling, see Kim et al., 2007, Gates et al., 2010 and Gates et al., 2012 for details). Perturbation on the model was conducted according to impulse response analysis introduced in Lutkepohl (2007). Kim, J., Zhu, W., Chang, L., Bentler, P. M., & Ernst, T. (2007) <doi:10.1002/hbm.20259>. Gates, K. M., Molenaar, P. C. M., Hillary, F. G., Ram, N., & Rovine, M. J. (2010) <doi:10.1016/j.neuroimage.2009.12.117>. Gates, K. M., & Molenaar, P. C. M. (2012) <doi:10.1016/j.neuroimage.2012.06.026>. Lutkepohl, H. (2007, ISBN:3540262393).
pool Object Pooling
Enables the creation of object pools, which make it less computationally expensive to fetch a new object. Currently the only supported pooled objects are ‘DBI’ connections.
PooledMeanGroup Pooled Mean Group Estimation of Dynamic Heterogenous Panels
Calculates the pooled mean group (PMG) estimator for dynamic panel data models, as described by Pesaran, Shin and Smith (1999) <doi:10.1080/01621459.1999.10474156>.
poolfstat Computing F-Statistics from Pool-Seq Data
Functions for the computation of F-statistics from Pool-Seq data in population genomics studies. The package also includes several utilities to manipulate Pool-Seq data stored in standard format (‘vcf’ and ‘rsync’ files as obtained from the popular software ‘VarScan’ and ‘PoPoolation’ respectively) and perform conversion to alternative format (as used in the ‘BayPass’ and ‘SelEstim’ software).
pooling Fit Poolwise Regression Models
Functions for calculating power and fitting regression models in studies where a biomarker is measured in ‘pooled’ samples rather than for each individual. Approaches for handling measurement error follow the framework of Schisterman et al. (2010) <doi:10.1002/sim.3823>.
pop A Flexible Syntax for Population Dynamic Modelling
Population dynamic models underpin a range of analyses and applications in ecology and epidemiology. The various approaches for analysing population dynamics models (MPMs, IPMs, ODEs, POMPs, PVA) each require the model to be defined in a different way. This makes it difficult to combine different modelling approaches and data types to solve a given problem. ‘pop’ aims to provide a flexible and easy to use common interface for constructing population dynamic models and enabling to them to be fitted and analysed in lots of different ways.
POPdemog Plot Population Demographic History
Plot demographic graphs for single/multiple populations from coalescent simulation program input. Currently, this package can support the ‘ms’, ‘msHot’, ‘MaCS’, ‘msprime’, ‘SCRM’, and ‘Cosi2’ simulation programs. It does not check the simulation program input for correctness, but assumes the simulation program input has been validated by the simulation program. More features will be added to this package in the future, please check the ‘GitHub’ page for the latest updates: <https://…/POPdemog>.
popkin Estimate Kinship and FST under Arbitrary Population Structure
Provides functions to estimate the kinship matrix of individuals from a large set of biallelic SNPs, and extract inbreeding coefficients and the generalized FST (Wright’s fixation index). Method described in Ochoa and Storey (2016) <doi:10.1101/083923>.
portfolio.optimization Contemporary Portfolio Optimization
Simplify your portfolio optimization process by applying a contemporary modeling way to model and solve your portfolio problems. While most approaches and packages are rather complicated this one tries to simplify things and is agnostic regarding risk measures as well as optimization solvers. Some of the methods implemented are described by Konno and Yamazaki (1991) <doi:10.1287/mnsc.37.5.519>, Rockafellar and Uryasev (2001) <doi:10.21314/JOR.2000.038> and Markowitz (1952) <doi:10.1111/j.1540-6261.1952.tb01525.x>.
PortfolioAnalytics Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios
Portfolio optimization and analysis routines and graphics.
PortfolioEffectEstim High Frequency Price Estimators by PortfolioEffect
R interface to PortfolioEffect Quant service for estimating high frequency price variance, quarticity, microstructure noise variance, and other metrics in both aggregate and rolling window flavors. Constructed estimators could use client-side market data or access HF intraday price history for all major US Equities. See https://www.portfolioeffect.com for more information on the PortfolioEffect high frequency portfolio analytics platform.
PortfolioOptim Small/Large Sample Portfolio Optimization
Two functions for financial portfolio optimization by linear programming are provided. One function implements Benders decomposition algorithm and can be used for very large data sets. The other, applicable for moderate sample sizes, finds optimal portfolio which has the smallest distance to a given benchmark portfolio.
portsort Factor-Based Portfolio Sorts
Designed to aid both academic researchers and asset managers in conducting factor based portfolio sorts. Provides functionality to sort assets into portfolios for up to three factors via a conditional or unconditional sorting procedure.
PoSI Valid Post-Selection Inference for Linear LS Regression
In linear LS regression, calculate for a given design matrix the multiplier K of coefficient standard errors such that the confidence intervals [b – K*SE(b), b + K*SE(b)] have a guaranteed coverage probability for all coefficient estimates b in any submodels after performing arbitrary model selection.
postGIStools Tools for Interacting with ‘PostgreSQL’ / ‘PostGIS’ Databases
Functions to convert geometry and ‘hstore’ data types from ‘PostgreSQL’ into standard R objects, as well as to simplify the import of R data frames (including spatial data frames) into ‘PostgreSQL’.
postlightmercury Parses Web Pages using Postlight Mercury
This is a wrapper for the Mercury Parser API. The Mercury Parser is a single API endpoint that takes a URL and gives you back the content reliably and easily. With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free. See the webpage here: <https://…/>.
POT Generalized Pareto Distribution and Peaks Over Threshold
Some functions useful to perform a Peak Over Threshold analysis in univariate and bivariate cases. A user’s guide is available.
PottsUtils Utility Functions of the Potts Models
There are three sets of functions. The first produces basic properties of a graph and generates samples from multinomial distributions to facilitate the simulation functions (they maybe used for other purposes as well). The second provides various simulation functions for a Potts model in Potts, R. B. (1952) <doi:10.1017/S0305004100027419>. The third currently includes only one function which computes the normalizing constant of a Potts model based on simulation results.
powerbydesign Power Estimates for ANOVA Designs
Functions for bootstrapping the power of ANOVA designs based on estimated means and standard deviations of the conditions. Please refer to the documentation of the boot.power.anova() function for further details.
powerCompRisk Power Analysis Tool for Joint Testing Hazards with Competing Risks Data
A power analysis tool for jointly testing the cause-1 cause-specific hazard and the any-cause hazard with competing risks data.
powerCRT Power Analysis for Cluster Randomized Trials
Statistical power analysis tools for designing cluster randomized trials (CRTs). Includes functions to calculate minimum detectable effect size (MDES), minimum required sample size (MRSS), power (1 – type II error), functions to estimate optimal sample sizes (OSS) under various constraints, and to visualize duo or trio relationships between MDES, MRSS, power, and OSS.
powerEQTL Power and Sample Size Calculation for eQTL Analysis
Power and sample size calculation for eQTL analysis based on ANOVA or simple linear regression. It can also calculate power/sample size for testing the association of a SNP to a continuous type phenotype.
powerlmm Power Calculations for Longitudinal Multilevel Models
Calculate power for two- and three-level multilevel longitudinal studies with missing data. Both the third-level factor (e.g. therapists, schools, or physicians), and the second-level factor (e.g. subjects), can be assigned random slopes. Studies with partially nested designs, unequal cluster sizes, unequal allocation to treatment arms, and different dropout patterns per treatment are supported. For all designs power can be calculated both analytically and via simulations. The analytical calculations extends the method described in Galbraith et al. (2002) <doi:10.1016/S0197-2456(02)00205-2>, to three-level models. Additionally, the simulation tools provides flexible ways to investigate bias, type I errors and the consequences of model misspecification.
PowerNormal Power Normal Distribution
Miscellaneous functions for a descriptive analysis and initial inference for the power parameter of the the Power Normal (PN) distribution. This miscellaneous will be extend for more distributions into the power family and the three-parameter model.
powerplus Exponentiation Operations
Computation of matrix and scalar exponentiation.
PowerUpR Power Analysis Tools for Individual/Cluster Randomized Trials
Statistical power analysis tools for designing individual or cluster randomized trials. Includes functions to calculate statistical power (1 – type II error), minimum detectable effect size (MDES), minimum required sample size (MRSS), functions to solve constrained optimal sample allocation (COSA) problems, and to visualize duo or trio relationships between statistical power, MDES, MRSS, and COSA.
ppcc Probability Plot Correlation Coefficient Test
Calculates the Probability Plot Correlation Coefficient (PPCC) between a continuous variable X and a specified distribution. The corresponding composite hypothesis test can be done to test whether the sample X is element of either the Normal, log-Normal, Exponential, Uniform, Cauchy, Logistic, Generalized Logistic, Gumbel (GEVI), Weibull, Generalized Extreme Value, Pearson III (Gamma 2), Mielke’s Kappa, Rayleigh or Generalized Logistic Distribution. The PPCC test is performed with a fast Monte-Carlo simulation.
PPCI Projection Pursuit for Cluster Identification
Implements recently developed projection pursuit algorithms for finding optimal linear cluster separators. The clustering algorithms use optimal hyperplane separators based on minimum density, Pavlidis et. al (2016) <https://…/15-307.pdf>; minimum normalised cut, Hofmeyr (2017) <doi:10.1109/TPAMI.2016.2609929>; and maximum variance ratio clusterability, Hofmeyr and Pavlidis (2015) <doi:10.1109/SSCI.2015.116>.
ppclust Probabilistic and Possibilistic Cluster Analysis
Partitioning clustering divides the objects in a data set into non-overlapping subsets or clusters by using the prototype-based probabilistic and possibilistic clustering algorithms. This package covers a set of the functions for Fuzzy C-Means (Bezdek, 1974) <doi:10.1080/01969727308546047>, Possibilistic C-Means (Krishnapuram & Keller, 1993) <doi:10.1109/91.227387>, Possibilistic Fuzzy C-Means (Pal et al, 2005) <doi:10.1109/TFUZZ.2004.840099>, Possibilistic Clustering Algorithm (Yang et al, 2006) <doi:10.1016/j.patcog.2005.07.005>, Possibilistic C-Means with Repulsion (Wachs et al, 2006) <doi:10.1007/3-540-31662-0_6> and the other variants of hard and soft clustering algorithms. The cluster prototypes and membership matrices required by these partitioning algorithms are initialized with different initialization techniques that are available in the package ‘inaparc’. As the distance metrics, not only the Euclidean distance but also a set of the commonly used distance metrics are available to use with some of the algorithms in the package.
PPforest Projection Pursuit Classification Forest
Implements projection pursuit forest algorithm for supervised classification.
ppgmmga Projection Pursuit Based on Gaussian Mixtures and Evolutionary Algorithms
Projection Pursuit (PP) algorithm for dimension reduction based on Gaussian Mixture Models (GMMs) for density estimation using Genetic Algorithms (GAs) to maximise an approximated negentropy index.
ppitables Lookup Tables to Generate Poverty Likelihoods and Rates using the Poverty Probability Index (PPI)
The Poverty Probability Index (PPI) is a poverty measurement tool for organizations and businesses with a mission to serve the poor. The PPI is statistically-sound, yet simple to use: the answers to 10 questions about a household’s characteristics and asset ownership are scored to compute the likelihood that the household is living below the poverty line – or above by only a narrow margin. This package contains country-specific lookup data tables used as reference to determine the poverty likelihood of a household based on their score from the country-specific PPI questionnaire. These lookup tables have been extracted from documentation of the PPI found at <https://www.povertyindex.org> and managed by Innovations for Poverty Action <https://www.poverty-action.org>.
ppls Penalized Partial Least Squares
Contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.
ppmlasso Point Process Models with LASSO Penalties
Toolkit for fitting point process models with sequences of LASSO penalties (“regularisation paths”). Regularisation paths of Poisson point process models or area-interaction models can be fitted with LASSO, adaptive LASSO or elastic net penalties. A number of criteria are available to judge the bias-variance tradeoff.
PPRL Privacy Preserving Record Linkage
A toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques. Combines the functionality of the ‘Merge ToolBox’ (<http://record-linkage.de> ) with current privacy-preserving techniques.
ppsbm Clustering in Longitudinal Networks
Stochastic block model used for dynamic graphs represented by Poisson processes. To model recurrent interaction events in continuous time, an extension of the stochastic block model is proposed where every individual belongs to a latent group and interactions between two individuals follow a conditional inhomogeneous Poisson process with intensity driven by the individuals’ latent groups. The model is shown to be identifiable and its estimation is based on a semiparametric variational expectation-maximization algorithm. Two versions of the method are developed, using either a nonparametric histogram approach (with an adaptive choice of the partition size) or kernel intensity estimators. The number of latent groups can be selected by an integrated classification likelihood criterion. Y. Baraud and L. Birgé (2009). <doi:10.1007/s00440-007-0126-6>. C. Biernacki, G. Celeux and G. Govaert (2000). <doi:10.1109/34.865189>. M. Corneli, P. Latouche and F. Rossi (2016). <doi:10.1016/j.neucom.2016.02.031>. J.-J. Daudin, F. Picard and S. Robin (2008). <doi:10.1007/s11222-007-9046-7>. A. P. Dempster, N. M. Laird and D. B. Rubin (1977). <http://…/2984875>. G. Grégoire (1993). <http://…/4616289>. L. Hubert and P. Arabie (1985). <doi:10.1007/BF01908075>. M. Jordan, Z. Ghahramani, T. Jaakkola and L. Saul (1999). <doi:10.1023/A:1007665907178>. C. Matias, T. Rebafka and F. Villers (2018). <doi:10.1093/biomet/asy016>. C. Matias and S. Robin (2014). <doi:10.1051/proc/201447004>. H. Ramlau-Hansen (1983). <doi:10.1214/aos/1176346152>. P. Reynaud-Bouret (2006). <doi:10.3150/bj/1155735930>.
PPtreeViz Projection Pursuit Classification Tree Visualization
Tools for exploring projection pursuit classification tree using LDA, Lr or PDA projection pursuit index.
prais Prais-Winsten Estimation Procedure for AR(1) Serial Correlation
The Prais-Winsten estimation procedure takes into account serial correlation of type AR(1) in a linear model. The procedure is an iterative method that recursively estimates the beta coefficients and the error autocorrelation of the specified model until convergence of rho, i.e. the AR(1) coefficient, is attained. All estimates are obtained by OLS.
praznik Collection of Information-Based Feature Selection Filters
A collection of feature selection filters performing greedy optimisation of mutual information-based usefulness criteria, inspired by the overview by Brown, Pocock, Zhao and Lujan (2012) <http://…/brown12a.html>. Implements, among other, minimum redundancy maximal relevancy (‘mRMR’) method by Peng, Long and Ding (2005) <doi:10.1109/TPAMI.2005.159>; joint mutual information (‘JMI’) method by Yang and Moody (1999) <http://…-new-algorithms-for-nongaussian-data>; double input symmetrical relevance (‘DISR’) method by Meyer and Bontempi (2006) <doi:10.1007/11732242_9> as well as joint mutual information maximisation (‘JMIM’) method by Bennasar, Hicks and Setchi (2015) <doi:10.1016/j.eswa.2015.07.007>.
prcbench Testing Workbench for Precision-Recall Curves
A testing workbench for evaluating Precision-Recall curves under various conditions.
prclust Penalized Regression-Based Clustering Method
Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. In this package, we provide two algorithms for fitting the penalized regression-based clustering (PRclust). One algorithm is based on quadratic penalty and difference convex method. Another algorithm is based on difference convex and ADMM, called DC-ADD, which is more efficient. Generalized cross validation was provided to select the tuning parameters. Rand index, adjusted Rand index and Jaccard index were provided to estimate the agreement between estimated cluster memberships and the truth.
prcr Person-Centered Analysis
While person-centered analysis is increasingly common in psychology, education, and related fields, carrying it out is challenging. The prcr package provides an easy-to-use yet adaptable set of tools to conduct person-center analysis.
pre Prediction Rule Ensembles
Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with several adjustments and improvements. The main function pre() derives a prediction rule ensemble. Functions coef() and importance() can be used to inspect the generated ensemble. Function predict() generates predicted values. Singeplot() and pairplot() depict dependence of the output on specified predictor variables. Function cvpre() performs full cross validation of a pre to calculate the expected prediction error. Functions interact() and bsnullinteract() can be used to assess interaction effects of predictor variables.
PreciseSums Accurate Floating Point Sums and Products
Most of the time floating point arithmetic does approximately the right thing. When adding sums or having products of numbers that greatly differ in magnitude, the floating point arithmetic may be incorrect. This package implements the Kahan (1965) sum <doi:10.1145/363707.363723>, Neumaier (1974) sum <doi:10.1002/zamm.19740540106>, pairwise-sum (adapted from ‘NumPy’, See Castaldo (2008) <doi:10.1137/070679946> for a discussion of accuracy), and arbitrary precision sum (adapted from the fsum in ‘Python’ ; Shewchuk (1997) <http://…/robustr.pdf> ). In addition, products are changed to long double precision for accuracy, or changed into a log-sum for accuracy.
precrec Calculate Accurate Precision-Recall and ROC Curves
Accurate calculations and visualization of Precision-Recall and ROC curves.
predict3d Draw Three Dimensional Predict Plot Using Package ‘rgl’
Draw 2 dimensional and three dimensional plot for multiple regression models using package ‘ggplot2’ and ‘rgl’. Supports linear models (lm), generalized linear models (glm) and local polynomial regression fittings (loess).
prediction Tidy, Type-Safe ‘prediction()’ Methods
A one-function package containing ‘prediction()’, a type-safe alternative to ‘predict()’ that always returns a data frame.
predictionInterval Prediction Interval Functions for Assessing Replication Study Results
A common problem faced by journal reviewers and authors is the question of whether the results of a replication study are consistent with the original published study. One solution to this problem is to examine the effect size from the original study and generate the range of effect sizes that could reasonably be obtained (due to random sampling) in a replication attempt (i.e., calculate a prediction interval). This package has functions that calculate the prediction interval for the correlation (i.e., r), standardized mean difference (i.e., d-value), and mean.
PredictionR Prediction for Future Data from any Continuous Distribution
Functions to get prediction intervals and prediction points of future observations from any continuous distribution.
predictoR Predictive Data Analysis System
Perform a supervised data analysis on a database through a ‘shiny’ graphical interface. It includes methods such as K-Nearest Neighbors, Decision Trees, ADA Boosting, Extreme Gradient Boosting, Random Forest, Neural Networks, Deep Learning, Support Vector Machines and Bayesian Methods.
PredictTestbench Test Bench for Comparison of Data Prediction Models
Provides a Testbench for comparison of prediction models. This package is inspired from ‘imputeTestbench’ package <https://…/package=imputeTestbench>. It compares prediction models with reference to RMSE, MAE or MAPE parameters. It allows to add new proposed methods to test bench and to compare with other methods. The function ‘prediction_append()’ allows to add multiple numbers of methods to the existing methods available in test bench.
predkmeans Covariate Adaptive Clustering
Implements the predictive k-means method for clustering observations, using a mixture of experts model to allow covariates to influence cluster centers. Motivated by air pollution epidemiology settings, where cluster membership needs to be predicted across space. Includes functions for predicting cluster membership using spatial splines and principal component analysis (PCA) scores using either multinomial logistic regression or support vector machines (SVMs). For method details see Keller et al. (2017) <doi:10.1214/16-AOAS992>.
predtoolsTS Time Series Prediction
Makes the time series prediction easier by automatizing this process using four main functions: prep(), modl(), pred() and postp(). Features different preprocessing methods to homogenize variance and to remove trend and seasonality. Also has the potential to bring together different predictive models to make comparatives. Features ARIMA and Data Mining Regression models (using caret).
prefeR R Package for Pairwise Preference Elicitation
Allows users to derive multi-objective weights from pairwise comparisons, which research shows is more repeatable, transparent, and intuitive other techniques. These weights can be rank existing alternatives or to define a multi-objective utility function for optimization.
preference 2-Stage Clinical Trial Design and Analysis
Design and analyze two-stage randomized trials with a continuous outcome measure. The package contains functions to compute the required sample size needed to detect a given preference, treatment, and selection effect; alternatively, the package contains functions that can report the study power given a fixed sample size. Finally, analysis functions are provided to test each effect using either summary data (i.e. means, variances) or raw study data.
PreKnitPostHTMLRender Pre-Knitting Processing and Post HTML-Rendering Processing
Dynamize headers or R code within ‘Rmd’ files to prevent proliferation of ‘Rmd’ files for similar reports. Add in external HTML document within ‘rmarkdown’ rendered HTML doc.
prepdat Preparing Experimental Data for Statistical Analysis
Prepares data collected in an experimental design for statistical analysis (e.g., analysis of variance ;ANOVA) by taking the individual data files and preparing one table that contains several possibilities for dependent variables. Most suitable when measuring reaction-times and/or accuracy, or any other variable in an interval or ratio scale. Functions included: file_merge(), read_data() and prep(). The file_merge() function vertically merges individual data files (in a long format) in which each line is a single trial within the experiment to one single dataset. The read_data() function reads a file in a txt or csv format that contains a single dataset in a long format table and creates a data frame from it. The prep() function aggregates the single dataset according to any combination of between and within grouping variables (i.e., between-subjects and within-subjects independent variables, respectively), and returns a data frame with a number of dependent measures for further analysis for each experimental cell according to the combination of provided grouping variables. Dependent measures for each experimental cell include among others means before and after rejecting all values according to a flexible standard deviation criterion/s, number of rejected values according to the flexible standard deviation criterion/s, proportions of rejected values according to the flexible standard deviation criterion/s, number of values before rejection, means after rejecting values according to procedures described in Van Selst & Jolicoeur (1994) (suitable when measuring reaction-times), standard deviations, medians, means according to any percentile (e.g., 0.05, 0.25, 0.75, 0.95) and harmonic means. The data frame prep() returns can also be exported as a txt file to be used for statistical analysis in other statistical programs.
prepplot Prepare Figure Region for Base Graphics
A figure region is prepared, creating a plot region with suitable background color, grid lines or shadings, and providing axes and labeling if not suppressed. Subsequently, information carrying graphics elements can be added (points, lines, barplot with add=TRUE and so forth).
PreProcess Basic Functions for Pre-Processing Microarrays
Provides classes to pre-process microarray gene expression data as part of the OOMPA collection of packages described at <http://…/>.
preprocomb Tools for Preprocessing Combinations
Preprocessing is often the most time-consuming phase in knowledge discovery and preprocessing transformations interdependent in unexpected ways. This package helps to make preprocessing faster and more effective. It provides an S4 framework for creating and testing preprocessing combinations for classification, clustering and outlier detection. The framework supports user-defined and domain-specific preprocessors and preprocessing phases. Default preprocessors can be used for low variance removal, missing value imputation, scaling, outlier removal, noise smoothing, feature selection and class imbalance correction.
preprosim Lightweight Data Quality Simulation for Classification
Data quality simulation can be used to check the robustness of data analysis findings and learn about the impact of data quality contaminations on classification. This package helps to add contaminations (noise, missing values, outliers, low variance, irrelevant features, class swap (inconsistency), class imbalance and decrease in data volume) to data and then evaluate the simulated data sets for classification accuracy. As a lightweight solution simulation runs can be set up with no or minimal up-front effort.
preproviz Tools for Visualization of Interdependent Data Quality Issues
Data quality issues such as missing values and outliers are often interdependent, which makes preprocessing both time-consuming and leads to suboptimal performance in knowledge discovery tasks. This package supports preprocessing decision making by visualizing interdependent data quality issues through means of feature construction. The user can define his own application domain specific constructed features that express the quality of a data point such as number of missing values in the point or use nine default features. The outcome can be explored with plot methods and the feature constructed data acquired with get methods.
PREPShiny Interactive Document for Preprocessing the Dataset
An interactive document for preprocessing the dataset using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
preputils Utilities for Preparation of Data Analysis
Miscellaneous small utilities are provided to mitigate issues with messy, inconsistent or high dimensional data and help for preprocessing and preparing analyses.
PResiduals Probability-Scale Residuals and Residual Correlations
Computes probability-scale residuals and residual correlations for continuous, ordinal, binary, count, and time-to-event data.
preText Diagnostics to Assess the Effects of Text Preprocessing Decisions
Functions to assess the effects of different text preprocessing decisions on the inferences drawn from the resulting document-term matrices they generate.
prettyB Pretty Base Graphics
Modifications to standard base graphics. By masking function base function calls, standard plots are able to have themes.
prettycode Pretty Print R Code in the Terminal
Replace the standard print method for functions with one that performs syntax highlighting, using ANSI colors, if the terminal supports them.
prettymapr Scale Bar, North Arrow, and Pretty Margins in R
Automates the process of creating a scale bar and north arrow in any package that uses base graphics to plot in R. Bounding box tools help find and manipulate extents. Finally, there is a function to automate the process of setting margins, plotting the map, scalebar, and north arrow, and resetting graphic parameters upon completion.
pRF Permutation Significance for Random Forests
Estimate False Discovery Rates (FDRs) for importance metrics from random forest runs.
priceR Regular Expressions for Prices and Currencies
Functions to aid in the analysis of price and currency data by expediting data preprocessing. This includes extraction of relevant data (e.g. from text fields), conversion into numeric class, cleaning, and standardisation as appropriate.
pricesensitivitymeter Van Westendorp Price Sensitivity Meter Analysis
An implementation of the van Westendorp Price Sensitivity Meter in R, which is a survey-based approach to analyze consumer price preferences and sensitivity (van Westendorp 1976, isbn:9789283100386).
prim Patient Rule Induction Method (PRIM)
PRIM for bump hunting in high-dimensional data
primefactr Use Prime Factorization for Computations
Use Prime Factorization for simplifying computations, for instance for ratios of large factorials.
PRIMME Eigenvalues and Singular Values and Vectors from Large Matrices
R interface to PRIMME, a C library for computing a few eigenvalues and their corresponding eigenvectors of a real symmetric or complex Hermitian matrix. It can also compute singular values and vectors of a square or rectangular matrix. It can find largest, smallest, or interior singular/eigenvalues and can use preconditioning to accelerate convergence.
PRIMsrc PRIM Survival Regression Classification
Performs a unified treatment of Bump Hunting by Patient Rule Induction Method (PRIM) in Survival, Regression and Classification settings (SRC). The current version is a development release that only implements the case of a survival response. New features will be added soon as they are available.
prinsimp Finding and plotting simple basis vectors for multivariate data
Provides capabilities beyond principal components analysis to focus on finding structure in low variability subspaces. Constructs and plots simple basis vectors for pre-defined and user-defined measures of simplicity.
printr Automatically Print R Objects to Appropriate Formats According to the ‘knitr’ Output Format
Extends the S3 generic function knit_print() in ‘knitr’ to automatically print some objects using an appropriate format such as Markdown or LaTeX. For example, data frames are automatically printed as tables, and the help() pages can also be rendered in ‘knitr’ documents.
PriorGen Generates Prior Distributions for Proportions
Translates beliefs into prior information in the form of Beta and Gamma distributions. It can be mainly used for the generation of priors on the prevalence of disease and the sensitivity/specificity of diagnostic tests.
prioritizr Systematic Conservation Prioritization in R
Solve systematic reserve design problems using integer programming techniques. To solve problems most efficiently users can install three optional packages not available on CRAN: the ‘gurobi’ optimizer (available from <http://…/> ) and the conservation prioritization package ‘marxan’ (available from <https://…/marxan> ).
prioritizrdata Conservation Planning Data Sets
Conservation planning data sets and tutorials for learning how to use the ‘prioritizr’ package <https://…/package=prioritizr>.
prioritylasso Analyzing Multiple Omics Data with an Offset Approach
Fits successive Lasso models for several blocks of (omics) data with different priorities and takes the predicted values as an offset for the next block.
PRISMA Protocol Inspection and State Machine Analysis
The PRISMA package is capable of loading and processing huge text corpora processed with the sally toolbox (http://…/sally ). sally acts as a very fast preprocessor which splits the text files into tokens or n-grams. These output files can then be read with the PRISMA package which applies testing-based token selection and has some replicate-aware, highly tuned non-negative matrix factorization and principal component analysis implementation which allows the processing of very big data sets even on desktop machines.
PRISMAstatement Plot Flow Charts According to the ‘PRISMA’ Statement
Plot a PRISMA <http://…/> flow chart describing the identification, screening, eligibility and inclusion or studies in systematic reviews. PRISMA is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses. PRISMA focuses on the reporting of reviews evaluating randomized trials, but can also be used as a basis for reporting systematic reviews of other types of research, particularly evaluations of interventions.
prithulib Perform Random Experiments
Enables user to perform the following:
1. Roll ‘n’ number of die/dice (roll()).
2. Toss ‘n’ number of coin(s) (toss()).
3. Play the game of Rock, Paper, Scissors.
4. Choose ‘n’ number of card(s) from a pack of 52 playing cards (Joker optional).
prnsamplr Permanent Random Number Sampling
Survey sampling using permanent random numbers (PRN’s). A solution to the problem of unknown overlap between survey samples, which leads to a low precision in estimates when the survey is repeated or combined with other surveys. The PRN solution is to supply the U(0, 1) random numbers to the sampling procedure, instead of having the sampling procedure generate them. In Lindblom (2014) <doi:10.2478/jos-2014-0047>, and therein cited articles, it is shown how this is carried out and how it improves the estimates. This package supports two common fixed-size sampling procedures (simple random sampling and probability-proportional-to-size sampling) and includes a function for transforming the PRN’s in order to control the sample overlap.
prob Elementary Probability on Finite Sample Spaces
A framework for performing elementary probability calculations on finite sample spaces, which may be represented by data frames or lists. Functionality includes setting up sample spaces, counting tools, defining probability spaces, performing set algebra, calculating probability and conditional probability, tools for simulation and checking the law of large numbers, adding random variables, and finding marginal distributions. Characteristic functions for all base R distributions are included.
probably Tools for Post-Processing Class Probability Estimates
Models can be improved by post-processing class probabilities, by: recalibration, conversion to hard probabilities, assessment of equivocal zones, and other activities. ‘probably’ contains tools for conducting these operations.
probFDA Probabilistic Fisher Discriminant Analysis
Probabilistic Fisher discriminant analysis (pFDA) is a probabilistic version of the popular and powerful Fisher linear discriminant analysis for dimensionality reduction and classification.
probhat Generalized Kernel Smoothing
Computes nonparametric probability distributions (probability density functions, cumulative distribution functions and quantile functions) using kernel smoothing. Supports univariate, multivariate and conditional distributions, and weighted data (possibly useful mixed with fuzzy clustering or frequency data). Also, supports empirical continuous cumulative distribution functions and their inverses, and random number generation.
ProbitSpatial Probit with Spatial Dependence, SAR and SEM Models
Binomial Spatial Probit models for big data.
probout Unsupervised Multivariate Outlier Probabilities for Large Datasets
Estimates unsupervised outlier probabilities for multivariate numeric data with many observations from a nonparametric outlier statistic.
PROBShiny Interactive Document for Working with Basic Probability
An interactive document on the topic of basic probability using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
ProbYX Inference for the Stress-Strength Model R = P(Y<X)
Confidence intervals and point estimation for R under various parametric model assumptions; likelihood inference based on classical first-order approximations and higher-order asymptotic procedures.
processanimateR Process Map Token Replay Animation
Token replay animation for process maps created with ‘processmapR’ by using SVG animations (‘SMIL’) and the ‘htmlwidget’ package.
processcheckR Rule-Based Conformance Checking of Business Process Event Data
Check compliance of event data from (business) processes with respect to specified rules. Rules supported are of three types: frequency (activities that should (not) happen x number of times), order (succession between activities) and exclusiveness (and and exclusive choice between activities).
processcontrol Statistical Process Control Charts
Generate time series chart for individual values with mean and +/- 3 standard deviation lines and the corresponding mR chart with the upper control limit. Also execute the 8 Shewhart stability run tests and display the violations.
processmapR Construct Process Maps Using Event Data
Visualize of process maps based on event logs, in the form of directed graphs. Part of the ‘bupaR’ framework.
processmonitR Building Process Monitoring Dashboards
Functions for constructing dashboards for business process monitoring. Building on the event log objects class from package ‘bupaR’. Allows the use to assemble custom shiny dashboards based on process data.
processR Implementation of the ‘PROCESS’ Macro
Perform moderation, mediation, moderated mediation and moderated moderation. Inspired from famous ‘PROCESS’ macro for ‘SPSS’ and ‘SAS’ created by Andrew Hayes.
processx Execute and Control System Processes
Portable tools to run system processes in the background. It can check if a background process is running; wait on a background process to finish; get the exit status of finished processes; kill background processes and their children; restart processes. It can read the standard output and error of the processes, using non-blocking connections. ‘processx’ can poll a process for standard output or error, with a timeout. It can also poll several processes at once.
prodigenr Research Project Directory Generator
Create a project directory structure, along with typical files for that project. This allows projects to be quickly and easily created, as well as for them to be standardized. Designed specifically with scientists in mind (mainly bio-medical researchers, but likely applies to other fields).
productivity Indices of Productivity Using Data Envelopment Analysis (DEA)
Various transitive measures of productivity and profitability, in levels and changes, are computed. In addition to the classic Malmquist productivity index, the ‘productivity’ package contains also the multiplicatively complete and transitive Färe-Primont and Lowe indices. These indices are also decomposed into different components providing insightful information on the sources of productivity and profitability improvements. In the use of Malmquist productivity index, the technological change index is further decomposed into bias technological change components. For the transitive Färe-Primont and Lowe measures, it is possible to rule out technological change. The package also allows to prohibit negative technological change. All the estimations are based on the nonparametric Data Envelopment Analysis (DEA) and several assumptions regarding returns to scale are available (i.e. CRS, VRS, NIRS, NDRS). The package allows parallel computing by default, depending on the user’s computer configuration.
productplots Product Plots for R
Framework for visualising tables of counts, proportions and probabilities. The framework is called product plots, alluding to the computation of area as a product of height and width, and the statistical concept of generating a joint distribution from the product of conditional and marginal distributions. The framework, with extensions, is sufficient to encompass over 20 visualisations previously described in fields of statistical graphics and ‘infovis’, including bar charts, mosaic plots, ‘treemaps’, equal area plots and fluctuation diagrams.
prof.tree An Alternative Display Profiling Data as Tree Structure
An alternative data structure for the profiling information generated by Rprof().
profExtrema Compute and Visualize Profile Extrema Functions
Computes profile extrema functions for arbitrary functions. If the function is expensive-to-evaluate it computes profile extrema by emulating the function with a Gaussian process (using package ‘DiceKriging’). In this case uncertainty quantification on the profile extrema can also be computed. The different plotting functions for profile extrema give the user a tool to better locate excursion sets.
profile Read, Manipulate, and Write Profiler Data
Defines a data structure for profiler data, and methods to read and write from the ‘Rprof’ and ‘pprof’ file formats.
profilr Quickly Profile Data in R
Allows users to quickly and reliably profile data in R using convenience functions. The profiled data returns as a data.frame and provides a wealth of common and uncommon summary statistics.
ProFit Fit Projected 2D Profiles to Galaxy Images
Get data / Define model / ??? / ProFit! ProFit is a Bayesian galaxy fitting tool that uses a fast C++ image generation library and a flexible interface to a large number of likelihood samplers.
profmem Simple Memory Profiling for R
A simple and light-weight API for memory profiling of R expressions. The profiling is built on top of R’s built-in memory profiler (‘utils::Rprofmem()’), which records every memory allocation done by R (also native code).
ProFound Photometry Tools
Core package containing all the tools for simple and advanced source extraction. This is used to create inputs for ‘ProFit’, or for source detection, extraction and photometry in its own right.
profvis Interactive Visualizations for Profiling R Code
Interactive visualizations for profiling R code.
progenyClust Finding the Optimal Cluster Number Using Progeny Clustering
Implementing the Progeny Clustering algorithm, the progenyClust package assesses the clustering stability and identifies the optimal clustering number for a given data matrix. It uses kmeans clustering as default, but can be customized to work with other clustering algorithms and different parameter settings. The package includes one main function progenyClust(), plot and summary methods for progenyClust object, and one example dataset example for testing.
progress Terminal Progress Bars
Terminal progress bars. They are configurable, may include percentage, elapsed time, and/or the estimated completion time. They work in the command line, in Emacs, R Studio, Windows Rgui and Mac OSX R.app. The package also provides a C++ API, that works with or without Rcpp.
projections Project Future Case Incidence
Provides functions and graphics for projecting daily incidence based on past incidence, and estimates of the serial interval and reproduction number. Projections are based on a branching process using a Poisson-distributed number of new cases per day, similar to the model used for estimating R0 in ‘EpiEstim’ or in ‘earlyR’, and described by Nouvellet et al. (2017) <doi:10.1016/j.epidem.2017.02.012>.
ProjectManagement Management of Deterministic and Stochastic Projects
Management problems of deterministic and stochastic projects. It obtains the duration of a project and the appropriate slack for each activity in a deterministic context. In addition it obtains a schedule of activities’ time (Castro, Gómez & Tejada (2007) <doi:10.1016/j.orl.2007.01.003>). When the project is done, and the actual duration for each activity is known, then it can know how long the project is delayed and make a fair delivery of the delay between each activity (Bergantiños, Valencia-Toledo & Vidal-Puga (2018) <doi:10.1016/j.dam.2017.08.012>). In a stochastic context it can estimate the average duration of the project and plot a histogram of this duration. As in the deterministic case, it can make a distribution of the delay generated by observing the project already carried out.
projector Project Dense Vectors Representation of Texts on a 2D Plan
Display dense vector representation of texts on a 2D plan to better understand embeddings by observing the neighbors of a selected text. It also includes an interactive application to change dynamically the pivot text.
projects A Project Infrastructure for Researchers
Provides a project infrastructure with a focus on manuscript creation. Creates a project folder with a single command, containing subdirectories for specific components, templates for manuscripts, and so on.
projmanr Project Management Tools
Calculates the critical path for a series of tasks, creates Gantt charts and generates network diagrams in order to provide similar functionality to the basic tools offered by ‘MS Project’.
projpred Projection Predictive Feature Selection
Performs projection predictive feature selection for generalized linear models (see, e.g., Piironen and Vehtari, 2017, <doi:10.1007/s11222-016-9649-y>). The package is compatible with ‘rstanarm’ package, but other reference models can also be used. See the package vignette for more information and examples.
PROMETHEE Preference Ranking Organization METHod for Enrichment of Evaluations
Functions which can be used to support the Multicriteria Decision Analysis (MCDA) process involving multiple criteria, by PROMETHEE (Preference Ranking Organization METHod for Enrichment of Evaluations).
promises Abstractions for Promise-Based Asynchronous Programming
Provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to ‘JavaScript’ promises, but with a syntax that is idiomatic R.
ProNet Biological Network Construction, Visualization and Analyses
High-throughput experiments are now widely used in biological researches, which improves both the quality and quantity of omics data. Network-based presentation of these data has become a popular way in data analyses. This package mainly provides functions for biological network construction, visualization and analyses. Networks can be constructed either from experimental data or from a set of proteins and integrated PPI database. Based on them, users can perform traditional visualization, along with the subcellular localization based ones for Homo sapiens and Arabidopsis thaliana. Furthermore, analyses including topological statistics, functional module clustering and go profiling can also be achieved.
prop.comb.RR Analyzing Combination of Proportions and Relative Risk
The prop.comb.RR package analyzes combination of proportions and relative risk.
properties Parse Java Properties Files for R Service Bus Applications
The properties package allows to parse Java properties files in the context of R Service Bus applications.
prophet Automatic Forecasting Procedure
Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.
proportion Inference on Single Binomial Proportion and Bayesian Computations
Abundant statistical literature has revealed the importance of constructing and evaluating various methods for constructing confidence intervals (CI) for single binomial proportion (p). We comprehensively provide procedures in frequentist (approximate with or without adding pseudo counts or continuity correction or exact) and in Bayesian cultures. Evaluation procedures for CI warrant active computational attention and required summaries pertaining to four criterion (coverage probability, expected length, p-confidence, p-bias, and error) are implemented.
propr Calculating Proportionality Between Vectors of Compositional Data
The bioinformatic evaluation of gene co-expression often begins with correlation-based analyses. However, this approach lacks statistical validity when applied to relative data, including those biological count data produced by microarray assays or high-throughput RNA-sequencing. As an alternative, Lovell et al propose a proportionality metric, phi, derived from compositional data analysis, a branch of math dealing specifically with relative data. In a subsequent publication, Erb and Nicodemus expounded these efforts by elaborating on another proportionality metric, rho. This package introduces a programmatic framework for the calculation of feature dependence using proportionality and other compositional data methods discussed in the cited publications.
proto Prototype object-based programming
An object oriented system using object-based, also called prototype-based, rather than class-based object oriented ideas.
protoclust Hierarchical Clustering with Prototypes
Performs minimax linkage hierarchical clustering. Every cluster has an associated prototype element that represents that cluster as described in Bien, J., and Tibshirani, R. (2011), “Hierarchical Clustering with Prototypes via Minimax Linkage,” accepted for publication in The Journal of the American Statistical Association, DOI: 10.1198/jasa.2011.tm10183.
protolite Fast and Simple Object Serialization to Protocol Buffers
High-performance C++ re-implementation of the object serialization functionality from the ‘RProtoBuf’ package. Permissively licensed, fully compatible, no bloat.
prototest Inference on Prototypes from Clusters of Features
Procedures for testing for group-wide signal in clusters of variables. Tests can be performed for single groups in isolation (univariate) or multiple groups together (multivariate). Specific tests include the exact and approximate (un)selective likelihood ratio tests described in Reid et al (2015), the selective F test and marginal screening prototype test of Reid and Tibshirani (2015). User may pre-specify columns to be included in prototype formation, or allow the function to select them itself. A mixture of these two is also possible. Any variable selection is accounted for using the selective inference framework. Options for non-sampling and hit-and-run null reference distributions.
provParseR Pulls Information from Prov.Json Files
R functions to access provenance information collected by ‘rdt’ or ‘rdtLite’. The information is stored inside a ‘ProvInfo’ object and can be accessed through a collection of functions that will return the requested data. The exact format of the JSON created by ‘rdt’ and ‘rdtLite’ is described in <https://…/ExtendedProvJson>.
provSummarizeR Summarizes Provenance Related to Inputs and Outputs of a Script or Console Commands
Reads the provenance collected by the ‘rdt’ or ‘rdtLite’ packages, or other tools providing compatible PROV JSON output created by the execution of a script, and provides a human-readable summary identifying the input and output files, the script used (if any), errors and warnings produced, and the environment in which it was executed. It can also optionally package all the files into a zip file. The exact format of the JSON created by ‘rdt’ and ‘rdtLite’ is described in <https://…/ExtendedProvJson>. More information about ‘rdtLite’ and associated tools is available at <https://…/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi: 10.3390/informatics5010012>.
provViz Provenance Visualizer
Displays provenance graphically for provenance collected by the ‘rdt’ or ‘rdtLite’ packages, or other tools providing compatible PROV JSON output. The exact format of the JSON created by ‘rdt’ and ‘rdtLite’ is described in <https://…/ExtendedProvJson>. More information about rdtLite and associated tools is available at <https://…/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi: 10.3390/informatics5010012>.
proxy Distance and Similarity Measures
Provides an extensible framework for the efficient calculation of auto- and cross-proximities, along with implementations of the most popular ones.
proxyC Computes Proximity in Large Sparse Matrices
Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimized for large sparse matrices using the Armadillo and Intel TBB libraries. Among several built-in similarity/distance measures, computation of correlation, cosine similarity and Euclidean distance is particularly fast.
prrd Parallel Runs of Reverse Depends
Reverse depends for a given package are queued such that multiple workers can run the tests in parallel.
PRROC Precision-Recall and ROC Curves for Weighted and Unweighted Data
Computes the areas under the precision-recall (PR) and ROC curve for weighted (e.g., soft-labeled) and unweighted data. In contrast to other implementations, the interpolation between points of the PR curve is done by a non-linear piecewise function. In addition to the areas under the curves, the curves themselves can also be computed and plotted by a specific S3-method.
PRSim Stochastic Simulation of Streamflow Time Series using Phase Randomization
Provides a simulation framework to simulate streamflow time series with similar main characteristics as observed data. These characteristics include the distribution of daily streamflow values and their temporal correlation as expressed by short- and long-range dependence. The approach is based on the randomization of the phases of the Fourier transform. We further use the flexible four-parameter Kappa distribution, which allows for the extrapolation to yet unobserved low and high flows.
ps List, Query, Manipulate System Processes
List, query and manipulate all system processes, on ‘Windows’, ‘Linux’ and ‘macOS’.
PSAboot Bootstrapping for Propensity Score Analysis
Bootstrapping for propensity score analysis and matching.
psda Polygonal Symbolic Data Analysis
An implementation of symbolic polygonal data analysis. The package presents the estimation of main descriptive statistical measures, e.g, mean, covariance, variance, correlation and coefficient of variation. In addition, transformation of the data in polygons. Empirical probability distribution function based on polygonal histogram and regression models are presented.
pseudorank Pseudo-Ranks
Efficient calculation of pseudo-ranks. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. For details, see Brunner, E., Bathke A. C. and Konietschke, F: Rank- and Pseudo-Rank Procedures in Factorial Designs – Using R and SAS, Springer Verlag, to appear.
PSF Algorithm for Pattern Sequence Based Forecasting
Various functions in Pattern Sequence Based Forecasting (PSF) takes time series data as input and assist to forecast the future value with the help of Algorithm PSF. This algorithm forecasts the behavior of time series based on similarity of pattern sequences. Initially, clustering is done with the labeling of samples from database. The labels associated with samples are then used for forecasting the future behaviour of time series data. The further technical details and references regarding PSF are discussed in Vignette.
psfmi Prediction Model Selection and Performance Evaluation in Multiple Imputed Datasets
Provides functions to apply pooling or backward selection for logistic or Cox regression prediction models in multiple imputed datasets. Backward selection can be done from the pooled model using Rubin’s Rules (RR), the total covariance matrix (D1 method), pooling chi-square values (D2 method), pooling likelihood ratio statistics (D3) or pooling the median p-values. The model can contain continuous, dichotomous, categorical predictors and interaction terms between all type of these predictors. Continuous predictors can also be introduced as restricted cubic spline coefficients. It is also possible to force (spline) predictors or interaction terms in the model during predictor selection. The package also contains functions to generate apparent model performance measures over imputed datasets as ROC/AUC, R-squares, fit test values and calibration plots. A wrapper function over Frank Harrell’s validate function is used for that. Bootstrap internal validation is performed in each imputed dataset and results are pooled. Backward selection as part of internal validation is optional and recommended. Also a function to externally validate logistic prediction models in multiple imputed datasets is available. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.
PSGExpress Portfolio Safeguard: Optimization, Statistics and Risk Management
Solves optimization, advanced statistics, and risk management problems. Popular nonlinear functions in financial, statistical, and logistics applications are pre-coded (e.g., Standard Deviation, Entropy, Expected Shortfall (ES), Value-at-Risk (VaR), Conditional Value-at-Risk (CVaR), Probability of Exceedance (POE), Buffered Probability of Exceedance (bPOE), Partial Moment (PM), Drawdown, Mean-Squared Error, see, the list <http://…/index.html?function.htm> ). ‘PSGExpress’ is the ‘Portfolio Safeguard (PSG)’ freeware version with the number of variables in functions less or equal to 10, see <http://www.aorda.com>.
psgp Projected Spatial Gaussian Process Methods
Implements projected sparse Gaussian process Kriging (Ingram ‘et. al.’, 2008, <doi:10.1007/s00477-007-0163-9>) as an additional method for the ‘intamap’ package.
psica Decision Tree Analysis for Probabilistic Subgroup Identification with Multiple Treatments
In the situation when multiple alternative treatments or interventions available, different population groups may respond differently to different treatments. This package implements a method that discovers the population subgroups in which a certain treatment has a better effect than the other alternative treatments. This is done by first estimating the treatment effect for a given treatment and its uncertainty by computing random forests, and the resulting model is summarized by a decision tree in which the probabilities that the given treatment is best for a given subgroup is shown in the corresponding terminal node of the tree.
PsiHat Several Local False Discovery Rate Estimators
Suite of R functions for the estimation of local false discovery rate (LFDR) using several methods.
PSIMEX SIMEX Algorithm on Pedigree Structures
Generalization of the SIMEX algorithm from Cook & Stefanski (1994) <doi:10.2307/2290994> for the calculation of inbreeding depression or heritability on pedigree structures affected by missing or misassigned paternities. It simulates errors and tracks the behavior of the estimate as a function of the error proportion. It extrapolates back a true value corresponding to the null error rate.
PSM Non-Linear Mixed-Effects Modelling using Stochastic Differential Equations
Functions for fitting linear and non-linear mixed-effects models using stochastic differential equations (SDEs). The provided pipeline relies on the coupling of the FOCE algorithm and Kalman filtering as outlined by Klim et al (2009) <doi:10.1016/j.cmpb.2009.02.001> and has been validated against the proprietary software NONMEM (Tornøe et al, 2005, <doi:10.1007/s11095-005-5269-5>). Further functions are provided for finding smoothed estimates of model states and for simulation. The package allows for any multivariate non-linear time-variant model to be specified, and it also handles multidimensional input, covariates, missing observations, and specification of dosage regimen.
PSPManalysis Analysis of Physiologically Structured Population Models
Performs demographic, bifurcation and evolutionary analysis of physiologically structured population models, which is a class of models that consistently translates continuous-time models of individual life history to the population level. A model of individual life history has to be implemented specifying the individual-level functions that determine the life history, such as development and mortality rates and fecundity. M.A. Kirkilionis, O. Diekmann, B. Lisser, M. Nool, B. Sommeijer & A.M. de Roos (2001) <doi:10.1142/S0218202501001264>. O.Diekmann, M.Gyllenberg & J.A.J.Metz (2003) <doi:10.1016/S0040-5809(02)00058-8>. A.M. de Roos (2008) <doi:10.1111/j.1461-0248.2007.01121.x>.
pssm Piecewise Exponential Model for Time to Progression and Time from Progression to Death
Estimates parameters of a piecewise exponential model for time to progression and time from progression to death with interval censoring of the time to progression and covariates for each distribution using proportional hazards.
pssmooth Flexible and Efficient Evaluation of Principal Surrogates/Treatment Effect Modifiers
Implements estimation and testing procedures for evaluating an intermediate biomarker response as a principal surrogate of a clinical response to treatment (i.e., principal stratification effect modification analysis), as described in Juraska M, Huang Y, and Gilbert PB, Inference on treatment effect modification by biomarker response in a three-phase sampling design (under review). The methods avoid the restrictive ‘placebo structural risk’ modeling assumption common to past methods and further improve robustness by the use of nonparametric kernel smoothing for biomarker density estimation. A randomized controlled two-group clinical efficacy trial is assumed with an ordered categorical or continuous univariate biomarker response measured at a fixed timepoint post-randomization and with a univariate baseline surrogate measure allowed to be observed in only a subset of trial participants with an observed biomarker response (see the flexible three-phase sampling design in the paper for details). Bootstrap-based procedures are available for pointwise and simultaneous confidence intervals and testing of four relevant hypotheses. Summary and plotting functions are provided for estimation results.
Pstat Assessing Pst Statistics
Calculating Pst values to assess differentiation among populations from a set of quantitative traits is the primary purpose of such a package. Pst value is an index that measures the level of phenotypic differentiation among populations (Leinonen et al., 2006). The bootstrap method provides confidence intervals and distribution histograms of Pst. Variations of Pst in function of the parameter c/h^2 are studied as well. Finally, the package proposes different transformations especially to eliminate any variation resulting from allometric growth (calculation of residuals from linear regressions, Reist standardizations or Aitchison transformation).
pstest Specification Tests for Parametric Propensity Score Models
The propensity score is one of the most widely used tools in studying the causal effect of a treatment, intervention, or policy. Given that the propensity score is usually unknown, it has to be estimated, implying that the reliability of many treatment effect estimators depends on the correct specification of the (parametric) propensity score. This package provides data-driven nonparametric diagnostic tools for detecting propensity score misspecification.
PSTR Panel Smooth Transition Regression Modelling
Provides the Panel Smooth Transition Regression (PSTR) modelling. The modelling procedure consists of three stages: Specification, Estimation and Evaluation. The package offers sharp tools helping the package user(s) to conduct model specification tests, to do PSTR model estimation, and to do model evaluation. The tests implemented in the package allow for cluster-dependency and are heteroskedasticity-consistent. The wild bootstrap and wild cluster bootstrap tests are also implemented. Parallel computation (as an option) is implemented in some functions, especially the bootstrap tests. The package suits tasks running many cores on super-computation servers.
PSW Propensity Score Weighting Methods for Dichotomous Treatments
Provides propensity score weighting methods to control for confounding in causal inference with dichotomous treatments and continuous/binary outcomes. It includes the following functional modules: (1) visualization of the propensity score distribution in both treatment groups with mirror histogram, (2) covariate balance diagnosis, (3) propensity score model specification test, (4) weighted estimation of treatment effect, and (5) doubly robust estimation of treatment effect. The weighting methods include the inverse probability weight (IPW) for estimating the average treatment effect (ATE), the IPW for average treatment effect of the treated (ATT), the IPW for the average treatment effect of the controls (ATC), the matching weight (MW), the overlap weight (OVERLAP), and the trapezoidal weight (TRAPEZOIDAL). Sandwich variance estimation is provided to adjust for the sampling variability of the estimated propensity score. These methods are discussed by Hirano et al (2003) <DOI:10.1111/1468-0262.00442>, Li and Greene (2013) <DOI:10.1515/ijb-2012-0030>, and Li et al (2016) <DOI:10.1080/01621459.2016.1260466>.
psych Procedures for Psychological, Psychometric, and Personality Research
A general purpose toolbox for personality, psychometrics and experimental psychology. Functions are primarily for scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multi-levels include within and between group statistics, including correlations and factor analysis. Functions for simulating particular item and test structures are included. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometrics as well as publications in personality research. For more information, see the personality-project.org/r webpage.
psychReport Reproducible Reports in Psychology
Helper functions for producing reports in Psychology (Reproducible Research). Provides required formatted strings (APA style) for use in ‘Sweave’/’Knitr’ ‘Latex’ integration within *.Rnw files.
PsyControl CUSUM Person Fit Statistics
Person fit statistics based on Quality Control measures are provided for questionnaires and tests given a specified IRT model. Statistics based on Cumulative Sum (CUSUM) charts are provided. Options are given for banks with polytomous and dichotomous data.
psymonitor Real Time Monitoring of Asset Markets: Bubbles and Crisis
Apply the popular real-time monitoring strategy proposed by Phillips, Shi and Yu (2015a,b;PSY) <doi:10.1111/iere.12132>, <doi:10.1111/iere.12131>, along with a new bootstrap procedure designed to mitigate the potential impact of heteroskedasticity and to effect family-wise size control in recursive testing algorithms (Phillips and Shi, forthcoming).
pterrace Persistence Terrace for Topological Data Analysis
Plot the summary graphic called the persistence terrace for topological inference. It also provides the aid tool called the terrace area plot for determining the number of significant topological features. Moon, C., Giansiracusa, N., and Lazar, N. A. (2018) <doi:10.1080/10618600.2017.1422432>.
ptest Periodicity Tests in Short Time Series
Implements p-value computations using an approximation to the cumulative distribution function for a variety of tests for periodicity. These tests include harmonic regression tests with normal and double exponential errors as well as modifications of Fisher’s g test. An accompanying vignette illustrates the application of these tests.
ptmixed Poisson-Tweedie Generalized Linear Mixed Model
Fits Poisson-Tweedie generalized linear mixed model. Likelihood approximation based on adaptive Gauss Hermite quadrature rule.
ptsuite Tail Index Estimation for Power Law Distributions
Various estimation methods for the shape parameter of Pareto distributed data. This package contains functions for various estimation methods such as maximum likelihood (Newman, 2005)<doi:10.1016/j.cities.2012.03.001>, Hill’s estimator (Hill, 1975)<doi:10.1214/aos/1176343247>, least squares (Zaher et al., 2014)<doi:10.9734/BJMCS/2014/10890>, method of moments (Rytgaard, 1990)<doi:10.2143/AST.20.2.2005443>, percentiles (Bhatti et al., 2018)<doi:10.1371/journal.pone.0196456>, and weighted least squares (Nair et al., 2019) to estimate the shape parameter of Pareto distributed data. It also provides both a heuristic method (Hubert et al., 2013)<doi:10.1016/j.csda.2012.07.011> and a goodness of fit test (Gulati and Shapiro, 2008)<doi:10.1007/978-0-8176-4619-6> for testing for Pareto data as well as a method for generating Pareto distributed data.
ptwikiwords Words Used in Portuguese Wikipedia
Contains a dataset of words used in 15.000 randomly extracted pages from the Portuguese Wikipedia (<https://…/> ).
ptycho Bayesian Variable Selection with Hierarchical Priors
Bayesian variable selection for linear regression models using hierarchical priors. There is a prior that combines information across responses and one that combines information across covariates, as well as a standard spike and slab prior for comparison. An MCMC samples from the marginal posterior distribution for the 0-1 variables indicating if each covariate belongs to the model for each response.
pubchunks Fetch Sections of XML Scholarly Articles
Get chunks of XML scholarly articles without having to know how to work with XML. Custom mappers for each publisher and for each article section pull out the information you want. Works with outputs from package ‘fulltext’, ‘xml2’ package documents, and file paths to XML documents.
Publish Format Output of Various Routines in a Suitable Way for Reports and Publication
A bunch of convenience functions that transform the results of some basic statistical analyses into table format nearly ready for publication. This includes descriptive tables, tables of logistic regression and Cox regression results as well as forest plots.
PUlasso High-Dimensional Variable Selection with Presence-Only Data
Efficient algorithm for solving PU (Positive and Unlabelled) problem in low or high dimensional setting with lasso or group lasso penalty. The algorithm uses Maximization-Minorization and (block) coordinate descent. Sparse calculation and parallel computing via ‘OpenMP’ are supported for the computational speed-up. See Hyebin Song, Garvesh Raskutti (2017) <arXiv:1711.08129>.
pullword R Interface to Pullword Service
R interface to pullword service for Natural Language Processing in Chinese. It enables users to extract valuable words from text by deep learning models. For more details please visit the official site (in Chinese) http://pullword.com .
pulsar Parallel Utilities for Lambda Selection along a Regularization Path
Model selection for penalized graphical models using the Stability Approach to Regularization Selection (‘StARS’), with options for speed-ups including Bounded StARS (B-StARS), batch computing, and other stability metrics (e.g., graphlet stability G-StARS).
pulver Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms
Computes p-values for the interaction term in a very large number of linear regression models.
purge Purge Training Data from Models
Enables the removal of training data from fitted R models while retaining predict functionality. The purged models are more portable as their memory footprints do not scale with the training sample size.
purging Simple Method for Purging Mediation Effects among Independent Variables
Simple method of purging independent variables of mediating effects. First, regress the direct variable on the indirect variable. Then, used the stored residuals as the new purged (direct) variable in the updated specification. This purging process allows for use of a new direct variable uncorrelated with the indirect variable. Please cite the method and/or package using Waggoner, Philip D. (2018) <doi:10.1177/1532673X18759644>.
purrr Functional Programming Tools
Make your pure functions purr with the ‘purrr’ package. This package completes R’s functional programming tools with missing features present in other programming languages.
purrrlyr Tools at the Intersection of ‘purrr’ and ‘dplyr’
Some functions at the intersection of ‘dplyr’ and ‘purrr’ that formerly lived in ‘purrr’.
pushbar Create Sliders for ‘Shiny’
Create sliders from left, right, top and bottom which may include any html or ‘Shiny’ input or output.
pvaluefunctions Creates and Plots P-Value Functions, S-Value Functions, Confidence Distributions and Confidence Densities
Contains functions to compute and plot confidence distributions, confidence densities, p-value functions and s-value (surprisal) functions for several commonly used estimates. Instead of just calculating one p-value and one confidence interval, p-value functions display p-values and confidence intervals for many levels thereby allowing to gauge the compatibility of several parameter values with the data. These methods are discussed by Poole C. (1987) <doi:10.2105/AJPH.77.2.195>; Schweder T, Hjort NL. (2002) <doi:10.1111/1467-9469.00285>; Bender R, Berg G, Zeeb H. (2005) <doi:10.1002/bimj.200410104> ; Singh K, Xie M, Strawderman WE. (2007) <doi:10.1214/074921707000000102>; Rothman KJ, Greenland S, Lash TL. (2008, ISBN:9781451190052); Amrhein V, Trafimow D, Greenland S. (2019) <doi:10.1080/00031305.2018.1543137>; and Greenland S. (2019) <doi:10.1080/00031305.2018.1529625>.
pvar Calculation and Application of p-Variation
The calculation of p-variation of the finite sample data.
pvrank Rank Correlations
Computes rank correlations and their p-values with various options for tied ranks.
PWD Time Series Regression Using the Power Weighted Densities (PWD) Approach
Contains functions which allow the user to perform time series regression quickly using the Power Weighted Densities (PWD) approach. alphahat_LR_one_Rcpp() is the main workhorse function within this package.
pweight P-Value Weighting
This R package contains open source implementations of several p-value weighting methods, including Spjotvoll, exponential and Bayes weights. These are methods for improving power in multiple testing via the use of prior information.
pwr2 Power and Sample Size Analysis for One-way and Two-way ANOVA Models
User friendly functions for power and sample size analysis at one-way and two-way ANOVA settings take either effect size or delta and sigma as arguments. They are designed for both one-way and two-way ANOVA settings. In addition, a function for plotting power curves is available for power comparison, which can be easily visualized by statisticians and clinical researchers.
pwrAB Power Analysis for AB Testing
Power analysis for AB testing. The calculations are based on the Welch’s unequal variances t-test, which is generally preferred over the Student’s t-test when sample sizes and variances of the two groups are unequal, which is frequently the case in AB testing. In such situations, the Student’s t-test will give biased results due to using the pooled standard deviation, unlike the Welch’s t-test.
pwrFDR FDR Power
This is a package for calculating power and sample size in multiple testing situations using the Benjamini-Hochberg (BH) false discovery rate (FDR) procedure. The package computes power and sample size in one of either two ways, using the average power or the lambda-power. See Izmirlian, G. (2018) <arXiv:1801.03989>.
pwrRasch Statistical Power Simulation for Testing the Rasch Model
Statistical power simulation for testing the Rasch Model based on a three-way analysis of variance design with mixed classification.
pycno Pycnophylactic Interpolation
Given a SpatialPolygonsDataFrame and a set of populations for each polygon, compute a population density estimate based on Tobler’s pycnophylactic interpolation algorithm. The result is a SpatialGridDataFrame.
pyinit Pena-Yohai Initial Estimator for Robust S-Regression
Deterministic Pena-Yohai initial estimator for robust S estimators of regression. The procedure is described in detail in Pena, D., & Yohai, V. (1999) <doi:10.2307/2670164>.
pysd2r API to ‘Python’ Library ‘pysd’
Using the R package ‘reticulate’, this package creates an interface to the ‘pysd’ toolset. The package provides an R interface to a number of ‘pysd’ functions, and can read files in ‘Vensim’ ‘mdl’ format, and ‘xmile’ format. The resulting simulations are returned as a ‘tibble’, and from that the results can be processed using ‘dplyr’ and ‘ggplot2’. The package has been tested using ‘python3’.
pystr Python String Methods in R
String operations the Python way – a package for those of us who miss Python’s string methods while we’re working in R.
GitHub
PythonInR Use Python from Within R
Interact with Python from within R.

Q

qap Heuristics for the Quadratic Assignment Problem (QAP)
Implements heuristics for the Quadratic Assignment Problem (QAP). Currently only a simulated annealing heuristic is available.
QBAsyDist Asymmetric Distributions and Quantile Estimation
Provides the local polynomial maximum likelihood estimates for the location and scale functions as well as the semiparametric quantile estimates in the generalized quantile-based asymmetric distributional setting. These functions are useful for any member of the generalized quantile-based asymmetric family of distributions.
qboxplot Quantile-Based Boxplot
Produce quantile-based box-and-whisker plot(s).
QCA Qualitative Comparative Analysis
Core functions to perform Qualitative Comparative Analysis.
QCAfalsePositive Tests for Type I Error in Qualitative Comparative Analysis (QCA)
Implements tests for Type I error in Qualitative Comparative Analysis (QCA) that take into account the multiple hypothesis tests inherent in the procedure. Tests can be carried out on three variants of QCA: crisp-set QCA (csQCA), multi-value QCA (mvQCA) and fuzzy-set QCA (fsQCA). For fsQCA, the fsQCApermTest() command implements a permutation test that provides 95% confidence intervals for the number of counterexamples and degree of consistency, respectively. The distributions of permuted values can be plotted against the observed values. For csQCA and mvQCA, simple binomial tests are implemented in csQCAbinTest() and mvQCAbinTest(), respectively.
QCAGUI Modern Functions for Qualitative Comparative Analysis
An extensive set of functions to perform Qualitative Comparative Analysis: crisp sets (‘csQCA’), temporal (‘tQCA’), multi-value (‘mvQCA’) and fuzzy sets (‘fsQCA’), using a GUI – graphical user interface. QCA is a methodology that bridges the qualitative and quantitative divide in social science research. It uses a Boolean algorithm that results in a minimal causal combination which explains a given phenomenon.
QCApro Professional Functionality for Performing and Evaluating Qualitative Comparative Analysis
The ‘QCApro’ package provides professional functionality for performing configurational comparative research with Qualitative Comparative Analysis (QCA), including crisp-set, multi-value, and fuzzy-set QCA. It also offers advanced tools for sensitivity diagnostics and methodological evaluations of QCA.
QCAtools Helper functions for QCA in R
Helper functions for Qualitative Comparative Analysis: evaluate and plot Boolean formulae on fuzzy set score data, apply Boolean operations, compute consistency and coverage measures
qCBA Quantitative Classification by Association Rules
CBA postprocessing algorithm that creates smaller models for datasets containing quantitative (numerical) attributes. Article describing QCBA is published in Tomas Kliegr (2017) <arXiv:1711.10166>.
qccrs Quality Control Charts under Repetitive Sampling
Functions to calculate Average Sample Numbers (ASN), Average Run Length (ARL1) and value of k, k1 and k2 for quality control charts under repetitive sampling as given in Aslam et al. (2014) (<DOI:10.7232/iems.2014.13.1.101>).
qclust Robust Estimation of Gaussian Mixture Models
Robust estimation of Gaussian mixture models fitted by modified EM algorithm, robust clustering and classification.
QCSIS Sure Independence Screening via Quantile Correlation and Composite Quantile Correlation
Quantile correlation-sure independence screening (QC-SIS) and composite quantile correlation-sure independence screening (CQC-SIS) for ultrahigh-dimensional data.
qcv Quantifying Construct Validity
Primarily, the ‘qcv’ package computes key indices related to the Quantifying Construct Validity procedure (QCV; Westen & Rosenthal, 2003 <doi:10.1037/0022-3514.84.3.608>; see also Furr & Heuckeroth, in press). The qcv() function is the heart of the ‘qcv’ package, but additional functions in the package provide useful ancillary information related to the QCV procedure.
qdap Bridging the Gap Between Qualitative Data and Quantitative Analysis
qdap automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. qdap is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/Natural Language Processing.
qdapRegex Regular Expression Removal, Extraction, and Replacement Tools
A collection of regular expression tools associated with the qdap package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, person tags, phone numbers, times, and zip codes.
qdapTools Tools for the qdap Package
A collection of tools associated with the qdap package that may be useful outside of the context of text analysis.
QDComparison Modern Nonparametric Tools for Two-Sample Quantile and Distribution Comparisons
Allows practitioners to determine (i) if two univariate distributions (which can be continuous, discrete, or even mixed) are equal, (ii) how two distributions differ (shape differences, e.g., location, scale, etc.), and (iii) where two distributions differ (at which quantiles), all using nonparametric LP statistics. The primary reference is Jungreis, D. and Mukhopadhyay, S. (2018, Technical Report).
QFRM Pricing of Vanilla and Exotic Option Contracts
Option pricing (financial derivatives) techniques mainly following textbook ‘Options, Futures and Other Derivatives’, 9ed by John C.Hull, 2014. Prentice Hall. Implementations are via binomial tree option model (BOPM), Black-Scholes model, Monte Carlo simulations, etc. This package is a result of Quantitative Financial Risk Management course (STAT 449 and STAT 649) at Rice University, Houston, TX, USA, taught by Oleg Melnikov, statistics PhD student, as of Spring 2015.
qgam Smooth Additive Quantile Regression Models
Smooth additive quantile regression models, fitted using the methods of Fasiolo et al. (2017) <arXiv:1707.03307>. Differently from ‘quantreg’, the smoothing parameters are estimated automatically by marginal loss minimization, while the regression coefficients are estimated using either PIRLS or Newton algorithm. The learning rate is determined so that the Bayesian credible intervals of the estimated effects have approximately the correct coverage. The main function is qgam() which is similar to gam() in ‘mgcv’, but fits non-parametric quantile regression models.
qGaussian The q-Gaussian Distribution
Density, distribution function, quantile function and random generation for the q-gaussian distribution with parameters mu and sig.
qgcomp Quantile G-Computation
G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. This approach estimates a regression line corresponding to the expected change in the outcome (on the link basis) given a simultaneous increase in the quantile-based category for all exposures. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <arXiv:1902.04200> [stat.ME].
QGglmm Estimate Quantitative Genetics Parameters from Generalised Linear Mixed Models
Compute various quantitative genetics parameters from a Generalised Linear Mixed Model (GLMM) estimates. Especially, it yields the observed phenotypic mean, phenotypic variance and additive genetic variance.
qha Qualitative Harmonic Analysis
Multivariate description of the state changes of a qualitative variable by Correspondence Analysis and Clustering. See: Deville, J.C., & Saporta, G. (1983). Correspondence analysis, with an extension towards nominal time series. Journal of econometrics, 22(1-2), 169-189. Corrales, M.L., & Pardo, C.E. (2015) <doi:10.15332/s2027-3355.2015.0001.01>. Analisis de datos longitudinales cualitativos con analisis de correspondencias y clasificacion. Comunicaciones en Estadistica, 8(1), 11-32.
QICD Estimate the Coefficients for Non-Convex Penalized Quantile Regression Model by using QICD Algorithm
Extremely fast algorithm ‘QICD’, Iterative Coordinate Descent Algorithm for High-dimensional Nonconvex Penalized Quantile Regression. This algorithm combines the coordinate descent algorithm in the inner iteration with the majorization minimization step in the outside step. For each inner univariate minimization problem, we only need to compute a one-dimensional weighted median, which ensures fast computation. Tuning parameter selection is based on two different method: the cross validation and BIC for quantile regression model. Details are described in the Peng,B and Wang,L. (2015) linked to via the URL below with <DOI:10.1080/10618600.2014.913516>.
qicharts2 Quality Improvement Charts
Functions for making run charts, Shewhart control charts and Pareto charts for continuous quality improvement. Included control charts are: I, MR, Xbar, S, T, C, U, U’, P, P’, and G charts. Non-random variation in the form of minor to moderate persistent shifts in data over time is identified by the Anhoej rules for unusually long runs and unusually few crossing [Anhoej, Olesen (2014) <doi:10.1371/journal.pone.0113825>]. Non-random variation in the form of larger, possibly transient, shifts is identified by Shewhart’s 3-sigma rule [Mohammed, Worthington, Woodall (2008) <doi:10.1136/qshc.2004.012047>].
qkerntool Q-Kernel-Based and Conditionally Negative Definite Kernel-Based Machine Learning Tools
Nonlinear machine learning tool for classification, clustering and dimensionality reduction. It integrates 12 q-kernel functions and 14 conditional negative definite kernel functions and includes the q-kernel and conditional negative definite kernel version of density-based spatial clustering of applications with noise, spectral clustering, generalized discriminant analysis, principal component analysis, multidimensional scaling, locally linear embedding, sammon’s mapping and t-Distributed stochastic neighbor embedding.
qle Simulation-Based Quasi-Likelihood Estimation
A simulation-based quasi-likelihood method (Baaske, M. (2014) <doi:10.5566/ias.v33.p107-119>) for parameter estimation of parametric statistical models for which closed-form representations of distributional characteristics are unavailable and can only be obtained by computationally intensive simulations of the model.
QLearning Reinforcement Learning using the Q Learning Algorithm
Implements Q-Learning, a model-free form of reinforcement learning, described in work by Strehl, Li, Wiewiora, Langford & Littman (2006) <doi:10.1145/1143844.1143955>.
qoma.smuggler Transport Data and Commands Across the ‘FAME’ / ‘R’ Border
Transport data and commands across the ‘FAME’ <https://…/support.html> / ‘R’ border. A set of utilities for: reading ‘FAME’ databases into ‘R’; writing ‘R’ data into ‘FAME’ databases; executing ‘FAME’ commands in ‘R’ environment; and, executing ‘R’ commands from the ‘FAME’ environment.
QPBoot Model Validation using Quantile Spectral Analysis and Parametric Bootstrap
Provides functionality for model validation by computing a parametric bootstrap and comparing the Quantile Spectral Densities.
qpdf Split, Combine and Compress PDF Files
Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the ‘qpdf’ C++ API and does not require any command line utilities. Note that ‘qpdf’ does not read actual content from PDF files: to extract text and data you need the ‘pdftools’ package.
qqid Generation and Support of QQIDs – A Human-Compatible Representation of 128-bit Numbers
The string ‘bird.carp.7TsBWtwqtKAeCTNk8f’ is a ‘QQID’: a representation of a 128-bit number, constructed from two ‘cues’ of short, common, English words, and Base64 encoded characters. The primary intended use of QQIDs is as random unique identifiers, e.g. database keys like the ‘UUIDs’ defined in the RFC 4122 Internet standard. QQIDs can be identically interconverted with UUIDs, IPv6 addresses, MD5 hashes etc., and are suitable for a host of applications in which identifiers are read by humans. They are compact, can safely be transmitted in binary and text form, can be used as components of URLs, and it can be established at a glance whether two QQIDs are different or potentially identical. The qqid package contains functions to retrieve true, quantum-random QQIDs, to generate pseudo- random QQIDs, to validate them, and to interconvert them with other 128-bit number representations.
QQperm Permutation Based QQ Plot and Inflation Factor Estimation
Provides users the necessary utility functions to generate permutation-based QQ plots and also estimate inflation factor based on the empirical NULL distribution. While it has general utility, it is particularly helpful when the skewness of the Fisher’s Exact test in sparse data situations with imbalanced case-control sample sizes renders the reliance on the uniform chi-square expected distribution inappropriate.
qqplotr Quantile-Quantile Plot Extensions for ‘ggplot2’
Extensions of ‘ggplot2’ Q-Q plot functionalities.
qqtest Quantile Quantile Plots Self Calibrating For Visual Testing
Provides the function qqtest which incorporates uncertainty in its qqplot display(s) so that the user might have a better sense of the evidence against the specified distributional hypothesis. qqtest draws a quantile quantile plot for visually assessing whether the data come from a test distribution that has been defined in one of many ways. The vertical axis plots the data quantiles, the horizontal those of a test distribution. The default behaviour generates 1000 samples from the test distribution and overlays the plot with pointwise interval estimates for the ordered quantiles from the test distribution. A small number of independently generated exemplar quantile plots are also overlaid. Both the interval estimates and the exemplars provide different comparative information to assess the evidence provided by the qqplot for or against the hypothesis that the data come from the test distribution (default is normal or gaussian). Finally, a visual test of significance (a lineup plot) can also be displayed to test the null hypothesis that the data come from the test distribution.
qqvases Animated Normal Quantile-Quantile Plots
Presents an explanatory animation of normal quantile-quantile plots based on a water-filling analogy. The animation presents a normal QQ plot as the parametric plot of the water levels in vases defined by two distributions. The distributions decorate the axes in the normal QQ plot and are optionally shown as vases adjacent to the plot. The package draws QQ plots for several distributions, either as samples or continuous functions.
QRAGadget A ‘Shiny’ Gadget for Interactive ‘QRA’ Visualizations
Upload raster data and easily create interactive quantitative risk analysis ‘QRA’ visualizations. Select from numerous color palettes, base-maps, and different configurations.
qrage Tools that Create D3 JavaScript Force Directed Graph from R
Tools that create D3 JavaScript force directed graph from R. D3 JavaScript was created by Michael Bostock. See http://d3js.org and, more specifically for Force Directed Graph https://…/Force-Layout.
qrandom True Random Numbers using the ANU Quantum Random Numbers Server
The ANU Quantum Random Number Generator provided by the Australian National University generates true random numbers in real-time by measuring the quantum fluctuations of the vacuum. This package offers an interface using their API. The electromagnetic field of the vacuum exhibits random fluctuations in phase and amplitude at all frequencies. By carefully measuring these fluctuations, one is able to generate ultra-high bandwidth random numbers. The quantum Random Number Generator is based on the papers by Symul et al., (2011) <doi:10.1063/1.3597793> and Haw, et al. (2015) <doi:10.1103/PhysRevApplied.3.054004>. The package offers functions to retrieve a sequence of random integers or hexadecimals and true random samples from a normal or uniform distribution.
QRank A Novel Quantile Regression Approach for eQTL discovery
A Quantile Rank-score based test for the identification of expression quantitative trait loci.
qrcm Quantile Regression Coefficients Modeling
Parametric modeling of quantile regression coefficient functions.
qrcmNL Nonlinear Quantile Regression Coefficients Modeling
Nonlinear parametric modeling of quantile regression coefficient functions. Frumento P and Bottai M (2016) <doi:10.1111/biom.12410>.
qrcmNP Nonlinear and Penalized Quantile Regression Coefficients Modeling
Nonlinear and Penalized parametric modeling of quantile regression coefficient functions. Frumento P and Bottai M (2016) <doi:10.1111/biom.12410>.
qrcode QRcode Generator for R
Create QRcode in R.
QRegVCM Quantile Regression in Varying-Coefficient Models
Quantile regression in varying-coefficient models (VCM) using one particular nonparametric technique called P-splines. The functions can be applied on three types of VCM; (1) Homoscedastic VCM, (2) Simple heteroscedastic VCM, and (3) General heteroscedastic VCM.
qrencoder Make QR codes in R via libqrencode
qrencoder uses libqrencode to make QR codes in R
QRFCCA Quadratically Regularized Functional Canonical Correlation Analysis
Conduct quadratically regularized functional canonical correlation analysis. The details of the method are explained in Nan Lin, Yun Zhu, Ruzhong Fan and Momiaoxiong (2017) <DOI:10.1371/journal.pcbi.1005788>.
qrjoint Joint Estimation in Linear Quantile Regression
Joint estimation of quantile specific intercept and slope parameters in a linear regression setting.
qrLMM Quantile Regression for Linear Mixed-Effects Models
Quantile regression (QR) for Linear Mixed-Effects Models via the asymmetric Laplace distribution (ALD). It uses the Stochastic Approximation of the EM (SAEM) algorithm for deriving exact maximum likelihood estimates and full inference results for the fixed-effects and variance components. It also provides graphical summaries for assessing the algorithm convergence and fitting results.
qrmix Quantile Regression Mixture Models
Implements the robust algorithm for fitting finite mixture models based on quantile regression proposed by Emir et al., 2017 (unpublished).
qrng (Randomized) Quasi-Random Number Generators
Functionality for generating (randomized) quasi-random numbers in high dimensions.
qrsvm SVM Quantile Regression with the Pinball Loss
Quantile Regression (QR) using Support Vector Machines under the Pinball-Loss. Estimation is based on ‘Nonparametric Quantile Regression’ by I. Takeuchi, Q.V.Le , T. Sears, A.J.Smola (2004). Implementation relies on ‘quadprog’ package, package ‘kernlab’ Kernelfunctions and package ‘Matrix’ nearPD to find next Positive definite Kernelmatrix. Package estimates quantiles individually but an Implementation of non crossing constraints coming soon. Function multqrsvm() now supports parallel backend for faster fitting.
qs Quick Serialization of R Objects
Provides functions for quickly writing and reading any R object to and from disk. This package makes use of the ‘zstd’ library for compression and decompression. ‘zstd’ is created by Yann Collet and owned by Facebook, Inc.
qsort Scoring Q-Sort Data
Computes scores from Q-sort data, using criteria sorts and derived scales from subsets of items. The ‘qsort’ package includes descriptions and scoring procedures for four different Q-sets: Attachment Q-set (version 3.0) (Waters, 1995, <doi:10.1111/j.1540-5834.1995.tb00214.x>); California Child Q-set (Block and Block, 1969, <doi:10.1037/0012-1649.21.3.508>); Maternal Behaviour Q-set (version 3.1) (Pederson et al., 1999, <https://…/viewcontent.cgi article=1000&context=psychologypub>); Preschool Q-set (Baumrind, 1968 revised by Wanda Bronson, <doi:10.1111/j.1540-5834.1995.tb00214.x>).
qsub Running Commands Remotely on ‘Gridengine’ Clusters
Run lapply() calls in parallel by submitting them to ‘gridengine’ clusters using the ‘qsub’ command.
qte Quantile Treatment Effects
Provides several methods for computing the Quantile Treatment Effect (QTE) and Quantile Treatment Effect on the Treated (QTET). The main cases covered are
(i) Treatment is randomly assigned,
(ii) Treatment is as good as randomly assigned after conditioning on some covariates (also called conditional independence or selection on observables),
(iii) Identification is based on a Difference in Differences assumption (several varieties are available in the package).
QuACN Quantitative Analysis of Complex Networks
Quantitative Analysis of Complex Networks. This package offers a set of topological network measures to analyze complex Networks structurally.
quadprogXT Quadratic Programming with Absolute Value Constraints
Extends the quadprog package to solve quadratic programs with absolute value constraints and absolute values in the objective function.
qualityTools Statistical Methods for Quality Science
Contains methods associated with the Define, Measure, Analyze, Improve and Control (i.e. DMAIC) cycle of the Six Sigma Quality Management methodology.It covers distribution fitting, normal and non-normal process capability indices, techniques for Measurement Systems Analysis especially gage capability indices and Gage Repeatability (i.e Gage RR) and Reproducibility studies, factorial and fractional factorial designs as well as response surface methods including the use of desirability functions. Improvement via Six Sigma is project based strategy that covers 5 phases: Define – Pareto Chart; Measure – Probability and Quantile-Quantile Plots, Process Capability Indices for various distributions and Gage RR Analyze i.e. Pareto Chart, Multi-Vari Chart, Dot Plot; Improve – Full and fractional factorial, response surface and mixture designs as well as the desirability approach for simultaneous optimization of more than one response variable. Normal, Pareto and Lenth Plot of effects as well as Interaction Plots; Control – Quality Control Charts can be found in the ‘qcc’ package. The focus is on teaching the statistical methodology used in the Quality Sciences.
qualmap Opinionated Approach for Digitizing Semi-Structured Qualitative GIS Data
Provides a set of functions for taking qualitative GIS data, hand drawn on a map, and converting it to a simple features object. These tools are focused on data that are drawn on a map that contains some type of polygon features. For each area identified on the map, the id numbers of these polygons can be entered as vectors and transformed using qualmap.
qualpalr Automatic Generation of Qualitative Color Palettes
Automatic generation of distinct qualitative color palettes using optimization routines on the delta-E CIEDE2000 color difference algorithm.
qualtRics Download ‘Qualtrics’ Survey Data
Provides functions to access survey results directly into R using the ‘Qualtrics’ API. ‘Qualtrics’ <https://…/> is an online survey and data collection software platform. See <https://…/> for more information about the ‘Qualtrics’ API. This package is community-maintained and is not officially supported by ‘Qualtrics’.
qualvar Implements Indices of Qualitative Variation Proposed by Wilcox (1973)
Implements indices of qualitative variation proposed by Wilcox (1973).
QUALYPSO Partitioning Uncertainty Components of an Incomplete Ensemble of Climate Projections
These functions use data augmentation and Bayesian techniques for the assessment of single-member and incomplete ensembles of climate projections. It provides unbiased estimates of climate change responses of all simulation chains and of all uncertainty variables. It additionally propagates uncertainty due to missing information in the estimates. – Evin, G., B. Hingray, J. Blanchet, N. Eckert, S. Morin, and D. Verfaillie. (2019) <doi:10.1175/JCLI-D-18-0606.1>.
quantable Streamline Descriptive Analysis of Quantitative Data Matrices
Methods which streamline the descriptive analysis of quantitative matrices.
quanteda Quantitative Analysis of Textual Data
A fast, flexible toolset for for the management, processing, and quantitative analysis of textual data in R.
QuantifQuantile Estimation of Conditional Quantiles using Optimal Quantization
Estimation of conditional quantiles using optimal quantization. Construction of an optimal grid of N quantizers, estimation of conditional quantiles and data driven selection of the size N of the grid. Graphical illustrations for the selection of N and of resulting estimated curves or surfaces when the dimension of the covariate is one or two.
quantileDA Quantile Classifier
Code for centroid, median and quantile classifiers.
quantities Quantity Calculus for R Vectors
Integration of the ‘units’ and ‘errors’ packages for a complete quantity calculus system for R vectors, matrices and arrays, with automatic propagation, conversion, derivation and simplification of magnitudes and uncertainties.
quantmod Quantitative Financial Modelling Framework
Specify, build, trade, and analyse quantitative financial trading strategies.
quantreg.nonpar Nonparametric Series Quantile Regression
Implements the nonparametric quantile regression method developed by Belloni, Chernozhukov, and Fernandez-Val (2011) to partially linear quantile models. Provides point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. Provides pointwise and uniform confidence intervals using analytic and resampling methods.
quantregRanger Quantile Regression Forests for ‘ranger’
This is the implementation of quantile regression forests for the fast random forest package ‘ranger’.
QuantTools Enhanced Quantitative Trading Modelling
Download and organize historical market data from multiple sources like Yahoo (<http://finance.yahoo.com> ), Google (<https://…/finance> ), Finam (<http://…/> ) and IQFeed (<http://…/index.cfm?symbolguide=lookup> ). Code your trading algorithms in modern C++11 with powerful event driven tick processing API including trading costs and exchange communication latency and transform detailed data seamlessly into R. In just few lines of code you will be able to visualize every step of your trading model from tick data to multi dimensional heat maps.
Quartet Comparison of Phylogenetic Trees Using Quartet and Bipartition Measures
Calculates the number of four-taxon subtrees consistent with a pair of cladograms, calculating the symmetric quartet distance of Bandelt & Dress (1986), Reconstructing the shape of a tree from observed dissimilarity data, Advances in Applied Mathematics, 7, 309-343 <doi:10.1016/0196-8858(86)90038-2>, and using the tqDist algorithm of Sand et al. (2014), tqDist: a library for computing the quartet and triplet distances between binary or general trees, Bioinformatics, 30, 2079 2080 <doi:10.1093/bioinformatics/btu157> for pairs of bifurcating trees.
QuClu Quantile-Based Clustering Algorithms
Various quantile-based clustering algorithms: algorithm CU (Common theta and Unscaled variables), algorithm CS (Common theta and Scaled variables through lambda_j), algorithm VU (Variable-wise theta_j and Unscaled variables) and algorithm VW (Variable-wise theta_j and Scaled variables through lambda_j). Hennig, Viroli, Anderlucci (2018) <arXiv:1806.10403v1>.
queuecomputer Computationally Efficient Queue Simulation
Implementation of a computationally efficient method for simulating queues with arbitrary arrival and service times.
quhomology Calculation of Homology of Quandles, Racks, Biquandles and Biracks
This calculates the Quandle, Rack and Degenerate Homology groups of Racks and Biracks (as well as Quandles and Biquandles). In addition, a test is provided to ascertain if a given set with one or two given functions is indeed a biquandle or not.
quickblock Quick Threshold Blocking
Provides functions for assigning treatments in randomized experiments using near-optimal threshold blocking. The package is made with large data sets in mind and derives blocks more than an order of magnitude quicker than other methods.
quickmapr Quickly Map and Explore Spatial Data
While analyzing geospatial data, easy visualization is often needed that allows for quick plotting, and simple, but easy interactivity. Additionally, visualizing geospatial data in projected coordinates is also desirable. The ‘quickmapr’ package provides a simple method to visualize ‘sp’ and ‘raster’ objects, allows for basic zooming, panning, identifying, and labeling of spatial objects, and does not require that the data be in geographic coordinates.
quickmatch Quick Generalized Full Matching
Provides functions for constructing near-optimal generalized full matching. Generalized full matching is an extension of the original full matching method to situations with more intricate study designs. The package is made with large data sets in mind and derives matches more than an order of magnitude quicker than other methods.
quickPlot A System of Plotting Optimized for Speed and Modularity
A high-level plotting system, built using ‘grid’ graphics, that is optimized for speed and modularity. This has great utility for quick visualizations when testing code, with the key benefit that visualizations are updated independently of one another.
quickReg Build Regression Models Quickly and Display the Results Using ‘ggplot2’
A set of functions to extract results from regression models and plot the effect size using ‘ggplot2’ seamlessly. While ‘broom’ is useful to convert statistical analysis objects into tidy data frames, ‘coefplot’ is adept at showing multivariate regression results. With specific outcome, this package could build regression models automatically, extract results into a data frame and provide a quicker way to summarize models’ statistical findings using ‘ggplot2’.
quickregression Quick Linear Regression
Helps to perform linear regression analysis by reducing manual effort. Reduces the independent variables based on specified p-value and Variance Inflation Factor (VIF) level.
quokar Quantile Regression Outlier Diagnostics with K Left Out Analysis
Diagnostics methods for quantile regression models for detecting influential observations: robust distance methods for general quantile regression models; generalized Cook’s distance and Q-function distance method for quantile regression models using aymmetric Laplace distribution. Reference of this method can be found in Luis E. Benites, Víctor H. Lachos, Filidor E. Vilca (2015) <arXiv:1509.05099v1>; mean posterior probability and Kullback-Leibler divergence methods for Bayes quantile regression model. Reference of this method is Bruno Santos, Heleno Bolfarine (2016) <arXiv:1601.07344v1>.
Quor Quantile Ordering
Functions to compute confidence statistics for ordering population quantiles.
quotedargs A Way of Writing Functions that Quote their Arguments
A facility for writing functions that quote their arguments, may sometimes evaluate them in the environment where they were quoted, and may pass them as quoted to other functions.
qut Quantile Universal Threshold
Selection of a threshold parameter for GLM-lasso to obtain a sparse model with a good compromise between high true positive rate and low false discovery rate.
QVM Questionnaires Validation Module
Implement a multivariate analysis interface for questionnaire validation of Likert-type scale variables.
qwraps2 Quick Wraps 2
A collection of (wrapper) functions the creator found useful for quickly placing data summaries and formatted regression results into .Rnw or .Rmd files. Functions for generating commonly used graphics, such as receiver operating curves or Bland-Altman plots, are also provided by ‘qwraps2’. ‘qwraps2’ is a updated version of an package ‘qwraps’. The original version ‘qwraps’ was never submitted to CRAN but can be found at https://…/qwraps . The implementation and limited scope of the functions within ‘qwraps2’ (https://…/qwraps2 ) is fundamentally different from ‘qwraps’.

R

r.blip Bayesian Network Learning Improved Project
Allows the user to learn Bayesian networks from datasets containing thousands of variables. It focuses on score-based learning, mainly the ‘BIC’ and the ‘BDeu’ score functions. It provides state-of-the-art algorithms for the following tasks: (1) parent set identification – Mauro Scanagatta (2015) <http://…networks-with-thousands-of-variables>; (2) general structure optimization – Mauro Scanagatta (2018) <doi:10.1007/s10994-018-5701-9>, Mauro Scanagatta (2018) <http://…/scanagatta17a.html>; (3) bounded treewidth structure optimization – Mauro Scanagatta (2016) <http://…networks-with-thousands-of-variables>; (4) structure learning on incomplete data sets – Mauro Scanagatta (2018) <doi:10.1016/j.ijar.2018.02.004>. Distributed under the LGPL-3 by IDSIA.
r.jive Perform JIVE Decompositions for Multi-Source Data
Performs the JIVE decompositions on a list of data sets when the data share a dimension, returning low-rank matrices that capture the joint and individual structure of the data. It provides two methods of rank selection when the rank is unknown, a permutation test and a BIC selection algorithm. Also included in the package are three plotting functions for visualizing the variance attributed to each data source: a bar plot that shows the percentages of the variability attributable to joint and individual structure, a heatmap that shows the structure of the variability, and principal component plots.
R.matlab Read and Write MAT Files and Call MATLAB from Within R
Methods readMat() and writeMat() for reading and writing MAT files. For user with MATLAB v6 or newer installed (either locally or on a remote host), the package also provides methods for controlling MATLAB (trademark) via R and sending and retrieving data between R and MATLAB.
R.rsp Dynamic generation of scientific reports
R.rsp is an R package that implements a compiler for the RSP markup language. RSP can be used to embed dynamic R code in any text-based source document to be compiled into a final document, e.g. RSP-embedded LaTeX into PDF, RSP-embedded Markdown into HTML, RSP-embedded HTML into HTML and so on. The package provides a set of vignette engines making it straightforward to use RSP in vignettes and there are also other vignette engines to, for instance, include static PDF vignettes. Starting with R.rsp v0.20.0 (on CRAN), a vignette engine for including plain LaTeX-based vignettes is also available. The R.rsp package installs out-of-the-box on all common operating systems, including Linux, OS X and Windows.
R.temis Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from ‘Dow Jones Factiva’, ‘LexisNexis’, ‘Europresse’ and ‘Alceste’ files.
r2d3 Interface to ‘D3’ Visualizations
Suite of tools for using ‘D3’, a library for producing dynamic, interactive data visualizations. Supports translating objects into ‘D3’ friendly data structures, rendering ‘D3’ scripts, publishing ‘D3’ visualizations, incorporating ‘D3’ in R Markdown, creating interactive ‘D3’ applications with Shiny, and distributing ‘D3’ based ‘htmlwidgets’ in R packages.
R2DT Translation of Base R-Like Functions for ‘data.table’ Objects
Some heavily used base R functions are reconstructed to also be compliant to data.table objects. Also, some general helper functions that could be of interest for working with data.table objects are included.
r2glmm Computes R Squared for Mixed (Multilevel) Models
The model R squared and semi-partial R squared for the linear and generalized linear mixed model (LMM and GLMM) are computed with confidence limits. The R squared measure from Edwards et.al (2008) <DOI:10.1002/sim.3429> is extended to the GLMM using penalized quasi-likelihood (PQL) estimation (see Jaeger et al. 2016 <DOI:10.1080/02664763.2016.1193725>). Three methods of computation are provided and described as follows. Firstly, The Kenward-Roger approach. Due to some inconsistency between the ‘pbkrtest’ package and the ‘glmmPQL’ function, the Kenward-Roger approach in the ‘r2glmm’ package is limited to the LMM. Secondly, The method introduced by Nakagawa and Schielzeth (2013) <DOI:10.1111/j.2041-210x.2012.00261.x> and later extended by Johnson (2014) <DOI:10.1111/2041-210X.12225>. The ‘r2glmm’ package only computes marginal R squared for the LMM and does not generalize the statistic to the GLMM; however, confidence limits and semi-partial R squared for fixed effects are useful additions. Lastly, an approach using standardized generalized variance (SGV) can be used for covariance model selection. Package installation instructions can be found in the readme file.
R2GUESS Wrapper Functions for GUESS
Wrapper functions for GUESS, a GPU-accelerated sparse Bayesian variable selection method for linear regression based analysis of multivariate, correlated outcomes.
R2MLwiN Running ‘MLwiN’ from Within R
An R command interface to the ‘MLwiN’ multilevel modelling software package.
r2pmml Convert R Models to PMML
R wrapper for the JPMML-R library <https://…/jpmml-r>, which converts R models to Predictive Model Markup Language (PMML).
R2ucare Goodness-of-Fit Tests for Capture-Recapture Models
Performs goodness-of-fit tests for capture-recapture models. Also contains several functions to process capture-recapture data.
R3port Report Functions to Create HTML and PDF Files
Create and combine HTML and PDF reports from within R. Possibility to design tables and listings for reporting and also include R plots.
r4lineups Statistical Inference on Lineup Fairness
Since the early 1970s eyewitness testimony researchers have recognised the importance of estimating properties such as lineup bias (is the lineup biased against the suspect, leading to a rate of choosing higher than one would expect by chance ), and lineup size (how many reasonable choices are in fact available to the witness A lineup is supposed to consist of a suspect and a number of additional members, or foils, whom a poor-quality witness might mistake for the perpetrator). Lineup measures are descriptive, in the first instance, but since the earliest articles in the literature researchers have recognised the importance of reasoning inferentially about them. This package contains functions to compute various properties of laboratory or police lineups, and is intended for use by researchers in forensic psychology and/or eyewitness testimony research. Among others, the r4lineups package includes functions for calculating lineup proportion, functional size, various estimates of effective size, diagnosticity ratio, homogeneity of the diagnosticity ratio, ROC curves for confidence x accuracy data and the degree of similarity of faces in a lineup.
R6 Classes with reference semantics
The R6 package allows the creation of classes with reference semantics, similar to R’s built-in reference classes. Compared to reference classes, R6 classes are simpler and lighter-weight, and they are not built on S4 classes so they do not require the methods package. These classes allow public and private members, and they support inheritance, even when the classes are defined in different packages.
R62S3 Automatic S3 Method Generation from R6
After defining an R6 class, R62S3 is used to automatically generate S3 generics and methods for dispatch. Also allows piping for R6 objects.
R6DS R6 Reference Class Based Data Structures
Provides reference classes implementing some useful data structures. The package implements these data structures by using the reference class R6. Therefore, the classes of the data structures are also reference classes which means that their instances are passed by reference. The implemented data structures include stack, queue, double-ended queue, doubly linked list, set, dictionary and binary search tree. See for example <https://…/Data_structure> for more information about the data structures.
r6extended Extension for ‘R6’ Base Class
Useful methods and data fields to extend the bare bones ‘R6’ class provided by the ‘R6’ package – ls-method, hashes, warning- and message-method, general get-method and a debug-method that assigns self and private to the global environment.
R6Frame R6 Wrapper for Data Frames
Provides a R6 ‘frame’ around data which allows one to create more complex objects/operations based on the underlying data.
RADanalysis Normalization and Study of Rank Abundance Distributions
It has tools for normalization of rank abundance distributions (RAD) to a desired number of ranks using MaxRank Normalization method. RADs are commonly used in biology/ecology and mathematically equivalent to complementary cumulative distributions (CCDFs) which are used in physics, linguistics and sociology and more generally in data science.
radarBoxplot Implementation of the Radar-Boxplot
A wrapper around fmsb::radarchart() creating the radar-boxplot, a plot that was created by the author during his doctoring in forest resources. The radar-boxplot is a visualization feature suited for multivariate classification/clustering. It provides an intuitive deep understanding of the data.
radarchart Radar Chart from Chart.js
Create interactive radar charts using the ‘Chart.js’ JavaScript library and the ‘htmlwidgets’ package. Chart.js (http://www.chartjs.org ) is a lightweight library that supports several types of simple chart using the HTML5 canvas element. This package provides an R interface specifically to the radar chart, sometimes called a spider chart, for visualising multivariate data.
radiant Business Analytics using R and Shiny
A platform-independent browser-based interface for business analytics in R, based on the Shiny package.
http://…/introducing-radiant.html
radiant.basics Basics Menu for Radiant: Business Analytics using R and Shiny
The Radiant Basics menu includes interfaces for probability calculation, central limit theorem simulation, comparing means and proportions, goodness-of-fit testing, cross-tabs, and correlation. The application extends the functionality in radiant.data.
radiant.data Business Analytics using R and Shiny
A platform-independent browser-based interface for business analytics in R, based on the Shiny package.
radiant.design Design Menu for Radiant: Business Analytics using R and Shiny
The Radiant design menu includes interfaces for design of experiments, sampling, and sample size calculation. The application extends the functionality in radiant.data.
radiant.model Model Menu for Radiant: Business Analytics using R and Shiny
The Radiant Model menu includes interfaces for linear and logistic regression, Neural Networks, model evaluation, decision analysis, and simulation. The application extends the functionality in radiant.data.
radiant.multivariate Multivariate Menu for Radiant: Business Analytics using R and Shiny
The Radiant Multivariate menu includes interfaces for perceptual mapping, factor analysis, cluster analysis, and conjoint analysis. The application extends the functionality in radiant.data.
radiomics Radiomic’ Image Processing Toolbox
Functions to extract first and second order statistics from images.
radjust Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints
Calculates adjusted p-values for the null hypothesis of no replicability across studies for two study designs: (i) a primary and follow-up study, where the features in the follow-up study are selected from the primary study, as described in Bogomolov and Heller (2013) <doi:10.1080/01621459.2013.829002> and Heller, Bogomolov and Benjamini (2014) <doi:10.1073/pnas.1314814111>; (ii) two independent studies, where the features for replicability are first selected in each study separately, as described in Bogomolov and Heller (2018) <doi:10.1093/biomet/asy029>. The latter design is the one encountered in a typical meta-analysis of two studies, but the inference is for replicability rather than for identifying the features that are non-null in at least one study.
Radviz Project Multidimensional Data in 2D Space
An implementation of the radviz projection in R. It enables the visualization of multidimensional data while maintaining the relation to the original dimensions. This package provides functions to create and plot radviz projections, and a number of summary plots that enable comparison and analysis. For reference see Ankerst et al. (1996) <http://…/summary?doi=10.1.1.68.1811> for original implementation, see Di Caro et al. (2010) <DOI:10.1007/978-3-642-13672-6_13> for the original method for dimensional anchor arrangements.
rafalib Convenience Functions for Routine Data Exploration
A series of shortcuts for routine tasks originally developed by Rafael A. Irizarry to facilitate data exploration.
rafalib package
ragt2ridges Ridge Estimation of the Vector Auto-Regressive (VAR) Processes
Ridge maximum likelihood estimation of vector auto-regressive processes and supporting functions for their exploitation.
rainbow Rainbow Plots, Bagplots and Boxplots for Functional Data
Functions and data sets for functional data display and outlier detection.
rainfarmr Stochastic Precipitation Downscaling with the RainFARM Method
An implementation of the RainFARM (Rainfall Filtered Autoregressive Model) stochastic precipitation downscaling method (Rebora et al. (2006) <doi:10.1175/JHM517.1>). Adapted for climate downscaling according to D’Onofrio et al. (2018) <doi:10.1175/JHM-D-13-096.1> and for complex topography as in Terzago et al. (2018) <doi:10.5194/nhess-18-2825-2018>. The RainFARM method is based on the extrapolation to small scales of the Fourier spectrum of a large-scale precipitation field, using a fixed logarithmic slope and random phases at small scales, followed by a nonlinear transformation of the resulting linearly correlated stochastic field. RainFARM allows to generate ensembles of spatially downscaled precipitation fields which conserve precipitation at large scales and whose statistical properties are consistent with the small-scale statistics of observed precipitation, based only on knowledge of the large-scale precipitation field.
rakeR Easy Spatial Microsimulation (Raking) in R
Functions for performing spatial microsimulation (‘raking’) in R.
Ramble Combinatory Parser Implementation
Parser generator for R using combinatory parsers. It is inspired by combinatory parsers developed in Haskell.
rAmCharts JavaScript Charts API Tool
API for using AmChart Library. Based on the ‘htmlwidgets’ package, it provides a global architecture to generate ‘JavaScript’ source code for charts. Most of classes in the library have their equivalent in R with S4 classes; for those classes, not all properties have been referenced but can easily be added in the constructors. Complex properties (e.g. ‘JavaScript’ object) can be passed as named list. See examples at <h…/ dataknowledge.github.io/rAmCharts/>. and <http://…/> for more information about the library.
ramchoice Estimation and Inference in Random Attention Models
It is widely documented in psychology, economics and other disciplines that socio-economic agents do not pay full attention to all available choices, rendering standard revealed preference theory invalid. This package implements the estimation and inference procedures documented in Cattaneo, Ma, Masatlioglu and Suleymanov (2017) <http://…-Masatlioglu-Suleymanov_2017_RAM.pdf>, which utilizes standard choice data to partially identify and estimate decision maker’s preference. For inference, different simulation-based critical values are provided.
RAMClustR RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching Based Annotation for Metabolomics Data
A feature clustering algorithm for non-targeted mass spectrometric metabolomics data. This method is compatible with GC-MS, LC-MS, and LC-MS data using indiscriminant MS/MS (MSe, MSall, etc) data.
ramcmc Robust Adaptive Metropolis Algorithm
Function for adapting the shape of the random walk Metropolis proposal as specified by robust adaptive Metropolis algorithm by Vihola (2012) <DOI:10.1007/s11222-011-9269-5>. Package also includes fast functions for rank-one Cholesky update and downdate. These functions can be used directly from R or the corresponding C++ header files can be easily linked to other R packages.
ramidst An Interface to the AMIDST Toolbox for Data Stream Processing
Offers a link to some of the functionality of the AMIDST toolbox <http://www.amidsttoolbox.com> for handling data streams. More precisely, the package provides inference and concept drift detection using hybrid Bayesian networks.
ramify Additional Matrix Functionality
Additional matrix functionality for R. Includes a wrapper to the base matrix function that extends its functionality by allowing matrices to be created from character strings and lists. A number of convenience functions have also been added for users more familiar with other scientific languages like Julia, MATLAB/Octave, or Python.
RAMP Regularized Generalized Linear Models with Interaction Effects
This package provides an efficient procedure for fitting the entire solution path for high-dimensional regularized generalized linear models with interactions effects under the strong heredity constraint.
ramsvm Reinforced Angle-Based Multicategory Support Vector Machines
Provides a solution path for Reinforced Angle-based Multicategory Support Vector Machines, with linear learning, polynomial learning, and Gaussian kernel learning.
randcorr Generate a Random p x p Correlation Matrix
Implements the algorithm by Pourahmadi and Wang (2015) <doi:10.1016/j.spl.2015.06.015> for generating a random p x p correlation matrix. Briefly, the idea is to represent the correlation matrix using Cholesky factorization and p(p-1)/2 hyperspherical coordinates (i.e., angles), sample the angles from a particular distribution and then convert to the standard correlation matrix form. The angles are sampled from a distribution with pdf proportional to sin^k(theta) (0 < theta < pi, k >= 1) using the efficient sampling algorithm described in Enes Makalic and Daniel F. Schmidt (2018) <arXiv:1809.05212>.
randgeo Generate Random ‘WKT’ or ‘GeoJSON’
Generate random positions (latitude/longitude), Well-known text (WKT) points or polygons, or ‘GeoJSON’ points or polygons.
randnet Random Network Model Selection and Parameter Tuning
Model selection and parameter tuning procedures for a class of random network models. The model selection can be done by a general cross-validation framework called ECV from Li et. al. (2016) <arXiv:1612.04717> . Several other model-based and task-specific methods are also included, such as NCV from Chen and Lei (2016) <arXiv:1411.1715>, likelihood ratio method from Wang and Bickel (2015) <arXiv:1502.02069>, spectral methods from Le and Levina (2015) <arXiv:1507.00827>. Many network analysis methods are also implemented, such as the regularized spectral clustering (Amini et. al. 2013 <doi:10.1214/13-AOS1138>) and its degree corrected version and graphon neighborhood smoothing (Zhang et. al. 2015 <arXiv:1509.08588>).
random True Random Numbers using random.org
The true random number service provided by the random.org website created by Mads Haahr samples atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann. More background is available in the included vignette based on an essay by Mads Haahr. In its current form, the package offers functions to retrieve random integers, randomized sequences and random strings.
randomcoloR Generate Attractive Random Colors
Simple methods to generate attractive random colors. The random colors are from a wrapper of ‘randomColor.js’ <https://…/randomColor>. In addition, it also generates optimally distinct colors based on k-means (inspired by ‘IWantHue’ <https://…/iwanthue> ).
RandomFieldsUtils Utilities for the Simulation and Analysis of Random Fields
Various utilities are provided that might be used in spatial statistics and elsewhere. It delivers a method for solving linear equations that checks the sparsity of the matrix before any algorithm is used. Furthermore, it includes the Struve functions.
randomForest Breiman and Cutler’s random forests for classification and regression
Classification and regression based on a forest of trees using random inputs.
randomForest.ddR Distributed ‘randomForest’ for Big Data using ‘ddR’ API
Distributed training and prediction of random forest models based upon ‘randomForest’ package.
randomizeR Randomization for Clinical Trials
This tool enables the user to choose a randomization procedure based on sound scientific criteria. It comprises the generation of randomization sequences as well the assessment of randomization procedures based on carefully selected criteria. Furthermore, ‘randomizeR’ provides a function for the comparison of randomization procedures.
randomsearch Random Search for Expensive Functions
Simple Random Search function for the ‘smoof’ and ‘ParamHelpers’ ecosystem with termination criteria and parallelization.
RandPro Random Projection
Performs random projection using Johnson-Lindenstrauss (JL) Lemma (see William B.Johnson and Joram Lindenstrauss (1984) <doi:10.1090/conm/026/737400>). Random Projection is a technique, where the data in the high dimensional space is projected into the low dimensional space using JL transform. The original high dimensional data matrix is multiplied with the low dimensional random matrix which results in reduced matrix. The random matrix can be generated by Gaussian matrix or sparse matrix.
randquotes Get Random Quotes from Quotes on Design API
Connects to the site <http://…/> that uses the ‘WordPress’ JSON REST API to provide a way for you to grab quotes.
randstr Generate Random Strings
Generate random strings of a dictated size of symbol set and distribution of the lengths of strings.
rangeModelMetadata Provides Templates for Metadata Files Associated with Species Range Models
Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Associated manuscript in review.
ranger A Fast Implementation of Random Forests
A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class ‘gwaa.data’ (R package GenABEL) can be directly analyzed.
ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R – implementation
RanglaPunjab Displays Palette of 5 Colors
Displays palette of 5 colors based on photos depicting the unique and vibrant culture of Punjab in Northern India. Since Punjab translates to ‘Land of 5 Rivers’ there are 5 colors per palette. If users need more than 5 colors, they can 2 merge palettes to create their own color-combination. Users can view colors of 1 or 2 palettes together. Users can also list all the palette choices. And last but not least, users can see the photo that inspired a particular palette.
rankdist Distance Based Ranking Models
Implements distance based probability models for ranking data. The supported distance metrics include Kendall distance and Weighted Kendall distance. Mixture models are also supported.
rankFD Rank-Based Tests for General Factorial Designs
The rankFD() function calculates the Wald-type statistic (WTS) and the ANOVA-type statistic (ATS) for nonparametric factorial designs, e.g., for count, ordinal or score data in a crossed design with an arbitrary number of factors.
RankingProject The Ranking Project: Visualizations for Comparing Populations
Functions to generate plots and tables for comparing independently- sampled populations. Companion package to ‘A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals’ by Wright, Klein, and Wieczorek (2017, in press).
RANKS Ranking of Nodes with Kernelized Score Functions
Implementation of Kernelized score functions and other semi-supervised learning algorithms for node label ranking in biomolecular networks. RANKS can be easily applied to a large set of different relevant problems in computational biology, ranging from automatic protein function prediction, to gene disease prioritization and drug repositioning, and more in general to any bioinformatics problem that can be formalized as a node label ranking problem in a graph. The modular nature of the implementation allows to experiment with different score functions and kernels and to easily compare the results with baseline network-based methods such as label propagation and random walk algorithms, as well as to enlarge the algorithmic scheme by adding novel user-defined score functions and kernels.
RANN.L1 Fast Nearest Neighbour Search (Wraps ANN Library) Using L1 Metric
Finds the k nearest neighbours for every point in a given dataset in O(N log N) time using Arya and Mount’s ANN library (v1.1.3). There is support for approximate as well as exact searches, fixed radius searches and ‘bd’ as well as ‘kd’ trees. The distance is computed using the L1 (Manhattan, taxicab) metric. Please see package ‘RANN’ for the same functionality using the L2 (Euclidean) metric.
rapiclient Dynamic OpenAPI/Swagger Client
Access services specified in OpenAPI (formerly Swagger) format. It is not a code generator. Client is generated dynamically as a list of R functions.
RApiDatetime R API Datetime
Access to the C-level R date and datetime code is provided for C-level API use by other packages via registration of native functions. Client packages simply include a single header ‘RApiDatetime.h’ provided by this package, and also ‘import’ it. The R Core group is the original author of the code made available with slight modifications by this package.
rapidjsonr Rapidjson’ C++ Header Files
Provides JSON parsing capability through the ‘Rapidjson’ ‘C++’ header-only library.
rapidraker Rapid Automatic Keyword Extraction (RAKE) Algorithm
A ‘Java’ implementation of the RAKE algorithm (Rose, S., Engel, D., Cramer, N. and Cowley, W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data.
rapidxmlr Rapidxml’ C++ Header Files
Provides XML parsing capability through the ‘Rapidxml’ ‘C++’ header-only library.
rapier Turn your R code into a web API.
rapier allows you to create a REST API by merely decorating your existing R source code with special comments.
Convert R Code to a Web API
rare Linear Model with Tree-Based Lasso Regularization for Rare Features
Implementation of an alternating direction method of multipliers algorithm for fitting a linear model with tree-based lasso regularization, which is proposed in Algorithm 1 of Yan and Bien (2018) <arXiv:1803.06675>. The package allows efficient model fitting on the entire 2-dimensional regularization path for large datasets. The complete set of functions also makes the entire process of tuning regularization parameters and visualizing results hassle-free.
rarhsmm Regularized Autoregressive Hidden Semi Markov Model
Fit Gaussian hidden Markov (or semi-Markov) models with / without autoregressive coefficients and with / without regularization. The fitting algorithm for the hidden Markov model is illustrated by Rabiner (1989) <doi:10.1109/5.18626>. The shrinkage estimation on the covariance matrices is based on the graphical lasso method by Freedman (2007) <doi:10.1093/biostatistics/kxm045>. The shrinkage estimation on the autoregressive coefficients uses the elastic net shrinkage detailed in Zou (2005) <doi:10.1111/j.1467-9868.2005.00503.x>.
rasciidoc Create Reports Using R and ‘asciidoc’
Inspired by Karl Broman`s reader on using ‘knitr’ with ‘asciidoc’ (<http://…/asciidoc.html> ), this is merely a wrapper to ‘knitr’ and ‘asciidoc’.
rasterImage An Improved Wrapper of Image()
This is a wrapper function for image(), which makes reasonable raster plots with nice axis and other useful features.
rasterize Rasterize Graphical Output
Provides R functions to selectively rasterize components of ‘grid’ output.
rasterKernelEstimates Kernel Based Estimates on in-Memory Raster Images
Performs kernel based estimates on in-memory raster images from the raster package. These kernel estimates include local means variances, modes, and quantiles. All results are in the form of raster images, preserving original resolution and projection attributes.
rasterList A Raster Where Cells are Generic Objects
A S4 class has been created such that complex operations can be executed on each cells of a raster map. The raster of objects contains the traditional raster map with the addition of a list of generic objects: one object for each raster cells. It allows to write few lines of R code for complex map algebra. Two environmental applications about frequency analysis of raster map of precipitation and creation of a raster map of soil water retention curves have been presented.
ratelimitr Rate Limiting for R
Allows to limit the rate at which one or more functions can be called.
ratematrix Bayesian Estimation of the Evolutionary Rate Matrix
Estimates the evolutionary rate matrix (R) using Markov chain Monte Carlo (MCMC) as described in Caetano and Harmon (2017) <doi:10.1111/2041-210X.12826>. The package has functions to run MCMC chains, plot results, evaluate convergence, and summarize posterior distributions.
ratesci Confidence Intervals for Comparisons of Binomial or Poisson Rates
Computes confidence intervals for the rate (or risk) difference (‘RD’) or ratio (‘RR’) for independent binomial proportions or Poisson rates, or for odds ratio (‘OR’, binomial only). Also confidence intervals for a single binomial or Poisson rate. Includes asymptotic score methods including skewness corrections, which have been developed from Miettinen & Nurminen (1985) <doi:10.1002/sim.4780040211> and Gart & Nam (1988) <doi:10.2307/2531848>. Also includes MOVER methods (Method Of Variance Estimates Recovery), derived from the Newcombe method but using equal-tailed Jeffreys intervals. Also methods for stratified calculations (e.g. meta-analysis), either assuming fixed effects or incorporating stratum heterogeneity.
RATest Randomization Tests
A collection of randomization tests, data sets and examples. The current version focuses on the description and implementation of a permutation test for testing the continuity assumption of the baseline covariates in the sharp regression discontinuity design (RDD) as in Canay and Kamat (2017) <https://goo.gl/UZFqt7>. More specifically, it allows the user to select a set of covariates and test the aforementioned hypothesis using a permutation test based on the Cramer-von Miss test statistic. Graphical inspection of the empirical CDF and histograms for the variables of interest is also supported in the package.
RatingScaleReduction Rating Scale Reduction Procedure
Reduces items in a rating scale by a new procedure of reducing items in a rating scale.
RationalExp Rationalizing Rational Expectations. Tests and Deviations
We implement a test of the rational expectations hypothesis based on the marginal distributions of realizations and subjective beliefs from D’Haultfoeuille, Gaillac, and Maurel (2018) <doi:10.3386/w25274>. This test can be used in cases where realizations and subjective beliefs are observed in two different datasets that cannot be matched, or when they are observed in the same dataset. The package also computes the estimator of the minimal deviations from rational expectations than can be rationalized by the data.
rattle.data Rattle Datasets
Contains the datasets used as default examples by the rattle package. The datasets themselves can be used independently of the rattle package to illustrate analytics, data mining, and data science tasks.
rayshader Create and Visualize Hillshaded Maps from Elevation Matrices
Uses a combination of raytracing, spherical texture mapping, lambertian reflectance, and ambient occlusion to produce hillshades of elevation matrices. Includes water detection and layering functions, programmable color palette generation, several built-in textures, and 2D and 3D plotting options.
rBayesianOptimization Bayesian Optimization of Hyperparameters
An Pure R implementation of Bayesian Global Optimization with Gaussian Processes.
RBesT R Bayesian Evidence Synthesis Tools
Tool-set to support Bayesian evidence synthesis. This includes meta-analysis, (robust) prior derivation from historical data, operating characteristics and analysis (1 and 2 sample cases).
Rbgs Reading and Background Subtraction in Videos
Methods that allow video reading and loading in R. Also provides nine different methods for background subtraction.
rbi R Interface to LibBi
Provides a complete R interface to LibBi, a library for Bayesian inference (see http://libbi.org for more information). This includes functions for manipulating LibBi models, for reading and writing LibBi input/output files, for converting LibBi output to provide traces for use with the coda package, and for running LibBi from R.
rbi.helpers RBi’ Helper Functions
Contains a collection of helper functions to use with ‘RBi’, the R interface to ‘LibBi’, described in Murray et al. (2015) <doi:10.18637/jss.v067.i10>. It contains functions to adapt the proposal distribution and number of particles in particle Markov-Chain Monte Carlo, as well as calculating the Deviance Information Criterion (DIC) and converting between times in ‘LibBi’ results and R time/dates.
rbin Tools for Binning Data
Manually bin data using weight of evidence and information value. Includes other binning methods such as equal length, quantile and winsorized. Options for combining levels of categorical data are also available. Dummy variables can be generated based on the bins created using any of the available binning methods. References: Siddiqi, N. (2006) <doi:10.1002/9781119201731.biblio>.
rbit Binary Indexed Tree
A simple implementation of Binary Indexed Tree by R. The BinaryIndexedTree class supports construction of Binary Indexed Tree from a vector, update of a value in the vector and query for the sum of a interval of the vector.
Rblpapi R Access to Bloomberg API
Rblpapi provides R with API access to data and calculations from Bloomberg Finance L.P.
GitHub
rblt Bio-Logging Toolbox
An R-shiny application to plot datalogger time series at a microsecond precision as Acceleration, Temperature, Pressure, Light intensity from CATS, AXY-TREK and WACU bio-loggers. It is possible to link behavioral labels extracted from ‘BORIS’ software <http://www.boris.unito.it> or manually written in a csv file. CATS bio-logger are manufactured by <http://www.cats.is>, AXY-TREK are manufactured by <http://www.technosmart.eu> and WACU are manufactured by <http://…/-MIBE-.html>.
RBMRB BMRB Data Access and Visualization
The Biological Magnetic Resonance Data Bank (BMRB,<http://…/> ) collects, annotates, archives, and disseminates (worldwide in the public domain) the important spectral and quantitative data derived from NMR(Nuclear Magnetic Resonance) spectroscopic investigations of biological macromolecules and metabolites. This package provides an interface to BMRB database for easy data access and includes a minimal set of data visualization functions. Users are encouraged to make their own data visualizations using BMRB data.
rbokeh R interface to Bokeh
R interface to Bokeh.
http://…parison-of-ggplot2-rbokeh-plotting-idioms
Rborist Extensible, Parallelizable Implementation of the Random Forest Algorithm
Scalable decision tree training and prediction.
RBPcurve The Residual-based Predictiveness Curve
The package provides a visual tool, the RBP curve, to assess the performance of prediction models.
rbraries Interface to the ‘Libraries.io’ API
Interface to the ‘Libraries.io’ API (<https://…/api> ). ‘Libraries.io’ indexes data from 36 different package managers for programming languages.
rbtc Bitcoin API
Implementation of the RPC-JSON API for Bitcoin and utility functions for address creation and content analysis of the blockchain.
rbtt Alternative Bootstrap-Based t-Test Aiming to Reduce Type-I Error for Non-Negative, Zero-Inflated Data
Tu & Zhou (1999) <doi:10.1002/(SICI)1097-0258(19991030)18:20%3C2749::AID-SIM195%3E3.0.CO;2-C> showed that comparing the means of populations whose data-generating distributions are non-negative with excess zero observations is a problem of great importance in the analysis of medical cost data. In the same study, Tu & Zhou discuss that it can be difficult to control type-I error rates of general-purpose statistical tests for comparing the means of these particular data sets. This package allows users to perform a modified bootstrap-based t-test that aims to better control type-I error rates in these situations.
rbvs Ranking-Based Variable Selection
Implements the Ranking-Based Variable Selection algorithm for variable selection in high-dimensional data.
RCA Relational Class Analysis
Relational Class Analysis (RCA) is a method for detecting heterogeneity in attitudinal data (as described in Goldberg A., 2011, Am. J. Soc, 116(5)).
RCALI Calculation of the Integrated Flow of Particles Between Polygons
Calculate the flow of particles between polygons by two integration methods: integration by a cubature method and integration on a grid of points. Annie Bouvier, Kien Kieu, Kasia Adamczyk and Herve Monod (2009) <doi:10.1016/j.envsoft.2008.11.006>.
rcane Different Numeric Optimizations to Estimate Parameter Coefficients
There are different numeric optimizations which are used in order to estimate coefficients in models such as linear regression and neural networks. This package covers parameter estimation in linear regression using different methods such as batch gradient descent, stochastic gradient descent, minibatch gradient descent and coordinate descent. Kiwiel, Krzysztof C (2001) <doi:10.1007/PL00011414> Yu Nesterov (2004) <ISBN:1-4020-7553-7> Ferguson, Thomas S (1982) <doi:10.1080/01621459.1982.10477894> Zeiler, Matthew D (2012) <arXiv:1212.5701> Wright, Stephen J (2015) <arXiv:1502.04759>.
rcartocolor CARTOColors’ Palettes
Provides color schemes for maps and other graphics designed by ‘CARTO’ as described at <https://…/>. It includes four types of palettes: aggregation, diverging, qualitative, and quantitative.
rCBA CBA Classifier for R
Provides implementations of rule pruning algorithms based on the ‘Classification Based on Associations’ (CBA). It can be used for building classification models from association rules. Rules are pruned in the order of precedence given by the sort criteria and a default rule is added. CBA was originally proposed by Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp80-86.
rcc Parametric Bootstrapping to Control Rank Conditional Coverage
Functions to implement the parametric and non-parametric bootstrap confidence interval methods described in Morrison and Simon (2017) <arXiv:1702.06986>.
Rcereal C++ Header Files of ‘cereal’
To facilitate using ‘cereal’ with Rcpp. ‘cereal’ is a header-only C++11 serialization library. ‘cereal’ takes arbitrary data types and reversibly turns them into different representations, such as compact binary encodings, XML, or JSON. ‘cereal’ was designed to be fast, light-weight, and easy to extend – it has no external dependencies and can be easily bundled with other code or used standalone.
rchallenge A Simple Datascience Challenge System
A simple datascience challenge system using R Markdown and Dropbox. It requires no network configuration, does not depend on external platforms like e.g. Kaggle and can be easily installed on a personal computer.
rCharts Interactive JS Charts from R http://rcharts.io
rCharts is an R package to create, customize and publish interactive javascript visualizations from R using a familiar lattice style plotting interface.
rchie A Parser for ‘ArchieML’
Parses the ‘ArchieML’ format from the New York Times <http://archieml.org>. Also provides utilities for retrieving Google Drive documents for parsing.
RchivalTag Analyzing Archival Tagging Data
A set of functions to generate, access and analyze standard data products from archival tagging data.
RChronoModel Post-Processing of the Markov Chain Simulated by ChronoModel
Provides a list of functions for the statistical analysis and the post-processing of the Markov Chains simulated by ChronoModel (see <http://www.chronomodel.fr> for more information). ChronoModel is a friendly software to construct a chronological model in a Bayesian framework. Its output is a sampled Markov chain from the posterior distribution of dates component the chronology.
rcitoid Client for ‘Citoid’
Client for ‘Citoid’ (<https://…/Citoid> ), an API for getting citations for various scholarly work identifiers found on ‘Wikipedia’.
Rclean A Tool for Writing Cleaner, more Transparent Code
To create clearer, more concise code provides this toolbox helps coders to isolate the essential parts of a script that produces a chosen result, such as an object, tables and figures written to disk and even warnings and errors. This work was funded by US National Science Foundation grant SSI-1450277 for applications of End-to-End Data Provenance.
RClickhouse A ‘DBI’ Interface to the ‘Yandex Clickhouse’ Database Providing Basic ‘dplyr’ Support
Yandex Clickhouse’ (<https://…/> ) is a high-performance relational column-store database to enable big data exploration and ‘analytics’ scaling to petabytes of data. Methods are provided that enable working with ‘Yandex Clickhouse’ databases via ‘DBI’ ‘methods’ and using ‘dplyr’/’dbplyr’ idioms.
rclipboard Shiny/R Wrapper for ‘clipboard.js’
Leverages the functionality of ‘clipboard.js’, a JavaScript library for HMTL5-based copy to clipboard from web pages (see <https://clipboardjs.com> for more information), and provides a reactive copy-to-clipboard UI button component, called ‘rclipButton’, for ‘shiny’ R applications.
rClr R package for accessing .NET
An R package to inter-operate with arbitrary .NET code (Microsoft .NET and Mono).
https://rclr.codeplex.com
rcmdcheck Run ‘R CMD check’ from ‘R’ and Capture Results
Run ‘R CMD check’ from ‘R’ programmatically, and capture the results of the individual checks.
RcmdrPlugin.aRnova R Commander Plug-in for Repeated-Measures ANOVA
R Commander plug-in for repeated-measures and mixed-design (‘split-plot’) ANOVA. It adds a new menu entry for repeated measures that allows to deal with up to three within-subject factors and optionally with one or several between-subject factors. It also provides supplementary options to oneWayAnova() and multiWayAnova() functions, such as choice of ANOVA type, display of effect sizes and post hoc analysis for multiWayAnova().
RcmdrPlugin.BiclustGUI Rcmdr’ Plug-in GUI for Biclustering
A plug-in for R Commander (‘Rcmdr’). The package is a Graphical User Interface (GUI) in which several biclustering methods can be executed, followed by diagnostics and plots of the results. Further, the GUI also has the possibility to connect the methods to more general diagnostic packages for biclustering. Biclustering methods from ‘biclust’, ‘fabia’, ‘s4vd’, ‘iBBiG’, ‘isa2’, ‘rqubic’ and ‘BicARE’ are implemented. Additionally, ‘superbiclust’ and ‘BcDiag’ are also implemented to be able to further investigate results. The GUI also provides a couple of extra utilities to export, save, search through and plot the results. ‘RcmdrPlugin.BiclustGUI’ also provides a very specific framework for biclustering in which new methods, diagnostics and plots can be added. Scripts were prepared so that R-package developers can freely design their own dialogs in the GUI which can then be added by the maintainer of ‘RcmdrPlugin.BiclustGUI’. These scripts do not required any knowledge of ‘tcltk’ and ‘Rcmdr’ and are easy to fill in.
RcmdrPlugin.Export Export R Output to LaTeX or HTML
Export Rcmdr output to LaTeX or HTML code. The plug-in was originally intended to facilitate exporting Rcmdr output to formats other than ASCII text and to provide R novices with an easy-to-use, easy-to-access reference on exporting R objects to formats suited for printed output. The package documentation contains several pointers on creating reports, either by using conventional word processors or LaTeX/LyX.
RcmdrPlugin.FuzzyClust R Commander Plug-in for Fuzzy Clustering Methods (Fuzzy C-Means and Gustafson Kessel)
The R Commander Plug-in for Fuzzy Clustering Methods. This Plug- in provide Graphical User Interface of 2 methods of Fuzzy Clustering (Fuzzy C- Means /FCM and Gustafson Kessel-Babuska). For validation of clustering, this plug- in use Xie Beni Index, MPC index, and CE index. For statistical test (test of significant differences of grouping/clustering), this plug-in use MANOVA analysis with Pillai trace statistics. For stabilize the result, this package provide soft voting cluster ensemble function. Visualization of result are provided via plugin that must be load in Rcmdr file.
RcmdrPlugin.GWRM R Commander Plug-in for Fitting Generalized Waring Regression Models
Provides an Rcmdr plug-in based on the ‘GWRM’ package.
RcmdrPlugin.KMggplot2 An Rcmdr Plug-in for Kaplan-Meier Plots and Other Plots by Using the ggplot2 Package
A GUI front-end for ggplot2 allows Kaplan-Meier plot, histogram, Q-Q plot, box plot, errorbar plot, scatter plot, line chart, pie chart, bar chart, contour plot, and distribution plot.
RcmdrPlugin.OptimClassifier Create the Best Train for Classification Models
An R Commander ‘plug-in’ providing an interface to OptimClassifier functions.
RcmdrPlugin.PcaRobust R Commander Plug-in for Robust Principal Component Analysis
The R commander plug-in for robust principal component analysis. The Graphical User Interface for Principal Component Analysis (PCA) with Hubert Algorithm method.
RcmdrPlugin.RiskDemo R Commander Plug-in for Risk Demonstration
R Commander plug-in to demonstrate various actuarial and financial risks. It includes valuation of bonds and stocks, portfolio optimization, classical ruin theory and demography.
RcmdrPlugin.sutteForecastR Rcmdr’ Plugin for Alpha-Sutte Indicator ‘sutteForecastR’
The ‘sutteForecastR’ is a package of Alpha-Sutte indicator. To make the ‘sutteForecastR’ user friendly, so we develop an ‘Rcmdr’ plug-in based on the Alpha-Sutte indicator function.
RcmdrPlugin.TeachStat RCommander Plugin for Teaching Statistical Methods
RCommander plugin for teaching statistical methods. It adds a new menu for making easier the teaching of the main concepts about the main statistical methods.
Rcoclust Co-Clustering with Document-Term Matrix
Several co-clustering algorithms are implemented for sparse binary and contingency matrices. They aim at a simultaneous clustering of the rows and the columns via an objective function.
RColorBrewer ColorBrewer Palettes
Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org
rcorpora A Collection of Small Text Corpora of Interesting Data
A collection of small text corpora of interesting data. It contains all data sets from https://…/corpora . Some examples: names of animals: birds, dinosaurs, dogs; foods: beer categories, pizza toppings; geography: English towns, rivers, oceans; humans: authors, US presidents, occupations; science: elements, planets; words: adjectives, verbs, proverbs, US president quotes.
Rcpp Seamless R and C++ Integration
The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries. Documentation about ‘Rcpp’ is provided by several vignettes included in this package, via the ‘Rcpp Gallery’ site at <http://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, JSS), and the book by Eddelbuettel (2013, Springer); see ‘citation(‘Rcpp’)’ for details on these last two.
RcppAlgos Tools for Combinatorics and Computational Mathematics
Provides optimized functions implemented in C++ with ‘Rcpp’. There is a generalized combinations function that is highly efficient (both speed and memory). There are optional contraint arguments that when employed, generate all combinations of a vector meeting a specific criteria (E.g. finding all combinations such that the sum is less than a bound). Additionally, there are various sieving functions that quickly generate essential components for problems common in computational mathematics (E.g. number of comprime elements, divisors, prime factorizations, and complete factorizations for many numbers as well as generating primes in a range).
RcppAnnoy Rcpp Bindings for Annoy, a Library for Approximate Nearest Neighbors
Annoy is a small C++ library for Approximate Nearest Neighbors written for efficient memory usage as well an ability to load from / save to disk. This package provides an R interface by relying on the Rcpp and BH packages, exposing the same interface as the original Python wrapper to Annoy. See https://…/annoy for more on Annoy. Annoy is released under Version 2.0 of the Apache License. Also included is a small Windows port of mmap which is released under the MIT license.
RcppAPT Rcpp Interface to the APT Package Manager
Debian and its derivatives like Ubuntu utilize a powerful package managing backend / frontend combination in APT (A Packaging Tool). Accessible at the command-line via front-ends apt , apt-get , apt-cache , … as well as numerous GUI variants, it is implemented using a library libapt-pkg . This small package provides R with access to this library via Rcpp.
RcppArrayFire R and ArrayFire library via Rcpp.
RcppBlaze Rcpp’ Integration for the ‘Blaze’ High-Performance C++ Math Library
Blaze’ is an open-source, high-performance C++ math library for dense and sparse arithmetic. With its state-of-the-art Smart Expression Template implementation Blaze combines the elegance and ease of use of a domain-specific language with HPC-grade performance, making it one of the most intuitive and fastest C++ math libraries available. The Blaze library offers: – high performance through the integration of BLAS libraries and manually tuned HPC math kernels – vectorization by SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2, AVX-512, FMA, and SVML – parallel execution by OpenMP, C++11 threads and Boost threads (Boost threads is disables in RcppBlaze) – the intuitive and easy to use API of a domain specific language – unified arithmetic with dense and sparse vectors and matrices – thoroughly tested matrix and vector arithmetic – completely portable, high quality C++ source code The RcppBlaze package includes the header files from the Blaze library with disabling some functionalities related to link to the thread and system libraries which make RcppBlaze be a header-only library. Therefore, users do not need to install Blaze and the dependency Boost. Blaze is licensed under the New (Revised) BSD license, while RcppBlaze (the Rcpp bindings/bridge to Blaze) is licensed under the GNU GPL version 2 or later, as is the rest of Rcpp. Note that since latest version of Blaze (3.0) commit to C++14 which does not used by most R users, we will use the version 2.6 of Blaze which is C++98 compatible to support the most compilers and system.
RcppCCTZ Rcpp’ Bindings for the ‘CCTZ’ Library
Rcpp’ Access to the ‘CCTZ’ timezone library is provided. ‘CCTZ’ is a C++ library for translating between absolute and civil times using the rules of a time zone. The ‘CCTZ’ source code, released under the Apache 2.0 License, is included in this package. See <https://…/cctz> for more details.
RcppCNPy Read-Write Support for NumPy Files via Rcpp
The cnpy library written by Carl Rogers provides read and write facilities for files created with (or for) the NumPy extension for Python. Vectors and matrices of numeric types can be read or written to and from files as well as compressed files. Support for integer files is available if the package has been built with -std=c++11 which is the default starting with release 0.2.3 following the release of R 3.1.0.
RcppCWB Rcpp’ Bindings for the ‘Corpus Workbench’ (‘CWB’)
Rcpp’ Bindings for the C code of the ‘Corpus Workbench’ (‘CWB’), an indexing and query engine to efficiently analyze large corpora (<http://cwb.sourceforge.net> ). ‘RcppCWB’ is licensed under the GNU GPL-3, in line with the GPL-3 license of the ‘CWB’ (<https://…/GPL-3> ). The ‘CWB’ relies on ‘pcre’ (BSD license, see <https://…/licence.txt> ) and ‘GLib’ (LGPL license, see <https://…/lgpl-3.0.en.html> ). See the file LICENSE.note for further information.
RcppDE Global Optimization by Differential Evolution in C++
An efficient C++ based implementation of the ‘DEoptim’ function which performs global optimization by differential evolution. Its creation was motivated by trying to see if the old approximation ‘easier, shorter, faster: pick any two’ could in fact be extended to achieving all three goals while moving the code from plain old C to modern C++. The initial version did in fact do so, but a good part of the gain was due to an implicit code review which eliminated a few inefficiencies which have since been eliminated in ‘DEoptim’.
RcppDL Deep Learning Methods via Rcpp
This package is based on the C++ code from Yusuke Sugomori, which implements basic machine learning methods with many layers (deep learning), including dA (Denoising Autoencoder), SdA (Stacked Denoising Autoencoder), RBM (Restricted Boltzmann machine) and DBN (Deep Belief Nets).
RcppEigen Rcpp’ Integration for the ‘Eigen’ Templated Linear Algebra Library
R and ‘Eigen’ integration using ‘Rcpp’. ‘Eigen’ is a C++ template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers, decompositions of such matrices, and solutions of linear systems. Its performance on many algorithms is comparable with some of the best implementations based on ‘Lapack’ and level-3 ‘BLAS’. The ‘RcppEigen’ package includes the header files from the ‘Eigen’ C++ template library (currently version 3.2.5). Thus users do not need to install ‘Eigen’ itself in order to use ‘RcppEigen’. Since version 3.1.1, ‘Eigen’ is licensed under the Mozilla Public License (version 2); earlier version were licensed under the GNU LGPL version 3 or later. ‘RcppEigen’ (the ‘Rcpp’ bindings/bridge to ‘Eigen’) is licensed under the GNU GPL version 2 or later, as is the rest of ‘Rcpp’.
RcppEigenAD Compiles ‘C++’ Code using ‘Rcpp’, ‘Eigen’ and ‘CppAD’ to Produce First and Second Order Partial Derivatives
Compiles ‘C++’ code using ‘Rcpp’, ‘Eigen’ and ‘CppAD’ to produce first and second order partial derivatives. Also provides an implementation of Faa’ di Bruno’s formula to combine the partial derivatives of composed functions, (see Hardy, M (2006) <arXiv:math/0601149v1>).
RcppEnsmallen Header-Only C++ Mathematical Optimization Library for ‘Armadillo’
Ensmallen’ is a templated C++ mathematical optimization library (by the ‘MLPACK’ team) that provides a simple set of abstractions for writing an objective function to optimize. Provided within are various standard and cutting-edge optimizers that include full-batch gradient descent techniques, small-batch techniques, gradient-free optimizers, and constrained optimization. The ‘RcppEnsmallen’ package includes the header files from the ‘Ensmallen’ library and pairs the appropriate header files from ‘armadillo’ through the ‘RcppArmadillo’ package. Therefore, users do not need to install ‘Ensmallen’ nor ‘Armadillo’ to use ‘RcppEnsmallen’. Note that ‘Ensmallen’ is licensed under 3-Clause BSD, ‘Armadillo’ starting from 7.800.0 is licensed under Apache License 2, ‘RcppArmadillo’ (the ‘Rcpp’ bindings/bridge to ‘Armadillo’) is licensed under the GNU GPL version 2 or later. Thus, ‘RcppEnsmallen’ is also licensed under similar terms. Note that ‘Ensmallen’ requires a compiler that supports ‘C++11’ and ‘Armadillo’ 6.500 or later.
RcppExamples Examples using ‘Rcpp’ to Interface R and C++
Examples for Seamless R and C++ integration The ‘Rcpp’ package contains a C++ library that facilitates the integration of R and C++ in various ways. This package provides some usage examples. Note that the documentation in this package currently does not cover all the features in the package. It is not even close. On the other hand, the site <http://gallery.rcpp.org> is regrouping a large number of examples for ‘Rcpp’.
RcppFaddeeva Rcpp’ Bindings for the ‘Faddeeva’ Package
Access to a family of Gauss error functions for arbitrary complex arguments is provided via the ‘Faddeeva’ package by Steven G. Johnson (see http://…/Faddeeva_Package for more information).
RcppGetconf Rcpp’ Interface for Querying System Configuration Variables
The ‘getconf’ command-line tool provided by ‘libc’ allows querying of a large number of system variables. This package provides similar functionality.
RcppGreedySetCover Greedy Set Cover
A fast implementation of the greedy algorithm for the set cover problem using ‘Rcpp’.
RcppGSL Rcpp Integration for GNU GSL Vectors and Matrices
Rcpp integration for GNU GSL vectors and matrices The GNU Scientific Library (GSL) is a collection of numerical routines for scientific computing. It is particularly useful for C and C++ programs as it provides a standard C interface to a wide range of mathematical routines such as special functions, permutations, combinations, fast fourier transforms, eigensystems, random numbers, quadrature, random distributions, quasi-random sequences, Monte Carlo integration, N-tuples, differential equations, simulated annealing, numerical differentiation, interpolation, series acceleration, Chebyshev approximations, root-finding, discrete Hankel transforms physical constants, basis splines and wavelets. There are over 1000 functions in total with an extensive test suite. The RcppGSL package provides an easy-to-use interface between GSL data structures and R using concepts from Rcpp which is itself a package that eases the interfaces between R and C++. This package also serves as a prime example of how to build a package that uses Rcpp to connect to another third-party library. The autoconf script, inline plugin and example package can all be used as a stanza to write a similar package against another library.
RcppHMM Rcpp Hidden Markov Model
Collection of functions to evaluate sequences, decode hidden states and estimate parameters from a single or multiple sequences of a discrete time Hidden Markov Model. The observed values can be modeled by a multinomial distribution for categorical emissions, a mixture of Gaussians for continuous data and also a mixture of Poissons for discrete values. It includes functions for random initialization, simulation, backward or forward sequence evaluation, Viterbi or forward-backward decoding and parameter estimation using an Expectation-Maximization approach.
RcppHNSW Rcpp’ Bindings for ‘hnswlib’, a Library for Approximate Nearest Neighbors
Hnswlib’ is a C++ library for Approximate Nearest Neighbors. This package provides a minimal R interface by relying on the ‘Rcpp’ package. See <https://…/hnswlib> for more on ‘hnswlib’. ‘hnswlib’ is released under Version 2.0 of the Apache License.
RcppMeCab rcpp’ Wrapper for ‘mecab’ Library
R package based on ‘Rcpp’ for ‘MeCab’: Yet Another Part-of-Speech and Morphological Analyzer. The purpose of this package is providing a seamless developing and analyzing environment for CJK texts. This package utilizes parallel programming for providing highly efficient text preprocessing ‘posParallel()’ function. For installation, please refer to README.md file.
RcppMgsPack MsgPack Headers for R
This package provides R with MessagePack header files. MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it is faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves. MessagePack is used by Redis and many other projects. To use this package, simply add it to the LinkingTo: field in the DESCRIPTION field of your R package—and the R package infrastructure tools will then know how to set include flags correctly on all architectures supported by R.
RcppMLPACK Rcpp Integration for MLPACK Library
MLPACK is an intuitive, fast, scalable C++ machine learning library, meant to be a machine learning analog to LAPACK. It aims to implement a wide array of machine learning methods and function as a Swiss army knife for machine learning researchers: MLPACK is from http://…/: sources are included in the package.
RcppMovStat Fast Moving Statistics Calculation
Provides several efficient functions to calculate common moving (or rolling, running) statistics for both evenly and unevenly spaced time series: moving average, moving median, moving maximum (minimum), and so on. Built on ‘C++’, these functions are apparently more efficient than those written in a traditional ‘R’ way and also faster than others using package ‘Rcpp’.
RcppMsgPack MsgPack’ C++ Header Files
MessagePack’ is an efficient binary serialization format. It lets you exchange data among multiple languages like ‘JSON’. But it is faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves. This package provides headers from the ‘msgpack-c’ implementation for C and C++(11) for use by R, particularly ‘Rcpp’. The included ‘msgpack-c’ headers are licensed under the Boost Software License (Version 1.0); the code added by this package as well the R integration are licensed under the GPL (>= 2). See the files ‘COPYRIGHTS’ and ‘AUTHORS’ for a full list of copyright holders and contributors to ‘msgpack-c’.
RcppNT2 Bindings to the Numerical Template Toolbox (NT2)
RcppNT2 is an R package that provides bindings to the Numerical Template Toolbox (NT2). It provides a framework for implementing highly optimizable algorithms, taking advantage of SIMD instructions when possible, and falling back to scalar operations when not.
RcppNumerical Rcpp’ Integration for Numerical Computing Libraries
A collection of open source libraries for numerical computing (numerical integration, optimization, etc.) and their integration with ‘Rcpp’.
RcppQuantuccia R Bindings to the ‘Quantuccia’ Header-Only Essentials of ‘QuantLib’
QuantLib’ bindings are provided for R using ‘Rcpp’ and the header-only ‘Quantuccia’ variant (put together by Peter Caspers) offering an essential subset of ‘QuantLib’. See the included file ‘AUTHORS’ for a full list of contributors to both ‘QuantLib’ and ‘Quantuccia’.
RcppRedis Rcpp’ Bindings for ‘Redis’ using the ‘hiredis’ Library
Connection to the ‘Redis’ key/value store using the C-language client library ‘hiredis’.
RcppRoll Fast rolling functions through Rcpp and RcppArmadillo
RcppRoll supplies fast functions for ‘roll’ing over vectors and matrices, e.g. rolling means, medians and variances. It also provides the utility functions ‘rollit’ and ‘rollit_raw’ as an interface for generating your own C++ backed rolling functions.
RcppShark R Interface to the Shark Machine Learning Library
An R interface to the C++/Boost Shark machine learning library.
RcppStreams Rcpp Integration of the Streamulus DSEL for Stream Processing
The Streamulus (template, header-only) library by Irit Katriel (at https://…/streamulus ) provides a very powerful yet convenient framework for stream processing. This package connects Streamulus to R by providing both the header files and all examples.
RcppThread R-Friendly Threading in C++
Provides a C++11-style thread class and thread pool that can safely be interrupted from R.
RcppXPtrUtils XPtr Add-Ons for ‘Rcpp’
Provides the means to compile user-supplied C++ functions with ‘Rcpp’ and retrieve an ‘XPtr’ that can be passed to other C++ components.
RcppZiggurat Rcpp’ Integration of Different ‘Ziggurat’ Normal RNG Implementations
The Ziggurat generator for normally distributed random numbers, originally proposed by Marsaglia and Tsang (JSS, 2000), has been improved upon a few times starting with Leong et al (JSS, 2005). This package provides an aggregation in order to compare different implementations. The goal is to provide an ‘faster but good enough’ alternative for use with R and C++ code. The package is still in an early state. Unless you know what you are doing, sticking with the generators provided by R may be a good idea as these have been extremely diligently tested.
rcqp Interface to the Corpus Query Protocol
Implements Corpus Query Protocol functions based on the CWB software. Rely on CWB (GPL v2), PCRE (BSD licence), glib2 (LGPL).
Rcrawler Web Crawler and Scraper
Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <DOI:10.1016/j.softx.2017.04.004>.
rcreds Securely Use Credentials Within R Scripts
Tools to write a list of credentials to an encrypted file and later read from that file into R. The goal is to have a useful alternative to including username/passwords as part of a script or even stored in the clear in a separate text file. Additional tools provided which are specific for connecting to a database.
Rcriticor Critical Periods
Pierre’s correlogram. Research of critical periods in the past. Integrates a time series in a given window.
rCRM Regularized Continual Reassessment Method
Fit a 2-parameter continual reassessment method (CRM) model (O’Quigley and Shen (1996), <doi: 10.2307/2532905>) regularized with L2 norm (Friedman et al. (2010), <doi: 10.18637/jss.v033.i01>) adjusted by the distance with the target dose limiting toxicity (DLT) rate.
RCRnorm An Integrated Regression Model for Normalizing ‘NanoString nCounter’ Data
NanoString nCounter’ is a medium-throughput platform that measures gene or microRNA expression levels. Here is a publication that introduces this platform: Malkov (2009) <doi:10.1186/1756-0500-2-80>. Here is the webpage of ‘NanoString nCounter’ where you can find detailed information about this platform <https://…/ncounter-technology>. It has great clinical application, such as diagnosis and prognosis of cancer. Implements integrated system of random-coefficient hierarchical regression model to normalize data from ‘NanoString nCounter’ platform so that noise from various sources can be removed.
rcrtan Criterion-Referenced Test Analysis
Contains methods for criterion-referenced test analyses as described by Brown & Hudson (2002) in Criterion-referenced Language Testing (ISBN: 9780521000833). This includes cut-score item discrimination analyses and measures of dependability.
rcrypt Symmetric File Encryption Using GPG
Provides easy symmetric file encryption using GPG with cryptographically strong defaults. Only symmetric encryption is supported. GPG is pre-installed with most Linux distributions. Windows users will need to install Gpg4win (http://www.gpg4win.org ). OS X users will need to install GPGTools (https://gpgtools.org ).
rcss Convex Switching Systems
The numerical treatment of optimal switching problems in a finite time setting when the state evolves as a controlled Markov chain consisting of a uncontrolled continuous component following linear dynamics and a controlled Markov chain taking values in a finite set. The reward functions are assumed to be convex and Lipschitz continuous in the continuous state. The action set is finite.
Rcssplot R plots styled with css
The Rcssplot package brings cascading style sheets to the R graphical environment. It provides a means to separate the aesthetics from data crunching in plots and charts.
http://…/?tutorial=introduction
rcure Robust Cure Models for Survival Analysis
Implements robust cure models for survival analysis by incorporate a weakly informative prior in the logistic part of cure models. Estimates prognostic accuracy, i.e. AUC, k-index and c-index, with bootstrap confidence interval for cure models.
RCurl General network (HTTP/FTP/…) client interface for R
The package allows one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP/… connection and the form of the request while providing a higher-level interface than is available just using R socket connections. Additionally, the underlying implementation is robust and extensive, supporting FTP/FTPS/TFTP (uploads and downloads), SSL/HTTPS, telnet, dict, ldap, and also supports cookies, redirects, authentication, etc.
rcv Ranked Choice Voting
A collection of ranked choice voting data and functions to manipulate, run elections with, and visualize this data and others. It can bring in raw data, transform it into a ballot you can read, and return election results for an RCV contest.
Rd2md Markdown Reference Manuals
The native R functionalities only allow PDF exports of reference manuals. This shall be extended by converting the package documentation files into markdown files and combining them into a markdown version of the package reference manual.
rda Shrunken Centroids Regularized Discriminant Analysis
Shrunken Centroids Regularized Discriminant Analysis for the classification purpose in high dimensional data.
rddensity Manipulation Testing Based on Density Discontinuity
Density discontinuity test (a.k.a. manipulation test) is commonly employed in regression discontinuity designs and other treatment effect settings to detect whether there is evidence suggesting perfect self-selection (manipulation) around a cutoff where a treatment/policy assignment changes. This package provides tools for conducting the aforementioned statistical test: rddensity() to construct local polynomial based density discontinuity test given a prespecified cutoff, and rdbwdensity() to perform bandwidth selection.
rddtools Toolbox for Regression Discontinuity Design (‘RDD’)
Set of functions for Regression Discontinuity Design (‘RDD’), for data visualisation, estimation and testing.
rde Reproducible Data Embedding
Allows caching of raw data directly in R code. This allows R scripts and R Notebooks to be shared and re-run on a machine without access to the original data. Cached data is encoded into an ASCII string that can be pasted into R code. When the code is run, the data is automatically loaded from the cached version if the original data file is unavailable. Works best for small datasets (a few hundred observations).
rDEA Robust Data Envelopment Analysis (DEA) for R
Data Envelopment Analysis for R, estimating robust DEA scores without and with environmental variables and doing returns-to-scale tests.
rdflib Tools to Manipulate and Query Semantic Data
The Resource Description Framework, or ‘RDF’ is a widely used data representation model that forms the cornerstone of the Semantic Web. ‘RDF’ represents data as a graph rather than the familiar data table or rectangle of relational databases. The ‘rdflib’ package provides a friendly and concise user interface for performing common tasks on ‘RDF’ data, such as reading, writing and converting between the various serializations of ‘RDF’ data, including ‘rdfxml’, ‘turtle’, ‘nquads’, ‘ntriples’, and ‘json-ld’; creating new ‘RDF’ graphs, and performing graph queries using ‘SPARQL’. This package wraps the low level ‘redland’ R package which provides direct bindings to the ‘redland’ C library. Additionally, the package supports the newer and more developer friendly ‘JSON-LD’ format through the ‘jsonld’ package. The package interface takes inspiration from the Python ‘rdflib’ library.
rdfp An Implementation of the ‘DoubleClick for Publishers’ API
An implementation of Google’s ‘DoubleClick for Publishers’ (DFP) API <https://…/start>. This package is automatically compiled from the API WSDLs (Web Service Description Language) files to dictate how the API is structured. Theoretically, all API actions are possible using this package; however, care must be taken to format the inputs correctly and parse the outputs correctly as well. Please see Google’s DFP API reference and this package’s website <https://…/> for more information, documentation, and examples.
RDFTensor Different Tensor Factorization (Decomposition) Techniques for RDF Tensors (Three-Mode-Tensors)
Different Tensor Factorization techniques suitable for RDF Tensors. RDF Tensors are three-mode-tensors, binary tensors and usually very sparse. Currently implemented methods are ‘RESCAL’ Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel (2012) <doi:10.1145/2187836.2187874>, ‘NMU’ Daniel D. Lee and H. Sebastian Seung (1999) <doi:10.1038/44565>, ‘ALS’, Alternating Least Squares ‘parCube’ Papalexakis, Evangelos, C. Faloutsos, and N. Sidiropoulos (2012) <doi:10.1007/978-3-642-33460-3_39>, ‘CP_APR’ C. Chi and T. G. Kolda (2012) <doi:10.1137/110859063>. The code is mostly converted from MATLAB and Python implementations of these methods. The package also contains functions to get Boolean (Binary) transformation of the real-number-decompositions. These methods also are for general tensors, so with few modifications they can be applied for other types of tensor.
rdhs API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://…/index.html> ), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
rdi Repertoire Dissimilarity Index
Methods for calculation and visualization of the Repertoire Dissimilarity Index. Citation: Bolen and Rubelt, et al (2017) <doi:10.1186/s12859-017-1556-5>.
Rdice A Collection of Functions to Experiment Dice Rolls
A collection of functions to simulate dice rolls and the like. In particular, experiments and exercises can be performed looking at combinations and permutations of values in dice rolls and coin flips, together with the corresponding frequencies of occurrences. When applying each function, the user has to input the number of times (rolls, flips) to toss the dice. Needless to say, the more the tosses, the more the frequencies approximate the actual probabilities. Moreover, the package provides functions to generate non-transitive sets of dice (like Efron’s) and to check whether a given set of dice is non-transitive with given probability.
Rdimtools Dimension Reduction and Estimation Methods
We provide a rich collection of linear and nonlinear dimension reduction techniques implemented using ‘RcppArmadillo’. The question on what we should use as the target dimension is addressed by intrinsic dimension estimation methods introduced as well. For more details on dimensionality techniques, see the paper by Ma and Zhu (2013) <doi:10.1111/j.1751-5823.2012.00182.x> if you are interested in statistical approach, or Engel, Huttenberger, and Hamann (2012) <doi:10.4230/OASIcs.VLUDS.2011.135> for a broader cross-disciplinary overview.
rdist Calculate Pairwise Distances
A common framework for calculating distance matrices.
rdiversity Measurement and Partitioning of Similarity-Sensitive Biodiversity
Provides a framework for the measurement and partitioning of the (similarity-sensitive) biodiversity of a metacommunity and its constituent subcommunities. Richard Reeve, et al. (2015) <arXiv:1404.6520v3>.
rdmulti Analysis of RD Designs with Multiple Cutoffs or Scores
The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The ‘rdmulti’ package provides tools to analyze RD designs with multiple cutoffs or scores: rdmc() estimates pooled and cutoff specific effects for multi-cutoff designs, rdmcplot() draws RD plots for multi-cutoff designs and rdms() estimates effects in cumulative cutoffs or multi-score designs. See Cattaneo, Titiunik and Vazquez-Bare (2018) <https://…Titiunik-VazquezBare_2018_rdmulti.pdf> for further methodological details.
rdoc Colourised R Documentation
Extends tools::Rd2txt() by adding customisable text and colour formatting to R documentation contents. If used from a terminal, output will be displayed via file.show() otherwise contents will be printed in sections. Also provides stand-in replacements for ?() and help().
RDocumentation Integrate R with ‘RDocumentation.org’
Wraps around the default help functionality in R. Instead of plain documentation files, documentation will now show up as it does on ‘RDocumentation.org’, a platform that shows R documentation from CRAN, GitHub and Bioconductor, together with informative stats to assess the package quality and possibilities to discuss packages.
rdomains Get the Category of Content Hosted by a Domain
Get the category of content hosted by a domain. Use Shallalist <http://…/>, Virustotal (which provides access to lots of services) <https://…/>, McAfee <https://…/>, Alexa <https://…/>, DMOZ <http://…/>, or validated machine learning classifiers based on Shallalist data to learn about the kind of content hosted by a domain.
rDotNet Low-Level Interface to the ‘.NET’ Virtual Machine Along the Lines of the R C/Call API
Low-level interface to ‘.NET’ virtual machine along the lines of the R C .call interface. Can create ‘.NET’ object, call methods, get or set properties, call static functions, etc.
rdoxygen Create Doxygen Documentation for Source Code
Create doxygen documentation for source code in R packages. Includes a RStudio Addin, that allows to trigger the doxygenize process.
rdpower Power Calculations for RD Designs
The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The ‘rdpower’ package provides tools to perform power and sample size calculations in RD designs: rdpower() calculates the power of an RD design and rdsampsi() calculates the required sample size to achieve a desired power. See Cattaneo, Titiunik and Vazquez-Bare (2018) <https://…o-Titiunik-VazquezBare_2018_Stata.pdf> for further methodological details.
RDQA R package for Qualitative Data Analysis
RDQA is a R package for Qualitative Data Analysis, a free (free as freedom) qualitative analysis software application (BSD license). It works on Windows, Linux/FreeBSD and Mac OSX platforms. RQDA is an easy to use tool to assist in the analysis of textual data. At the moment it only supports plain text formatted data. All the information is stored in a SQLite database via the R package of RSQLite. The GUI is based on RGtk2, via the aid of gWidgetsRGtk2. It includes a number of standard Computer-Aided Qualitative Data Analysis features. In addition it seamlessly integrates with R, which means that a) statistical analysis on the coding is possible, and b) functions for data manipulation and analysis can be easily extended by writing R functions. To some extent, RQDA and R make an integrated platform for both quantitative and qualitative data analysis.
Rdrools A Rules Engine for R Based on ‘Drools’
An R interface for using the popular Java based Drools, which is a Business Rule Management System (See <https://www.drools.org> for more information). This package allows you to run a set of rules written in DRL format on the data using the Drools engine.
Rdroolsjars Rdrools JARs
External jars required for package ‘Rdrools’.
rdrop2 Programmatic Interface to the Dropbox API
Provides full programmatic access to the Dropbox file hosting platform (dropbox.com), including support for all standard file operations.
RDS Respondent-Driven Sampling
Provides functionality for carrying out estimation with data collected using Respondent-Driven Sampling. This includes Heckathorn’s RDS-I and RDS-II estimators as well as Gile’s Sequential Sampling estimator. The package is part of the ‘RDS Analyst’ suite of packages for the analysis of respondent-driven sampling data.
Rdsdp R Interface to DSDP Semidefinite Programming Library
R interface to DSDP semidefinite programming library. Installs version 5.8 of DSDP from DSDP website. An existing installation of DSDP may be used by passing the proper configure arguments to the installation command.
RDStreeboot RDS Tree Bootstrap Method
A tree bootstrap method for estimating uncertainty in respondent-driven samples (RDS). Quantiles are estimated by multilevel resampling in such a way that preserves the dependencies of and accounts for the high variability of the RDS process.
rdtLite Provenance Collector
Defines functions that can be used to collect provenance as an R script executes or during a console session. The output is a text file in PROV-JSON format.
Rdtq Density Tracking by Quadrature
Implementation of density tracking by quadrature (DTQ) algorithms for stochastic differential equations (SDEs). DTQ algorithms numerically compute the density function of the solution of an SDE with user-specified drift and diffusion functions. The calculation does not require generation of sample paths, but instead proceeds in a deterministic fashion by repeatedly applying quadrature to the Chapman-Kolmogorov equation associated with a discrete-time approximation of the SDE. The DTQ algorithm is provably convergent. For several practical problems of interest, we have found the DTQ algorithm to be fast, accurate, and easy to use.
Rduino A Microcontroller Interface
Functions for connecting to and interfacing with an ‘Arduino’ or similar device. Functionality includes uploading of sketches, setting and reading digital and analog pins, and rudimentary servo control. This project is not affiliated with the ‘Arduino’ company, <https://…/>.
re2r RE2 Regular Expression
RE2 <https://…/re2> is a primarily deterministic finite automaton based regular expression engine from Google that is very fast at matching large amounts of text.
reactlog Reactivity Visualizer for ‘shiny’
Building interactive web applications with R is incredibly easy with ‘shiny’. Behind the scenes, ‘shiny’ builds a reactive graph that can quickly become intertwined and difficult to debug. ‘reactlog’ (Schloerke 2019) <doi:10.5281/zenodo.2591517> provides a visual insight into that black box of ‘shiny’ reactivity by constructing a directed dependency graph of the application’s reactive state at any time point in a reactive recording.
readability Calculate Readability Scores
Calculate readability scores by grouping variables. Readability is an approximation of the ease with which a reader parses and comprehends a written text. These scores use text attributes such as syllable counts, number of words, and number of characters to calculate an approximate grade level reading ease for the text. The readability scores that are calculated include: Flesch Kincaid, Gunning Fog Index, Coleman Liau, SMOG, and Automated Readability Index.
readbulk Read and Combine Multiple Data Files
Combine multiple data files from a common directory. The data files will be read into R and bound together, creating a single large data.frame. A general function is provided along with a specific function for data that was collected using the open-source experiment builder ‘OpenSesame’ <http://…/>.
readit Effortlessly Read Any Rectangular Data
Providing just one primary function, ‘readit’ uses a set of reasonable heuristics to apply the appropriate reader function to the given file path. As long as the data file has an extension, and the data is (or can be coerced to be) rectangular, readit() can probably read it.
readmnist Read MNIST Dataset
You can use the function Read.mnist() to read data and arrange them properly from MNIST dataset (the open handwriting digit database <http://…/> ). With this package, you can conveniently get all of necessary informations and then immediately start to check whether your machine learning algorithm works well. It can automatically recognize the type of dataset and returns the informations in corresponding structure.
readOffice Read Text Out of Modern Office Files
Reads in text from ‘unstructured’ modern Microsoft Office files (XML based files) such as Word and PowerPoint. This does not read in structured data (from Excel or Access) as there are many other great packages to that do so already.
readr Read Tabular Data
Read flat/tabular text files from disk.
readroper Simply Read ASCII Single and Multicard Polling Datasets
A convenient way to read fixed-width ASCII polling datasets from providers like the Roper Center <https://ropercenter.cornell.edu>.
readsdmx Read SDMX-XML Data
Read Statistical Data and Metadata Exchange (SDMX) XML data. This the main transmission format used in official statistics. Data can be imported from local SDMX-ML files or a SDMX web-service and will be read in ‘as is’ into a dataframe object. The ‘RapidXML’ C++ library <http://rapidxml.sourceforge.net> is used to parse the XML data.
readtext Import and Handling for Plain and Formatted Text Files
Functions for importing and handling text files and formatted text files with additional meta-data, such including ‘.csv’, ‘.tab’, ‘.json’, ‘.xml’, ‘.pdf’, ‘.doc’, ‘.docx’, ‘.xls’, ‘.xlsx’, and others.
readxl Read excel files (.xls and .xlsx) into R
The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies so it’s easy to install and use on all operating systems. It is designed to work with tabular data stored in a single sheet. Readxl supports both the legacy .xls format and the modern xml-based .xlsx format. .xls support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx , we use the RapidXML C++ library.
RealVAMS Multivariate VAM Fitting
The RealVAMs package fits a multivariate value-added model (VAM) (see Broatch and Lohr 2012) with normally distributed test scores and a binary outcome indicator. This material is based upon work supported by the National Science Foundation under grants DRL-1336027 and DRL-1336265.
Rearrangement Monotonize Point and Interval Functional Estimates by Rearrangement
The rearrangement operator (Hardy, Littlewood, and Polya 1952) for univariate, bivariate, and trivariate point estimates of monotonic functions. The package additionally provides a function that creates simultaneous confidence intervals for univariate functions and applies the rearrangement operator to these confidence intervals.
REAT Regional Economic Analysis Tools
Collection of analysis methods used in regional and urban economics and (quantitative) economic geography, e.g. measures of inequality, regional disparities and regional specialization.
rebus Build Regular Expressions in a Human Readable Way
Build regular expressions piece by piece using human readable code.
rebus.base Core Functionality for the ‘rebus’ Package
Build regular expressions piece by piece using human readable code. This package contains core functionality, and is primarily intended to be used by package developers.
rebus.datetimes Date and Time Extensions for the ‘rebus’ Package
Build regular expressions piece by piece using human readable code. This package contains date and time functionality, and is primarily intended to be used by package developers.
rebus.numbers Numeric Extensions for the ‘rebus’ Package
Build regular expressions piece by piece using human readable code. This package contains number-related functionality, and is primarily intended to be used by package developers.
rebus.unicode Unicode Extensions for the ‘rebus’ Package
Build regular expressions piece by piece using human readable code. This package contains Unicode functionality, and is primarily intended to be used by package developers.
RECA Relevant Component Analysis for Supervised Distance Metric Learning
Relevant Component Analysis (RCA) tries to find a linear transformation of the feature space such that the effect of irrelevant variability is reduced in the transformed space.
recipes Preprocessing Tools to Create Design Matrices
An extensible framework to create and preprocess design matrices. Recipes consist of one or more data manipulation and analysis ‘steps’. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting design matrices can then be used as inputs into statistical or machine learning models.
reclin Record Linkage Toolkit
Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities, forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage.
recmap Compute the Rectangular Statistical Cartogram
Provides an interface and a C++ implementation of the RecMap MP2 construction heuristic (see ‘citation(‘recmap’)’ for details). This algorithm draws maps according to a given statistical value (e. g. election results, population or epidemiological data). The basic idea of the RecMap algorithm is that each map region (e. g. different countries) is represented by a rectangle. The area of each rectangle represents the statistical value given as input . Documentation about ‘RecMap’ is provided by a vignette included in this package and a ‘RecMap gallery’ site at <http://…/gallery>.
recoder A Simple and Flexible Recoder
Simple, easy to use, and flexible functionality for recoding variables. It allows for simple piecewise definition of transformations.
recombinator Recombinate Nested Lists to Dataframes
Turns nested lists into data.frames in an orderly manner.
Recon Computational Tools for Economics
Implements solutions to canonical models of Economics such as Monopoly Profit Maximization, Cournot’s Duopoly, Solow (1956, <doi:10.2307/1884513>) growth model and Mankiw, Romer and Weil (1992, <doi:10.2307/2118477>) growth model.
reconstructr Session Reconstruction and Analysis
Functions to aid in reconstructing sessions and efficiently calculating an array of metrics from the resulting data, including bounce rate, time-on-page, and session length. Although primarily designed for web data and analytics, its approach is plausibly applicable to other domains.
recorder Toolkit to Validate New Data for a Predictive Model
A lightweight toolkit to validate new observations when computing their predictions with a predictive model. The validation process consists of two steps: (1) record relevant statistics and meta data of the variables in the original training data for the predictive model and (2) use these data to run a set of basic validation tests on the new set of observations.
RecordLinkage Record Linkage in R
Provides functions for linking and de-duplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain.
recordr R Provenance Tracking
Provide methods to record data provenance about R script executions. Provenance data includes files that were read and written by the script, along with information about the execution, such as start time end time, the R modules loaded during the execution, and other information describing the execution environment.
recosystem Recommender System using Matrix Factorization
R wrapper of the ‘libmf’ library (http://…/libmf ) for recommender system using matrix factorization. It is typically used to approximate an incomplete matrix using the product of two matrices in a latent space. Other common names for this task include ‘collaborative filtering’, ‘matrix completion’, ‘matrix recovery’, etc.
recurse Computes Revisitation Metrics for Trajectory Data
Computes revisitation metrics for trajectory data, such as the number of revisitations for each location as well as the time spent for that visit and the time since the previous visit. Also includes functions to plot data.
reda Recurrent Event Data Analysis
Functions that fit gamma frailty model with spline or piecewise constant baseline rate function for recurrent event data, compute and plot parametric mean cumulative function (MCF) from a fitted model as well as nonparametric sample MCF (Nelson-Aalen estimator) are provided. Most functions are S4 methods that produce S4 class objects.
RedditExtractoR Reddit Data Extraction Toolkit
Reddit is an online bulletin board and a social networking website where registered users can submit and discuss content. This package uses Reddit API to extract Reddit data using Reddit API. The retrieved data has flat structure, i.e. the relationship between comments is not preserved.This may be addressed in the next update of this package. Note that due to the API limitations, the number of comments that can extracted is limited to 500 per thread. The package consists of 3 functions, one for extracting relevant URLS, one for extracting features out of given URLs and one that does both together.
reddPrec Reconstruction of Daily Data – Precipitation
Computes quality control to daily precipitation datasets, reconstructs the original series by estimating precipitation in missing values, creates new series in a specified pair of coordinates and creates grids.
REddyProc Post Processing of (Half-)Hourly Eddy-Covariance Measurements
Standard and extensible Eddy-Covariance data post-processing includes uStar-filtering, gap-filling, and flux-partitioning. The Eddy-Covariance (EC) micrometeorological technique quantifies continuous exchange fluxes of gases, energy, and momentum between an ecosystem and the atmosphere. It is important for understanding ecosystem dynamics and upscaling exchange fluxes. (Aubinet et al. (2012) <doi:10.1007/978-94-007-2351-1>). This package inputs pre-processed (half-)hourly data and supports further processing. First, a quality-check and filtering is performed based on the relationship between measured flux and friction velocity (uStar) to discard biased data (Papale et al. (2006) <doi:10.5194/bg-3-571-2006>). Second, gaps in the data are filled based on information from environmental conditions (Reichstein et al. (2005) <doi:10.1111/j.1365-2486.2005.001002.x>). Third, the net flux of carbon dioxide is partitioned into its gross fluxes in and out of the ecosystem by night-time based and day-time based approaches (Lasslop et al. (2010) <doi:10.1111/j.1365-2486.2009.02041.x>).
redland RDF Library Bindings in R
Provides methods to parse, query and serialize information stored in the Resource Description Framework (RDF). RDF is described at <http://…/rdf-primer>. This package supports RDF by implementing an R interface to the Redland RDF C library, described at <http://…/index.html>. In brief, RDF provides a structured graph consisting of Statements composed of Subject, Predicate, and Object Nodes.
rEDM Applications of Empirical Dynamic Modeling from Time Series
Contains C++ compiled objects that use time delay embedding to perform state-space reconstruction and nonlinear forecasting and an R interface to those objects using ‘Rcpp’. It supports both the simplex projection method from Sugihara & May (1990) <DOI:10.1038/344734a0> and the S-map algorithm in Sugihara (1994) <DOI:10.1098/rsta.1994.0106>. In addition, this package implements convergent cross mapping as described by Sugihara et al. (2012) <DOI:10.1126/science.1227079>.
Redmonder Microsoft(r)-Inspired Color Palettes
Provide color schemes for maps (and other graphics) based on the color palettes of several Microsoft(r) products. Forked from ‘RColorBrewer’ v1.1-2.
redR REgularization by Denoising (RED)
Regularization by Denoising uses a denoising engine to solve many image reconstruction ill-posed inverse problems. This is a R implementation of the algorithm developed by Romano et.al. (2016) <arXiv:1611.02862>. Currently, only the gradient descent optimization framework is implemented. Also, only the median filter is implemented as a denoiser engine. However, (almost) any denoiser engine can be plugged in. There are currently available 3 reconstruction tasks: denoise, deblur and super-resolution. And again, any other task can be easily plugged into the main function ‘RED’.
redshiftTools Amazon Redshift Tools
Efficiently upload data into an Amazon Redshift database using the approach recommended by Amazon <https://…/>.
ref.ICAR Objective Bayes Intrinsic Conditional Autoregressive Model for Areal Data
Implements an objective Bayes intrinsic conditional autoregressive prior. This model provides an objective Bayesian approach for modeling spatially correlated areal data using an intrinsic conditional autoregressive prior on a vector of spatial random effects.
refinr Cluster and Merge Similar Values Within a Character Vector
These functions take a character vector as input, identify and cluster similar values, and then merge clusters together so their values become identical. The functions are an implementation of the key collision and ngram fingerprint algorithms from the open source tool Open Refine <http://…/>. More info on key collision and ngram fingerprint can be found here <https://…/Clustering-In-Depth>.
refnr Refining Data Table Using a Set of Formulas
A tool for refining data frame with formulas.
refund.shiny Interactive Plotting for Functional Data Analyses
Interactive plotting for functional data analyses.
regclass Tools for an Introductory Class in Regression and Modeling
Contains basic tools for visualizing, interpreting, and building regression models. It has been designed for use with the book Introduction to Regression and Modeling with R by Adam Petrie, Cognella Publishers.
regexPipes Wrappers Around ‘base::grep()’ for Use with Pipes
Provides wrappers around base::grep() where the first argument is standardized to take the data object. This makes it less of a pain to use regular expressions with ‘magrittr’ or other pipe operators.
regexr Readable Regular Expressions
An R framework for constructing human readable regular expressions. It aims to provide tools that enable the user to write regular expressions in a way that is similar to the ways R code is written. The tools allow the user to (1) write in smaller, modular, named, regular expression chunks, (2) write top to bottom, rather than a single string (3) comment individual chunks, (4) indent expressions to represent regular expression groups, and (5) test the validity of the concatenated expression and the modular chunks.
regexSelect Regular Expressions in ‘shiny’ Select Lists
shiny’ extension that adds regular expression filtering capabilities to the choice vector of the select list.
reghelper Helper Functions for Regression Analysis
A set of functions used to automate commonly used methods in regression analysis. This includes plotting interactions, calculating simple slopes, calculating standardized coefficients, etc. See the reghelper documentation for more information, documentation, and examples.
regnet Network-Based Regularization for Generalized Linear Models
Network-based regularization has achieved success in variable selections for high-dimensional biological data, due to its ability to incorporate the correlations among genomic features.This package provides procedures for fitting network-based regularization, minimax concave penalty (MCP) and lasso penalty for generalized linear models. This first version, regent0.1.0, focuses on binary outcomes. Functions for continuous, survival outcomes and other regularization methods will be included in the forthcoming upgraded version.
regplot Enhanced Regression Nomogram Plot
A function to plot a regression nomogram of coxph, lm and glm regression objects. Covariate distributions are superimposed on nomogram scales and the plot is animated to allow on the fly changes to distribution representation and to enable outcome calculation.
RegressionFactory Expander Functions for Generating Full Gradient and Hessian from Single- and Multi-Slot Base Distributions
The expander functions rely on the mathematics developed for the Hessian-definiteness invariance theorem for linear projection transformations of variables, described in authors’ paper, to generate the full, high-dimensional gradient and Hessian from the lower-dimensional derivative objects. This greatly relieves the computational burden of generating the regression-function derivatives, which in turn can be fed into any optimization routine that utilizes such derivatives. The theorem guarantees that Hessian definiteness is preserved, meaning that reasoning about this property can be performed in the low-dimensional space of the base distribution. This is often a much easier task than its equivalent in the full, high-dimensional space. Definiteness of Hessian can be useful in selecting optimization/sampling algorithms such as Newton-Raphson optimization or its sampling equivalent, the Stochastic Newton Sampler. Finally, in addition to being a computational tool, the regression expansion framework is of conceptual value by offering new opportunities to generate novel regression problems.
regrrr Toolkit for Compiling, (Post-Hoc) Testing, and Plotting Regression Results
Compiling regression results into a publishable format, conducting post-hoc hypothesis testing, and plotting moderating effects (the effect of X on Y becomes stronger/weaker as Z increases).
RegSDC Information Preserving Regression-Based Tools for Statistical Disclosure Control
Implementation of the methods described in the paper with the above title: Langsrud. (2019) <doi:10.1007/s11222-018-9848-9>. Open view-only version at <https://rdcu.be/bfeWQ>. The package can be used to generate synthetic or hybrid continuous microdata, and the relationship to the original data can be controlled in several ways.
regsel Variable Selection and Regression
Functions for fitting linear and generalized linear models with variable selection. The functions can automatically do Stepwise Regression, Lasso or Elastic Net as variable selection methods. Lasso and Elastic net are improved and handle factors better (they can either include or exclude all factor levels).
regsem Performs Regularization on Structural Equation Models
Uses both ridge and lasso penalties (and extensions) to penalize specific parameters in structural equation models. The package offers additional cost functions, cross validation, and other extensions beyond traditional SEM.
regspec Non-Parametric Bayesian Spectrum Estimation for Multirate Data
Computes linear Bayesian spectral estimates from multirate data for second-order stationary time series. Provides credible intervals and methods for plotting various spectral estimates.
regtools Various tools for linear, nonlinear and nonparametric regression.
Various tools for linear, nonlinear and nonparametric regression.
rehydratoR Downloads Tweets from a List of Tweet IDs
Facilitates replication of Twitter-based research by handling common programming tasks needed when downloading tweets. Specifically, it ensures a user does not exceed Twitter’s rate limits, and it saves tweets in moderately sized files. While a user could perform these tasks in their own code, doing so may be beyond the capabilities of many users.
REIDS Random Effects for the Identification of Differential Splicing
Contains the REIDS model presented in Van Moerbeke et al (2017) <doi:10.1186/s12859-017-1687-8> for the detection of alternative splicing. The method is extended by incorporating junction information for the assessment of alternative splicing. The vignette introduces the model and shows an example work flow.
reinforcedPred Reinforced Risk Prediction with Budget Constraint
Traditional risk prediction only utilizes baseline factors known to be associated with the disease. Given that longitudinal information are routinely measured and documented for patients, it is worthwhile to make full use of these data. The available longitudinal biomarker data will likely improve prediction. However, repeated biomarker collection could be costly and inconvenient, and risk prediction for patients at a later time could delay necessary medical decisions. Thus, there is a trade-off between high quality prediction and cost. This package implements a cost-effective statistical procedure that recursively incorporates comprehensive longitudinal information into the risk prediction model, taking into account the cost of delaying the decision to a follow-up time when more information is available. The statistical methods are described in the following paper: Pan, Y., Laber, E., Smith, M., Zhao, Y. (2018). Reinforced risk prediction with budget constraint: application to electronic health records data. Manuscript submitted for publication.
reinforcelearn Reinforcement Learning
Implements reinforcement learning environments and algorithms as described in Sutton & Barto (1998, ISBN:0262193981). The Q-Learning algorithm can be used with different types of function approximation (tabular and neural network), eligibility traces (Singh & Sutton (1996) <doi:10.1007/BF00114726>) and experience replay (Mnih et al. (2013) <arXiv:1312.5602>).
ReinforcementLearning Model-Free Reinforcement Learning
Performs model-free reinforcement learning in R. This implementation enables the learning of an optimal policy based on sample sequences consisting of states, actions and rewards. In addition, it supplies multiple predefined reinforcement learning algorithms, such as experience replay.
reinstallr Search and Install Missing Packages
Search R files for not installed packages and run install.packages.
reinsureR Reinsurance Treaties Application
Application of reinsurance treaties to claims portfolios. The package creates a class Claims whose objective is to store claims and premiums, on which different treaties can be applied. A statistical analysis can then be applied to measure the impact of reinsurance, producing a table or graphical output. This package can be used for estimating the impact of reinsurance on several portfolios or for pricing treaties through statistical analysis. Documentation for the implemented methods can be found in ‘Reinsurance: Actuarial and Statistical Aspects’ by Hansjöerg Albrecher, Jan Beirlant, Jozef L. Teugels (2017, ISBN: 978-0-470-77268-3) and ‘REINSURANCE: A Basic Guide to Facultative and Treaty Reinsurance’ by Munich Re (2010) <https://…/reinsurance_basic_guide.pdf>.
rel Reliability Coefficients
Derives point estimates with confidence intervals for Bennett et als S, Cohen’s kappa and weighted kappa, Gwet’s AC1 and AC2, Krippendorff’s alpha, and Scott’s pi.
relabeLoadings Relabel Loadings from MCMC Output for Confirmatory Factor Analysis
In confirmatory factor analysis (CFA), structural constraints typically ensure that the model is identified up to all possible reflections, i.e., column sign changes of the matrix of loadings. Such reflection invariance is problematic for Bayesian CFA when the reflection modes are not well separated in the posterior distribution. Imposing rotational constraints — fixing some loadings to be zero or positive in order to pick a factor solution that corresponds to one reflection mode — may not provide a satisfactory solution for Bayesian CFA. The function ‘relabel’ uses the relabeling algorithm of Erosheva and Curtis to correct for sign invariance in MCMC draws from CFA models. The MCMC draws should come from Bayesian CFA models that are fit without rotational constraints.
relatable Functions for Mapping Key-Value Pairs, Many-to-Many, One-to-Many, and Many-to-One Relations
Functions to safely map from a vector of keys to a vector of values, determine properties of a given relation, or ensure a relation conforms to a given type, such as many-to-many, one-to-many, injective, surjective, or bijective. Permits default return values for use similar to a vectorised switch statement, as well as safely handling large vectors, NAs, and duplicate mappings.
relevent Relational Event Models
Tools to fit relational event models.
RelimpPCR Relative Importance PCA Regression
Performs Principal Components Analysis (also known as PCA) dimensionality reduction in the context of a linear regression. In most cases, PCA dimensionality reduction is performed independent of the response variable for a regression. This captures the majority of the variance of the model’s predictors, but may not actually be the optimal dimensionality reduction solution for a regression against the response variable. An alternative method, optimized for a regression against the response variable, is to use both PCA and a relative importance measure. This package applies PCA to a given data frame of predictors, and then calculates the relative importance of each PCA factor against the response variable. It outputs ordered factors that are optimized for model fit. By performing dimensionality reduction with this method, an individual can achieve a the same r-squared value as performing just PCA, but with fewer PCA factors. References: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani (2013) <http://…/>.
relMix Relationship Inference Based on Mixtures
Makes relationship inference involving mixtures with unknown profiles and unknown number of contributors.
rem Relational Event Models (REM)
Calculate endogenous network effects in event sequences and fit relational event models (REM): Using network event sequences (where each tie between a sender and a target in a network is time-stamped), REMs can measure how networks form and evolve over time. Endogenous patterns such as popularity effects, inertia, similarities, cycles or triads can be calculated and analyzed over time.
rematch Match Regular Expressions with a Nicer ‘API’
A small wrapper on ‘regexpr’ to extract the matches and captured groups from the match of a regular expression to a character vector.
rematch2 Tidy Output from Regular Expression Matching
Wrappers on ‘regexpr’ and ‘gregexpr’ to return the match results in tidy data frames.
remedy RStudio’ Addins to Simplify ‘Markdown’ Writing
An ‘RStudio’ addin providing shortcuts for writing in ‘Markdown’. This package provides a series of functions that allow the user to be more efficient when using ‘Markdown’. For example, you can select a word, and put it in bold or in italics, or change the alignment of elements inside you Rmd. The idea is to map all the functionalities from ‘remedy’ on keyboard shortcuts, so that it provides an interface close to what you can find in any other text editor.
remindR Insert and Extract ‘Reminders’ from Function Comments
Insert/extract text ‘reminders’ into/from function source code comments or as the ‘comment’ attribute of any object. The former can be handy in development as reminders of e.g. argument requirements, expected objects in the calling environment, required options settings, etc. The latter can be used to provide information of the object and as simple manual ‘tooltips’ for users, among other things.
remMap Regularized Multivariate Regression for Identifying Master Predictors
remMap is developed for fitting multivariate response regression models under the high-dimension-low-sample-size setting
remote Empirical Orthogonal Teleconnections in R
remote’ is short for “R(-based) EMpirical Orthogonal TEleconnections”. It implements a collection of functions to facilitate empirical orthogonal teleconnection analysis. Empirical Orthogonal Teleconnections (EOTs) denote a regression based approach to decompose spatio-temporal fields into a set of independent orthogonal patterns. They are quite similar to Empirical Orthogonal Functions (EOFs) with EOTs producing less abstract results. In contrast to EOFs, which are orthogonal in both space and time, EOT analysis produces patterns that are orthogonal in either space or time.
remoter Remote R: Control a Remote R Session from a Local One
A set of utilities for controlling a remote R session from a local one. Simply set up a server (see package vignette for more details) and connect to it from your local R session, including ‘RStudio’. Network communication is handled by the ‘ZeroMQ’ library by way of the ‘pbdZMQ’ package. The client/server framework is a custom ‘REPL’.
remotes R Package Installation from Remote Repositories, Including ‘GitHub’
Download and install R packages stored in ‘GitHub’, ‘BitBucket’, or plain ‘subversion’ or ‘git’ repositories. This package is a lightweight replacement of the ‘install_*’ functions in ‘devtools’. Indeed most of the code was copied over from ‘devtools’.
rENA Epistemic Network Analysis
ENA (Shaffer, D. W. (2017) Quantitative Ethnography. ISBN: 0578191687) is a method used to identify meaningful and quantifiable patterns in discourse or reasoning. ENA moves beyond the traditional frequency-based assessments by examining the structure of the co-occurrence, or connections in coded data. Moreover, compared to other methodological approaches, ENA has the novelty of (1) modeling whole networks of connections and (2) affording both quantitative and qualitative comparisons between different network models. Shaffer, D.W., Collier, W., & Ruis, A.R. (2016) <doi:10.18608/jla.2016.33.3>.
renamer Standardising Function Names in R
Tired of the disparate naming systems in R? Then this is the package for you.
Renvlp Computing Envelope Estimators
Provides a general routine, envMU(), which allows estimation of the M envelope of span(U) given root n consistent estimators of M and U. The routine envMU() does not presume a model. This package implements response envelopes (env()), partial response envelopes (penv()), envelopes in the predictor space (xenv()), heteroscedastic envelopes (henv()), simultaneous envelopes (stenv()), scaled response envelopes (senv()), scaled envelopes in the predictor space (sxenv()), groupwise envelopes (genv()), weighted envelopes (weighted.env(), weighted.penv() and weighted.xenv()), envelopes in logistic regression (logit.env()), and envelopes in Poisson regression (pois.env()). For each of these model-based routines the package provides inference tools including bootstrap, cross validation, estimation and prediction, hypothesis testing on coefficients are included except for weighted envelopes. Tools for selection of dimension include AIC, BIC and likelihood ratio testing. Background is available at Cook, R. D., Forzani, L. and Su, Z. (2016) <doi:10.1016/j.jmva.2016.05.006>. Optimization is based on a clockwise coordinate descent algorithm.
repeated Non-Normal Repeated Measurements Models
Various functions to fit models for non-normal repeated measurements.
repec Access RePEc Data Through API
Utilities for accessing RePEc (Research Papers in Economics) through a RESTful API. You can request a code and get detailed information at the following page: <https://…/api.html>.
REPLesentR Presentations in the REPL
Create presentations and display them inside the R ‘REPL’ (Read-Eval-Print loop), aka the R console. Presentations can be written in ‘RMarkdown’ or any other text format. A set of convenient navigation options as well as code evaluation during a presentation is provided. It is great for tech talks with live coding examples and tutorials. While this is not a replacement for standard presentation formats, it’s old-school looks might just be what sets it apart. This project has been inspired by the ‘REPLesent’ project for presentations in the ‘Scala’ ‘REPL’.
Replicate Statistical Metrics for Multisite Replication Studies
For a multisite replication project, computes metrics and confidence intervals representing: (1) the probability that the original study would observe an estimated effect size as extreme or more extreme than it actually did, if in fact the original study is statistically consistent with the replications; (2) the probability of a true effect of scientifically meaningful size in the same direction as the estimate the original study; and (3) the probability of a true effect of meaningful size in the direction opposite the original study’s estimate. Additionally computes older metrics used in replication projects (namely expected agreement in ‘statistical significance’ between an original study and replication studies as well as prediction intervals for the replication estimates). See Mathur and VanderWeele (in preparation) <https://…/> for details.
Replication Test Replications by Means of the Prior Predictive p-Value
Allows for the computation of a prior predictive p-value to test replication of relevant features of original studies. Relevant features are captured in informative hypotheses. The package also allows for the computation of power. The statistical underpinnings are described in Zondervan-Zwijnenburg (2019) <doi:10.31234/osf.io/uvh5s>.
replyr Fluid Use of ‘dplyr’
Methods to get a grip on working with remote ‘tbl’ sources (‘SQL’ databases, ‘Spark’) through ‘dplyr’. Adds convenience functions to make such tasks more like working with an in-memory ‘data.frame’. Results do depend on which ‘dplyr’ data service you use.
repo A Resource Manager for R Objects
This is an R data manager meant to avoid manual storage/retrieval of R data to/from the file system. It builds one (or more) centralized repository where R objects are stored with annotations, tags, dependency notes, provenance traces. It also provides navigation tools to easily locate, load and edit previously stored resources.
RepoGenerator Generates a Project and Repo for Easy Initialization of a Workshop
Generates a project and repo for easy initialization of a GitHub repo for R workshops. The repo includes a README with instructions to ensure that all users have the needed packages, an ‘RStudio’ project with the right directories and the proper data. The repo can then be used for hosting code taught during the workshop.
reportReg An Easy Way to Report Regression Analysis
Provides an easy way to report the results of regression analysis, including: 1. Proportional hazards regression model from function ‘coxph’ of package ‘survival’; 2. Ordered logistic regression from function ‘polr’ of package ‘MASS’; 3. Binary logistic regression from function ‘glm’ of package ‘stats’; 4. Linear regression from function ‘lm’ of packages ‘stats’.
reportROC An Easy Way to Report ROC Analysis
Provides an easy way to report the results of ROC analysis, including: 1. an ROC curve. 2. the value of Cutoff, SEN (sensitivity), SPE (specificity), AUC (Area Under Curve), AUC.SE (the standard error of AUC), PLR (positive likelihood ratio), NLR (negative likelihood ratio), PPV (positive predictive value), NPV (negative predictive value).
repr Serializable Representations
String and binary representations of objects for several formats / mime types.
represtools Reproducible Research Tools
Reproducible research tools automates the creation of an analysis directory structure and work flow. There are R markdown skeletons which encapsulate typical analytic work flow steps. Functions will create appropriate modules which may pass data from one step to another.
reprex Prepare Reproducible Example Code for Sharing
Convenience wrapper that uses the ‘rmarkdown’ package to render small snippets of code to target formats that include both code and output. The goal is to encourage the sharing of small, reproducible, and runnable examples on code-oriented websites, such as <http://stackoverflow.com> and <https://github.com>, or in email. ‘reprex’ also extracts clean, runnable R code from various common formats, such as copy/paste from an R session.
reproducer Reproduce Statistical Analyses and Meta-Analyses
The reproducer R package includes data analysis functions and data sets (e.g., related to software defect prediction) to streamline reproducible research in software engineering.
reproj Coordinate System Transformations for Map Data
Transform coordinates via ‘PROJ’ using the library directly, by wrapping the ‘proj4’ package. The ‘reproj’ function handles the need for radian units for either source or target and allows removing an explicit source definition in methods that extend the generic. The ‘PROJ’ library is available at <https://…/>.
repurrrsive Examples of Recursive Lists and Nested or Split Data Frames
Recursive lists in the form of R objects, ‘JSON’, and ‘XML’, for use in teaching and examples. Examples include color palettes, Game of Thrones characters, ‘GitHub’ users and repositories, and entities from the Star Wars universe. Data from the ‘gapminder’ package is also included, as a simple data frame and in nested and split forms.
reqres Powerful Classes for HTTP Requests and Responses
In order to facilitate parsing of http requests and creating appropriate responses this package provides two classes to handle a lot of the housekeeping involved in working with http exchanges. The infrastructure builds upon the ‘rook’ specification and is thus well suited to be combined with ‘httpuv’ based web servers.
request High Level ‘HTTP’ Client
High level and easy ‘HTTP’ client for ‘R’. Provides functions for building ‘HTTP’ queries, including query parameters, body requests, headers, authentication, and more.
requireR R Source Code Modularizer
Modularizes source code. Keeps the global environment clean, explicifies interdependencies. Inspired by ‘RequireJS'<http://…/>.
REREFACT Reordering and/or Reflecting Factors for Simulation Studies with Exploratory Factor Analysis
Executes a post-rotation algorithm that REorders and/or REflects FACTors (REREFACT) for each replication of a simulation study with exploratory factor analysis.
reReg Recurrent Event Regression
A collection of regression models for recurrent event process and failure time.
rerf Randomer Forest
Random Forester (RerF) is an algorithm developed by Tomita (2016) <arXiv:1506.03410v2> which is similar to Random Forest – Random Combination (Forest-RC) developed by Breiman (2001) <doi:10.1023/A:1010933404324>. Random Forests create axis-parallel, or orthogonal trees. That is, the feature space is recursively split along directions parallel to the axes of the feature space. Thus, in cases in which the classes seem inseparable along any single dimension, Random Forests may be suboptimal. To address this, Breiman also proposed and characterized Forest-RC, which uses linear combinations of coordinates rather than individual coordinates, to split along. This package, ‘rerf’, implements RerF which is similar to Forest-RC. The difference between the two algorithms is where the random linear combinations occur: Forest-RC combines features at the per tree level whereas RerF takes linear combinations of coordinates at every node in the tree.
rERR Excess Relative Risk Models
Fits a linear excess relative risk model by maximum likelihood, possibly including several variables and allowing for lagged exposures. Allow time dependent covariates.
resautonet Autoencoder-based Residual Deep Network with Keras Support
This package is the R implementation of the Autoencoder-based Residual Deep Network that is based on this paper (https://…/1812.11262 ).
reshape2 Flexibly Reshape Data: A Reboot of the Reshape Package
Flexibly restructure and aggregate data using just two functions: melt and dcast (or acast).
REST RcmdrPlugin Easy Script Templates
Contains easy scripts which can be used to quickly create GUI windows for ‘Rcmdr’ Plugins. No knowledge about Tcl/Tk is required to make use of these scripts (These scripts are a generalisation of the template scripts in the ‘RcmdrPlugin.BiclustGUI’ package).
restfulr R Interface to RESTful Web Services
Models a RESTful service as if it were a nested R list.
restlos Robust Estimation of Location and Scatter
The restlos package provides algorithms for robust estimation of location and scatter based on minimum spanning trees (pMST), self-organizing maps (Flood Algorithm), and Delaunay triangulations (RDELA). The functions are also suitable for outlier detection.
restorepoint Debugging in R with restore points
The package restorepoint allows to debug R functions via restore points instead of break points. When called inside a function, a restore point stores all local variables. These can be restored for later debugging purposes by simply copy & pasting the body of the function from the source code editor to the R console. This vignette briefly illustrates the use of restore points and compares advantages and drawbacks compared to the traditional method of setting break points via browser(). Restore points are particularly convenient when using an IDE like RStudio that allows to quickly run selected code from a script in the R Console.
restrictedMVN Multivariate Normal Restricted by Affine Constraints
A fast Gibbs sampler for multivariate normal with affine constraints.
restriktor Restricted Statistical Estimation and Inference for Linear Models
Allow for easy-to-use testing of linear equality and inequality restrictions about parameters and effects in linear, robust linear and generalized linear statistical models.
reticulate R Interface to Python
R interface to Python modules, classes, and functions. When calling into Python R data types are automatically converted to their equivalent Python types. When values are returned from Python to R they are converted back to R types. Compatible with all versions of Python >= 2.7.
retrodesign Tools for Type S (Sign) and Type M (Magnitude) Errors
Provides tools for working with Type S (Sign) and Type M (Magnitude) errors, as proposed in Gelman and Tuerlinckx (2000) <doi.org/10.1007/s001800000040> and Gelman & Carlin (2014) <doi.org/10.1177/1745691614551642>. In addition to simply calculating the probability of Type S/M error, the package includes functions for calculating these errors across a variety of effect sizes for comparison, and recommended sample size given ‘tolerances’ for Type S/M errors. To improve the speed of these calculations, closed forms solutions for the probability of a Type S/M error from Lu, Qiu, and Deng (2018) <doi.org/10.1111/bmsp.12132> are implemented. As of 1.0.0, this includes support only for simple research designs. See the package vignette for a fuller exposition on how Type S/M errors arise in research, and how to analyze them using the type of design analysis proposed in the above papers.
reval Repeated Function Evaluation for Sensitivity Analysis
Simplified scenario testing and sensitivity analysis with R via a generalized function for one-factor-at-a-time (OFAT) sensitivity analysis, evaluation of parameter sets and (sampled) parameter permutations. Options for formatting output and parallel processing are also provided.
revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis
Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://…/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://…/package=evdbayes>, which uses Markov Chain Monte Carlo (MCMC) methods for posterior simulation. See the ‘revdbayes’ website for more information, documentation and examples.
revealedPrefs Revealed Preferences and Microeconomic Rationality
Computation of (direct and indirect) revealed preferences, fast non-parametric tests of rationality axioms (WARP, SARP, GARP), simulation of axiom-consistent data, and detection of axiom-consistent subpopulations. Rationality tests follow Varian (1982) <doi:10.2307/1912771>, axiom-consistent subpopulations follow Crawford and Pendakur (2012) <doi:10.1111/j.1468-0297.2012.02545.x>.
revealjs R Markdown Format for ‘reveal.js’ Presentations
R Markdown format for ‘reveal.js’ presentations, a framework for easily creating beautiful presentations using HTML.
reverseR Linear Regression Stability to Significance Reversal
Tests linear regressions for significance reversal through leave-one(multiple)-out and shifting/addition of response values. The paradigm of the package is loosely based on the somewhat forgotten ‘dfstat’ criterion (Belsley, Kuh & Welsch 1980 <doi:10.1002/0471725153.ch2>), which tests influential values in linear models from their effect on statistical inference, i.e. changes in p-value.
revtools Tools to Support Evidence Synthesis
Researchers commonly need to summarize scientific information, a process known as ‘evidence synthesis’. The first stage of a synthesis process (such as a systematic review or meta-analysis) is to download a list of references from academic search engines such as ‘Web of Knowledge’ or ‘Scopus’. This information can be sorted manually (the traditional approach to systematic review), or the user can draw on tools from machine learning to help them visualise patterns in the corpus. ‘revtools’ uses topic models to render ordinations of text drawn from article titles, keywords and abstracts, and allows the user to interactively select or exclude individual references, words or topics. ‘revtools’ does not currently provide tools for analysis of data drawn from those references, features that are available in other packages such as ‘metagear’ or ‘metafor’.
Rfacebook Access to Facebook API via R
Provides an interface to the Facebook API
https://…/Rfacebook
rfacebookstat Load Data from Facebook API Marketing
Load data by campaigns, ads, ad sets and insights, ad account and business manager from Facebook Marketing API into R. For more details see official documents by Facebook Marketing API <https://…/>.
Rfast Fast R Functions
A collection of fast (utility) functions for data analysis. Fast covariance matrix calculation, Mahalanobis distance and column-wise variances are some of the functions.
rFerns Random ferns classifier
An R implementation of the random ferns classifier by Ozuysal et al., modified for generic and multi-label classification and featuring OOB error approximation and importance measure.
RFgroove Importance Measure and Selection for Groups of Variables with Random Forests
Variable selection tools for groups of variables and functional data based on a new grouped variable importance with random forests.
rflann Basic R Interface to the FLANN C++ Library
Basic R interface for the FLANN C++ library written by Marius Muja and David Lowe. This package was written primarily for another package, ‘rcss’. This packages utilises a few features from the FLANN C++ library. When I have time (and if there is sufficient demand), I will add more functions.
rfm Recency, Frequency and Monetary Value Analysis
Tools for RFM (recency, frequency and monetary value) analysis. Generate RFM score from both transaction and customer level data. Visualize the relationship between recency, frequency and monetary value using heatmap, histograms, bar charts and scatter plots. Includes a ‘shiny’ app for interactive segmentation. References: i. Blattberg R.C., Kim BD., Neslin S.A (2008) <doi:10.1007/978-0-387-72579-6_12>.
rfml MarkLogic NoSQL Database Server in-Database Analytics for R
Functionality required to efficiently use R with MarkLogic NoSQL Database Server, <http://…/>. Many basic and complex R operations are pushed down into the database, which removes the main memory boundary of R and allows to make full use of MarkLogic server. In order to use the package you need a MarkLogic Server version 8 or higher.
Rfmtool Fuzzy Measure Tools for R
Various tools for handling fuzzy measures, calculating Shapley value and Interaction index, Choquet and Sugeno integrals, as well as fitting fuzzy measures to empirical data are provided. Construction of fuzzy measures from empirical data is done by solving a linear programming problem by using ‘lpsolve’ package, whose source in C adapted to the R environment is included. The description of the basic theory of fuzzy measures is in the manual in the Doc folder in this package.
Rfolding The Folding Test of Unimodality
The basic algorithm to perform the folding test of unimodality. Given a dataset X (d dimensional, n samples), the test checks whether the distribution of the data are rather unimodal or rather multimodal. This package stems from the following research publication: Siffer Alban, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouët. ‘Are your data gathered?’ In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 2210-2218. ACM, 2018. <doi:10.1145/3219819.3219994>.
RForcecom RForcecom provides the connection to Force.com and Salesforce.com from R
RForcecom provides the connection to Force.com and Salesforce.com from R.
RFormatter R Source Code Formatter
The R Formatter formats R source code. It is very much based on formatR, but tries to improve it by heuristics. For example, spaces can be forced around the division operator ‘/’.
rFSA Feasible Solution Algorithm for Finding Best Subsets and Interactions
Uses the lm() and glm() functions to fit models generated from a feasible solution algorithm. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.
Rfssa Functional Singular Spectrum Analysis
Methods and tools for implementing functional singular spectrum analysis for functional time series as described in Haghbin H., Najibi, S.M., Mahmoudvand R., Maadooliat M. (2019). Functional singular spectrum Analysis. Manuscript submitted for publication.
rFTRLProximal FTRL-Proximal Algorithm
An efficient C++ based implementation of ‘Follow The (Proximally) Regularized Leader’ online learning algorithm. This algorithm was proposed in McMahan et al. (2013) <DOI:10.1145/2487575.2488200>.
rfVarImpOOB Unbiased Variable Importance for Random Forests
Computes a novel variable importance for random forests: Impurity reduction importance scores for out-of-bag (OOB) data complementing the existing inbag Gini importance, see also Strobl et al (2007) <doi:10.1186/1471-2105-8-25>, Strobl et al (2007) <doi:10.1016/j.csda.2006.12.030> and Breiman (2001) <DOI:10.1023/A:1010933404324>. The Gini impurities for inbag and OOB data are combined in three different ways, after which the information gain is computed at each split. This gain is aggregated for each split variable in a tree and averaged across trees.
rfviz Interactive Visualization Tool for Random Forests
An interactive data visualization and exploration toolkit that implements Breiman and Cutler’s original random forest Java based visualization tools in R, for supervised and unsupervised classification and regression within the algorithm random forest.
rga R Google Analytics
This is a package for extracting data from Google Analytics into R.
Rga4gh An Interface to the GA4GH API
An Interface to the GA4GH API that allows users to easily GET responses and POST requests to GA4GH Servers. See <http://ga4gh.org> for more information about the GA4GH project.
RGeckoboard R API for Geckoboard
Provides an interface to Geckoboard.
rgen Random Sampling Distribution C++ Routines for Armadillo
Provides popular sampling distributions C++ routines based in armadillo through a header file approach.
RGenData Generates Multivariate Nonnormal Data and Determines How Many Factors to Retain
The GenDataSample() and GenDataPopulation() functions create, respectively, a sample or population of multivariate nonnormal data using methods described in Ruscio and Kaczetow (2008). Both of these functions call a FactorAnalysis() function to reproduce a correlation matrix. The EFACompData() function allows users to determine how many factors to retain in an exploratory factor analysis of an empirical data set using a method described in Ruscio and Roche (2012). The latter function uses populations of comparison data created by calling the GenDataPopulation() function. <DOI: 10.1080/00273170802285693>. <DOI: 10.1037/a0025697>.
rgeoapi Get Information from the GeoAPI
Provides access to information from <https://…/> about French ‘Communes’, ‘Departements’ and ‘Regions’.
RGeode Geometric Density Estimation
Provides the hybrid Bayesian method Geometric Density Estimation. On the one hand, it scales the dimension of our data, on the other it performs inference. The method is fully described in the paper ‘Scalable Geometric Density Estimation’ by Y. Wang, A. Canale, D. Dunson (2016) <http://…/wang16e.pdf>.
rgeolocate IP Address Geolocation
Connectors to online and offline sources for taking IP addresses and geolocating them to country, city, timezone and other geographic ranges. For individual connectors, see the package index.
rgeospatialquality Wrapper for the Geospatial Data Quality REST API
Provides native wrappers for the functions available via the spatial quality REST API. See <http://bit.ly/bioinformatics_btw057> for more information on the API.
rgexf Build, Import and Export GEXF Graph Files
Create, read and write GEXF (Graph Exchange XML Format) graph files (used in Gephi and others). Using the XML package, it allows the user to easily build/read graph files including attributes, GEXF viz attributes (such as color, size, and position), network dynamics (for both edges and nodes) and edge weighting. Users can build/handle graphs element-by-element or massively through data-frames, visualize the graph on a web browser through ‘sigmajs’ (a javascript library) and interact with the igraph package.
RGF Regularized Greedy Forest
Regularized Greedy Forest wrapper of the ‘Regularized Greedy Forest’ <https://…/rgf_python> ‘python’ package, which also includes a Multi-core implementation (FastRGF) <https://…/fast_rgf>.
rgho Access WHO Global Health Observatory Data from R
Access WHO Global Health Observatory data from R via the Athena web service, an application program interface providing a simple query interface to the World Health Organization’s data and statistics content.
RGLUEANN Coupling between general likelihood uncertainty estimation and artificial neural networks.
RGLUEANN provides an R implementation of the coupling between general likelihood uncertainty estimation (GLUE) and artificial neural networks (ANN).
rglwidget ‘rgl’ in ‘htmlwidgets’ Framework
Provides an ‘htmlwidgets’ framework for the ‘rgl’ package.
Rgnuplot R Interface for Gnuplot
Interface for gnuplot Based on gnuplot_i version 1.11, the GPL code from Nicolas Devillard.
RGoogleAnalytics R Wrapper for the Google Analytics API
Provides functions for accessing and retrieving data from the Google Analytics API.
RGoogleAnalyticsPremium Unsampled Data in R for Google Analytics Premium Accounts
It fires a query to the API to get the unsampled data in R for Google Analytics Premium Accounts. It retrieves data from the Google drive document and stores it into the local drive. The path to the excel file is returned by this package. The user can read data from the excel file into R using read.csv() function.
RGoogleFit R Interface to Google Fit API
Provides interface to Google Fit REST API v1 (see <https://…/> ).
rgoogleslides R Interface to Google Slides
Previously, when one is working with in the Google Ecosystem (Using Google Drive etc), there is hardly any good workflow of getting the values calculated from R and getting that into Google Slides. The normal and easy way out would be to just copy your work over but when you have a number of analysis to present with a lot of changes between each environment, it just becomes quite cumbersome.
RGraphM Graph Matching Library for R
This is a wrapper package for the graph matching library ‘graphm’. The original ‘graphm’ C/C++ library can be found in <http://…/> . Latest version ( 0.52 ) of this library is slightly modified to fit ‘Rcpp’ usage and included in the source package. The development version of the package is also available at <https://…/RGraphM> .
rgrass7 Interface Between GRASS 7 Geographical Information System and R
Interpreted interface between GRASS 7 geographical information system and R, based on starting R from within the GRASS environment, or running free-standing R in a temporary GRASS location; the package provides facilities for using all GRASS commands from the R command line. This package may not be used for GRASS 6, for which spgrass6 should be used.
Rgretl Interface to ‘gretlcli’
An interface to ‘GNU gretl’: running ‘gretl’ scripts from, estimating econometric models with backward passing of model results, opening ‘gretl’ data files (.gdt). ‘gretl’ can be downloaded from <http://gretl.sourceforge.net>. This package could make life on introductory/intermediate econometrics courses much easier: full battery of the required regression diagnostics, including White’s heteroskedasticity test, restricted ols estimation, advanced weak instrument test after iv estimation, very convenient dealing with lagged variables in models, standard case treatment in unit root tests, vector auto- regressions, and vector error correction models. Datasets for 8 popular econometrics textbooks can be installed into ‘gretl’ from its server. All datasets can be easily imported using this package.
rGroovy Groovy Language Integration
Integrates the Groovy scripting language with the R Project for Statistical Computing.
rgsp Repetitive Group Sampling Plan Based on Cpk
Functions to calculate Sample Number and Average Sample Number for Repetitive Group Sampling Plan Based on Cpk as given in Aslam et al. (2013) (<DOI:10.1080/00949655.2012.663374>).
rgw Goodman-Weare Affine-Invariant Sampling
Implementation of the affine-invariant method of Goodman & Weare (2010) <DOI:10.2140/camcos.2010.5.65>, a method of producing Monte-Carlo samples from a target distribution.
rhandsontable Interface to the ‘Handsontable.js’ Library
An R interface to the Handsontable JavaScript library, which is a minimalist Excel-like data grid editor.
RHawkes Renewal Hawkes Process
Simulate a renewal Hawkes (RHawkes) self-exciting process, with a given immigrant hazard rate function and offspring density function. Calculate the likelihood of a RHawkes process with given hazard rate function and offspring density function for an (increasing) sequence of event times. Calculate the Rosenblatt residuals of the event times. Predict future event times based on observed event times up to a given time. For details see Chen and Stindl (2017) <doi:10.1080/10618600.2017.1341324>.
RHIPE R and Hadoop Integrated Programming Environment
RHIPE is an R package that provides a way to use Hadoop from R. It can be used on its own or as part of the Tessera environment.
rhli FIS ‘MarketMap C-Toolkit’
Complete access from ‘R’ to the FIS ‘MarketMap C-Toolkit’ (‘FAME C-HLI’). ‘FAME’ is a fully integrated software and database management system from FIS that provides the following capabilities: Time series and cross-sectional data management; Financial calculation, data analysis, econometrics, and forecasting; Table generation and detailed multicolor, presentation-quality report writing; Multicolor, presentation-quality graphics; ‘What-if’ analysis; Application development and structured programming; Data transfer to and from other applications; Tools for building customized graphical user interfaces.
rhmc Hamiltonian Monte Carlo
Implements simple Hamiltonian Monte Carlo routines in R for sampling from any desired target distribution which is continuous and smooth. See Neal (2017) <arXiv:1701.02434> for further details on Hamiltonian Monte Carlo. Automatic parameter selection is not supported.
rhmmer Utilities Parsing ‘HMMER’ Results
HMMER’ is a profile hidden Markov model tool used primarily for sequence analysis in bioinformatics (<http://…/> ). ‘rhmmer’ provides utilities for parsing the ‘HMMER’ output into tidy data frames.
rhnerm Random Heteroscedastic Nested Error Regression
Performs the random heteroscedastic nested error regression model described in Kubokawa, Sugasawa, Ghosh and Chaudhuri (2016) <doi:10.5705/ss.202014.0070>.
rhoR Rho for Inter Rater Reliability
Rho is used to test the generalization of inter rater reliability (IRR) statistics. Calculating rho starts by generating a large number of simulated, fully-coded data sets: a sizable collection of hypothetical populations, all of which have a kappa value below a given threshold — which indicates unacceptable agreement. Then kappa is calculated on a sample from each of those sets in the collection to see if it is equal to or higher than the kappa in then real sample. If less than five percent of the distribution of samples from the simulated data sets is greater than actual observed kappa, the null hypothesis is rejected and one can conclude that if the two raters had coded the rest of the data, we would have acceptable agreement (kappa above the threshold).
RHPCBenchmark Benchmarks for High-Performance Computing Environments
Microbenchmarks for determining the run time performance of aspects of the R programming environment and packages relevant to high-performance computation. The benchmarks are divided into three categories: dense matrix linear algebra kernels, sparse matrix linear algebra kernels, and machine learning functionality.
rhub Connect to ‘R-hub’
Run ‘R CMD check’ on any of the ‘R-hub’ architectures, from the command line. The current architectures include ‘Windows’, ‘macOS’, ‘Solaris’ and various ‘Linux’ distributions.
rhymer Wrapper for the ‘Datamuse’ API to Find Rhyming and Associated Words
Wrapper for ‘Datamuse’ API to find rhyming and other associated words. This includes words of similar meaning, spelling, or other related words. Learn more about the ‘Datamuse’ API here <http://…/>.
ri2 Randomization Inference for Randomized Experiments
Randomization inference procedures for simple and complex randomized designs, including multi-armed trials, as described in Gerber and Green (2012, ISBN: 978-0393979954). Users formally describe their randomization procedure and test statistic. The randomization distribution of the test statistic under some null hypothesis is efficiently simulated.
riceware A Diceware Passphrase Implementation
The Diceware method can be used to generate strong passphrases. In short, you roll a 6-faced dice 5 times in a row, the number obtained is matched against a dictionary of easily remembered word. By combining together 7 words thus generated, you obtain a password that is relatively easy to remember, but would take several millions years (on average) for a powerful computer to guess.
ridge Ridge Regression with Automatic Selection of the Penalty Parameter
Linear and logistic ridge regression functions. Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data.
ridittools Useful Functions for Ridit Analysis
Functions to compute ridit scores of vectors, compute mean ridits and their standard errors for vectors compared to a reference vector, as described in Fleiss (1981, ISBN:0-471-06428-9), and compute means/SEs for multiple groups in matrices. Data can be either counts or proportions. Emphasis is on ridit analysis of ordered categorical data such as Likert items and pain-rating scales.
Riex IEX Stocks and Market Data
Retrieves efficiently and reliably Investors Exchange (‘IEX’) stock and market data using ‘IEX Cloud API’. The platform is offered by Investors Exchange Group (IEX Group). Main goal is to leverage ‘R’ capabilities including existing packages to effectively provide financial and statistical analysis as well as visualization in support of fact-based decisions. In addition, continuously improve and enhance ‘Riex’ by applying best practices and being in tune with users’ feedback and requirements. Please, make sure to review and acknowledge Investors Exchange Group (IEX Group) terms and conditions before using ‘Riex’ (<https://…/> ).
rifle Sparse Generalized Eigenvalue Problem
Implements the algorithms for solving sparse generalized eigenvalue problem by Tan, et. al. (2018). Sparse Generalized Eigenvalue Problem: Optimal Statistical Rates via Truncated Rayleigh Flow. To appear in Journal of the Royal Statistical Society: Series B. <arXiv:1604.08697>.
RImpact Calculates Measures of Scholarly Impact
The metrics() function calculates measures of scholarly impact. These include conventional measures, such as the number of publications and the total citations to all publications, as well as modern and robust metrics based on the vector of citations associated with each publication, such as the h index and many of its variants or rivals. These methods are described in Ruscio et al. (2012) <DOI: 10.1080/15366367.2012.711147>.
rinform An R Wrapper of the ‘Inform’ C Library for Information Analysis of Complex Systems
An R wrapper of the ‘Inform’ v1.0.0 C library for performing information analysis of complex systems. As for the ‘Inform’ library, ‘rinform’ is structured around the concepts of: 1) discrete empirical probability distributions, which form the basis for all of the information-theoretic measures; 2) classic information-theoretic measures built upon empirical distributions; and 3) measures of information dynamics for time series. In addition to the core components, ‘rinform’ also provides a collection of utilities to manipulate time series.
ring Circular / Ring Buffers
Circular / ring buffers in R and C. There are a couple of different buffers here with different implementations that represent different trade-offs.
RInno A Local Deployment Framework for Shiny Apps
Deploys local shiny apps using Inno Setup, an open source software that builds installers for Windows programs <http://…/>.
RInside C++ Classes to Embed R in C++ Applications
C++ classes to embed R in C++ applications The RInside packages makes it easier to have ‘R inside’ your C++ application by providing a C++ wrapper class providing the R interpreter. As R itself is embedded into your application, a shared library build of R is required. This works on Linux, OS X and even on Windows provided you use the same tools used to build R itself. Numerous examples are provided in the eight subdirectories of the examples/ directory of the installed package: standard, mpi (for parallel computing) qt (showing how to embed RInside inside a Qt GUI application), wt (showing how to build a ‘web-application’ using the Wt toolkit), armadillo (for RInside use with RcppArmadillo) and eigen (for RInside use with RcppEigen). The example use GNUmakefile(s) with GNU extensions, so a GNU make is required (and will use the GNUmakefile automatically). Doxygen-generated documentation of the C++ classes is available at the RInside website as well.
rIntervalTree An Interval Tree Tool for Real Numbers
This tool can be used to build binary interval trees using real number inputs. The tree supports queries of intervals overlapping a single number or an interval (start, end). Intervals with same bounds but different names are treated as distinct intervals. Insertion of intervals is also allowed. Deletion of intervals is not implemented at this point. See Mark de Berg, Otfried Cheong, Marc van Kreveld, Mark Overmars (2008). Computational Geometry: Algorithms and Applications, for a reference.
rintrojs Wrapper for the ‘intro.js’ Library
A wrapper for the ‘intro.js’ library (For more info: <http://www.introjs.com> ). This package makes it easy to include step-by-step introductions, and clickable hints in a ‘Shiny’ application. It supports both static introductions in the UI, and programmatic introductions from the server-side.
rio A Swiss-Army Knife for Data I/O
Streamlined data import and export by making assumptions that the user is probably willing to make: ‘import()’ and ‘export()’ determine the data structure from the file extension, reasonable defaults are used for data import and export (e.g., ‘stringsAsFactors=FALSE’), web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly without explicit decompression, and fast import packages are used where appropriate.
rioja Analysis of Quaternary Science Data
Constrained clustering, transfer functions, and other methods for analysing Quaternary science data.
rIP Passes an Array of IP Addresses to Iphub.info and Returns a Dataframe with Details of IP
Takes as its input an array of IPs and the user’s X-Key, passes these to <https://…/>, and returns a dataframe with the ip (used for merging), country code, country name, asn, isp, block, and hostname. Especially important in this is the variable ‘block’, which gives a score indicating whether the IP address is likely from a server farm and should be excluded from the data. It is codes 0 if the IP is residential/unclassified (i.e. safe IP), 1 if the IP is non-residential IP (hostping provider, proxy, etc. — should likely be excluded), and 2 for non-residential and residential IPs (more stringent, may flag innocent respondents). The recommendation from <https://…/> is to block or exclude those who score block = 1.
Rip46 Utils for IP4 and IP6 Addresses
Utility functions and S3 classes for IPv4 and IPv6 addresses, including conversion to and from binary representation.
rise Conduct RISE Analysis
Implements techniques for educational resource inspection, selection, and evaluation (RISE) described in Bodily, Nyland, and Wiley (2017) <doi:10.19173/irrodl.v18i2.2952>. Automates the process of identifying learning materials that are not effectively supporting student learning in technology-mediated courses by synthesizing information about access to course content and performance on assessments.
rIsing High-Dimensional Ising Model Selection
Fits an Ising model to a binary dataset using L1 regularized logistic regression and extended BIC. Also includes a fast lasso logistic regression function for high-dimensional problems. Uses the ‘libLBFGS’ optimization library by Naoaki Okazaki.
Risk Computes 26 Financial Risk Measures for Any Continuous Distribution
Computes 26 financial risk measures for any continuous distribution. The 26 financial risk measures include value at risk, expected shortfall due to Artzner et al. (1999) <DOI:10.1007/s10957-011-9968-2>, tail conditional median due to Kou et al. (2013) <DOI:10.1287/moor.1120.0577>, expectiles due to Newey and Powell (1987) <DOI:10.2307/1911031>, beyond value at risk due to Longin (2001) <DOI:10.3905/jod.2001.319161>, expected proportional shortfall due to Belzunce et al. (2012) <DOI:10.1016/j.insmatheco.2012.05.003>, elementary risk measure due to Ahmadi-Javid (2012) <DOI:10.1007/s10957-011-9968-2>, omega due to Shadwick and Keating (2002), sortino ratio due to Rollinger and Hoffman (2013), kappa due to Kaplan and Knowles (2004), Wang (1998)’s <DOI:10.1080/10920277.1998.10595708> risk measures, Stone (1973)’s <DOI:10.2307/2978638> risk measures, Luce (1980)’s <DOI:10.1007/BF00135033> risk measures, Sarin (1987)’s <DOI:10.1007/BF00126387> risk measures, Bronshtein and Kurelenkova (2009)’s risk measures.
riskParityPortfolio Design of Risk Parity Portfolios
Fast design of risk-parity portfolios for financial investment. The goal of the risk-parity portfolio formulation is to equalize or distribute the risk contributions of the different assets, which is missing if we simply consider the overall volatility of the portfolio as in the mean-variance Markowitz portfolio. In addition to the vanilla formulation, where the risk contributions are perfectly equalized subject to no shortselling and budget constraints, many other formulations are considered that allow for box constraints and shortselling, as well as the inclusion of additional objectives like the expected return and overall variance. See vignette for a detailed documentation and comparison, with several illustrative examples. The package is based on the papers: Y. Feng, and D. P. Palomar, ‘SCRIP: Successive Convex Optimization Methods for Risk Parity Portfolio Design,’ IEEE Trans. on Signal Processing, vol. 63, no. 19, pp. 5285-5300, Oct. 2015. <doi:10.1109/TSP.2015.2452219>. F. Spinu, ‘An Algorithm for Computing Risk Parity Weights,’ 2013. Available at SSRN: <https://ssrn.com/abstract=2297383> or <doi:10.2139/ssrn.2297383>. T. Griveau-Billion, J. Richard, and T. Roncalli, ‘A fast algorithm for computing High-dimensional risk parity portfolios,’ 2013. ArXiv preprint: <arXiv:1311.4057>.
RiskPortfolios Computation of Risk-Based Portfolios
Collection of functions designed to compute risk-based portfolios as described in Ardia et al. (2016) <doi:10.2139/ssrn.2650644> and Ardia et al. (2017) <doi:10.21105/joss.00171>.
riskPredictClustData Assessing Risk Predictions for Clustered Data
Assessing and comparing risk prediction rules for clustered data. The method is based on the paper: Rosner B, Qiu W, and Lee MLT.(2013) <doi: 10.1007/s10985-012-9240-6>.
riskR Risk Management
Computes risk measures from data, as well as performs risk management procedures such as practical risk measurement, capital requirement, capital allocation and decision-making.
riskyr Rendering Risk Literacy more Transparent
Risk-related information can be expressed in terms of probabilities or frequencies. By providing a toolbox of methods and metrics, we compute, translate, and represent risk-related information in a variety of ways. By offering different, but complementary perspectives on the interplay between key parameters, ‘riskyr’ renders teaching and training of risk literacy more transparent.
RiverLoad Load Estimation of River Compounds with Different Methods
Implements several of the most popular load estimation procedures, including averaging methods, ratio estimators and regression methods. The package provides an easy-to-use tool to rapidly calculate the load for various compounds and to compare different methods. The package also supplies additional functions to easily organize and analyze the data.
rjazz Official Client for ‘Jazz’
This is the official ‘Jazz’ client. ‘Jazz’ is a lightweight modular data processing framework, including a web server. It provides data persistence and computation capabilities accessible from ‘R’ and ‘Python’ and also through a REST API. <https://…/Jazz> See ?rjazz::rjazz to get a ‘Jazz’ server.
rjmcmc Reversible-Jump MCMC Using Post-Processing
Performs reversible-jump MCMC (Green, 1995) <doi:10.2307/2337340>, specifically the restriction introduced by Barker & Link (2013) <doi:10.1080/00031305.2013.791644>. By utilising a ‘universal parameter’ space, RJMCMC is treated as a Gibbs sampling problem. Previously-calculated posterior distributions are used to quickly estimate posterior model probabilities. Jacobian matrices are found using automatic differentiation.
rjsonapi Consumer for APIs that Follow the JSON API Specification
Consumer for APIs that Follow the JSON API Specification (<http://…/> ). Package mostly consumes data – with experimental support for serving JSON API data.
RJSplot Interactive Graphs with R
Creates interactive graphs with ‘R’. It joins the data analysis power of R and the visualization libraries of JavaScript in one package.
rJST Joint Sentiment Topic Modelling
Estimates the Joint Sentiment Topic model and its reversed variety, as described by Lin and He, 2009 <DOI:10.1145/1645953.1646003> and Lin, He, Everson and Ruger (2012) <DOI:10.1109/TKDE.2011.48>.
RJulia Integrating R and Julia
rjulia provides an interface between R and Julia. It allows a user to run a script in Julia from R, and maps objects between the two languages.
rkafka Using Apache ‘Kafka’ Messaging Queue Through ‘R’
Apache ‘Kafka’ is an open-source message broker project developed by the Apache Software Foundation which can be thought of as a distributed, partitioned, replicated commit log service.At a high level, producers send messages over the network to the ‘Kafka’ cluster which in turn serves them up to consumers.See http://kafka.apache.org for more information.Functions included in this package enable: 1. Creating ‘Kafka’ producer 2. Writing messages to a topic 3. Closing ‘Kafka’ producer 4. Creating ‘Kafka’ consumer 5. Reading messages from a topic 6. Closing ‘Kafka’ consumer. The jars required for this package are included in a separate package ‘rkafkajars’.Thanks to Mu Sigma for their continued support throughout the development of the package.
rkafkajars External Jars Required for Package ‘rkafka’
The ‘rkafkajars’ package collects all the external jars required for the ‘rkafka’ package.
RKEA R/KEA Interface
An R interface to KEA (Version 5.0). KEA (for Keyphrase Extraction Algorithm) allows for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. For more information see http://…/Kea .
RKEAjars R/KEA Interface Jars
External jars required for package RKEA.
RKEEL Using Keel in R Code
KEEL is a popular Java software for a large number of different knowledge data discovery tasks. This package takes the advantages of KEEL and R, allowing to use KEEL algorithms in simple R code. The implemented R code layer between R and KEEL makes easy both using KEEL algorithms in R as implementing new algorithms for ‘RKEEL’ in a very simple way. It includes more than 100 algorithms for classification, regression and preprocess, which allows a more complete experimentation process. For more information about KEEL, see <http://…/>.
RKEELdata Datasets from ‘KEEL’ for it Use in ‘RKEEL’
KEEL’ is a popular Java software for a large number of different knowledge data discovery tasks. Furthermore, ‘RKEEL’ is a package with a R code layer between R and ‘KEEL’, for using ‘KEEL’ in R code. This package includes the datasets from ‘KEEL’ in .dat format for its use in ‘RKEEL’ package. For more information about ‘KEEL’, see <http://…/>.
RKEELjars Java Executable .jar Files for ‘RKEEL’
KEEL is a popular Java software for a large number of different knowledge data discovery tasks. Furthermore, ‘RKEEL’ is a package with a R code layer between R and KEEL, for using KEEL in R code. This package downloads and install the .jar files necessary for ‘RKEEL’ algorithms execution. For more information about KEEL, see <http://…/>.
RKlout Fetch Klout Scores for Twitter Users
An interface of R to Klout API v2. It fetches Klout Score for a Twitter Username/handle in real time. Klout is a website and mobile app that uses social media analytics to rank its users according to online social influence via the ‘Klout Score’, which is a numerical value between 1 and 100. In determining the user score, Klout measures the size of a user’s social media network and correlates the content created to measure how other users interact with that content.
RkMetrics Hybrid Mortality Estimation
Hybrid Mortality Modelling (HMM) provides a framework in which mortality around ‘the accident hump’ and at very old ages can be modelled under a single model. The graphics’ codes necessary for visualization of the models’ output are included here.
rknn Random KNN Classification and Regression
Random knn classification and regression are implemented. Random knn based feature selection methods are also included. The approaches are mainly developed for high-dimensional data with small sample size.
RKUM Robust Kernel Unsupervised Methods
Robust kernel center matrix, robust kernel cross-covariance operator for kernel unsupervised methods, kernel canonical correlation analysis, influence function of identifying significant outliers or atypical objects from multimodal datasets. Alam, M. A, Fukumizu, K., Wang Y.-P. (2018) <doi:10.1016/j.neucom.2018.04.008>. Alam, M. A, Calhoun, C. D., Wang Y.-P. (2018) <doi:10.1016/j.csda.2018.03.013>.
rlang Functions for Base Types and Core R and ‘Tidyverse’ Features
A toolbox for working with base types, core R features like the condition system, and core ‘Tidyverse’ features like tidy evaluation.
rlas Read and Write ‘las’ and ‘laz’ Binary File Formats
Read and write ‘las’ and ‘laz’ binary file formats used to store LiDAR data.
rld Analyze and Design Repeated Low-Dose Challenge Experiments
Analyzes data from repeated low-dose challenge experiments and provide vaccine efficacy estimates. In addition, this package can provide guidance to design repeated low-dose challenge studies.
Rlda Bayesian LDA for Mixed-Membership Clustering Analysis
Describes the Bayesian LDA model for mixed-membership clustering based on different types of data (i.e., Multinomial, Bernoulli, and Binomial entries).
rLDCP Text Generation from Data
Linguistic Descriptions of Complex Phenomena (LDCP) is an architecture and methodology that allows us to model complex phenomena, interpreting input data, and generating automatic text reports customized to the user needs (see <doi:10.1016/j.ins.2016.11.002> and <doi:10.1007/s00500-016-2430-5> ). The proposed package contains a set of methods that facilitates the development of LDCP systems. It main goal is increasing the visibility and practical use of this research line.
RLeafAngle Estimates, Plots and Evaluates Leaf Angle Distribution Functions, Calculates Extinction Coefficients
Leaf angle distribution is described by a number of functions (e.g. ellipsoidal, Beta and rotated ellipsoidal). The parameters of leaf angle distributions functions are estimated through different empirical relationship. This package includes estimations of parameters of different leaf angle distribution function, plots and evaluates leaf angle distribution functions, calculates extinction coefficients given leaf angle distribution. Reference: Wang(2007)<doi:10.1016/j.agrformet.2006.12.003>.
rleafmap Interactive maps with R and Leaflet
rleafmap is an R package to display spatial data with interactive maps powered by Leaflet.
rlfsm Simulations and Statistical Inference for Linear Fractional Stable Motions
Contains functions for simulating linear fractional stable motions, according to techniques developed by Stoev and Taqqu (2004) <doi:10.1142/S0218348X04002379>, as well as functions for computing important statistics used with these processes introduced by Mazur, Otryakhin and Podolskij (2018) <arXiv:1802.06373>, and also different quantities related to those statistics.
Rlgt Bayesian Exponential Smoothing Models with Trend Modifications
An implementation of a number of Global Trend models for time series forecasting that are Bayesian generalizations and extensions of some Exponential Smoothing models. The main differences/additions include 1) nonlinear global trend, 2) Student-t error distribution, and 3) a function for the error size, so heteroscedasticity. The methods are particularly useful for short time series. When tested on the well-known M3 dataset, they are able to outperform all classical time series algorithms. The models are fitted with MCMC using the ‘rstan’ package.
Rlibeemd Ensemble Empirical Mode Decomposition (EEMD) and Its Complete Variant (CEEMDAN)
An R interface for C library libeemd for performing the ensemble empirical mode decomposition (EEMD), its complete variant (CEEMDAN) or the regular empirical mode decomposition (EMD).
Rlinkedin Access to LinkedIn API via R
This is a development version of an R package to access the LinkedIn API. I was motivated to create this after using and contributing to Pablo Barberá’s awesome Rfacebook package. Contributions are welcomed, and if you come across any errors please don’t hesitate to open a new issue. At the bottom of this readme is a list of the functions I would still like to add to the package. If you’d like to contribute or simply learn more about accessing the API, get started by visiting the LinkedIn Developer page.
Rlinsolve Iterative Solvers for (Sparse) Linear System of Equations
Solving a system of linear equations is one of the most fundamental computational problems for many fields of mathematical studies, such as regression problems from statistics or numerical partial differential equations. We provide basic stationary iterative solvers such as Jacobi, Gauss-Seidel, Successive Over-Relaxation and SSOR methods. Nonstationary – or, Krylov subspace methods are also provided; Conjugate Gradient, Conjugate Gradient Squared, Biconjugate Gradient, and Biconjugate Gradient Stabilized methods. Sparse matrix computation is also supported in that solving large and sparse linear systems can be manageable using ‘Matrix’ package along with ‘RcppArmadillo’. For a more detailed description, see a book by Saad (2003) <doi:10.1137/1.9780898718003>.
rlm Robust Fitting of Linear Model
Robust fitting of linear model which can take response in matrix form.
rlmDataDriven Robust Regression with Data Driven Tuning Parameter
Data driven approach for robust regression estimation. See Wang et al. (2007), <doi:10.1198/106186007X180156>.
rlme Rank-Based Estimation and Prediction in Random Effects Nested Models
Estimates robust rank-based fixed effects and predicts robust random effects in two- and three- level random effects nested models. The methodology is described in Bilgic & Susmann (2013) <https://…/>.
Rlof R Parallel Implementation of Local Outlier Factor(LOF)
R parallel implementation of Local Outlier Factor(LOF) which uses multiple CPUs to significantly speed up the LOF computation for large datasets. (Note: The overall performance depends on the computers especially the number of the cores).It also supports multiple k values to be calculated in parallel, as well as various distance measures in addition to the default Euclidean distance.
RLogicalOps Process Logical Operations
Processing logical operations such as AND/OR/NOT operations dynamically. It also handles nesting in the operations.
RLT Reinforcement Learning Trees
Random forest with a variety of additional features for regression, classification and survival analysis. The features include: parallel computing with OpenMP, embedded model for selecting the splitting variable (based on Zhu, Zeng & Kosorok, 2015), subject weight, variable weight, tracking subjects used in each tree, etc.
rLTP R interface to LTP-Cloud service
R interface to LTP-Cloud service for Natural Language Processing in Chinese. For more details please visit http://www.ltp-cloud.com . Visit https://…/rLTP for up-to-date version
rly Lex and Yacc
R implementation of the common parsing tools lex and yacc.
RM.weights Weighted Rasch Modeling and Extensions using Conditional Maximum Likelihood
Rasch model and extensions for survey data, using Conditional Maximum likelihood (CML).
rma.exact Exact Confidence Intervals for Random Effects Meta-Analyses
Compute an exact CI for the population mean under a random effects model. The routines implement the algorithm described in Michael, Thronton, Xie, and Tian (2017) <https://…/Exact_Inference_Meta.pdf>.
Rmagic MAGIC – Markov Affinity-Based Graph Imputation of Cells
MAGIC (Markov affinity-based graph imputation of cells) is a method for addressing technical noise in single-cell data, including under-sampling of mRNA molecules, often termed ‘dropout’ which can severely obscure important gene-gene relationships. MAGIC shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. Read more: van Dijk et al. (2018) <DOI:10.1016/j.cell.2018.05.061>.
rmake Makefile Generator for R Analytical Projects
Creates and maintains a build process for complex analytic tasks in R. Package allows to easily generate Makefile for the (GNU) ‘make’ tool, which drives the build process by (parallelly) executing build commands in order to update results accordingly to given dependencies on changed data or updated source files.
rmapshaper Edit ‘GeoJSON’ and Spatial Objects
Edit and simplify ‘geojson’ and ‘Spatial’ objects. This is wrapper around the ‘mapshaper’ ‘javascript’ library <https://…/> to perform topologically-aware polygon simplification, as well as other operations such as clipping, erasing, dissolving, and converting ‘multi-part’ to ‘single-part’ geometries. It relies on the ‘geojsonio’ package for working with ‘geojson’ objects, and the ‘sp’ and ‘rgdal’ packages for working with ‘Spatial’ objects.
RMatlab RMatlab Package
This is the early release of a bi-directional interface between the R and Matlab languages. fe The idea is that Matlab users can call R functions as if they are regular Matlab functions without learning the R language, and similarly R users can access Matlab functionality from with the familiar R environment and syntax.
rmcfs The MCFS-ID Algorithm for Feature Selection and Interdependency Discovery
MCFS-ID (Monte Carlo Feature Selection and Interdependency Discovery) is a Monte Carlo method-based tool for feature selection. It also allows for the discovery of interdependencies between the relevant features. MCFS-ID is particularly suitable for the analysis of high-dimensional, ‘small n large p’ transactional and biological data.
rmcorr Repeated Measures Correlation
Compute the repeated measures correlation, a statistical technique for determining the overall within-individual relationship among paired measures assessed on two or more occasions, first introduced by Bland and Altman (1995). Includes functions for diagnostics, p-value, effect size with confidence interval including optional bootstrapping, as well as graphing. Also includes several example datasets.
rmda Risk Model Decision Analysis
Provides tools to evaluate the value of using a risk prediction instrument to decide treatment or intervention (versus no treatment or intervention). Given one or more risk prediction instruments (risk models) that estimate the probability of a binary outcome, rmda provides functions to estimate and display decision curves and other figures that help assess the population impact of using a risk model for clinical decision making. Here, ‘population’ refers to the relevant patient population. Decision curves display estimates of the (standardized) net benefit over a range of probability thresholds used to categorize observations as ‘high risk’. The curves help evaluate a treatment policy that recommends treatment for patients who are estimated to be ‘high risk’ by comparing the population impact of a risk-based policy to ‘treat all’ and ‘treat none’ intervention policies. Curves can be estimated using data from a prospective cohort. In addition, rmda can estimate decision curves using data from a case-control study if an estimate of the population outcome prevalence is available. Version 1.4 of the package provides an alternative framing of the decision problem for situations where treatment is the standard-of-care and a risk model might be used to recommend that low-risk patients (i.e., patients below some risk threshold) opt out of treatment. Confidence intervals calculated using the bootstrap can be computed and displayed. A wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
rmdfiltr Lua’ filters for R Markdown
A collection of ‘Lua’ filters that extend the functionality of R Markdown templates (e.g., count words or post-process ‘pandoc-citeproc’- citations).
rmdformats HTML Output Formats and Templates for ‘rmarkdown’ Documents
HTML formats and templates for ‘rmarkdown’ documents, with some extra features such as automatic table of contents, lightboxed figures, dynamic crosstab helper.
rmdHelpers Helper Functions for Rmd Documents
A series of functions to aid in repeated tasks for Rmd documents. All details are to my personal preference, though I am happy to add flexibility if there are use cases I am missing. I will continue updating with new functions as I add utility functions for myself.
rmdshower R’ ‘Markdown’ Format for ‘shower’ Presentations
R’ ‘Markdown’ format for ‘shower’ presentations, see <https://…/shower>.
rMEA Synchrony in Motion Energy Analysis (MEA) Time-Series
A suite of tools useful to read, visualize and export bivariate motion energy time-series. Lagged synchrony between subjects can be analyzed through windowed cross-correlation. Surrogate data generation allows an estimation of pseudosynchrony that helps to estimate the effect size of the observed synchronization. Ramseyer & Tschacher (2011) <doi:10.1037/a0023419>.
RMediation Mediation Analysis Confidence Intervals
We provide functions to compute confidence intervals (CIs) for a well-defined nonlinear function of the model parameters (e.g., product of k coefficients) in single–level and multilevel structural equation models.
rmetalog R Implementation of the Metalog Distribution
Implementation of the metalog distribution in R. The metalog distribution is a modern, highly flexible, data-driven distribution. Metalogs are developed by Keelin (2016) <doi:10.1287/deca.2016.0338>. This package provides functions to build these distributions from raw data. Resulting metalog objects are then useful for exploratory and probabilistic analysis.
rmi Mutual Information Estimators
Provides mutual information estimators based on k-nearest neighbor estimators by A. Kraskov, et al. (2004) <doi:10.1103/PhysRevE.69.066138>, S. Gao, et al. (2015) <http://…/gao15.pdf> and local density estimators by W. Gao, et al. (2017) <doi:10.1109/ISIT.2017.8006749>.
rmio Provides ‘mio’ C++11 Header Files
Provides header files of ‘mio’, a cross-platform C++11 header-only library for memory mapped file IO <https://…/mio>.
Rmixmod An Interface for MIXMOD
A collection of functions designed to run supervised and unsupervised classification with MIXture MODelling.
RMKL Multiple Kernel Learning for Classification or Regression Problems
Provides R and C++ function that enable the user to conduct multiple kernel learning (MKL) and cross validation for support vector machine (SVM) models. Cross validation can be used to identify kernel shapes and hyperparameter combinations that can be used as candidate kernels for MKL. There are three implementations provided in this package, namely SimpleMKL Alain Rakotomamonjy et. al (2008), Simple and Efficient MKL Xu et. al (2010), and Dual augmented Lagrangian MKL Suzuki and Tomioka (2011) <doi:10.1007/s10994-011-5252-9>. These methods identify the convex combination of candidate kernels to construct an optimal hyperplane.
RMOA Connect R with MOA for Massive Online Analysis
Connect R with MOA ( Massive Online Analysis – http://moa.cms.waikato.ac.nz ) to build classification models and regression models on streaming data or out-of-RAM data
rmonad A Monadic Pipeline System
A monadic solution to pipeline analysis. All operations — and the errors, warnings and messages they emit — are merged into a directed graph. Infix binary operators mediate when values are stored, how exceptions are handled, and where pipelines branch and merge. The resulting structure may be queried for debugging or report generation. ‘rmonad’ complements, rather than competes with, non-monadic pipeline packages like ‘magrittr’ or ‘pipeR’.
Rmonkey A Survey Monkey R Client
Programmatic access to the Survey Monkey API https://developer.surveymonkey.com , which currently provides extensive functionality for monitoring surveys and retrieving survey results and some functionality for creating new surveys and data collectors.
rMouse Automate Mouse Clicks and Send Keyboard Input
Provides wrapper functions to the Java Robot class to automate user input, like mouse movements, clicks and keyboard input.
rmpw Causal Mediation Analysis Using Weighting Approach
We implement causal mediation analysis using the methods proposed by Hong (2010) and Hong, Deutsch & Hill (2015) <doi:10.3102/1076998615583902>. It allows the estimation and hypothesis testing of causal mediation effects through ratio of mediator probability weights (RMPW). This strategy conveniently relaxes the assumption of no treatment-by-mediator interaction while greatly simplifying the outcome model specification without invoking strong distributional assumptions.
rmsfuns Quickly View Data Frames in Excel, Build Folder Paths and Create Date Vectors
Contains several useful navigation helper functions, including easily building folder paths, quick viewing dataframes in ‘Excel’, creating date vectors and changing the console prompt to reflect time.
RMThreshold Signal-Noise Separation in Random Matrices by using Eigenvalue Spectrum Analysis
An algorithm which can be used to determine an objective threshold for signal-noise separation in large random matrices (correlation matrices, mutual information matrices, network adjacency matrices) is provided. The package makes use of the results of Random Matrix Theory (RMT). The algorithm increments a suppositional threshold monotonically, thereby recording the eigenvalue spacing distribution of the matrix. According to RMT, that distribution undergoes a characteristic change when the threshold properly separates signal from noise. By using the algorithm, the modular structure of a matrix – or of the corresponding network – can be unraveled.
RMTL Regularized Multi-Task Learning
Efficient solvers for 10 regularized multi-task learning algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. Based on the accelerated gradient descent method, the algorithms feature a state-of-art computational complexity O(1/k^2). Sparse model structure is induced by the solving the proximal operator. The detail of the package is described in the paper of Han Cao and Emanuel Schwarz (2018) <doi:10.1093/bioinformatics/bty831>.
rmutil Utilities for Nonlinear Regression and Repeated Measurements Models
A toolkit of functions for nonlinear regression and repeated measurements not to be used by itself but called by other Lindsey packages such as ‘gnlm’, ‘stable’, ‘growth’, ‘repeated’, and ‘event’ (available at <http://…/rcode.html> ).
RMySQL Database Interface and MySQL Driver for R
Implements DBI-compliant Interface to MySQL and MariaDB Databases.
rmytarget Load Data from ‘MyTarget API’
Allows work with ‘MyTarget API’ <https://…/> and load data by ads, campaigns and statistic from your ads account.
RNAseqNet Log-Linear Poisson Graphical Model with Hot-Deck Multiple Imputation
Infer log-linear Poisson Graphical Model with an auxiliary data set. Hot-deck multiple imputation method is used to improve the reliability of the inference with an auxiliary dataset. Standard log-linear Poisson graphical model can also be used for the inference and the Stability Approach for Regularization Selection (StARS) is implemented to drive the selection of the regularization parameter.
rnaturalearth World Map Data from Natural Earth
Facilitates mapping by making natural earth map data from <http://…/> more easily available to R users.
rnaturalearthdata World Vector Map Data from Natural Earth Used in ‘rnaturalearth’
Vector map data from <http://…/>. Access functions are provided in the accompanying package ‘rnaturalearth’.
RNaviCell Visualization of High-Throughput Data on Large-Scale Biological Networks
Provides a set of functions to access a data visualization web service. For more information and a tutorial on how to use it, see https://…/nav_web_service.html and https://…/RNaviCell.
RNeo4j Neo4j Driver for R
Neo4j, a graph database, allows users to store their data as a property graph. A graph consists of nodes that are connected by relationships; both nodes and relationships can have properties, or key-value pairs. RNeo4j is Neo4j’s R driver. It allows users to read and write data from and to Neo4j directly from their R environment by exposing an interface for interacting with nodes, relationships, paths, and more. Most notably, it allows users to retrieve Cypher query results as R data frames, where Cypher is Neo4j’s graph query language. Visit <http://www.neo4j.com> to learn more about Neo4j.
Rnets Resistance Relationship Networks using Graphical LASSO
Novel methods are needed to analyze the large amounts of antimicrobial resistance (AMR) data generated by AMR surveillance programs. This package is used to estimate resistance relationship networks, or ‘Rnets’, from empirical antimicrobial susceptibility data. These networks can be used to study relationships between antimicrobial resistances (typically measured using MICs) and genes in populations. The ‘GitHub’ for this package is available at <https://…/Rnets>. Bug reports and features requests should be directed to the same ‘GitHub’ site. The methods used in ‘Rnets’ are available in the following publications: An overview of the method in WJ Love, et al., ‘Markov Networks of Collateral Resistance: National Antimicrobial Resistance Monitoring System Surveillance Results from Escherichia coli Isolates, 2004-2012’ (2016) <doi:10.1371/journal.pcbi.1005160>; The graphical LASSO for sparsity in J Friedman, T Hastie, R Tibshirani ‘Sparse inverse covariance estimation with the graphical lasso’ (2007) <doi:10.1093/biostatistics/kxm045>; L1 penalty selection in H Liu, K Roeder, L Wasserman ‘Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models’ (2010) <arXiv:1006.3316>; Modularity for graphs with negative edge weights in S Gomez, P Jensen, A Arenas. ‘Analysis of community structure in networks of correlated data’ (2009) <doi:10.1103/PhysRevE.80.016114>.
RNewsflow Tools for Analyzing Content Homogeneity and News Diffusion using Computational Text Analysis
A collection of tools for measuring the similarity of news content and tracing the flow of (news) messages over time and across media.
RNGforGPD Random Number Generation for Generalized Poisson Distributions
Generation of univariate and multivariate data that follow the generalized Poisson distribution. The details of the method are explained in Demirtas (2017) <DOI:10.1080/03610918.2014.968725>.
rngSetSeed Seeding the Default RNG with a Numeric Vector
A function setVectorSeed() is provided. Its argument is a numeric vector of an arbitrary nonzero length, whose components have integer values from [0, 2^32-1]. The input vector is transformed using AES (Advanced Encryption Standard) algorithm into an initial state of Mersenne-Twister random number generator. The function provides a better alternative to the R base function set.seed(), if the input vector is a single integer. Initializing a stream of random numbers with a vector is a convenient way to obtain several streams, each of which is identified by several integer indices.
rngtools Utility functions for working with Random Number Generators
This package contains a set of functions for working with Random Number Generators (RNGs). In particular, it defines a generic S4 framework for getting/setting the current RNG, or RNG data that are embedded into objects for reproducibility. Notably, convenient default methods greatly facilitate the way current RNG settings can be changed.
RNifti Fast R and C++ Access to NIfTI Images
Provides very fast access to images stored in the NIfTI-1 file format <http://…/nifti-1>, with seamless synchronisation between compiled C and interpreted R code. Not to be confused with RNiftyReg, which provides tools for image registration.
rNMF Robust Nonnegative Matrix Factorization
An implementation of robust nonnegative matrix factorization (rNMF). The rNMF algorithm decomposes a nonnegative high dimension data matrix into the product of two low rank nonnegative matrices, while detecting and trimming outliers. The main function is rnmf(). The package also includes a visualization tool, see(), that arranges and prints vectorized images.
rnn Recurrent Neural Network
Implementation of a Recurrent Neural Network in R.
rNOMADS An Interface to the NOAA Operational Model Archive and Distribution System
An interface to the National Oceanic and Atmospheric Administration’s Operational Model Archive and Distribution System (NOMADS) that allows R users to quickly and efficiently download global and regional weather model data for processing. rNOMADS currently supports a variety of models ranging from global weather data to an altitude of 40 km, to high resolution regional weather models, to wave and sea ice models. It can also retrieve archived NOMADS models. rNOMADS can retrieve binary data in grib format as well as import ascii data directly into R by interfacing with the GrADS-DODS system.
http://rnomads.r-forge.r-project.org
https://…rnomads-with-grib-file-support-on-windows
rnr Rosenbaum and Rubin Sensitivity
Apply sensitivity analysis for offline policy evaluation, as implemented in Jung et al. (2017) <arXiv:1702.04690> based on Rosenbaum and Rubin (1983) <http://…/2345524>.
roadoi Find Free Versions of Scholarly Publications via the oaDOI Service
This web client interfaces oaDOI <https://oadoi.org>, a service finding free full-texts of academic papers by linking DOIs with open access journals and repositories. It provides unified access to various data sources for open access full-text links including Crossref, Bielefeld Academic Search Engine (BASE) and the Directory of Open Access Journals (DOAJ). API usage is free and no registration is required.
roahd Robust Analysis of High Dimensional Data
A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.
RobAStRDA Interpolation Grids for Packages of the ‘RobASt’ – Family of Packages
Includes ‘sysdata.rda’ file for packages of the ‘RobASt’ – family of packages; is currently used by package ‘RobExtremes’ only.
robcbi Conditionally Unbiased Bounded Influence Estimates
Conditionally unbiased bounded influence estimates as described in Kuensch et al. (1989) <doi:10.1080/01621459.1989.10478791> in three special cases of the generalized linear model: Bernoulli, Binomial, and Poisson distributed responses.
robets Forecasting Time Series with Robust Exponential Smoothing
We provide an outlier robust alternative of the function ets() in the ‘forecast’ package of Hyndman and Khandakar (2008)<DOI:10.18637/jss.v027.i03>. For each method of a class of exponential smoothing variants we made a robust alternative. The class includes methods with a damped trend and/or seasonal components. The robust method is developed by robustifying every aspect of the original exponential smoothing variant. We provide robust forecasting equations, robust initial values, robust smoothing parameter estimation and a robust information criterion. The method is described in more detail in Crevits and Croux (2016)<DOI:10.13140/RG.2.2.11791.18080>.
RobExtremes Optimally Robust Estimation for Extreme Value Distributions
Optimally robust estimation for extreme value distributions using S4 classes and methods (based on packages ‘distr’, ‘distrEx’, ‘distrMod’, ‘RobAStBase’, and ‘ROptEst’).
robFitConGraph Graph-Constrained Robust Covariance Estimation
Contains a single function named robFitConGraph() which includes two algorithms for robust estimation of scatter matrices subject to zero-constraints in its inverse. The methodology is described in Vogel & Tyler (2014) <doi:10.1093/biomet/asu041>. See robFitConGraph() function documentation for further details.
robmed (Robust) Mediation Analysis
Perform mediation analysis via a bootstrap test.
robmixglm Robust Generalized Linear Models (GLM) using Mixtures
Robust generalized linear models (GLM) using a mixture method, as described in Beath (2018) <doi:10.1080/02664763.2017.1414164>. This assumes that the data are a mixture of standard observations, being a generalised linear model, and outlier observations from an overdispersed generalized linear model. The overdispersed linear model is obtained by including a normally distributed random effect in the linear predictor of the generalized linear model.
RobStatTM Robust Statistics: Theory and Methods
Companion package for the book: ‘Robust Statistics: Theory and Methods, second edition’, <http://…/robust>. This package contains code that implements the robust estimators discussed in the recent second edition of the book above, as well as the scripts reproducing all the examples in the book.
robustarima Robust ARIMA Modeling
Functions for fitting a linear regression model with ARIMA errors using a filtered tau-estimate.
robustBLME Robust Bayesian Linear Mixed-Effects Models using ABC
Bayesian robust fitting of linear mixed effects models through weighted likelihood equations and approximate Bayesian computation as proposed by Ruli et al. (2017) <arXiv:1706.01752>.
RobustCalibration Robust Calibration of Imperfect Mathematical Models
Implements full Bayesian analysis for calibrating mathematical models with new methodology for modeling the discrepancy function. It allows for emulation, calibration and prediction using complex mathematical model outputs and experimental data. See the reference: Mengyang Gu and Long Wang (2017) <arXiv:1707.08215>.
robustDA Robust Mixture Discriminant Analysis
Robust mixture discriminant analysis (RMDA, Bouveyron & Girard, 2009) allows to build a robust supervised classifier from learning data with label noise. The idea of the proposed method is to confront an unsupervised modeling of the data with the supervised information carried by the labels of the learning data in order to detect inconsistencies. The method is able afterward to build a robust classifier taking into account the detected inconsistencies into the labels.
RobustEM Robust Mixture Modeling Fitted via Spatial-EM Algorithm for Model-Based Clustering and Outlier Detection
The Spatial-EM is a new robust EM algorithm for the finite mixture learning procedures. The algorithm utilizes median- based location and rank-based scatter estimators to replace sample mean and sample covariance matrix in each M step, hence enhancing stability and robustness of the algorithm. To understand more about this algorithm, read the article ‘Yu, K., Dang, X., Bart Jr, H. and Chen, Y. (2015). Robust Model- based Learning via Spatial-EM Algorithm. IEEE Transactions on Knowledge and Data Engineering, 27(6), 1670-1682. doi:10.1109/TKDE.2014.2373355’.
robustETM Robust Methods using Exponential Tilt Model
Testing homogeneity for generalized exponential tilt model. This package includes a collection of functions for (1) implementing methods for testing homogeneity for generalized exponential tilt model; and (2) implementing existing methods under comparison.
RobustGaSP Robust Gaussian Stochastic Process Emulation
Robust parameter estimation and prediction of Gaussian stochastic process emulators. Important functions : rgasp(), predict.rgasp().
robustrank Robust Rank-Based Tests
Implements several rank-based tests, including the modified Wilcoxon-Mann-Whitney two sample location test, also known as the Fligner-Policello test.
robustrao An Extended Rao-Stirling Diversity Index to Handle Missing Data
A collection of functions to compute the Rao-Stirling diversity index (Porter and Rafols, 2009) <DOI:10.1007/s11192-008-2197-2> and its extension to acknowledge missing data (i.e., uncategorized references) by calculating its interval of uncertainty using mathematical optimization as proposed in Calatrava et al. (2016) <DOI:10.1007/s11192-016-1842-4>. The Rao-Stirling diversity index is a well-established bibliometric indicator to measure the interdisciplinarity of scientific publications. Apart from the obligatory dataset of publications with their respective references and a taxonomy of disciplines that categorizes references as well as a measure of similarity between the disciplines, the Rao-Stirling diversity index requires a complete categorization of all references of a publication into disciplines. Thus, it fails for a incomplete categorization; in this case, the robust extension has to be used, which encodes the uncertainty caused by missing bibliographic data as an uncertainty interval. Classification / ACM – 2012: Information systems ~ Similarity measures, Theory of computation ~ Quadratic programming, Applied computing ~ Digital libraries and archives.
robustsae Robust Bayesian Small Area Estimation
Functions for Robust Bayesian Small Area Estimation.
roccv ROC for Cross Validation Results
Cross validate large genetic data while specifying clinical variables that should always be in the model using the function cv(). An ROC plot from the cross validation data with AUC can be obtained using rocplot(), which also can be used to compare different models.
rocNIT Non-Inferiority Test for Paired ROC Curves
Non-inferiority test and diagnostic test are very important in clinical trails. This package is to get a p value from the non-inferiority test for ROC curves from diagnostic test.
ROCR Visualizing the Performance of Scoring Classifiers
ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.
ROCS Receiver Operating Characteristics Surface
Plots the Receiver Operating Characteristics Surface for high-throughput class-skewed data, calculates the Volume under the Surface (VUS) and the FDR-Controlled Area Under the Curve (FCAUC), and conducts tests to compare two ROC surfaces.
rocsvm.path The Entire Solution Paths for ROC-SVM
We develop the entire solution paths for ROC-SVM presented by Rakotomamonjy. The ROC-SVM solution path algorithm greatly facilitates the tuning procedure for regularization parameter, lambda in ROC-SVM by avoiding grid search algorithm which may be computationally too intensive. For more information on the ROC-SVM, see the report in the ROC Analysis in AI workshop(ROCAI-2004) : Hernàndez-Orallo, José, et al. (2004) <doi:10.1145/1046456.1046489>.
rocTree Receiver Operating Characteristic (ROC)-Guided Classification and Survival Tree
Receiver Operating Characteristic (ROC)-guided survival trees and forests algorithms are implemented, providing a unified framework for tree-structured analysis with censored survival outcomes. A time-invariant partition scheme on the survivor population was considered to incorporate time-dependent covariates. Motivated by ideas of randomized tests, generalized time-dependent ROC curves were used to evaluate the performance of survival trees and establish the optimality of the target hazard function. The optimality of the target hazard function motivates us to use a weighted average of the time-dependent area under the curve (AUC) on a set of time points to evaluate the prediction performance of survival trees and to guide splitting and pruning. A detailed description of the implemented methods can be found in Sun et al. (2019) <arXiv:1809.05627>.
RODBC ODBC Database Access
An ODBC database interface.
RODBCDBI Provides Access to Databases Through the ODBC Interface
An implementation of R’s DBI interface using ODBC package as a back-end. This allows R to connect to any DBMS that has a ODBC driver.
rodeo A Code Generator for ODE-Based Models
Provides a reference class and several utility methods to facilitate the implementation of models based on ordinary differential equations. The heart of the package is a code generator that creates compiled ‘Fortran’ (or ‘R’) code which can be passed to a numerical solver. There is direct support for solvers contained in packages ‘deSolve’ and ‘rootSolve’.
ROI.models.miplib R Optimization Infrastructure: ‘MIPLIB’ 2010 Benchmark Instances
The mixed integer programming library ‘MIPLIB’ (see <http://…/> ) is commonly used to compare the performance of mixed integer optimization solvers. This package provides functions to access ‘MIPLIB’ from the ‘R’ Optimization Infrastructure (‘ROI’). More information about ‘MIPLIB’ can be found in the paper by Koch et al. available at <http://…/28>. The ‘README.md’ file illustrates how to use this package.
ROI.models.netlib ROI’ Optimization Problems Based on ‘NETLIB-LP’
A collection of ‘ROI’ optimization problems based on the ‘NETLIB-LP’ collection. ‘Netlib’ is a software repository, which amongst many other software for scientific computing contains a collection of linear programming problems. The purpose of this package is to make this problems easily accessible from ‘R’ as ‘ROI’ optimization problems.
ROI.plugin.alabama alabama’ Plugin for the ‘R’ Optimization Infrastructure
Enhances the R Optimization Infrastructure (‘ROI’) package with the ‘alabama’ solver for solving nonlinear optimization problems.
ROI.plugin.clp Clp (Coin-or linear programming)’ Plugin for the ‘R’ Optimization Interface
Enhances the R Optimization Infrastructure (ROI) package by registering the COIN-OR Clp open-source solver from the COIN-OR suite <https://…/>. It allows for solving linear programming with continuous objective variables keeping sparse constraints definition.
ROI.plugin.cplex ROI Plug-in CPLEX
Enhances the R Optimization Infrastructure (ROI) package by registering the CPLEX commercial solver. It allows for solving mixed integer quadratically constrained programming (MIQPQC) problems as well as all variants/combinations of LP, QP, QCP, IP.
ROI.plugin.deoptim DEoptim’ and ‘DEoptimR’ Plugin for the ‘R’ Optimization Interface
Enhances the R Optimization Infrastructure (‘ROI’) package with the ‘DEoptim’ and ‘DEoptimR’ package. ‘DEoptim’ is used for unconstrained optimization and ‘DEoptimR’ for constrained optimization.
ROI.plugin.ecos ROI-Plugin ECOS
Enhances the R Optimization Infrastructure (ROI) package with the Embedded Conic Solver (ECOS) for solving conic optimization problems.
ROI.plugin.ipop ROI Plug-in {ipop}
Enhances the R Optimization Infrastructure (‘ROI’) package by registering the ipop solver from package ‘kernlab’.
ROI.plugin.lpsolve lp_solve’ Plugin for the ‘R’ Optimization Interface
Enhances the ‘R’ Optimization Infrastructure (‘ROI’) package with the ‘lp_solve’ solver.
ROI.plugin.msbinlp Multi-Solution’ Binary Linear Problem Plugin for the ‘R’ Optimization Interface
Enhances the ‘R’ Optimization Infrastructure (‘ROI’) package with the possibility to obtain multiple solutions for linear problems with binary variables. The main function is copied (with small modifications) from the relations package.
ROI.plugin.neos NEOS’ Plug-in for the ‘R’ Optimization Interface
Enhances the ‘R’ Optimization Infrastructure (‘ROI’) package with a connection to the ‘neos’ server. ‘ROI’ optimization problems can be directly be sent to the ‘neos’ server and solution obtained in the typical ‘ROI’ style.
ROI.plugin.nloptr ROI-Plugin NLOPTR
Enhances the R Optimization Infrastructure (ROI) package with the NLopt solver for solving nonlinear optimization problems.
ROI.plugin.optimx ROI’-Plugin ‘optimx’
Enhances the R Optimization Infrastructure (‘ROI’) package with the ‘optimx’ package.
ROI.plugin.scs ROI-Plugin SCS
Enhances the R Optimization Infrastructure (ROI) package with the SCS solver for solving convex cone problems.
roll Rolling Statistics
Parallel functions for computing rolling statistics of time-series data.
rollmatch Rolling Entry Matching
Functions to perform propensity score matching on rolling entry interventions for which a suitable ‘entry’ date is not observed for nonparticipants. For more details, please reference Witman, Beadles, Hoerger, Liu, Kafali, Gandhi, Amico, and Larsen (2016) <https://…/9375>.
rollply Moving-Window Add-on for ‘plyr’
Apply a function in a moving window, then combine the results in a data frame.
rollRegres Fast Rolling and Expanding Window Linear Regression
Methods for fast rolling and expanding linear regression models. That is, series of linear regression models estimated on either an expanding window of data or a moving window of data. The methods use rank-one updates and downdates of the upper triangular matrix from a QR decomposition (see J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart (1979) <doi:10.1137/1.9781611971811>).
roloc Convert Colour Specification to Colour Name
Functions to convert an R colour specification to a colour name. The user can select and create different lists of colour names and different colour metrics for the conversion.
rolocISCCNBS A Colour List and Colour Metric Based on the ISCC-NBS System of Color Designation
A colour list and colour metric based on the ISCC-NBS System of Color Designation for use with the ‘roloc’ package for converting colour specifications to colour names.
ROlogit Fit Rank-Ordered Logit (RO-Logit) Model
Implements the rank-ordered logit (RO-logit) model for stratified analysis of continuous outcomes introduced by Tan et al. (2017) <doi:10.1177/0962280217747309>. Model diagnostics based on the heuristic residuals and estimates in linear scales are available from the package, and outcomes with ties are supported.
rolr Finding Optimal Three-Group Splits Based on a Survival Outcome
Provides fast procedures for exploring all pairs of cutpoints of a single covariate with respect to survival and determining optimal cutpoints using a hierarchical method and various ordered logrank tests.
rolypoly Identifying Trait-Relevant Functional Annotations
Using enrichment of genome-wide association summary statistics to identify trait-relevant cellular functional annotations.
rootWishart Distribution of Largest Root for Single and Double Wishart Settings
Functions for hypothesis testing in single and double Wishart settings, based on Roy’s largest root. This test statistic is especially useful in multivariate analysis. The computations are based on results by Chiani (2014) <DOI:10.1016/j.jmva.2014.04.002> and Chiani (2016) <DOI:10.1016/j.jmva.2015.10.007>. They use the fact that the CDF is related to the Pfaffian of a matrix that can be computed in a finite number of iterations. This package takes advantage of the Boost and Eigen C++ libraries to perform multi-precision linear algebra.
ROP Regression Optimized: Numerical Approach for Multivariate Classification and Regression Trees
Trees Classification and Regression using multivariate nodes calculated by an exhaustive numerical approach. We propose a new concept of decision tree, including multivariate knots and non hierarchical pathway. This package’s model uses a multivariate nodes tree that calculates directly a risk score for each observation for the state Y observed. Nguyen JM, Gaultier A, Antonioli D (2015) <doi:10.1016/j.respe.2018.03.088> Castillo JM, Knol AC, Nguyen JM, Khammari A, Saint Jean M, Dreno B (2016) <doi:10.1684/ejd.2016.2826> Vildy S, Nguyen JM, Gaultier A, Khammari A, Dreno B (2017) <doi:10.1684/ejd.2016.2955> Nguyen JM, Gaultier A, Antonioli D (2018) <doi:10.1016/j.respe.2018.03.088>.
rope Model Selection with FDR Control of Selected Variables
Selects one model with variable selection FDR controlled at a specified level. A q-value for each potential variable is also returned. The input, variable selection counts over many bootstraps for several levels of penalization, is modeled as coming from a beta-binomial mixture distribution.
ropendata Query and Download ‘Rapid7’ ‘Cybersecurity’ Data Sets
Rapid7′ collects ‘cybersecurity’ data and makes it available via their ‘Open Data’ <http://opendata.rapid7.com> portal which has an API. Tools are provided to assist in querying for available data sets and downloading any data set authorized to a free, registered account.
ROpenDota Access OpenDota Services in R
Provides a client for the API of OpenDota. OpenDota is a web service which is provide DOTA2 real time data. Data is collected through the Steam WebAPI. With ROpenDota you can easily grab the latest DOTA2 statistics in R programming such as latest match on official international competition, analyzing your or enemy performance to learn their strategies,etc. Please see <https://…/ROpenDota> for more information.
roperators Additional Operators to Help you Write Cleaner R Code
Provides string arithmetic, reassignment operators, logical operators that handle missing values, and extra logical operators such as floating point equality and all or nothing. The intent is to allow R users to write code that is easier to read, write, and maintain while providing a friendlier experience to new R users from other language backgrounds (such as ‘Python’) who are used to concepts such as x += 1 and ‘foo’ + ‘bar’.
Ropj Import Origin(R) Project Files
Read the data from Origin(R) project files (‘*.opj’) <https://…/Origin-File-Types>. For now, only spreadsheet objects are imported as data frames and no export is planned. More object types may be available to be imported later.
roprov Low-Level Support for Provenance Capture Between in-Memory R Objects
A suite of classes and methods which provide low-level support for modeling provenance between in-memory R objects. This is an infrastructure package and is not intended to be used directly by end-users.
roptim General Purpose Optimization in R using C++
Perform general purpose optimizations in R using C++. A unified wrapper interface is provided to call C functions of the five optimization algorithms (‘Nelder-Mead’, ‘BFGS’, ‘CG’, ‘L-BFGS-B’ and ‘SANN’) underlying optim().
ROptSpace Matrix Reconstruction from a Few Entries
Matrix reconstruction, also known as matrix completion, is the task of inferring missing entries of a partially observed matrix. This package provides a method called OptSpace, which was proposed by Keshavan, R.H., Oh, S., and Montanari, A. (2009) <doi:10.1109/ISIT.2009.5205567> for a case under low-rank assumption.
rosetta Parallel Use of Statistical Packages in Teaching
When teaching statistics, it can often be desirable to uncouple the content from specific software packages. To easy such efforts, the Rosetta Stats website (<https://rosettastats.com> ) allows comparing analyses in different packages. This package is the companion to the Rosetta Stats website, aiming to provide functions that produce output that is similar to output from other statistical packages, thereby facilitating ‘software-agnostic’ teaching of statistics.
rosetteApi Rosette API
Rosette is an API for multilingual text analysis and information extraction. More information can be found at https://developer.rosette.com.
rosm Plot Raster Map Tiles from Open Street Map and Other Sources
Download and plot Open Street Map <http://…/>, Mapquest <http://…/>, Bing Maps <http://…/maps> and other tiled map sources in a way that works seamlessly with plotting from the ‘sp’ package. Use to create high-resolution basemaps and add hillshade to vector based maps.
rospca Robust Sparse PCA using the ROSPCA Algorithm
Implementation of robust sparse PCA using the ROSPCA algorithm of Hubert et al. (2016) <DOI:10.1080/00401706.2015.1093962>.
rosqp Quadratic Programming Solver using the ‘OSQP’ Library
Provides bindings to the ‘OSQP’ solver, which can solve sparse convex quadratic programming problems with optional equality and inequality constraints.
rosr Create Reproducible Research Projects
Creates reproducible academic projects with integrated academic elements, including datasets, references, codes, images, manuscripts, dissertations, slides and so on. These elements are well connected so that they can be easily synchronized and updated.
rotasym Tests for Rotational Symmetry on the Hypersphere
Implementation of the tests for rotational symmetry on the hypersphere proposed in García-Portugués, Paindaveine and Verdebout (2019) <arXiv:1706.05030>. The package also implements the proposed distributions on the hypersphere, based on the tangent-normal decomposition, and allows for the replication of the data application considered in the paper.
rotationForest Fit and Deploy Rotation Forest Models
Fit and deploy rotation forest models (‘Rodriguez, J.J., Kuncheva, L.I., 2006. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619-1630’) for binary classification. Rotation forest is an ensemble method where each base classifier (tree) is fit on the principal components of the variables of random partitions of the feature set.
rotor Log Rotation and Conditional Backups
Conditionally rotate or back-up files based on their size or the date of the last backup; inspired by the ‘Linux’ utility ‘logrotate’.
roughrf Roughened Random Forests for Binary Classification
A set of functions to support Xiong K, ‘Roughened Random Forests for Binary Classification’ (2014). The functions include RRFA, RRFB, RRFC1-RRFC7, RRFD and RRFE. RRFB and RRFC6 are usually recommended. RRFB is much faster than RRFC6.
RoughSetKnowledgeReduction Simplification of Decision Tables using Rough Sets
Rough Sets were introduced by Zdzislaw Pawlak on his book “Rough Sets: Theoretical Aspects of Reasoning About Data”. Rough Sets provide a formal method to approximate crisp sets when the set-element belonging relationship is either known or undetermined. This enables the use of Rough Sets for reasoning about incomplete or contradictory knowledge. A decision table is a prescription of the decisions to make given some conditions. Such decision tables can be reduced without losing prescription ability. This package provides the classes and methods for knowledge reduction from decision tables as presented in the chapter 7 of the aforementioned book. This package provides functions for calculating the both the discernibility matrix and the essential parts of decision tables.
RoundAndRound Plot Objects Moving in Orbits
Visualize the objects in orbits in 2D and 3D. The packages is under developing to plot the orbits of objects in polar coordinate system. See the examples in demo.
Routliers Robust Outliers Detection
Detecting outliers using robust methods, i.e. the Median Absolute Deviation (MAD) for univariate outliers; Leys, Ley, Klein, Bernard, & Licata (2013) <doi:10.1016/j.jesp.2013.03.013> and the Mahalanobis-Minimum Covariance Determinant (MMCD) for multivariate outliers; Leys, C., Klein, O., Dominicy, Y. & Ley, C. (2018) <doi:10.1016/j.jesp.2017.09.011>. There is also the more known but less robust Mahalanobis distance method, only for comparison purposes.
routr A Simple Router for HTTP and WebSocket Requests
In order to make sure that web request ends up in the correct handler function a router is often used. ‘routr’ is a package implementing a simple but powerful routing functionality for R based servers. It is a fully functional ‘fiery’ plugin, but can also be used with other ‘httpuv’ based servers.
roxygen2 In-Source Documentation for R
A ‘Doxygen’-like in-source documentation system for Rd, collation, and ‘NAMESPACE’ files.
rPackedBar Packed Bar Charts with ‘plotly’
Packed bar charts are a variation of treemaps for visualizing skewed data. The concept was introduced by Xan Gregg at ‘JMP’.
rpart Recursive Partitioning and Regression Trees
Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
rpatrec Recognising Visual Charting Patterns in Time Series Data
Generating visual charting patterns and noise, smoothing to find a signal in noisy time series and enabling users to apply their findings to real life data.
rpca RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components
Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Candes, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis?. Journal of the ACM (JACM), 58(3), 11. prove that we can recover each component individually under some suitable assumptions. It is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This package implements this decomposition algorithm resulting with Robust PCA approach.
RPEnsemble Random Projection Ensemble Classification
Implements the methodology of ‘Cannings, T. I. and Samworth, R. J. (2015) Random projection ensemble classification’. The random projection ensemble classifier is a very general method for classification of high-dimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. The random projections are divided into non-overlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment.
http://…/1504.04595
Rperform R package for tracking R package-development metrics (time & memory) across git versions and branches.
Rperform is an R package that makes it easy for R package developers to track quantitative performance metrics of their code, over time. It focuses on providing changes brought over in the package’s performance metrics over subsequent development versions and across git branches, most importantly relating to time and memory.
RPEXE.RPEXT Reduced Piecewise Exponential Estimate/Test Software
This reduced piecewise exponential survival software implements the likelihood ratio test and backward elimination procedure in Han, Schell, and Kim (2012 <doi:10.1080/19466315.2012.698945>, 2014 <doi:10.1002/sim.5915>), and Han et al. (2016 <doi:10.1111/biom.12590>). Inputs to the program can be either times when events/censoring occur or the vectors of total time on test and the number of events. Outputs of the programs are times and the corresponding p-values in the backward elimination. Details about the model and implementation are given in Han et al. 2014. This program can run in R version 3.2.2 and above.
rpgm Fast Simulation of Normal Random Variables
Ziggurat method in order to simulate normal random variables approximately four times faster than the usual rnorm(), reference : MARSAGLIA, George, TSANG, Wai Wan, and al. (2000) <DOI:10.18637/jss.v005.i08>.
rpicosat R Bindings for the ‘PicoSAT’ SAT Solver
Bindings for the ‘PicoSAT’ solver to solve Boolean satisfiability problems (SAT). The boolean satisfiability problem asks the question if a given boolean formula can be TRUE; i.e. does there exist an assignment of TRUE/FALSE for each variable such that the whole formula is TRUE? The package bundles ‘PicoSAT’ solver release 965 <http://…/>.
Rpipedrive Pipedrive API’s’ Functions to Improvement and Integration’s Systems
R interaction with ‘pipedrive.com API’. All functions were created and documented according to <https://…/>. Created with the objective of offering integration and even the development of ‘APIs’. Making possible to create workflows and easily downloading databases for analysis.
rPithon rPithon
rPithon is a package which allows you to execute Python code from within R, passing R variables to Python, and reading Python results back into R. The functions are based on those from rPython, but the way it works is fundamentally different: in this package, an actual Python process is started and communication with it occurs over a so-called pipe. To exchange data, the rPithon package also makes use of RJSONIO to convert data structures to and from the JSON format.
rpivotTable Build Powerful Pivot Tables and Dynamically Slice & Dice your Data
Build powerful pivot tables (aka Pivot Grid, Pivot Chart, Cross-Tab) and dynamically slice & dice / drag ‘n’ drop your data. ‘rpivotTable’ is a wrapper of ‘pivottable’, a powerful open-source Pivot Table library implemented in ‘JavaScript’ by Nicolas Kruchten. Aligned to ‘pivottable’ v1.6.3.
rpms Recursive Partitioning for Modeling Survey Data
Fits a linear model to survey data in each node obtained by recursively partitioning the data. The splitting variables and splits selected are obtained using a procedure which adjusts for complex sample design features used to obtain the data. Likewise the model fitting algorithm produces design-consistent coefficients to the least squares linear model between the dependent and independent variables. The first stage of the design is accounted for in the provided variance estimates. The main function returns the resulting binary tree with the linear model fit at every endnode as an R object of class ‘rpms’. The package provides a number of functions and methods for this rpms class.
rpnf Point and Figure Package
A set of functions to analyze and print the development of a commodity using the Point and Figure (P&F) approach. A P&F processor can be used to calculate daily statistics for the time series. These statistics can be used for deeper investigations as well as to create plots. Plots can be generated as well known X/O Plots in plain TXT format, and additionally in a more graphical format.
Rpolyhedra Polyhedra Database
A polyhedra database scraped from various sources as R6 objects and ‘rgl’ visualizing capabilities.
RPostgres Rcpp’ Interface to ‘PostgreSQL’
Fully ‘DBI’-compliant ‘Rcpp’-backed interface to ‘PostgreSQL’ <https://…/>, an open-source relational database.
rPowerSampleSize Sample Size Computations Controlling the Type-II Generalized Family-Wise Error Rate
The significance of mean difference tests in clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. This package enables one to compute necessary sample sizes for single-step (Bonferroni) and step-wise procedures (Holm and Hochberg). These three procedures control the q-generalized family-wise error rate (probability of making at least q false rejections). Sample size is computed (for these single-step and step-wise procedures) in a such a way that the r-power (probability of rejecting at least r false null hypotheses, i.e. at least r significant endpoints among m) is above some given threshold, in the context of tests of difference of means for two groups of continuous endpoints (variables). Various types of structure of correlation are considered. It is also possible to analyse data (i.e., actually test difference in means) when these are available. The case r equals 1 is treated in separate functions that were used in Lafaye de Micheaux et al. (2014) <doi:10.1080/10543406.2013.860156>.
rpql Regularized PQL for Joint Selection in GLMMs
Performs joint selection in Generalized Linear Mixed Models (GLMMs) using penalized likelihood methods. Specifically, the Penalized Quasi-Likelihood (PQL) is used as a loss function, and penalties are then ‘added on’ to perform simultaneous fixed and random effects selection. Regularized PQL avoids the need for integration (or approximations such as the Laplace’s method) during the estimation process, and so the full solution path for model selection can be constructed relatively quickly.
rprojroot Finding Files in Project Subdirectories
Robust, reliable and flexible paths to files below a project root. The ‘root’ of a project is defined as a directory that matches a certain criterion, e.g., it contains a certain regular file.
RProtoBuf R Interface to the Protocol Buffers API
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats. Additional documentation is available in the arXiv.org preprint ‘RProtoBuf: Efficient Cross-Language Data Serialization in R’ by Eddelbuettel, Stokely, and Ooms (2014) at <http://…/1401.7372>.
RPS Resistant Procrustes Superimposition
Based on RPS tools, a rather complete resistant shape analysis of 2D and 3D datasets based on landmarks can be performed. In addition, landmark-based resistant shape analysis of individual asymmetry in 2D for matching or object symmetric structures is also possible.
rpsftm Rank Preserving Structural Failure Time Models
Implements methods described by the paper Robins and Tsiatis (1991) <DOI:10.1080/03610929108830654>. These use g-estimation to estimate the causal effect of a treatment in a two-armed randomised control trial where non-compliance exists and is measured, under an assumption of an accelerated failure time model and no unmeasured confounders.
rpst Recursive Partitioning Survival Trees
An implementation of Recursive Partitioning Survival Trees via a node-splitting rule that builds decision tree models that reflected within-node and within-treatment responses. The algorithm aims to find the maximal difference in survival time among different treatments.
RPtests Goodness of Fit Tests for High-Dimensional Linear Regression Models
Performs goodness of fits tests for both high and low-dimensional linear models. It can test for a variety of model misspecifications including nonlinearity and heteroscedasticity. In addition one can test the significance of potentially large groups of variables, and also produce p-values for the significance of individual variables in high-dimensional linear regression.
rptR Repeatability Estimation for Gaussian and Non-Gaussian Data
R functions for estimating repeatability (intra-class correlation) from gaussian, binary, proportion and count data.
RPUD GPU Computing with R
RPUD is a open source R package for performing statistical computation using CUDA. You are free to use and distribute it under the GPL v3 license.
RPUDPLUS is an extension of RPUD providing additional GPU accelerated functions including Bayesian statistics and SVM learning.
RPUSVM is a standalone terminal tool for SVM training and prediction with GPUs. It is free by request upon purchase of an rpudplus license.
RPushbullet R Interface to the Pushbullet Messaging Service
An R interface to the Pushbullet messaging service which provides fast and efficient notifications (and file transfer) between computers, phones and tablets. An account has to be registered at the site http://www.pushbullet.com site to obtain a (free) API key.
Rpyplot R interface to matplotlib
R interface to matplotlib via Rcpp using Python 2.7 or 3. Contains basic working interface to some basic with few options. Tested with Ubuntu 14.10 (System Python 2.7, 3.4) and Windows 7 (Anaconda Python 2.7, 3.4). Why? I often use Python and matplotlib for exploring measurement data (from e.g. accelerometers), even if I use R for the actual analysis. The reason is that I like to be able to flexibly zoom into different parts of the plot using the mouse and this works well for me with matplotlib. So I decided to try to call matplotlib from R using Rcpp and Python/C API. It was surprisingly simple to get it working so I put together this package.
rPython Package allowing R to call Python
This package permits calls to Python from R
Calling Python from R with rPython
rqdatatable rquery’ for ‘data.table’
Implement the ‘rquery’ piped query algebra using ‘data.table’. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.
RQEntangle Quantum Entanglement of Bipartite System
It computes the Schmidt decomposition of bipartite quantum systems, discrete or continuous, and their respective entanglement metrics. See Artur Ekert, Peter L. Knight (1995) <doi:10.1119/1.17904> for more details.
RQuantLib R Interface to the ‘QuantLib’ Library
The ‘RQuantLib’ package makes parts of ‘QuantLib’ accessible from R The ‘QuantLib’ project aims to provide a comprehensive software framework for quantitative finance. The goal is to provide a standard open source library for quantitative analysis, modeling, trading, and risk management of financial assets.
rquery Relational Query Generator for Data Manipulation
A query generator based on Edgar F. Codd’s relational algebra and operator names (plus experience using ‘SQL’ at big data scale). The design represents an attempt to make ‘SQL’ more teachable by denoting composition a sequential pipeline notation instead of nested queries or functions. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized ‘SQL’ generation as an explicit user visible modeling step, and convenience methods for applying query trees to in-memory data.frames.
rr Statistical Methods for the Randomized Response Technique
Enables researchers to conduct multivariate statistical analyses of survey data with randomized response technique items from several designs, including mirrored question, forced question, and unrelated question. This includes regression with the randomized response as the outcome and logistic regression with the randomized response item as a predictor. In addition, tools for conducting power analysis for designing randomized response items are included. The package implements methods described in Blair, Imai, and Zhou (2015) ‘Design and Analysis of the Randomized Response Technique,’ Working paper available at http://…/randresp.pdf.
rr2 R2s for Regression Models
Three methods to calculate R2 for models with correlated errors, including Phylogenetic GLS, Phylogenetic Logistic Regression, Linear Mixed Models (LMMs), and Generalized Linear Mixed Models (GLMMs). See details in Ives 2018 <doi:10.1093/sysbio/syy060>.
rRAP Real-Time Adaptive Penalization for Streaming Lasso Models
An implementation of the Real-time Adaptive Penalization (RAP) algorithm through which to iteratively update a regularization parameter in a streaming context.
Rraven Connecting R and ‘Raven’ Sound Analysis Software
A tool to exchange data between R and ‘Raven’ sound analysis software <http://…/RavenOverview.html> (Cornell Lab of Ornithology). Functions work on data formats compatible with the R package ‘warbleR’.
rrcov3way Robust Methods for Multiway Data Analysis, Applicable also for Compositional Data
Provides methods for multiway data analysis by means of Parafac and Tucker 3 models. Robust versions (Engelen and Hubert (2011) <doi:10.1016/j.aca.2011.04.043>) and versions for compositional data are also provided (Gallo (2015) <doi:10.1080/03610926.2013.798664>, Di Palma et al. (in press)).
rrd Import Data from a RRD (Round Robin Database) File
Makes it easy to import the data from a ‘RRD’ database (<https://…/> ) directly into R data structures. The resulting objects are ‘tibble’ objects or a list of ‘tibble’ objects, making it easy to manipulate the data.
rrecsys Environment for Assessing Recommender Systems
Provides implementations of several popular recommendation systems. They can process standard recommendation datasets (user/item matrix) as input and generate rating predictions and recommendation lists. Standard algorithm implementations included in this package are: Global/Item/User-Average baselines, Item-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology for recommender systems using measures such as MAE, RMSE, Precision, Recall, AUC, NDCG, RankScore and coverage measures. The package is intended for rapid prototyping of recommendation algorithms and education purposes.
rrefine R Client for OpenRefine API
OpenRefine’ (formerly ‘Google Refine’) is a popular, open source data cleaning software. This package enables users to programmatically trigger data transfer between R and ‘OpenRefine’. Available functionality includes project import, export and deletion.
RRegrs Searching the best regression model using R (correlation filter, data scaling, best regression model, etc.)
The current tool is a collection of regression tools from R that could be used to search the best regression models for any dataset. The initial use of the script is aimed at finding QSAR models for chemoinformatics / nanotoxicology. The full R script will contain: Loading dataset, Filter dataset, Scaling dataset, Feature selection, Regression models, Summary with top models, Statistics of the best model, etc. The script will be modular in order to create flexible APIs.
rrepast Invoke ‘Repast Simphony’ Simulation Models
An R and Repast integration tool for running individual-based (IbM) simulation models developed using Repast Simphony Agent-Based framework directly from R code. This package integrates Repast Simphony models within R environment, making easier the tasks of running and analyzing model output data for automated parameter calibration and for carrying out uncertainty and sensitivity analysis using the power of R environment.
rriskDistributions Fitting Distributions to Given Data or Known Quantiles
Collection of functions for fitting distributions to given data or by known quantiles. Two main functions fit.perc() and fit.cont() provide users a GUI that allows to choose a most appropriate distribution without any knowledge of the R syntax. Note, this package is a part of the ‘rrisk’ project.
rrpack Reduced-Rank Regression
Multivariate regression methodologies including reduced-rank regression (RRR), reduced-rank ridge regression (RRS), robust reduced-rank regression (R4), generalized/mixed-response reduced-rank regression (mRRR), row-sparse reduced-rank regression (SRRR), reduced-rank regression with a sparse singular value decomposition (RSSVD), and sparse and orthogonal factor regression (SOFAR).
RRPP Linear Model Evaluation with Randomized Residuals in a Permutation Procedure
Linear model calculations are made for many random versions of data. Using residual randomization in a permutation procedure, sums of squares are calculated over many permutations to generate empirical probability distributions for evaluating model effects. This method is described by Collyer, Sekora, & Adams (2015) <doi:10.1038/hdy.2014.75>.
rrr Reduced-Rank Regression
Reduced-rank regression, diagnostics and graphics.
rrscale Robust Re-Scaling to Better Recover Latent Effects in Data
Non-linear transformations of data to better discover latent effects. Applies a sequence of three transformations (1) a Gaussianizing transformation, (2) a Z-score transformation, and (3) an outlier removal transformation.
rrtable Reproducible Research with a Table of R Codes
Makes documents containing plots and tables from a table of R codes. Can make ‘HTML’, ‘pdf(‘LaTex’)’, ‘docx(‘MS Word’)’ and ‘pptx(‘MS powerpoint’)’ documents with or without R code. In the package, modularized ‘shiny’ app codes are provided. These modules are intended for reuse across applications.
rrum Bayesian Estimation of the Reduced Reparameterized Unified Model with Gibbs Sampling
Implementation of Gibbs sampling algorithm for Bayesian Estimation of the Reduced Reparameterized Unified Model (‘rrum’), described by Culpepper and Hudson (2017) <doi: 10.1177/0146621617707511>.
RSAgeo Resampling-Based Analysis of Geostatistical Data
RSAgeo performs parameter estimation for geostatistical data using a resampling-based stochastic approximation (RSA) method.
rsam RStudio’ Addin Manager
Toggle ‘RStudio’ addins on and off to hide in the IDE dropdown list and set/remove keyboard shortcuts for installed addins.
RSarules Random Sampling Association Rules from a Transaction Dataset
Implements the Gibbs sampling algorithm to randomly sample association rules with one pre-chosen item as the consequent from a transaction dataset. The Gibbs sampling algorithm was proposed in G. Qian, C.R. Rao, X. Sun and Y. Wu (2016) <DOI:10.1073/pnas.1604553113>.
RSCABS Rao-Scott Cochran-Armitage by Slices Trend Test
Performs the Rao-Scott Cochran-Armitage by Slices trend test (RSCABS) used in analysis of histopathological endpoints. It has functions for both command line operations along with a built in GUI.
rscala Bi-Directional Interface Between R and Scala with Callbacks
The Scala interpreter is embedded in R and callbacks to R from the embedded interpreter are supported. Conversely, the R interpreter is embedded in Scala. Scala versions 2.10 and 2.11 are supported.
RSCAT Shadow-Test Approach to Computerized Adaptive Testing
As an advanced approach to computerized adaptive testing (CAT), shadow testing (van der Linden(2005) <doi:10.1007/0-387-29054-0>) dynamically assembles entire shadow tests as a part of selecting items throughout the testing process. Selecting items from shadow tests guarantees the compliance of all content constraints defined by the blueprint. ‘RSCAT’ is an R package for the shadow-test approach to CAT. The objective of ‘RSCAT’ is twofold: 1) Enhancing the effectiveness of shadow-test CAT simulation; 2) Contributing to the academic and scientific community for CAT research.
rscimark SciMark 2.0 Benchmark for Scientific and Numerical Computing
The SciMark 2.0 benchmark was originally developed in Java as a benchmark for numerical and scientific computational performance. It measures the performance of several computational kernels which are frequently occurring in scientific applications. This package is a simple wrapper around the ANSI C implementation of the benchmark.
Rsconctdply Deploys Multiple ‘Shiny’ Apps using Configuration File
Provides a tool for mass deployment of shiny apps to ‘RStudio Connect’ or ‘Shiny Server’. Multiple user accounts and servers can be configured for deployment.
rsconnect Deployment Interface for R Markdown Documents and Shiny Applications
Programmatic deployment interface for ‘RPubs’, ‘shinyapps.io’, and ‘RStudio Connect’. Supported content types include R Markdown documents, Shiny applications, plots, and static web content.
RSDA R to Symbolic Data Analysis
Symbolic Data Analysis (SDA) was proposed by professor Edwin Diday in 1987, the main purpose of SDA is to substitute the set of rows (cases) in the data table for a concept (second order statistical unit). This package implements, to the symbolic case, certain techniques of automatic classification, as well as some linear models.
rsdmx Tools for Reading SDMX Data and Metadata
Set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework, currently focusing on the SDMX XML standard format (SDMX-ML).
rsed Stream Editing in R
Tools for stream editing: manipulating text files with insertions, replacements, deletions, substitutions, and commenting.
RSelenium R Bindings for ‘Selenium WebDriver’
Provides a set of R bindings for the ‘Selenium 2.0 WebDriver’ (see <https://…/wd.html> for more information) using the ‘JsonWireProtocol’ (see <https://…/JsonWireProtocol> for more information). ‘Selenium 2.0 WebDriver’ allows driving a web browser natively as a user would either locally or on a remote machine using the Selenium server it marks a leap forward in terms of web browser automation. Selenium automates web browsers (commonly referred to as browsers). Using RSelenium you can automate browsers locally or remotely.
RSentiment Analyse Sentiment of English Sentences
Analyses sentiment of a sentence in English and assigns score to it. It can classify sentences to the following categories of sentiments:- Positive, Negative, very Positive, very negative, Neutral or Sarcasm. For a vector of sentences, it counts the number of sentences in each category of sentiment.In calculating the score, negation and various degrees of adjectives are taken into consideration. It deals only with English sentences.
rsggm Robust Sparse Gaussian Graphical Modeling via the Gamma-Divergence
Robust estimation of sparse inverse covariance matrix via the gamma-divergence.
rsimsum Analysis of Simulation Studies Including Monte Carlo Error
Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the ‘simsum’ user-written command in ‘Stata’ (See White I.R., 2010 <http://…/article.html?article=st0200> ).
RSIP Remote Sensing and Image Processing
Makes operations with raster images, such as map viewing in time series, export values in time series for specific, total or limited within a polygon locations. Makes data processing of remote sensing of climatic variables distributed in the space (maps 2D) and the time (time series).
rslp A Stemming Algorithm for the Portuguese Language
Implements the ‘Stemming Algorithm for the Portuguese Language’ <DOI:10.1109/SPIRE.2001.10024>.
rslurm Submit R Calculations to a ‘SLURM’ Cluster
Functions that simplify the R interface the ‘SLURM’ cluster workload manager, and automate the process of dividing a parallel calculation across cluster nodes.
RSmartlyIO Loading Facebook and Instagram Advertising Data from Smartly.io
Aims at loading Facebook and Instagram advertising data from Smartly.io into R. Smartly.io is an online advertising service that enables advertisers to display commercial ads on social media networks. The package offers an interface to query the Smartly.io API and loads data directly into R for further data processing and data analysis.
Rsmlx R Speaks ‘Monolix’
Provides methods for model building and model evaluation of mixed effects models using ‘Monolix’ <http://monolix.lixoft.com>. ‘Monolix’ is a software tool for nonlinear mixed effects modeling that must have been installed in order to use ‘Rsmlx’. Among other tasks, ‘Rsmlx’ performs statistical tests for model assessment, bootstrap simulation and likelihood profiling for computing confidence intervals. ‘Rsmlx’ also proposes several automatic covariate search methods for mixed effects models.
rsolr R to Solr Interface
A comprehensive R API for querying Apache Solr databases. A Solr core is represented as a data frame or list that supports Solr-side filtering, sorting, transformation and aggregation, all through the familiar base R API. Queries are processed lazily, i.e., a query is only sent to the database when the data are required.
rSPARCS Data Management for the SPARCS
To clean and analyze the data from the Statewide Planning and Research Cooperative System (SPARCS), and generate sets for statistical modeling. Additionally, other data with similar format or study objectives can also be handled.
rsparse Statistical Learning on Sparse Matrices
Implements many algorithms for statistical learning on sparse matrices – matrix factorizations, matrix completion, elastic net regressions, factorization machines. Also ‘rsparse’ enhances ‘Matrix’ package by providing methods for multithreaded <sparse, dense> matrix products and native slicing of the sparse matrices in Compressed Sparse Row (CSR) format. List of the algorithms for regression problems: 1) Elastic Net regression via Follow The Proximally-Regularized Leader (FTRL) Stochastic Gradient Descent (SGD), as per McMahan et al(, <doi:10.1145/2487575.2488200>) 2) Factorization Machines via SGD, as per Rendle (2010, <doi:10.1109/ICDM.2010.127>) List of algorithms for matrix factorization and matrix completion: 1) Weighted Regularized Matrix Factorization (WRMF) via Alternating Least Squares (ALS) – paper by Hu, Koren, Volinsky (2008, <doi:10.1109/ICDM.2008.22>) 2) Maximum-Margin Matrix Factorization via ALS, paper by Rennie, Srebro (2005, <doi:10.1145/1102351.1102441>) 3) Fast Truncated Singular Value Decomposition (SVD), Soft-Thresholded SVD, Soft-Impute matrix completion via ALS – paper by Hastie, Mazumder et al. (2014, <arXiv:1410.2596>) 4) Linear-Flow matrix factorization, from ‘Practical linear models for large-scale one-class collaborative filtering’ by Sedhain, Bui, Kawale et al (2016, ISBN:978-1-57735-770-4) 5) GlobalVectors (GloVe) matrix factorization via SGD, paper by Pennington, Socher, Manning (2014, <https://…/D14-1162> ) Package is reasonably fast and memory efficient – it allows to work with large datasets – millions of rows and millions of columns. This is particularly useful for practitioners working on recommender systems.
Rspc Nelson Rules for Control Charts
Implementation of Nelson rules for control charts in ‘R’. The ‘Rspc’ implements some Statistical Process Control methods, namely Levey-Jennings type of I (individuals) chart, Shewhart C (count) chart and Nelson rules (as described in Montgomery, D. C. (2013) Introduction to statistical quality control. Hoboken, NJ: Wiley.). Typical workflow is taking the time series, specify the control limits, and list of Nelson rules you want to evaluate. There are several options how to modify the rules (one sided limits, numerical parameters of rules, etc.). Package is also capable of calculating the control limits from the data (so far only for i-chart and c-chart are implemented).
RSpectra Solvers for Large Scale Eigenvalue and SVD Problems
R interface to the ‘Spectra’ library <http://…/> for large scale eigenvalue and SVD problems. It is typically used to compute a few eigenvalues/vectors of an n by n matrix, e.g., the k largest eigenvalues, which is usually more efficient than eigen() if k << n. This package provides the ‘eigs()’ function which does the similar job as in ‘Matlab’, ‘Octave’, ‘Python SciPy’ and ‘Julia’. It also provides the ‘svds()’ function to calculate the largest k singular values and corresponding singular vectors of a real matrix. Matrices can be given in either dense or sparse form.
Rspotify Access to Spotify API
Provides an interface to the Spotify API <https://…/>.
rsppfp R’s Shortest Path Problem with Forbidden Subpaths
An implementation of functionalities to transform directed graphs that are bound to a set of known forbidden paths. There are several transformations, following the rules provided by Villeneuve and Desaulniers (2005) <doi: 10.1016/j.ejor.2004.01.032>, and Hsu et al. (2009) <doi: 10.1007/978-3-642-03095-6_60>. The resulting graph is generated in a data-frame format. See rsppfp website for more information, documentation an examples.
rsq Coefficient of Determination
Calculates a newly defined coefficient of determination, aka R^2, and coefficient of partial determination, aka partial R^2 by Zhang (2016) for generalized linear models (including quasi models with well defined variance functions). It also avoids overstatement of variation proportion explained by the model or a group of covariates.
RSQLite SQLite Interface for R
This package embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine (version 3.8.6) is included.
RSQLServer SQL Server R Database Interface (DBI) and ‘dplyr’ SQL Backend
Utilises The ‘jTDS’ project’s ‘JDBC’ 3.0 ‘SQL Server’ driver to extend ‘DBI’ classes and methods. The package also implements a ‘SQL’ backend to the ‘dplyr’ package.
rsqlserver Sql Server driver database interface (DBI) driver for R
Sql Server driver database interface (DBI) driver for R. This is a DBI-compliant Sql Server driver based on the .NET Framework Data Provider for SQL Server (SqlClient) System.Data.SqlClient. The .NET Framework Data Provider for SQL Server (SqlClient) uses its own protocol to communicate with SQL Server. It is lightweight and performs well because it is optimized to access a SQL Server directly without adding an OLE DB or Open Database Connectivity (ODBC) layer.
Rssa A Collection of Methods for Singular Spectrum Analysis
Methods and tools for Singular Spectrum Analysis including decomposition, forecasting and gap-filling for univariate and multivariate time series.
RSSampling Ranked Set Sampling
Ranked set sampling (RSS) is introduced as an advanced method for data collection which is substantial for the statistical and methodological analysis in scientific studies by McIntyre (1952) (reprinted in 2005) <doi:10.1198/000313005X54180>. This package introduces the first package that implements the RSS and its modified versions for sampling. With ‘RSSampling’, the researchers can sample with basic RSS and the modified versions, namely, Median RSS, Extreme RSS, Percentile RSS, Balanced groups RSS, Double RSS, L-RSS, Truncation-based RSS, Robust extreme RSS. The ‘RSSampling’ also allows imperfect ranking using an auxiliary variable (concomitant) which is widely used in the real life applications. Applicants can also use this package for parametric and nonparametric inference such as mean, median and variance estimation, regression analysis and some distribution-free tests where the the samples are obtained via basic RSS.
RSSL Implementations of Semi-Supervised Learning Approaches for Classification
A collection of implementations of semi-supervised classifiers and methods to evaluate their performance. The package includes implementations of, among others, Implicitly Constrained Learning, Moment Constrained Learning, the Transductive SVM, Manifold regularization, Maximum Contrastive Pessimistic Likelihood estimation, S4VM and WellSVM.
RSSOP Simulation of Supply Reservoir Systems using Standard Operation Policy
Reservoir Systems Standard Operation Policy. A system for simulation of supply reservoirs. It proposes functionalities for plotting and evaluation of supply reservoirs systems.
rstack Stack Data Type as an ‘R6’ Class
An extremely simple stack data type, implemented with ‘R6’ classes. The size of the stack increases as needed, and the amortized time complexity is O(1). The stack may contain arbitrary objects.
rstan R Interface to Stan
User-facing R functions are provided by this package to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the ‘StanHeaders’ package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo and (optionally penalized) maximum likelihood estimation via optimization. In both cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.
rstanarm Bayesian Applied Regression Modeling via Stan
Estimates pre-compiled regression models using the ‘rstan’ package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.
rstanmulticore A cross-platform R package to run RStan in parallel http://www.stat.cmu.edu/~nmv/
A cross-platform (Windows, Linux, and Mac) R package to parallelize RStan MCMC chains across multiple cores. The syntax is very simple: replace calls to stan(…) with pstan(…).
rstansim Simulation Studies with Stan
Provides a set of functions to facilitate and ease the running of simulation studies of Bayesian models using ‘stan’. Provides functionality to simulate data, fit models, and manage simulation results.
rstantools Tools for Developing R Packages Interfacing with ‘Stan’
Provides various tools for developers of R packages interfacing with ‘Stan’ <http://mc-stan.org>, including functions to set up the required package structure, S3 generics and default methods to unify function naming across ‘Stan’-based R packages, and a vignette with recommendations for developers.
rstap Spatial Temporal Aggregated Predictor Models via ‘stan’
Estimates previously compiled stap regression models using the ‘rstan’ package. Users specify models via a custom R syntax with a formula and data.frame plus additional arguments for priors.
RStata A Bit of Glue Between R and Stata
A simple R -> Stata interface allowing the user to execute Stata commands (both inline and from a .do file) from R.
rstatix Pipe-Friendly Framework for Basic Statistical Tests
Provides a simple and intuitive pipe-friendly framework, coherent with the ‘tidyverse’ design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs. It’s also possible to compute several effect size metrics, including ‘eta squared’ for ANOVA, ‘Cohen’s d’ for t-test and ‘Cramer’s V’ for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.
rstpm2 Flexible Link-Based Survival Models
R implementation of Stata’s stpm2 function (flexible link-based survival models), with extensions to different smoothers and penalised models.
rsurface Design of Rotatable Central Composite Experiments and Response Surface Analysis
Produces tables with the level of replication (number of replicates) and the experimental uncoded values of the quantitative factors to be used for rotatable Central Composite Design (CCD) experimentation and a 2-D contour plot of the corresponding variance of the predicted response according to Mead et al. (2012) <doi:10.1017/CBO9781139020879> design_ccd(), and analyzes CCD data with response surface methodology ccd_analysis(). A rotatable CCD provides values of the variance of the predicted response that are concentrically distributed around the average treatment combination used in the experimentation, which with uniform precision (implied by the use of several replicates at the average treatment combination) improves greatly the search and finding of an optimum response. These properties of a rotatable CCD represent undeniable advantages over the classical factorial design, as discussed by Panneton et al. (1999) <doi:10.13031/2013.13267> and Mead et al. (2012) <doi:10.1017/CBO9781139020879.018> among others.
RSurvey Geographic Information System Application
A geographic information system (GIS) graphical user interface (GUI) that provides data viewing, management, and analysis tools.
rsvd Randomized Singular Value Decomposition
Randomized singular value decomposition (rsvd) is a very fast probabilistic algorithm to compute an approximated low-rank singular value decomposition of large data sets with high accuracy. SVD plays a central role in data analysis and scientific computing. SVD is also widely used for computing principal component analysis (PCA), a linear dimensionality reduction technique. Randomized PCA (rpca) is using the approximated singular value decomposition to compute the most significant principal components. In addition several plot functions are provided.
rsvg Render SVG Images into High-Quality Bitmap Arrays
Renders vector-based ‘svg’ images into high-quality custom-size bitmap arrays using ‘librsvg’. The resulting bitmap can be written to e.g. ‘png’, ‘jpeg’ or ‘webp’ format.
rsyslog Interface to the ‘syslog’ System Logger
Functions to write messages to the ‘syslog’ system logger API, available on all ‘POSIX’-compatible operating systems. Features include tagging messages with a priority level and application type, as well as masking (hiding) messages below a given priority level.
rt.test Robustified t-Test
Performs one-sample t-test based on robustified statistics using median/MAD (TA) and Hodges-Lehmann/Shamos (TB). For more details, see Park and Wang (2018)<arXiv:1807.02215>. This work was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. NRF-2017R1A2B4004169).
rtable Tabular Reporting Functions
Provides tabular reporting functionalities to work with ‘ReporteRs’ package: ‘as.FlexTable’ methods are available for ‘ftable’ and ‘xtable’ objects, function ‘FlexPivot’ is producing a pivot table and ‘freqtable’ a percentage table, a ‘knitr’ print method and a ‘shiny’ render function are provided for ‘FlexTable’ objects.
Rtauchen Discretization of AR(1) Processes
Discretize AR(1) process following Tauchen (1986) <http://…/0165176586901680>. A discrete Markov chain that approximates in the sense of weak convergence a continuous-valued univariate Autoregressive process of first order is generated. It is a popular method used in economics and in finance.
RTaxometrics Taxometric Analysis
We provide functions to perform taxometric analyses. This package contains 52 functions, but only 5 should be called directly by users. CheckData() should be run prior to any taxometric analysis to ensure that the data are appropriate for taxometric analysis. RunTaxometrics() performs taxometric analyses for a sample of data. RunCCFIProfile() performs a series of taxometric analyses to generate a CCFI profile. CreateData() generates a sample of categorical or dimensional data. ClassifyCases() assigns cases to groups using the base-rate classification method.
RTD Simple TD API Client
Upload R data.frame to Arm Treasure Data, see <https://…/>. You can execute database or table handling for resources on Arm Treasure Data.
rTensor Tools for Tensor Analysis and Decomposition
A set of tools for creation, manipulation, and modeling of tensors with arbitrary number of modes. A tensor in the context of data analysis is a multidimensional array. rTensor does this by providing a S4 class ‘Tensor’ that wraps around the base ‘array’ class. rTensor provides common tensor operations as methods, including matrix unfolding, summing/averaging across modes, calculating the Frobenius norm, and taking the inner product between two tensors. Familiar array operations are overloaded, such as index subsetting via ‘[‘ and element-wise operations. rTensor also implements various tensor decomposition, including CP, GLRAM, MPCA, PVD, and Tucker. For tensors with 3 modes, rTensor also implements transpose, t-product, and t-SVD, as defined in Kilmer et al. (2013). Some auxiliary functions include the Khatri-Rao product, Kronecker product, and the Hamadard product for a list of matrices.
RTest A XML-Based Testing Framework for Automated Component Tests of R Packages
This provides a framework for R packages developed for a regulatory environment. It is based on the ‘testthat’ unit testing system and provides the adapter functionalities for XML-based test case definition as well as for standardized reporting of the test results.
rtext R6 Objects for Text and Data
For natural language processing and analysis of qualitative text coding structures which provide a way to bind together text and text data are fundamental. The package provides such a structure and accompanying methods in form of R6 objects. The ‘rtext’ class allows for text handling and text coding (character or regex based) including data updates on text transformations as well as aggregation on various levels. Furthermore, the usage of R6 enables inheritance and passing by reference which should enable ‘rtext’ instances to be used as back-end for R based graphical text editors or text coding GUIs.
rticles Article Formats for R Markdown
A suite of custom R Markdown formats and templates for authoring journal articles and conference submissions.
rtika R Interface to ‘Apache Tika’
Extract text or metadata from over a thousand file types, using Apache Tika <https://…/>. Get either plain text or structured XHTML content.
rtk Rarefaction Tool Kit
Allows normalization of very large abundance tables, from metagenomics applications or other sources, to an equal depth. Additionally diversity measures, such as richness, Shannon diversity, chao1, chao2 and others are calculated on the fly. The results may be plotted using the provided plotting functions.
rtkore STK++ Core Library Integration to R using Rcpp
STK++ (http://www.stkpp.org ) is a collection of C++ classes for statistics, clustering, linear algebra, arrays (with an Eigen-like API), regression, dimension reduction, etc. The integration of the library to R is using Rcpp. The rtkore package includes the header files from the STK++ core library. All files contain only templated classes or inlined functions. STK++ is licensed under the GNU LGPL version 2 or later. rtkore (the stkpp integration into R) is licensed under the GNU GPL version 2 or later. See file LICENSE.note for details.
Rtnmin Truncated Newton Function Minimization with Bounds Constraints
Truncated Newton function minimization with bounds constraints based on the ‘Matlab’/’Octave’ codes of Stephen Nash.
RTransferEntropy Measuring Information Flow Between Time Series with Shannon and Renyi Transfer Entropy
Measuring information flow between time series with Shannon and Rényi transfer entropy. See also Dimpfl and Peter (2013) <doi:10.1515/snde-2012-0044> and Dimpfl and Peter (2014) <doi:10.1016/j.intfin.2014.03.004> for theory and applications to financial time series. Additional references can be found in the theory part of the vignette.
RTransProb Analyze and Forecast Credit Migrations
A set of functions used to automate commonly used methods in credit risk to estimate migration (transition) matrices. The package includes multiple methods for bootstrapping default rates and forecasting/stress testing credit exposures migrations, via Econometric and Machine Learning approaches. More information can be found at <https://analyticsrusers.blog>.
rtrends Analyze Download Logs from the CRAN RStudio Mirror
Analyze download logs from the CRAN RStudio mirror (<http://…/> ). This CRAN mirror is the default one used in RStudio. The available data is the result of parsed and anonymised raw log data from that CRAN mirror.
rtrie A Simple R-Based Implementation of a Trie (A.k.a. Digital Tree/Radix Tree/Prefix Tree)
A simple R-based implementation of a Trie (a.k.a. digital tree/radix tree/prefix tree) A trie, also called digital tree and sometimes radix tree or prefix tree is a kind of search tree. This ordered tree data structure is used to store a dynamic set or associative array where the keys are usually strings.
rtrim Trends and Indices for Monitoring Data
The TRIM model is widely used for estimating growth and decline of animal populations based on (possibly sparsely available) count data. The current package is a reimplementation of the original TRIM software developed at Statistics Netherlands by Jeroen Pannekoek. See <https://…/indices-and-trends–trim–> for more information about TRIM.
rTRNG Advanced and Parallel Random Number Generation via ‘TRNG’
Embeds sources and headers from Tina’s Random Number Generator (‘TRNG’) C++ library. Exposes some functionality for easier access, testing and benchmarking into R. Provides examples of how to use parallel RNG with ‘RcppParallel’. The methods and techniques behind ‘TRNG’ are illustrated in the package vignettes and examples. Full documentation is available in Bauke (2018) <https://…/trng.pdf>.
rtsdata R Time Series Intelligent Data Storage
A tool that allows to download and save historical time series data for future use offline. The intelligent updating functionality will only download the new available information; thus, saving you time and Internet bandwidth. It will only re-download the full data-set if any inconsistencies are detected. This package supports following data provides: ‘Yahoo’ (<https://finance.yahoo.com> ), ‘FRED’ (<https://fred.stlouisfed.org> ), ‘Quandl’ (<https://www.quandl.com> ), ‘AlphaVantage’ (<https://www.alphavantage.co> ), ‘Tiingo’ (<https://www.tiingo.com> ).
rtson Typed JSON
TSON, short for Typed JSON, is a binary-encoded serialization of JSON like document that support JavaScript typed data (https://…/TSON ).
rtsplot Time Series Plot
A fast and elegant time series visualization package. In addition to the standard R plot types, this package supports candle sticks, open-high-low-close, and volume plots. Useful for visualizing any time series data, e.g., stock prices and technical indicators.
RTutor Creating R exercises with automatic assement of student’s solutions
RTutor is an R package that allows to develop interactive R exercises. Problem sets can be solved off-line or can hosted in the web via shiny server. Problem sets can be designed as a Markdown .rmd file (to be solved directly in RStudio) or use a browser-based interface powered by RStudio’s Shiny.
RtutoR Tutorial App for Learning R
Contains functions for launching Tutorial Apps, which covers different aspects of the R programming language. This first version includes a “R Basics” app, which provides a set of most commonly performed data manipulation tasks in R. The app structures the contents into different topics and provides an interactive & dynamic interface for navigation.
Rtwalk The R Implementation of the ‘t-walk’ MCMC Algorithm
The ‘t-walk’ is a general-purpose MCMC sampler for arbitrary continuous distributions that requires no tuning.
rtweet Collecting Twitter Data
An implementation of calls designed to extract and organize Twitter data via Twitter’s REST and stream API’s. Functions formulate GET and POST requests and convert response objects to more user friendly structures, e.g., data frames or lists. Specific consideration is given to functions designed to return tweets, friends, and followers.
rucm Implementation of Unobserved Components Model (UCM) in R
Unobserved Components Models (introduced in Harvey, A. (1989), Forecasting, structural time series models and the Kalman filter, Cambridge New York: Cambridge University Press) decomposes a time series into components such as trend, seasonal, cycle, and the regression effects due to predictor series which captures the salient features of the series to predict its behavior.
rucrdtw R Bindings for the UCR Suite
R bindings for functions from the UCR Suite by Rakthanmanon et al. (2012) <DOI:10.1145/2339530.2339576>, which enables ultrafast subsequence search for a best match under Dynamic Time Warping and Euclidean Distance.
ruimtehol Learn Text ‘Embeddings’ with ‘Starspace’
Wraps the ‘StarSpace’ library <https://…/StarSpace> allowing users to calculate word, sentence, article, document, webpage, link and entity ’embeddings’. By using the ’embeddings’, you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph ’embeddings’ as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: ‘StarSpace: Embed All The Things!’ by Wu et al. (2017), available at <arXiv:1709.03856>.
ruin Simulation of Various Risk Processes
A (not yet exhaustive) collection of common models of risk processes in actuarial science, represented as formal S4 classes. Each class (risk model) has a simulator of its path, and a plotting function. Further, a Monte-Carlo estimator of a ruin probability for a finite time is implemented, using a parallel computation. Currently, the package extends two classical risk models Cramer-Lundberg and Sparre Andersen models by including capital injections, that are positive jumps (see Breuer L. and Badescu A.L. (2014) <doi:10.1080/03461238.2011.636969>). The intent of the package is to provide a user-friendly interface for ruin processes’ simulators, as well as a solid and extensible structure for future extensions.
ruler Tidy Data Validation Reports
Tools for creating data validation pipelines and tidy reports. This package offers a framework for exploring and validating data frame like objects using ‘dplyr’ grammar of data manipulation.
rUnemploymentData Data and Functions for USA State and County Unemployment Data
Contains data and visualization functions for USA unemployment data. Data comes from the US Bureau of Labor Statistics (BLS). State data is in ?df_state_unemployment and covers 2000-2013. County data is in ?df_county_unemployment and covers 1990-2013. Choropleth maps of the data can be generated with ?state_unemployment_choropleth and ?county_unemployment_choropleth respectively.
runittotestthat Convert ‘RUnit’ Test Functions into ‘testthat’ Tests
Automatically convert a file or package worth of ‘RUnit’ test functions into ‘testthat’ tests.
runner Running Operations for Vectors
Package contains running functions (a.k.a. windowed, rolling, cumulative) with varying window size and missing handling options. Package brings also running streak and running which, what extends beyond range of functions already implemented in R packages.
runstats Fast Computation of Running Statistics for Time Series
Provides methods for fast computation of running sample statistics for time series. These include: (1) mean, (2) standard deviation, and (3) variance over a fixed-length window of time-series, (4) correlation, (5) covariance, and (6) Euclidean distance (L2 norm) between short-time pattern and time-series. Implemented methods utilize Convolution Theorem to compute convolutions via Fast Fourier Transform (FFT).
rusk Beautiful Graphical Representation of Multiplication Tables on a Modular Circle
By placing on a circle 10 points numbered from 1 to 10, and connecting them by a straight line to the point corresponding to its multiplication by 2. (1 must be connected to 1 * 2 = 2, point 2 must be set to 2 * 2 = 4, point 3 to 3 * 2 = 6 and so on). You will obtain an amazing geometric figure that complicates and beautifies itself by varying the number of points and the multiplication table you use.
rust Ratio-of-Uniforms Simulation with Transformation
Uses the generalised ratio-of-uniforms (RU) method to simulate from univariate and (low-dimensional) multivariate continuous distributions. The user specifies the log-density, up to an additive constant. The RU algorithm is applied after relocation of mode of the density to zero, and the user can choose a tuning parameter r. For details see Wakefield, Gelfand and Smith (1991) <DOI:10.1007/BF01889987>, Efficient generation of random variates via the ratio-of-uniforms method, Statistics and Computing (1991) 1, 129-133. A Box-Cox variable transformation can be used to make the input density suitable for the RU method and to improve efficiency. In the multivariate case rotation of axes can also be used to improve efficiency. See the rust website for more information, documentation and examples.
ruta Implementation of Unsupervised Neural Architectures
Implementation of several unsupervised neural networks, from building their architecture to their training and evaluation. Available networks are auto-encoders including their main variants: sparse, contractive, denoising, robust and variational, as described in Charte et al. (2018) <doi:10.1016/j.inffus.2017.12.007>.
ruv Detect and Remove Unwanted Variation using Negative Controls
Implements the ‘RUV’ (Remove Unwanted Variation) algorithms. These algorithms attempt to adjust for systematic errors of unknown origin in high-dimensional data. The algorithms were originally developed for use with genomic data, especially microarray data, but may be useful with other types of high-dimensional data as well. These algorithms were proposed by Gagnon-Bartsch and Speed (2012), and by Gagnon-Bartsch, Jacob and Speed (2013). The algorithms require the user to specify a set of negative control variables, as described in the references. The algorithms included in this package are ‘RUV-2’, ‘RUV-4’, ‘RUV-inv’, and ‘RUV-rinv’, along with various supporting algorithms.
rv Simulation-Based Random Variable Objects
Implements a simulation-based random variable class and a suite of methods for extracting parts of random vectors, calculating extremes of random vectors, and generating random vectors under a variety of distributions following Kerman and Gelman (2007) <doi:10.1007/s11222-007-9020-4>.
rvalues R-values for Ranking in High-Dimensional Settings
A collection of functions for computing ‘r-values’ from various kinds of user input such as MCMC output or a list of effect size estimates and associated standard errors. Given a large collection of measurement units, the r-value, r, of a particular unit is a reported percentile that may be interpreted as the smallest percentile at which the unit should be placed in the top r-fraction of units.
rvcheck R/Package Version Check
Check latest release version of R and R package (both in ‘CRAN’ or ‘Bioconductor’).
rvest Easily Harvest (Scrape) Web Pages
Wrappers around the XML and httr packages to make it easy to download, then manipulate, both html and xml.
rvg R Graphics Devices for Vector Graphics Output
Vector Graphics devices for SVG, DrawingML for Microsoft Word and DrawingML for Microsoft PowerPoint.
RViennaCL ViennaCL C++ Header Files
ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime). I have placed these libraries in this package as a more efficient distribution system for CRAN. The idea is that you can write a package that depends on the ViennaCL library and yet you do not need to distribute a copy of this code with your package.
rviewgraph Animated Graph Layout Viewer
This is an ‘R’ interface to Alun Thomas’s ‘ViewGraph’ ‘Java’ graph viewing program. It takes a graph specified as an incidence matrix, list of edges, or in ‘igraph’ format and runs a graphical user interface that shows an animation of a force directed algorithm positioning the vertices in two dimensions. It works well for graphs of various structure of up to a few thousand vertices. It’s not fazed by graphs that comprise several components. The coordinates can be read as an ‘igraph’ style layout matrix at any time. The user can mess with the layout using a mouse, preferably one with 3 buttons, and some keyed commands. The ‘Java’ program ‘ViewGraph’ is contained in Alun Thomas’s ‘JPSGCS’ collection of ‘Java’ programs for statistical genetics and computational statistics. The homepage for ‘JPSGCS’ is <http://…/index.html>. The documentation page for ‘ViewGraph’ is at <http://…/ViewGraph.html>.
rvinecopulib High Performance Algorithms for Vine Copula Modeling
Provides an interface to ‘vinecopulib’, a high performance C++ library based on ‘Boost’, ‘Eigen’ and ‘NLopt’. It provides high-performance implementations of the core features of the popular ‘VineCopula’ package, in particular inference algorithms for both vine copula and bivariate copula models. Advantages over VineCopula are a sleaker and more modern API, shorter runtimes, especially in high dimensions, nonparametric and multi-parameter families.
Rvoterdistance Calculates the Distance Between Voter and Multiple Polling Locations
Designed to calculate the distance between each voter in a voter file — given lat/long coordinates — and many potential (early) polling or vote by mail drop box locations, then return the minimum distance.
rwavelet Wavelet Analysis
Perform wavelet analysis (orthogonal and translation invariant transforms) with applications to data compression or denoising. Most of the code is a port of ‘MATLAB’ Wavelab toolbox written by D. Donoho, A. Maleki and M. Shahram (<https://…/> ).
rwc Random Walk Covariance Models
Code to facilitate simulation and inference when connectivity is defined by underlying random walks. Methods for spatially-correlated pairwise distance data are especially considered. This provides core code to conduct analyses similar to that in Hanks and Hooten (2013) <doi:10.1080/01621459.2012.724647>.
RWDataPlyr Read and Manipulate Data from ‘RiverWare’
A tool to read and manipulate data generated from ‘RiverWare'(TM) <http://…/> simulations. ‘RiverWare’ and ‘RiverSMART’ generate data in ‘rdf’, ‘csv’, and ‘nc’ format. This package provides an interface to read, aggregate, and summarize data from one or more simulations in a ‘dplyr’ pipeline.
RWeightedKmeans Weighted Object k-Means Algorithm
Weighted object version of k-means algorithm, robust against outlier data.
RWeka R/Weka interface
An R interface to Weka (Version 3.7.12). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package RWeka contains the interface code, the Weka jar is in a separate package RWekajars. For more information on Weka see <http://…/>.
rwfec R Wireless, Forward Error Correction
Communications simulation package supporting forward error correction.
Rwhois WHOIS Server Querying
Queries data from WHOIS servers.
rwirelesscom Basic Wireless Communications
A basic wireless communications simulation package in R. The package includes modulation functions for BPSK, QPSK, 8-PSK, 16-QAM and 64-QAM. Also included is an AWGN noise generation function. Additionally, the package includes functions to plot an I (in-phase) and Q (quadrature) scatter diagram, or density plot. Together these functions enable the evaluation of respective bit error and symbol rates in an AWGN channel and for easily viewing the respective signals and noise in a scatter plot or density plot.
rWishart Random Wishart Matrix Generation
An expansion of R’s ‘stats’ random wishart matrix generation. This package allows the user to generate singular, Uhlig and Harald (1994) <doi:10.1214/aos/1176325375>, and pseudo wishart, Diaz-Garcia, et al.(1997) <doi:10.1006/jmva.1997.1689>, matrices. In addition the user can generate wishart matrices with fractional degrees of freedom, Adhikari (2008) <doi:10.1061/(ASCE)0733-9399(2008)134:12(1029)>, commonly used in volatility modeling. Users can also use this package to create random covariance matrices.
RWsearch Lazy Search in R Packages, Task Views, CRAN, the Web. All-in-One Download
Search by keywords in R packages, task views, CRAN, the web and display the results in console, txt, html or pdf pages. Within a single instruction, download the whole documentation (html index, pdf manual, vignettes, source code, etc), either in a flat format or in subdirectories defined by the keywords. Several functions for task view maintenance. Quick links to more than 60 web search engines. Lazy evaluation of non-standard content is available throughout the package and eases the use of many functions.
RxODE Facilities for Simulating from ODE-Based Models
Facilities for running simulations from ordinary differential equation (ODE) models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the ‘R Administration and Installation’ manual. Also the code is mostly released under GPL. The VODE and LSODA are in the public domain. The information is available in the inst/COPYRIGHTS.
Ryacas R Interface to the Yacas Computer Algebra System
An interface to the yacas computer algebra system.
RYandexTranslate R Interface to Yandex Translate API
Yandex Translate’ (https://translate.yandex.com ) is a statistical machine translation system. The system translates separate words, complete texts, and webpages. This package can be used to detect language from text and to translate it to supported target language. For more info: https://…/About-docpage .
rym R Interface to Yandex Metrika API
Allows work with ‘Management API’ for load counters, segments, filters, user permissions and goals list from Yandex Metrika, ‘Reporting API’ allows you to get information about the statistics of site visits and other data without using the web interface, ‘Logs API’ allows to receive non-aggregated data and ‘Compatible with Google Analytics Core Reporting API v3’ allows receive information about site traffic and other data using field names from Google Analytics Core API. For more information see official documents <https://…/>.
RZabbix R Module for Working with the ‘Zabbix API’
As R users we mostly perform analysis, produce reports and create interactive shiny applications. Those are rather one-time performances. Sometimes, however, the R developer enters the world of the real software development, where R applications should be distributed and maintained on many machines. Then one really appreciates the value of a proper applications monitoring. RZabbix – the R interface to the Zabbix API data.
rzeit Interface to gather newspaper articles from ZEIT ONLINE
Interface to gather newspaper articles from ZEIT ONLINE, based on a multilevel query. Including sorting algorithms and graphical output options.
rzeit
RZigZag Zig-Zag Sampler
Implements the Zig-Zag algorithm with subsampling and control variates (ZZ-CV) of (Bierkens, Fearnhead, Roberts, 2016) <arXiv:1607.03188> as applied to Bayesian logistic regression, as well as basic Zig-Zag for a Gaussian target distribution.

S

s2 Google’s S2 Library for Geometry on the Sphere
R bindings for Google’s s2 library for geometric calculations on the sphere.
S2sls Spatial Two Stage Least Squares Estimation
Fit a spatial instrumental-variable regression by two-stage least squares.
sabre Spatial Association Between Regionalizations
Calculates a degree of spatial association between regionalizations or categorical maps using the information-theoretical V-measure (Nowosad and Stepinski (2018) <doi:10.17605/OSF.IO/RCJH7>). It also offers an R implementation of the MapCurve method (Hargrove et al. (2006) <doi:10.1007/s10109-006-0025-x>).
SACCR SA Counterparty Credit Risk under Basel III
Computes the Exposure-At-Default based on standardized approach of the Basel III Regulatory framework (SA-CCR). For the generation of the trades an object-oriented solution has been created.
SACOBRA Self-Adjusting COBRA
Performs constrained optimization for expensive black-box problems.
sae Small Area Estimation
Functions for small area estimation.
sae2 Small Area Estimation: Time-series Models
Time series models for small area estimation based on area-level models.
saemix Stochastic Approximation Expectation Maximization (SAEM) algorithm
The SAEMIX package implements the Stochastic Approximation EM algorithm for parameter estimation in (non)linear mixed effects models. The SAEM algorithm: – computes the maximum likelihood estimator of the population parameters, without any approximation of the model (linearisation, quadrature approximation,…), using the Stochastic Approximation Expectation Maximization (SAEM) algorithm, – provides standard errors for the maximum likelihood estimator – estimates the conditional modes, the conditional means and the conditional standard deviations of the individual parameters, using the Hastings-Metropolis algorithm. Several applications of SAEM in agronomy, animal breeding and PKPD analysis have been published by members of the Monolix group (http://group.monolix.org ).
SAENET A Stacked Autoencoder Implementation with Interface to ‘neuralnet’
This package implements a stacked sparse autoencoder for dimension reduction of features and pre-training of feed-forward neural networks with the neuralnet package. The package also includes a predict function for the stacked autoencoder object to generate the compressed representation of new data if required. For the purposes of this package, ‘stacked’ is defined in line with http://…/Stacked_Autoencoders . The underlying sparse autoencoder is defined in the documentation of ‘autoencoder’.
saeRobust Robust Small Area Estimation
Methods to fit robust alternatives to commonly used models used in Small Area Estimation. The methods here used are based on best linear unbiased predictions and linear mixed models. At this time available models include area level models incorporating spatial and temporal correlation in the random effects.
saeSim Simulation Tools for Small Area Estimation
Tools for the simulation of data in the context of small area estimation. Combine all steps of your simulation – from data generation over drawing samples to model fitting – in one object. This enables easy modification and combination of different scenarios. You can store your results in a folder or start the simulation in parallel.
SafeBayes Generalized and Safe-Bayesian Ridge and Lasso Regression
Functions for Generalized and Safe-Bayesian Ridge and Lasso Regression models with both fixed and varying variance.
safemode A ‘safemode’ Package for R
The ‘safemode’ package provides a safemode() function that creates a “safe mode” session in R. In “safe mode”, all symbols have an “age” (a last-modified time stamp) and a set of dependent symbols, and a warning is issued whenever a symbol is used in an expression and its age exceeds the age of any of its dependents (i.e., there is warning whenever a “stale” symbol is used in an expression).
safer Encrypt and Decrypt Strings, R Objects and Files
A consistent interface to encrypt and decrypt strings, R objects and files using symmetric key encryption.
safetyGraphics Create Interactive Graphics Related to Clinical Trial Safety
A framework for evaluation of clinical trial safety. Users can interactively explore their data using the ‘Shiny’ application or create standalone ‘htmlwidget’ charts. Interactive charts are built using ‘d3.js’ and ‘webcharts.js’ ‘JavaScript’ libraries.
SAGMM Clustering via Stochastic Approximation and Gaussian Mixture Models
Computes clustering by fitting Gaussian mixture models (GMM) via stochastic approximation following the methods of Nguyen and Jones (2018) <doi:10.1201/9780429446177>. It also provides some test data generation and plotting functionality to assist with this process.
sAIC Akaike Information Criterion for Sparse Estimation
Computes the Akaike information criterion for the generalized linear models (logistic regression, Poisson regression, and Gaussian graphical models) estimated by the lasso.
SALES Elastic Net and (Adaptive) Lasso Penalized Sparse Asymmetric Least Squares (SALES) and Coupled Sparse Asymmetric Least Squares (COSALES) using Coordinate Descent and Proximal Gradient Algorithms
A coordinate descent algorithm for computing the solution path of the sparse and coupled sparse asymmetric least squares, including the elastic net and (adaptive) Lasso penalized SALES and COSALES regressions.
SALTSampler Efficient Sampling on the Simplex
The SALTSampler package facilitates Monte Carlo Markov Chain (MCMC) sampling of random variables on a simplex. A Self-Adjusting Logit Transform (SALT) proposal is used so that sampling is still efficient even in difficult cases, such as those in high dimensions or with parameters that differ by orders of magnitude. Special care is also taken to maintain accuracy even when some coordinates approach 0 or 1 numerically. Diagnostic and graphic functions are included in the package, enabling easy assessment of the convergence and mixing of the chain within the constrained space.
salty Turn Clean Data into Messy Data
Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.
sambia A Collection of Techniques Correcting for Sample Selection Bias
A collection of various techniques correcting statistical models for sample selection bias is provided. In particular, the resampling-based methods ‘stochastic inverse-probability oversampling’ and ‘parametric inverse-probability bagging’ are placed at the disposal which generate synthetic observations for correcting classifiers for biased samples resulting from stratified random sampling. For further information, see the article Krautenbacher, Theis, and Fuchs (2017) <doi:10.1155/2017/7847531>. The methods may be used for further purposes where weighting and generation of new observations is needed.
SAMCpack Stochastic Approximation Monte Carlo (SAMC) Sampler and Methods
Stochastic Approximation Monte Carlo (SAMC) is one of the celebrated Markov chain Monte Carlo (MCMC) algorithms. It is known to be capable of sampling from multimodal or doubly intractable distributions. We provide generic SAMC samplers for continuous distributions. User-specified densities in R and C++ are both supported. We also provide functions for specific problems that exploit SAMC computation. See Liang et al (2010) <doi:10.1002/9780470669723> for complete introduction to the method.
samon Sensitivity Analysis for Missing Data
In a clinical trial with repeated measures designs, outcomes are often taken from subjects at fixed time-points. The focus of the trial may be to compare the mean outcome in two or more groups at some pre-specified time after enrollment. In the presence of missing data auxiliary assumptions are necessary to perform such comparisons. One commonly employed assumption is the missing at random assumption (MAR). The ‘samon’ package allows the user to perform a (parameterized) sensitivity analysis of this assumption. In particular it can be used to examine the sensitivity of tests in the difference in outcomes to violations of the MAR assumption. The sensitivity analysis can be performed under two scenarios, a) where the data exhibit a monotone missing data pattern (see the samon() function), and, b) where in addition to a monotone missing data pattern the data exhibit intermittent missing values (see the samonIM() function).
sampler Sample Design, Drawing & Data Analysis Using Data Frames
Determine sample sizes, draw samples, and conduct data analysis using data frames. It specifically enables you to determine simple random sample sizes, stratified sample sizes, and complex stratified sample sizes using a secondary variable such as population; draw simple random samples and stratified random samples from sampling data frames; determine which observations are missing from a random sample, missing by strata, duplicated within a dataset; and perform data analysis, including proportions, margins of error and upper and lower bounds for simple, stratified and cluster sample designs.
samplesize Sample Size Calculation for Various t-Tests and Wilcoxon-Test
Computes sample size for Student’s t-test and for the Wilcoxon-Mann-Whitney test for categorical data. The t-test function allows paired and unpaired (balanced / unbalanced) designs as well as homogeneous and heterogeneous variances. The Wilcoxon function allows for ties.
SampleSize4ClinicalTrials Sample Size Calculation for Mean and Proportion Comparisons in Phase 3 Clinical Trials
The design of phase 3 clinical trials can be classified into 4 types: (1) Testing for equality;(2) Superiority trial;(3) Non-inferiority trial; and (4) Equivalence trial according to the goals. Given that none of the available packages combines these designs in a single package, this package has made it possible for researchers to calculate sample size when comparing means or proportions in phase 3 clinical trials with different designs. The ssc function can calculate the sample size with pre-specified type 1 error rate,statistical power and effect size according to the hypothesis testing framework. Furthermore, effect size is comprised of true treatment difference and non-inferiority or equivalence margins which can be set in ssc function. (Reference: Yin, G. (2012). Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. John Wiley & Sons.)
samplesize4surveys Sample Size Calculations for Complex Surveys
Computes the required sample size for estimation of totals, means and proportions under complex sampling designs.
samplesizeCMH Power and Sample Size Calculation for the Cochran-Mantel-Haenszel Test
Calculates the power and sample size for Cochran-Mantel-Haenszel tests. There are also several helper functions for working with probability, odds, relative risk, and odds ratio values.
samplesizelogisticcasecontrol Sample Size Calculations for Case-Control Studies
To determine sample size for case-control studies to be analyzed using logistic regression.
samplingDataCRT Sampling Data Within Different Study Designs for Cluster Randomized Trials
Package provides the possibility to sampling complete datasets from a normal distribution to simulate cluster randomized trails for different study designs.
SAMUR Stochastic Augmentation of Matched Data Using Restriction Methods
Augmenting a matched data set by generating multiple stochastic, matched samples from the data using a multi-dimensional histogram constructed from dropping the input matched data into a multi-dimensional grid built on the full data set. The resulting stochastic, matched sets will likely provide a collectively higher coverage of the full data set compared to the single matched set. Each stochastic match is without duplication, thus allowing downstream validation techniques such as cross-validation to be applied to each set without concern for overfitting.
sankey Sankey Diagrams
Sankey plots illustrate the flow of information or material.
santaR Short Asynchronous Time-Series Analysis
A graphical and automated pipeline for the analysis of short time-series in R (‘santaR’). This approach is designed to accommodate asynchronous time sampling (i.e. different time points for different individuals), inter-individual variability, noisy measurements and large numbers of variables. Based on a smoothing splines functional model, ‘santaR’ is able to detect variables highlighting significantly different temporal trajectories between study groups. Designed initially for metabolic phenotyping, ‘santaR’ is also suited for other Systems Biology disciplines. Command line and graphical analysis (via a ‘shiny’ application) enable fast and parallel automated analysis and reporting, intuitive visualisation and comprehensive plotting options for non-specialist users.
saotd Sentiment Analysis of Twitter Data
This analytic is an in initial foray into sentiment analysis. This analytic will allow a user to access the Twitter API (once they create their own developer account), ingest tweets of their interest, clean / tidy data, perform topic modeling if interested, compute sentiment scores utilizing the x bing Lexicon, and output visualizations.
SAR Smart Adaptive Recommendations
Smart Adaptive Recommendations’ (SAR) is the name of a fast, scalable, adaptive algorithm for personalized recommendations based on user transactions and item descriptions. It produces easily explainable/interpretable recommendations and handles ‘cold item’ and ‘semi-cold user’ scenarios. This package provides two implementations of ‘SAR’: a standalone implementation, and an interface to a web service in Microsoft’s ‘Azure’ cloud: <https://…/sar.md>. The former allows fast and easy experimentation, and the latter provides robust scalability and extra features for production use.
sarima Simulation and Prediction with Seasonal ARIMA Models
Functions, classes and methods for time series modelling with ARIMA and related models. The aim of the package is to provide consistent interface for the user. For example, a single function autocorrelations() computes various kinds of theoretical and sample autocorrelations. This is work in progress, see the documentation and vignettes for the current functionality.
SARP.compo Network-based Interpretation of Changes in Compositional Data
Provides a set of functions to interpret changes in compositional data based on a network representation of all pairwise ratio comparisons: computation of all pairwise ratio, construction of a p-value matrix of all pairwise tests of these ratios between conditions, conversion of this matrix to a network.
sasMap Static ‘SAS’ Code Analysis
A static code analysis tool for ‘SAS’ scripts. It is designed to load, count, extract, remove, and summarise components of ‘SAS’ code.
SASmarkdown SAS Markdown
Settings and functions to extend the ‘knitr’ ‘SAS’ engine.
SASxport Read and Write ‘SAS’ ‘XPORT’ Files
Functions for reading, listing the contents of, and writing ‘SAS’ ‘xport’ format files. The functions support reading and writing of either individual data frames or sets of data frames. Further, a mechanism has been provided for customizing how variables of different data types are stored.
satscanMapper SaTScan’ (TM) Results Mapper
Supports the generation of maps based on the results from ‘SaTScan’ (TM) cluster analysis. The package handles mapping of Spatial and Spatial-Time analysis using the discrete Poisson, Bernoulli, and exponential models of case data generating cluster and location (‘GIS’) records containing observed, expected and observed/expected ratio for U. S. states (and DC), counties or census tracts of individual states based on the U. S. ‘FIPS’ codes for state, county and census tracts (locations) using 2000 or 2010 Census areas, ‘FIPS’ codes, and boundary data. ‘satscanMapper’ uses the ‘SeerMapper’ package for the boundary data and mapping of locations. Not all of the ‘SaTScan’ (TM) analysis and models generate the observed, expected and observed/expected ratio values for the clusters and locations. The user can map the observed/expected ratios for locations (states, counties, or census tracts) for each cluster with a p-value less than 0.05 or a user specified p-value. The locations are categorized and colored based on either the cluster’s Observed/Expected ratio or the locations’ Observed/Expected ratio. The place names are provided for each census tract using data from ‘NCI’, the ‘HUD’ crossover tables (Tract to Zip code) as of December, 2013, the USPS Zip code 5 database for 1999, and manual look ups on the USPS.gov web site.
saturnin Spanning Trees Used for Network Inference
Bayesian inference of graphical model structures using spanning trees.
SAutomata Inference and Learning in Stochastic Automata
Machine learning provides algorithms that can learn from data and make inferences or predictions. Stochastic automata is a class of input/output devices which can model components. This work provides implementation an inference algorithm for stochastic automata which is similar to the Viterbi algorithm. Moreover, we specify a learning algorithm using the expectation-maximization technique and provide a more efficient implementation of the Baum-Welch algorithm for stochastic automata. This work is based on Inference and learning in stochastic automata was by Karl-Heinz Zimmermann(2017) <doi:10.12732/ijpam.v115i3.15>.
SAVE Bayesian Emulation, Calibration and Validation of Computer Models
Implements Bayesian statistical methodology for the analysis of complex computer models. It allows for the emulation, calibration, and validation of computer models, following methodology described in Bayarri et al 2007, Technometrics.
http://…/paper
sazedR Parameter-Free Domain-Agnostic Season Length Detection in Time Series
Spectral and Average Autocorrelation Zero Distance Density (‘sazed’) is a method for estimating the season length of a seasonal time series. ‘sazed’ is aimed at practitioners, as it employs only domain-agnostic preprocessing and does not depend on parameter tuning or empirical constants. The computation of ‘sazed’ relies on the efficient autocorrelation computation methods suggested by Thibauld Nion (2012, URL: <http://…/autocorrelations.html> ) and by Bob Carpenter (2012, URL: <https://…/> ).
sbart Sequential BART for Imputation of Missing Covariates
Implements the sequential BART (Bayesian Additive Regression Trees) approach to impute the missing covariates. The algorithm applies a Bayesian nonparametric approach on factored sets of sequential conditionals of the joint distribution of the covariates and the missingness and applying the Bayesian additive regression trees to model each of these univariate conditionals. Each conditional distribution is then sampled using MCMC algorithm. The published journal can be found at <https://…/kxw009> Package provides a function, seqBART(), which computes and returns the imputed values.
sbfc Selective Bayesian Forest Classifier
An MCMC algorithm for simultaneous feature selection and classification, and visualization of the selected features and feature interactions. An implementation of SBFC by Krakovna, Du and Liu (2015), <http://…/1506.02371>.
sBIC Computing the Singular BIC for Multiple Models
Computes the sBIC for various singular model collections including: binomial mixtures, factor analysis models, Gaussian mixtures, latent forests, latent class analyses, and reduced rank regressions.
sboost Machine Learning with AdaBoost on Decision Stumps
Creates classifier for binary outcomes using Freund and Schapire’s Adaptive Boosting (AdaBoost) algorithm on decision stumps with a fast C++ implementation. This type of classifier is nonlinear, but easy to interpret and visualize. Feature vectors may be a combination of continuous (numeric) and categorical (string, factor) elements. Methods for classifier assessment, predictions, and cross-validation also included.
sbpiper Data Analysis Functions for ‘SBpipe’ Package
Provides an API for analysing repetitive parameter estimations and simulations of mathematical models. Examples of mathematical models are Ordinary Differential equations (ODEs) or Stochastic Differential Equations (SDEs) models. Among the analyses for parameter estimation ‘sbpiper’ calculates statistics and generates plots for parameter density, parameter profile likelihood estimations (PLEs), and 2D parameter PLEs. These results can be generated using all or a subset of the best computed parameter sets. Among the analyses for model simulation ‘sbpiper’ calculates statistics and generates plots for deterministic and stochastic time courses via cartesian and heatmap plots. Plots for the scan of one or two model parameters can also be generated. This package is primarily used by the software ‘SBpipe’. Citation: Dalle Pezze P, Le Novère N. SBpipe: a collection of pipelines for automating repetitive simulation and analysis tasks. BMC Systems Biology. 2017;11:46. <doi:10.1186/s12918-017-0423-3>.
sbrl Scalable Bayesian Rule Lists Model
An implementation of Scalable Bayesian Rule Lists Algorithm.
Scale Likert Type Questionnaire Item Analysis
Provides the Scale class and corresponding functions, in order to facilitate data input for scale construction. Reverse items and alternative orders of administration are dealt with by the program. Computes reliability statistics, confirmatory, single factor loadings. It suggests item deletions and produces basic text output in English, for incorporation in reports. Returns list objects of all relevant functions from other packages (see Depends).
scaleboot Approximately Unbiased P-Values via Multiscale Bootstrap
Calculating approximately unbiased (AU) p-values from multiscale bootstrap probabilities. See Shimodaira (2004) <doi:10.1214/009053604000000823> and Shimodaira (2008) <doi:10.1016/j.jspi.2007.04.001>.
Scalelink Create Scale Linkage Scores
Perform a ‘probabilistic’ linkage of two data files using a scaling procedure using the methods described in Goldstein, H., Harron, K. and Cortina-Borja, M. (2017) <doi:10.1002/sim.7287>.
scales Scale Functions for Visualization
Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.
scan Single-Case Data Analyses for Single and Multiple AB Designs
A collection of procedures for analysing single-case data of an AB-design. Some procedures support multiple-baseline designs.
scanstatistics Space-Time Anomaly Detection using Scan Statistics
Detection of anomalous space-time clusters using the scan statistics methodology. Focuses on prospective surveillance of data streams, scanning for clusters with ongoing anomalies. Hypothesis testing is made possible by the generation of Monte Carlo p-values.
SCAT Summary Based Conditional Association Test
Conditional association test based on summary data from genome-wide association study (GWAS). SCAT adjusts for heterogeneity in SNP coverage that exists in summary data if SNPs are not present in all of the participating studies of a GWAS meta-analysis. This commonly happens when different reference panels are used in participating studies for genotype imputation. This could happen when ones simply do not have data for some SNPs (e.g. different array, or imputated data is not available). Without properly adjusting for this kind of heterogeneity leads to inflated false positive rate. SCAT can also be used to conduct conventional conditional analysis when coverage heterogeneity is absent. For more details, refer to Zhang et al. (2018) Brief Bioinform. 19(6):1337-1343. <doi: 10.1093/bib/bbx072>.
scatr Create Scatter Plots with Marginal Density or Box Plots
Allows you to make clean, good-looking scatter plots with the option to easily add marginal density or box plots on the axes. It is also available as a module for ‘jamovi’ (see <https://www.jamovi.org> for more information). ‘Scatr’ is based on the ‘cowplot’ package by Claus O. Wilke and the ‘ggplot2’ package by Hadley Wickham.
scatterD3 D3 Javascript Scatterplot from R
Creates ‘D3’ ‘JavaScript’ scatterplots from ‘R’ with interactive features : panning, zooming, tooltips, etc.
scatterpie Scatterpie Plot
Creates scatterpie plots, especially useful for plotting pies on a map.
SCBiclust Identifies Mean, Variance, and Hierarchically Clustered Biclusters
Identifies a bicluster, a submatrix of the data such that the features and observations within the submatrix differ from those not contained in submatrix, using a two-step method. In the first step, observations in the bicluster are identified to maximize the sum of weighted between cluster feature differences. The observations are identified in a similar fashion as in Witten and Tibshirani (2010) <doi:10.1198/jasa.2010.tm09415> except with a modified objective function and no feature sparsity constraint. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. The cluster significance test of Liu, Hayes, Nobel, and Marron (2008): <doi:10.1198/016214508000000454> can then be used to test the strength of the identified bicluster. ‘SCBiclust’ can be used to identify biclusters which differ based on feature means, feature variances, or more general differences.
scbursts Single Channel Bursts Analysis
Provides tools to import and export from several existing pieces of ion-channel analysis software such as ‘TAC’, ‘QUB’, ‘SCAN’, and ‘Clampfit’, implements procedures such as dwell-time correction and defining bursts with a critical time, and provides tools for analysis of bursts, such as tools for sorting and plotting.
SCCI Stochastic Complexity-Based Conditional Independence Test for Discrete Data
An efficient implementation of SCCI using ‘Rcpp’. SCCI is short for the Stochastic Complexity-based Conditional Independence criterium (Marx and Vreeken, 2019). SCCI is an asymptotically unbiased and L2 consistent estimator of (conditional) mutual information for discrete data.
scclust Size-Constrained Clustering
Provides wrappers for ‘scclust’, a C library for computationally efficient size-constrained clustering with near-optimal performance. See <https://…/scclust> for more information.
SCCS The Self-Controlled Case Series Method
Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and <http://…/index.html>.
scdensity Shape-Constrained Kernel Density Estimation
Implements methods for obtaining kernel density estimates subject to a variety of shape constraints (unimodality, bimodality, symmetry, tail monotonicity, bounds, and constraints on the number of inflection points). Enforcing constraints can eliminate unwanted waves or kinks in the estimate, which improves its subjective appearance and can also improve statistical performance. The main function scdensity() is very similar to the density() function in ‘stats’, allowing shape-restricted estimates to be obtained with little effort. The methods implemented in this package are described in Wolters and Braun (2017) <doi:10.1080/03610918.2017.1288247>, Wolters (2012) <doi:10.18637/jss.v047.i06>, and Hall and Huang (2002) <http://…/j12n41.htm>. See the scdensity() help for for full citations.
scdhlm Estimating Hierarchical Linear Models for Single-Case Designs
Provides a set of tools for estimating hierarchical linear models and effect sizes based on data from single-case designs. Functions are provided for calculating standardized mean difference effect sizes that are directly comparable to standardized mean differences estimated from between-subjects randomized experiments, as described in Hedges, Pustejovsky, and Shadish (2012) <DOI:10.1002/jrsm.1052>; Hedges, Pustejovsky, and Shadish (2013) <DOI:10.1002/jrsm.1086>; and Pustejovsky, Hedges, and Shadish (2014) <DOI:10.3102/1076998614547577>.
scenario Construct Reduced Trees with Predefined Nodal Structures
Uses the neural gas algorithm to construct a scenario tree for use in multi-stage stochastic programming. The primary input is a set of initial scenarios or realizations of a disturbance. The scenario tree nodal structure must be predefined using a scenario tree nodal partition matrix.
scgwr Scalable Geographically Weighted Regression
Estimates a fast and regularized version of GWR for large dataset, detailed in Murakami, Tsutsumida, Yoshida, Nakaya, and Lu (2019) <arXiv:1905.00266>.
scheduleR An Interface to Schedule R Scripts
scheduleR is a framework that can be used to deploy R tasks, reports and apps.
• Tasks are ‘regular’ R scripts that you want to schedule to be executed on a regular basis (often ETL related scripts).
• Reports are Rmarkdown (.Rmd) reports that can be converted to a PDF or HTML. See rmarkdown for more info.
• Apps are Shiny apps, support for these in scheduleR is experimental.
An easy web interface for scheduling is provided for adding tasks, maintenance and viewing logs. scheduleR provides extensive logging support and error/success notifications. scheduleR is built to be used on a server. It can be used locally but that mean that you have to keep a mongodb server and scheduleR running at all times.
SchemaOnRead Automated Schema on Read
Provides schema-on-read tools including a single function call (e.g., schemaOnRead(‘filename’)) that reads text (‘TXT’), comma separated value (‘CSV’), raster image (‘BMP’, ‘PNG’, ‘GIF’, ‘TIFF’, and ‘JPG’), R data (‘RDS’), HDF5, NetCDF, spreadsheet (‘XLS’, ‘XLSX’, ‘ODS’, and ‘DIF’), Weka Attribute-Relation File Format (‘ARFF’), Epi Info (‘REC’), SPSS (‘SAV’), Systat (‘SYS’), and Stata (‘DTA’) files. It also recursively reads folders (e.g., schemaOnRead(‘folder’)), returning a nested list of the contained elements.
schumaker Schumaker Shape-Preserving Spline
This is a shape preserving spline which is guaranteed to be monotonic and concave or convex if the data is monotonic and concave or convex. It does not use any optimisation and is therefore quick and smoothly converges to a fixed point in economic dynamics problems including value function iteration. It also automatically gives the first two derivatives of the spline and options for determining behaviour when evaluated outside the interpolation domain.
scico Colour Palettes Based on the Scientific Colour-Maps
Colour choice in information visualisation is important in order to avoid being mislead by inherent bias in the used colour palette. The ‘scico’ package provides access to the perceptually uniform and colour-blindness friendly palettes developed by Fabio Crameri and released under the ‘Scientific Colour-Maps’ moniker. The package contains 17 different palettes and includes both diverging and sequential types.
scientoText Text & Scientometric Analytics
It involves bibliometric indicators calculation from bibliometric data.It also deals pattern analysis using the text part of bibliometric data.The bibliometric data are obtained from mainly Web of Science and Scopus.
scifigure Visualize Reproducibility and Replicability in a Comparison of Scientific Studies
Users may specify what fundamental qualities of a new study have or have not changed in an attempt to reproduce or replicate an original study. A comparison of the differences is visualized. Visualization approach follows Patil, Peng, and Leek (2016) <doi:10.1101/066803>.
scmamp Statistical Comparison of Multiple Algorithms in Multiple Problems
Given a matrix with results of different algorithms for different problems, the package uses statistical tests and corrections to assess the differences between algorithms.
scorecard Credit Risk Scorecard
Makes the development of credit risk scorecard easily and efficiently by providing functions such as information value, variable filter, optimal woe binning, scorecard scaling and performance evaluation etc. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.
scorecardModelUtils Credit Scorecard Modelling Utils
Provides infrastructure functionalities such as missing value treatment, information value calculation, GINI calculation etc. which are used for developing a traditional credit scorecard as well as a machine learning based model. The functionalities defined are standard steps for any credit underwriting scorecard development, extensively used in financial domain.
ScoreGGUM Score Persons Using the Generalized Graded Unfolding Model
Estimate GGUM Person Parameters Using Pre-Calibrated Item Parameters and Binary or Graded Disagree-Agree Responses
scorer Quickly Score Models
A set of tools to quickly score models commonly used to data analysis and data science using uncommon scoring metrics. For example, you might want to use a weighted absolute percent error instead of a root mean square deviation to score that regression model.
scoringRules Scoring Rules for Parametric and Simulated Distribution Forecasts
Dictionary-like reference for computing scoring rules in a wide range of situations. Covers both parametric forecast distributions (such as mixtures of Gaussians) and distributions generated via simulation.
ScottKnott The ScottKnott Clustering Algorithm
Division of an ANOVA experiment treatment means into homogeneous distinct groups using the clustering method of Scott & Knott
ScottKnottESD The Scott-Knott Effect Size Difference (ESD) Test
An enhancement of the Scott-Knott test (which cluster distributions into statistically distinct ranks) that takes effect size into consideration.
scPDSI Calculation of the Conventional and Self-Calibrating Palmer Drought Severity Index
Calculating the monthly conventional and self-calibrating Palmer Drought Severity Index (PDSI and scPDSI) using the precipitation and potential evapotranspiration data. The function to calculate PDSI is based on the C++ source codes developed by Nathan Wells, Steve Goddard and Michael J. Hayes, University of Nebraska-Lincoln. Reference: Palmer W. (1965). Meteorological drought. U.s.department of Commerce Weather Bureau Research Paper, <https://…/palmer.pdf>; Wells N., Goddard S., Hayes M. J. (2004). A Self-Calibrating Palmer Drought Severity Index. Journal of Climate, 17(12):2335-2351, <DOI:10.1175/1520-0442(2004)017%3C2335:ASPDSI%3E2.0.CO;2>.
scpm An R Package for Spatial Smoothing
Group of functions for spatial smoothing using cubic splines and variogram maximum likelihood estimation. Also allow the inclusion of linear parametric terms and change-points for segmented smoothing splines models.
SCPME Shrinking Characteristics of Precision Matrix Estimators
Estimates a penalized precision matrix via an augmented ADMM algorithm. This package is an implementation of the methods described in ‘Shrinking Characteristics of Precision Matrix Estimators’ by Aaron J. Molstad, PhD and Adam J. Rothman, PhD. The manuscript can be found here: <doi:10.1093/biomet/asy023> .
scrapeR Tools for Scraping Data from HTML and XML Documents
Tools for Scraping Data from Web-Based Documents
scriptName Determine a Script’s Filename from Within the Script Itself
A small set of functions wrapping up the call stack and command line inspection needed to determine a running script’s filename from within the script itself.
scriptuRs Complete Text of the LDS Scriptures
Full text, in data frames containing one row per verse, of the Standard Works of The Church of Jesus Christ of Latter-day Saints (LDS). These are the Old Testament, (KJV), the New Testament (KJV), the Book of Mormon, the Doctrine and Covenants, and the Pearl of Great Price.
SCRSELECT Performs Bayesian Variable Selection on the Covariates in a Semi-Competing Risks Model
Contains four functions used in the DIC-tau_g procedure. SCRSELECT() and SCRSELECTRUN() uses Stochastic Search Variable Selection to select important covariates in the three hazard functions of a semi-competing risks model. These functions perform the Gibbs sampler for variable selection and a Metropolis-Hastings-Green sampler for the number of split points and parameters for the three baseline hazard function. The function SCRSELECT() returns the posterior sample of all quantities sampled in the Gibbs sampler after a burn-in period to a desired file location, while the function SCRSELECTRUN() returns posterior values of important quantities to the DIC-Tau_g procedure in a list. The function DICTAUG() returns a list containing the DIC values for the unique models visited by the DIC-Tau_g grid search. The function ReturnModel() uses SCRSELECTRUN() and DICTAUG() to return a summary of the posterior coefficient vectors for the optimal model along with saving this posterior sample to a desired path location.
scs Splitting Conic Solver
Solves convex cone programs via operator splitting. Can solve: linear programs (LPs), second-order cone programs (SOCPs), semidefinite programs (SDPs), exponential cone programs (ECPs), and power cone programs (PCPs), or problems with any combination of those cones. SCS uses AMD (a set of routines for permuting sparse matrices prior to factorization) and LDL (a sparse LDL’ factorization and solve package) from ‘SuiteSparse’ (<http://www.suitesparse.com> ).
sctransform Variance Stabilizing Transformations for Single Cell UMI Data
A normalization method for single-cell UMI count data using a variance stabilizing transformation. The transformation is based on a negative binomial regression model with regularized parameters. As part of the same regression framework, this package also provides functions for batch correction, and data correction/denoising.
scvxclustr Sparse Convex Clustering
Alternating Minimization Algorithm (AMA) and Alternating Direction Method of Multipliers (ADMM) splitting methods for sparse convex clustering.
SDALGCP Spatially Discrete Approximation to Log-Gaussian Cox Processes for Aggregated Disease Count Data
Provides a computationally efficient discrete approximation to log-Gaussian Cox process model for spatially aggregated disease count data. It uses Monte Carlo Maximum Likelihood for model parameter estimation as proposed by Christensen (2004) <doi: 10.1198/106186004X2525> and delivers prediction of spatially discrete and continuous relative risk.
sdat Signal Detection via Adaptive Test
Test the global null in linear models using marginal approach.
sdcHierarchies Create and (Interactively) Modify Nested Hierarchies
Provides functionality to generate, (interactively) modify (by adding, removing and renaming nodes) and convert nested hierarchies between different formats. These tree like structures can be used to define for example complex hierarchical tables used for statistical disclosure control.
sdcTarget Statistical Disclosure Control Substitution Matrix Calculator
Classes and methods to calculate and evaluate target matrices for statistical disclosure control.
SDD Serial Dependence Diagrams
Allows for computing (and by default plotting) different types of serial dependence diagrams.
SDEFSR Subgroup Discovery with Evolutionary Fuzzy Systems in R
Implementation of evolutionary fuzzy systems for the data mining task called ‘subgroup discovery’. It also provide a Shiny App for make the analysis easier. The algorithms works with data sets provided in KEEL, ARFF and CSV format and also with data.frame objects.
sdols Summarizing Distributions of Latent Structures
Summaries of distributions on clusterings and feature allocations are provided. Specifically, point estimates are obtained by the sequentially-allocated latent structure optimization (SALSO) algorithm to minimize squared error loss, absolute error loss, Binder loss, or the lower bound of the variation of information loss. Clustering uncertainty can be assessed with the confidence calculations and the associated plot.
sdpt3r Semi-Definite Quadratic Linear Programming Solver
Solves the general Semi-Definite Linear Programming formulation using an R implementation of SDPT3 (K.C. Toh, M.J. Todd, and R.H. Tutuncu (1999) <doi:10.1080/10556789908805762>). This includes problems such as the nearest correlation matrix problem (Higham (2002) <doi:10.1093/imanum/22.3.329>), D-optimal experimental design (Smith (1918) <doi:10.2307/2331929>), Distance Weighted Discrimination (Marron and Todd (2012) <doi:10.1198/016214507000001120>), as well as graph theory problems including the maximum cut problem. Technical details surrounding SDPT3 can be found in R.H Tutuncu, K.C. Toh, and M.J. Todd (2003) <doi:10.1007/s10107-002-0347-5>.
SDR Subgroup Discovery Algorithms for R
Implementation of some algorithms for the data mining task called ‘subgroup discovery’ without package dependencies. It also provide a Shiny App for make the analysis easier. The algorithms works with data sets provided in KEEL format. If you want more information about this format, please refer to < http://www.keel.es > .
SDT Self-Determination Theory Measures
Functions for self-determination motivation theory (SDT) to compute measures of motivation internalization, motivation simplex structure, and of the original and adjusted self-determination or relative autonomy index. SDT was introduced by Deci and Ryan (1985) <doi:10.1007/978-1-4899-2271-7>. See package?SDT for an overview.
sdwd Sparse Distance Weighted Discrimination
Solves the solution paths of the sparse distance weighted discrimination (DWD) with the L1, the elastic-net, and the adaptive elastic-net penalties.
SEA Segregation Analysis
A few major genes and a series of polygene are responsive for each quantitative trait. Major genes are individually identified while polygene is collectively detected. This is mixed major genes plus polygene inheritance analysis or segregation analysis (SEA). In the SEA, phenotypes from a single or multiple segregation populations along with their parents are used to fit all the possible models and the best model is viewed as the model of the trait. There are fourteen combinations of populations available. Zhang YM, Gai JY, Yang YH (2003) <doi:10.1017/S0016672303006141>.
sealr Sealing the R Objects Test and Assert Conditions
Record the state of R object. Outputs include object class and attributes. This helps reduce errors in test results by manual description. The goal is to improve the efficiency of data analysis and package development.
searchable Make R Objects Searchable by Matching Names Based on Case (in)Sensitivity, Regular Expressions, ..
Provides functionality for searching named vectors and lists more configurable. The packages uses ‘stringr’-style match modifiers to allow for matching by case (in)sensitivity, regular expressions or fixed expression. It also allows searching through values rather than names. This functionality facilitates creating dictionary and thesaurus like structures.
searchConsoleR Google Search Console APIv3 R Client
Provides an interface with the Google Search Console API v3, formally called Google Webmaster Tools.
searcher Query Search Interfaces
Provides a search interface to look up terms on ‘Google’, ‘Bing’, ‘DuckDuckGo’, ‘StackOverflow’, ‘GitHub’, and ‘BitBucket’. Upon searching, a browser window will open with the aforementioned search results.
seasonalview Graphical User Interface for Seasonal Adjustment
A graphical user interface to the ‘seasonal’ package and ‘X-13ARIMA-SEATS’, the U.S. Census Bureau’s seasonal adjustment software. Unifies the code base of <http://www.seasonal.website> and the GUI in the ‘seasonal’ package.
seastests Seasonality Tests
An overall test for seasonality of a given time series in addition to a set of single seasonality tests as used in Ollech and Webel (forthcoming): An overall seasonality test. Bundesbank Discussion Paper.
SecKW The SecKW Distribution
Density, distribution function, quantile function, random generation and survival function for the Secant Kumaraswamy Weibull Distribution as defined by SOUZA, L. New Trigonometric Class of Probabilistic Distributions. 219 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2015 (available at <http://…obabilistic-distributions-602633.html> ) and BRITO, C. C. R. Method Distributions generator and Probability Distributions Classes. 241 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2014 (available upon request).
secret Share Sensitive Information in R Packages
Allow sharing sensitive information, for example passwords, ‘API’ keys, etc., in R packages, using public key cryptography.
secsse Several Examined and Concealed States-Dependent Speciation and Extinction
Combines the features of HiSSE and MuSSE to simultaneously infer state-dependent diversification across two or more traits or states while accounting for the role of a possible concealed trait. See Herrera-Alsina et al. Systematic Biology, in press <DOI:10.1093/sysbio/syy057>.
secure Sequential Co-Sparse Factor Regression
Fitting sparse factor regression using sequential estimation.
securitytxt Identify and Parse Web Security Policies Files
When security risks in web services are discovered by independent security researchers who understand the severity of the risk, they often lack the channels to properly disclose them. As a result, security issues may be left unreported. The ‘security.txt’ ‘Web Security Policies’ specification defines an ‘IETF’ draft standard <https://…/draft-foudil-securitytxt-00> to help organizations define the process for security researchers to securely disclose security vulnerabilities. Tools are provided to help identify and parse ‘security.txt’ files to enable analysis of the usage and adoption of these policies.
see Visualisation Toolbox for ‘easystats’ and Extra Geoms, Themes and Color Palettes for ‘ggplot2’
Provides plotting utilities supporting easystats-packages (<https://…/easystats> ) and some extra themes, geoms, and scales for ‘ggplot2’. Color scales are based on <https://…/colors>.
seedCCA Seeded Canonical Correlation Analysis
Functions for dimension reduction through the seeded canonical correlation analysis are provided. A classical canonical correlation analysis (CCA) is one of useful statistical methods in multivariate data analysis, but it is limited in use due to the matrix inversion for large p small n data. To overcome this, a seeded CCA has been proposed in Im, Gang and Yoo (2015) <DOI:10.1002/cem.2691>. The seeded CCA is a two-step procedure. The sets of variables are initially reduced by successively projecting cov(X,Y) or cov(Y,X) onto cov(X) and cov(Y), respectively, without loss of information on canonical correlation analysis, following Cook, Li and Chiaromonte (2007) <DOI:10.1093/biomet/asm038> and Lee and Yoo (2014) <DOI:10.1111/anzs.12057>. Then, the canonical correlation is finalized with the initially-reduced two sets of variables.
segclust2d Bivariate Segmentation/Clustering Methods and Tools
Provides two methods for segmentation and joint segmentation/clustering of bivariate time-series. Originally intended for ecological segmentation (home-range and behavioural modes) but easily applied on other series, the package also provides tools for analysing outputs from R packages ‘moveHMM’ and ‘marcher’. The segmentation method is a bivariate extension of Lavielle’s method available in ‘adehabitatLT’ (Lavielle, 1999 <doi:10.1016/S0304-4149(99)00023-X> and 2005 <doi:10.1016/j.sigpro.2005.01.012>). This method rely on dynamic programming for efficient segmentation. The segmentation/clustering method alternates steps of dynamic programming with an Expectation-Maximization algorithm. This is an extension of Picard et al (2007) <doi:10.1111/j.1541-0420.2006.00729.x> method (formerly available in ‘cghseg’ package) to the bivariate case. The full description of the method is not published yet.
segmented Regression Models with Breakpoints/Changepoints Estimation
Given a regression model, segmented ‘updates’ the model by adding one or more segmented (i.e., piecewise-linear) relationships. Several variables with multiple breakpoints are allowed.
segmenTier Similarity-Based Segmentation of Multidimensional Signals
A dynamic programming solution to segmentation based on maximization of arbitrary similarity measures within segments. The general idea, theory and this implementation are described in Machne, Murray & Stadler (2017) <doi:10.1038/s41598-017-12401-8>. In addition to the core algorithm, the package provides time-series processing and clustering functions as described in the publication. These are generally applicable where a `k-means` clustering yields meaningful results, and have been specifically developed for clustering of the Discrete Fourier Transform of periodic gene expression data (`circadian’ or `yeast metabolic oscillations’). This clustering approach is outlined in the supplemental material of Machne & Murray (2012) <doi:10.1371/journal.pone.0037906>), and here is used as a basis of segment similarity measures. Notably, the time-series processing and clustering functions can also be used as stand-alone tools, independent of segmentation, e.g., for transcriptome data already mapped to genes.
segmentr Segment Data With Maximum Likelihood
Given a likelihood provided by the user, this package applies it to a given matrix dataset in order to find change points in the data that maximize the sum of the likelihoods of all the segments. This package provides a handful of algorithms with different time complexities and assumption compromises so the user is able to choose the best one for the problem at hand. The implementation of the segmentation algorithms in this package are based on the paper by Bruno M. de Castro, Florencia Leonardi (2018) <arXiv:1501.01756>. The Berlin weather sample dataset was provided by Deutscher Wetterdienst <https://dwd.de/>. You can find all the references in the Acknowledgments section of this package’s repository via the URL below.
segMGarch Multiple Change-Point Detection for High-Dimensional GARCH Processes
Implements a segmentation algorithm for multiple change-point detection in high-dimensional GARCH processes. It simultaneously segments GARCH processes by identifying ‘common’ change-points, each of which can be shared by a subset or all of the component time series as a change-point in their within-series and/or cross-sectional correlation structure.
segregation Entropy-Based Segregation Indices
Computes entropy-based segregation indices, as developed by Theil (1971) <isbn:978-0471858454>, with a focus on the Mutual Information Index (M). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. Includes standard error estimation by bootstrapping.
SEHmodel Spatial Exposure-Hazard Model for Exposure and Impact Assessment on Exposed Individuals
A model coupling polygon and point processes for assessing risk due to contaminant sources and their impact on exposed individuals.
seismic Predict Information Cascade by Self-Exciting Point Process
An implementation of self-exciting point process model for information cascades, which occurs when many people engage in the same acts after observing the actions of others (e.g. post resharings on Facebook or Twitter). It provides functions to estimate the infectiousness of an information cascade and predict its popularity given the observed history. See http://…/seismic for more information and datasets.
seismicRoll Fast Rolling Functions for Seismology using Rcpp
Fast versions of seismic analysis functions that ‘roll’ over a vector of values. See the RcppRoll package for alternative versions of basic statistical functions such as rolling mean, median, etc.
SelectBoost A General Algorithm to Enhance the Performance of Variable Selection Methods in Correlated Datasets
An implementation of the selectboost algorithm (Aouadi et al. 2018, <arXiv:1810.01670>), which is a general algorithm that improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. It can either produce a confidence index for variable selection or it can be used in an experimental design planning perspective.
selection Correcting Biased Estimates Under Selection
A collection of functions for correcting biased estimates under selection (range restriction).
selectspm Select point patterns models based on minimum contrast and AIC
Package to fit and select point patterns models based on minimum contrast and AIC
seleniumPipes R Client Implementing the W3C WebDriver Specification
The W3C WebDriver specification defines a way for out-of-process programs to remotely instruct the behaviour of web browsers. It is detailed at <https://…/webdriver-spec.html>. This package provides an R client implementing the W3C specification.
SELF A Structural Equation Embedded Likelihood Framework for Causal Discovery
Provides the SELF criteria to learn causal structure. Details of the algorithm can be found in ‘SELF: A Structural Equation Embedded Likelihood Framework for Causal Discovery’ (AAAI 2018).
SelvarMix Regularization for variable selection in model-based clustering and discriminant analysis
Regularization for variable selection in model-based clustering and discriminant analysis
semantic.dashboard Dashboard with Semantic UI Support for ‘shiny’
Basic functions for creating semantic UI dashboard. This package adds support for a powerful UI library semantic UI – <http://…/> to your dashboard and enables you to stay compatible with ‘shinydashboard’ functionalities.
semds Structural Equation Multidimensional Scaling
Fits a multidimensional scaling (MDS) model for three-way data. It integrates concepts from structural equation models (SEM) by assuming an underlying, latent dissimilarity matrix. The methods uses an alternating estimation procedure in which the unknown symmetric dissimilarity matrix is estimated in a SEM framework while the objects are represented in a low-dimensional space. As a special case it can also handle asymmetric input dissimilarities.
SEMID Identifiability of Linear Structural Equation Models
Provides routines to check identifiability or non-identifiability of linear structural equation models as described in Drton, Foygel, and Sullivant (2011) <DOI:10.1214/10-AOS859>, Foygel, Draisma, and Drton (2012) <DOI:10.1214/12-AOS1012>, and other works. The routines are based on the graphical representation of structural equation models by a path diagram/mixed graph.
SemiMarkov Multi-States Semi-Markov Models
Functions for fitting multi-state semi-Markov models to longitudinal data. A parametric maximum likelihood estimation method adapted to deal with Exponential, Weibull and Exponentiated Weibull distributions is considered. Right-censoring can be taken into account and both constant and time-varying covariates can be included using a Cox proportional model.
seminr Domain-Specific Language for Building PLS Structural Equation Models
A powerful, easy to write and easy to modify syntax for specifying and estimating Partial Least Squares (PLS) path models allowing for the latest estimation methods for Consistent PLS as per Dijkstra & Henseler (2015, MISQ 39(2): 297-316), adjusted interactions as per Henseler & Chin (2010) <doi:10.1080/10705510903439003> and bootstrapping utilizing parallel processing as per Hair et al. (2017, ISBN:978-1483377445).
SemiSupervised Safe Semi-Supervised Learning Tools
Implements several safe graph-based semi-supervised learning algorithms. The first algorithm is the Semi-Supervised Semi-Parametric Model (S4PM) and the fast Anchor Graph version of this approach. For additional technical details, refer to Culp and Ryan (2013) <http://…/culp13a.html>, Ryan and Culp (2015) <http://…/ryan15a.html> and the package vignette. The underlying fitting routines are executed in C++. All tuning parameter estimation is optimized using K-fold Cross-Validation.
SemNetCleaner An Automated Cleaning Tool for Semantic and Linguistic Data
Implements several functions that automatize the cleaning, removal of plurals and continuous strings, making the data binary, converging, and finalizing of linguistic data for semantic network analysis. Also provides a partial bootstrapped network function and plot.
SemNetDictionaries Dictionaries for the ‘SemNetCleaner’ Package
Implements dictionaries that can be used in the ‘SemNetCleaner’ package. Also includes several functions aimed at facilitating the text cleaning analysis in the ‘SemNetCleaner’ package. This package is designed to integrate and update word lists and dictionaries based on each user’s individual needs by allowing users to store and save their own dictionaries. Dictionaries can be added to the ‘SemNetDictionaries’ package by submitting user-defined dictionaries to <https://…/SemNetDictionaries>.
semPower Power Analyses for SEM
Provides a-priori, post-hoc, and compromise power-analyses for structural equation models (SEM). Moshagen & Erdfelder (2016) <doi:10.1080/10705511.2014.950896>.
SEMrushR R Interface to Access the ‘SEMrush’ API
Implements methods for querying SEO (Search Engine Optimization) and SEM (Search Engine Marketing) data from ‘SEMrush’ using its API (<https://…/> ). ‘SEMrush’ API uses a basic authentication with an API key.
semsfa Semiparametric Estimation of Stochastic Frontier Models
Semiparametric Estimation of Stochastic Frontier Models following a two step procedure: in the first step semiparametric or nonparametric regression techniques are used to relax parametric restrictions of the functional form representing technology and in the second step variance parameters are obtained by pseudolikelihood estimators or by method of moments.
semTable Structural Equation Modeling Tables
For confirmatory factor analysis (‘CFA’) and structural equation models (‘SEM’) estimated with the ‘lavaan’ package, this package provides functions to create model summary tables and model comparison tables for hypothesis testing. Tables can be produced in ‘LaTeX’, ‘HTML’, or comma separated variables (‘CSV’).
semtree Recursive Partitioning for Structural Equation Models
SEM Trees and SEM Forests — an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees.
semver Semantic Versioning V2.0′ Parser
Tools and functions for parsing, rendering and operating on semantic version strings. Semantic versioning is a simple set of rules and requirements that dictate how version numbers are assigned and incremented as outlined at <http://semver.org>.
sensiPhy Sensitivity Analysis for Comparative Methods
An implementation of sensitivity analysis for phylogenetic comparative methods. The package is an umbrella of statistical and graphical methods that estimate and report different types of uncertainty in PCM: (i) Species Sampling uncertainty (sample size; influential species and clades). (ii) Phylogenetic uncertainty (different topologies and/or branch lengths). (iii) Data uncertainty (intraspecific variation and measurement error).
sensitivity2x2xk Sensitivity Analysis for 2x2xk Tables in Observational Studies
Performs exact or approximate adaptive or nonadaptive Cochran-Mantel-Haenszel-Birch tests and sensitivity analyses for one or two 2x2xk tables in observational studies.
sensitivityCalibration A Calibrated Sensitivity Analysis for Matched Observational Studies
Implements the calibrated sensitivity analysis approach for matched observational studies. Our sensitivity analysis framework views matched sets as drawn from a super-population. The unmeasured confounder is modeled as a random variable. We combine matching and model-based covariate-adjustment methods to estimate the treatment effect. The hypothesized unmeasured confounder enters the picture as a missing covariate. We adopt a state-of-art Expectation Maximization (EM) algorithm to handle this missing covariate problem in generalized linear models (GLMs). As our method also estimates the effect of each observed covariate on the outcome and treatment assignment, we are able to calibrate the unmeasured confounder to observed covariates. Zhang, B., Small, D. S. (2018). <arXiv:1812.00215>.
sensitivityfull Sensitivity Analysis for Full Matching in Observational Studies
Sensitivity to unmeasured biases in an observational study that is a full match.
sensitivitymult Sensitivity Analysis for Observational Studies with Multiple Outcomes
Sensitivity analysis for multiple outcomes in observational studies. For instance, all linear combinations of several outcomes may be explored using Scheffe projections in the comparison() function; see Rosenbaum (2016, Annals of Applied Statistics) <doi:10.1214/16-AOAS942>. Alternatively, attention may focus on a few principal components in the principal() function. The package includes parallel methods for individual outcomes, including tests in the senm() function and confidence intervals in the senmCI() function.
sensitivityPStrat Principal Stratification Sensitivity Analysis Functions
This package provides functions to perform principal stratification sensitivity analyses on datasets.
SensMap Sensory and Consumer Data Mapping
Obtain external preference map to explain consumer preferences in function of sensory attributes of products (K.Greenhoff et al. (1994) <doi:10.1007/978-1-4615-2171-6_6>) with options in dimension reduction methods and prediction models from linear and non linear regressions. A smoothed version of the map is available and a comparison of maps stability from different features before and after smoothing is provided which may help industrials to make good decisions about characteristics of new product development. A ‘shiny’ application is included. It presents an easy GUI for the implemented functions as well as a comparative tool of fit models performance using several criteria. Basic analysis such as characterization of products, panelists and sessions likewise consumers segmentation are available.
sensmediation Parametric Estimation and Sensitivity Analysis of Direct and Indirect Effects
We implement functions to estimate and perform sensitivity analysis to unobserved confounding of direct and indirect effects introduced in Lindmark, de Luna and Eriksson (2018) <doi:10.1002/sim.7620>. The estimation and sensitivity analysis are parametric, based on probit and/or linear regression models. Sensitivity analysis is implemented for unobserved confounding of the exposure-mediator, mediator-outcome and exposure-outcome relationships.
SensMixed Analysis of Sensory and Consumer Data in a Mixed Model Framework
Functions that facilitate analysis of Sensory as well as Consumer data within a mixed effects model framework are provided. The so-called mixed assessor models, that correct for the scaling effect are implemented. The generation of the d-tilde plots forms part of the package. The shiny application for the functionalities forms part of the package.
sensobol Computation of High-Order Sobol’ Sensitivity Indices
It allows to rapidly compute, bootstrap and plot up to third-order Sobol’ indices using the estimators by Saltelli et al. 2010 <doi:10.1016/j.cpc.2009.09.018> and Jansen 1999 <doi:10.1016/S0010-4655(98)00154-4>. The ‘sensobol’ package also implements the algorithm by Khorashadi Zadeh et al. 2017 <doi:10.1016/j.envsoft.2017.02.001 > to calculate the approximation error in the computation of Sobol’ first and total indices, an approach that allows to robustly screen influential from non-influential model inputs. Finally, it also provides functions to obtain publication-ready figures of the model output uncertainty and sensitivity-related analysis.
SensoMineR Sensory data analysis with R
an R package for analysing sensory data
sensors4plumes Test and Optimise Sampling Designs Based on Plume Simulations
Test sampling designs by several flexible cost functions, usually based on the simulations, and optimise sampling designs using different optimisation algorithms; load plume simulations (on lattice or points) even if they do not fit into memory.
senstrat Sensitivity Analysis for Stratified Observational Studies
Sensitivity analysis in unmatched observational studies, with or without strata. The main functions are sen2sample() and senstrat(). See Rosenbaum, P. R. and Krieger, A. M. (1990), JASA, 85, 493-498, <doi:10.1080/01621459.1990.10476226> and Gastwirth, Krieger and Rosenbaum (2000), JRSS-B, 62, 545-555 <doi:10.1111/1467-9868.00249> .
SentimentAnalysis Dictionary-Based Sentiment Analysis
Performs a sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as Harvard IV, or finance-specific dictionaries. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.
sentimentr Calculate Text Polarity Sentiment
Calculate text polarity sentiment at the sentence level and optionally aggregate by rows or grouping variable(s).
sentometrics An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Time series analysis based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in many ways. As described in Ardia et al. (2017) <https://ssrn.com/abstract=3067734>, the package provides a means to model the impact of sentiment in texts on a target variable, by first computing a wide range of textual sentiment measures and then selecting those that are most informative.
seoR SEO Related Analyses
Help SEOs (Search Engine Optimization) retrieve relevant informations from various APIs or websites. It is possible to scrape SEO-relevant parts of a website. So you are able to extract links, meta-tags, h-tags, and many more. The package also provides functions to scrape informations form search engines. Like indexed pages, number of results for a given keyword or complete search results. The third part of the package are the SEO-Tool APIs, that are connected. It’s possible to get Informations from ‘Whois’, ‘Google Pagespeed’ and many more direct in R.
seplyr Standard Evaluation Interfaces for Common ‘dplyr’ Tasks
The ‘seplyr’ (standard evaluation data.frame ‘dplyr’) package supplies standard evaluation adapter methods for important common ‘dplyr’ methods that currently have a non-standard programming interface. This allows the analyst to use ‘dplyr’ to perform fundamental data transformation steps such as arranging rows, grouping rows, aggregating selecting columns without having to use learn the details of ‘rlang’/’tidyeval’ non-standard evaluation and without continuing to rely on now deprecated ‘dplyr’ ‘underscore verbs.’ In addition the ‘seplyr’ package supplies several new ‘key operations bound together’ methods. These include ‘group_summarize()’ (which combines grouping, arranging and calculation in an atomic unit), ‘add_group_summaries()’ (which joins grouped summaries into a ‘data.frame’ in a well documented manner), and ‘add_group_indices()’ (which adds per-group identifies to a ‘data.frame’ without depending on row-order).
SeqAlloc Sequential Allocation for Prospective Experiments
Potential randomization schemes are prospectively evaluated when units are assigned to treatment arms upon entry into the experiment. The schemes are evaluated for balance on covariates and on predictability (i.e., how well could a site worker guess the treatment of the next unit enrolled).
seqHMM Hidden Markov Models for Life Sequences and Other Multivariate, Multichannel Categorical Time Series
Designed for fitting hidden (latent) Markov models and mixture hidden Markov models for social sequence data and other categorical time series. Also some more restricted versions of these type of models are available: Markov models, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models. The package provides functions for evaluating and comparing models, as well as functions for easy plotting of multichannel sequence data and hidden Markov models. Models are estimated using maximum likelihood via the EM algorithm and/or direct numerical maximization with analytical gradients. All main algorithms are written in C++ with support for parallel computation.
seqICP Sequential Invariant Causal Prediction
Contains an implementation of invariant causal prediction for sequential data. The main function in the package is ‘seqICP’, which performs linear sequential invariant causal prediction and has guaranteed type I error control. For non-linear dependencies the package also contains a non-linear method ‘seqICPnl’, which allows to input any regression procedure and performs tests based on a permutation approach that is only approximately correct. In order to test whether an individual set S is invariant the package contains the subroutines ‘seqICP.s’ and ‘seqICPnl.s’ corresponding to the respective main methods.
seqmon Group Sequential Design Class for Clinical Trials
S4 class object for creating and managing group sequential designs. It calculates the efficacy and futility boundaries at each look. It allows modifying the design and tracking the design update history.
seqtest Sequential Triangular Test
Sequential triangular test for the arithmetic mean in one- and two- samples, proportions in one- and two-samples, and the Pearson’s correlation coefficient.
SequentialDesign Observational Database Study Planning using Exact Sequential Analysis for Poisson and Binomial Data
Functions to be used in conjunction with the ‘Sequential’ package that allows for planning of observational database studies that will be analyzed with exact sequential analysis. This package supports Poisson- and binomial-based data. The primary function, seq_wrapper(…), accepts parameters for simulation of a simple exposure pattern and for the ‘Sequential’ package setup and analysis functions. The exposure matrix is used to simulate the true and false positive and negative populations (Green (1983) <doi:10.1093/oxfordjournals.aje.a113521>, Brenner (1993) <doi:10.1093/oxfordjournals.aje.a116805>). Functions are then run from the ‘Sequential’ package on these populations, which allows for the exploration of outcome misclassification in data.
sergeant Tools to Transform and Query Data with ‘Apache’ ‘Drill’
Apache Drill’ is a low-latency distributed query engine designed to enable data exploration and ‘analytics’ on both relational and non-relational ‘datastores’, scaling to petabytes of data. Methods are provided that enable working with ‘Apache’ ‘Drill’ instances via the ‘REST’ ‘API’, ‘JDBC’ interface (optional), ‘DBI’ ‘methods’ and using ‘dplyr’/’dbplyr’ idioms.
serial The Serial Interface Package
Provides functionality for the use of the internal hardware RS232/RS422/RS485 or any other virtual serial interfaces of the computer.
serieslcb Lower Confidence Bounds for Binomial Series System
Calculate lower confidence bounds for binomial series system reliability. The R ‘shiny’ application, launched by launch_app(), weaves together a workflow of customized simulations and delta coverage calculations to output recommended lower confidence bound methods.
sessioninfo R Session Information
Query and print information about the current R session. It is similar to ‘utils::sessionInfo()’, but includes more information about packages, and where they were installed from.
setter Mutators that Work with Pipes
Mutators to set attributes of variables, that work well in a pipe (much like stats::setNames()).
SetTest Group Testing Procedures for Signal Detection and Goodness-of-Fit
It provides cumulative distribution function (CDF), quantile, p-value, statistical power calculator and random number generator for a collection of group-testing procedures, including the Higher Criticism tests, the one-sided Kolmogorov-Smirnov tests, the one-sided Berk-Jones tests, the one-sided phi-divergence tests, etc. The input are a group of p-values. The null hypothesis is that they are i.i.d. Uniform(0,1). In the context of signal detection, the null hypothesis means no signals. In the context of the goodness-of-fit testing, which contrasts a group of i.i.d. random variables to a given continuous distribution, the input p-values can be obtained by the CDF transformation. The null hypothesis means that these random variables follow the given distribution. For reference, see Hong Zhang, Jiashun Jin and Zheyang Wu. ‘Distributions and Statistical Power of Optimal Signal Detection Methods in Finite Samples’, submitted.
sf Simple Features for R
Support for simple features, a standardized way to encode spatial data, in R.
sfadv Advanced Methods for Stochastic Frontier Analysis
Stochastic frontier analysis with advanced methods. In particular, it applies the approach proposed by Latruffe et al. (2017) <DOI:10.1093/ajae/aaw077> to estimate a stochastic frontier with technical inefficiency effects when one input is endogenous.
sfc Substance Flow Computation
Provides a function sfc() to compute the substance flow with the input files — ‘data’ and ‘model’. If sample.size is set more than 1, uncertainty analysis will be executed while the distributions and parameters are supplied in the file ‘data’.
sfdct Constrained Triangulation for Simple Features
Build a constrained ‘Delaunay’ triangulation from simple features objects, applying constraints based on input line segments, and triangle properties including maximum area, minimum internal angle.
sFFLHD Sequential Full Factorial-Based Latin Hypercube Design
Gives design points from a sequential full factorial-based Latin hypercube design, as described in Duan, Ankenman, Sanchez, and Sanchez (2015, Technometrics, <doi:10.1080/00401706.2015.1108233>).
SFS Similarity-First Search Seriation Algorithm
An implementation of the Similarity-First Search algorithm (SFS), a combinatorial algorithm which can be used to solve the seriation problem and to recognize some structured weighted graphs. The SFS algorithm represents a generalization to weighted graphs of the graph search algorithm Lexicographic Breadth-First Search (Lex-BFS), a variant of Breadth-First Search. The SFS algorithm reduces to Lex-BFS when applied to binary matrices (or, equivalently, unweighted graphs). Hence this library can be also considered for Lex-BFS applications such as recognition of graph classes like chordal or unit interval graphs. In fact, the SFS seriation algorithm implemented in this package is a multisweep algorithm, which consists in repeating a finite number of SFS iterations (at most \eqn{n} sweeps for a matrix of size \eqn{n}). If the data matrix has a Robinsonian structure, then the ranking returned by the multistep SFS algorithm is a Robinson ordering of the input matrix. Otherwise the algorithm can be used as a heuristic to return a ranking partially satisfying the Robinson property.
SFtools Space Filling Based Tools for Data Mining
Contains space filling based tools for machine learning and data mining. Some functions offer several computational techniques and deal with the out of memory for large big data by using the ff package.
SGB Simplicial Generalized Beta Regression
Main properties and regression procedures using a generalization of the Dirichlet distribution called Simplicial Generalized Beta distribution. It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation. Graf, M. (2017, ISBN: 978-84-947240-0-8). See also the vignette enclosed in the package.
sgd Stochastic Gradient Descent for Scalable Estimation
A fast and flexible set of tools for large scale inference. It features many different stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.
sgee Stagewise Generalized Estimating Equations
Stagewise techniques implemented with Generalized Estimating Equations to handle individual, group, and bi-level selection.
SGL Fit a GLM (or Cox Model) with a Combination of Lasso and Group Lasso Regularization
Fit a regularized generalized linear model via penalized maximum likelihood. The model is fit for a path of values of the penalty parameter. Fits linear, logistic and Cox models.
sglg Fitting Semi-Parametric Generalized log-Gamma Regression Models
Set of tools to fit a linear multiple or semi-parametric regression models. Under this setup, the localization parameter of the response variable distribution is modeled by using linear multiple regression or semi-parametric functions, whose non-parametric components may be approximated by natural cubic spline or P-splines. The supported distribution for the model error is a generalized log-gamma distribution which includes the generalized extreme value distribution as an important special case.
sglOptim Generic Sparse Group Lasso Solver
Fast generic solver for sparse group lasso optimization problems. The loss (objective) function must be defined in a C++ module. The optimization problem is solved using a coordinate gradient descent algorithm. Convergence of the algorithm is established (see reference) and the algorithm is applicable to a broad class of loss functions. Use of parallel computing for cross validation and subsampling is supported through the ‘foreach’ and ‘doParallel’ packages. Development version is on GitHub, please report package issues on GitHub.
sgmcmc Stochastic Gradient Markov Chain Monte Carlo
Provides functions that performs popular stochastic gradient Markov chain Monte Carlo (SGMCMC) methods on user specified models. The required gradients are automatically calculated using ‘TensorFlow’ <https://…/>, an efficient library for numerical computation. This means only the log likelihood and log prior functions need to be specified. The methods implemented include stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian Monte Carlo (SGHMC), stochastic gradient Nose-Hoover thermostat (SGNHT) and their respective control variate versions for increased efficiency.
sgmodel Solves a Generic Stochastic Growth Model with a Representative Agent
It computes the solutions to a generic stochastic growth model for a given set of user supplied parameters. It includes the solutions to the model, plots of the solution, a summary of the features of the model, a function that covers different types of consumption preferences, and a function that computes the moments of a Markov process. Merton, Robert C (1971) <doi:10.1016/0022-0531(71)90038-X>, Tauchen, George (1986) <doi:10.1016/0165-1765(86)90168-0>, Wickham, Hadley (2009, ISBN:978-0-387-98140-6 ).
sGMRFmix Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection
An implementation of sparse Gaussian Markov random field mixtures presented by Ide et al. (2016) <doi:10.1109/ICDM.2016.0119>. It provides a novel anomaly detection method for multivariate noisy sensor data. It can automatically handle multiple operational modes. And it can also compute variable-wise anomaly scores.
sgPLS Sparse Group Partial Least Square Methods
The Sparse Group Partial Least Square package (sgPLS) provides sparse, group, and sparse group versions of partial least square regression models.
shades Simple Colour Manipulation
Functions for easily manipulating colours and creating colour scales.
shadow R Package for Geometric Shade Calculations
Functions for calculating (1) shade height at a single point, (2) shaded proportion of a building facade; (3) a polygonal layer of shade footprints on the ground; and (4) Sky View Factor value at a single point. Typical inputs include a polygonal layer of buildings outline along with the height of each building, sun azimuth and sun elevation. The package also provides functions for related preliminary calculations: converting polygons to line segments, finding segment azimuth, shifting segments by azimuth and distance, and constructing the footprint of a line of sight between an observer and the sun.
shadowtext Shadow Text Grob and Layer
Implement shadowtextGrob() for ‘grid’ and geom_shadowtext() layer for ‘ggplot2’. These functions create/draw text grob with background shadow.
shallot Random Partition Distribution Indexed by Pairwise Information
Implementations are provided for the models described in the paper D. B. Dahl, R. Day, J. Tsai (2017), ‘Random Partition Distribution Indexed by Pairwise Information,’ Journal of the American Statistical Association, accepted. The Ewens, Ewens-Pitman, Ewens attraction, Ewens-Pitman attraction, and ddCRP distributions are available for prior simulation. We hope in the future to add posterior simulation with a user-supplied likelihood. Supporting functions for partition estimation and plotting are also planned.
ShapeChange Change-Point Estimation using Shape-Restricted Splines
In a scatterplot where the response variable is Gaussian, Poisson or binomial, we consider the case in which the mean function is smooth with a change-point, which is a mode, an inflection point or a jump point. The main routine estimates the mean curve and the change-point as well using shape-restricted B-splines. An optional subroutine delivering a bootstrapping confidence interval for the change-point is incorporated in the main routine.
ShapePattern Tools for Analyzing Planar Shape and Associated Patterns
An evolving and growing collection of tools for the quantification, assessment, and comparison of planar shape and pattern. The current flagship functionality is in the spatial decomposition of planar shapes using ‘ShrinkShape’ to incrementally shrink shapes to extinction while computing area, perimeter, and number of parts at each iteration of shrinking. The spectra of results are returned in graphic and tabular formats. Additional utility tools for handling data are provided and this package will be added to as more tools are created, cleaned-up, and documented.
shapper Wrapper of Python Library ‘shap’
Provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874> The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package ‘shapper’ is a port of the Python library ‘shap’.
sharpeRratio Moment-Free Estimation of Sharpe Ratios
An efficient moment-free estimator of the Sharpe ratio, or signal-to-noise ratio, for heavy-tailed data (see <https://…/1505.01333> ).
SHELF Tools to Support the Sheffield Elicitation Framework (SHELF)
Implements various methods for eliciting a probability distribution for a single parameter from an expert or a group of experts. The expert provides a small number of probability or quantile judgements, corresponding to points on his or her cumulative distribution function. A range of parametric distributions can then be fitted and displayed, with feedback provided in the form of additional quantiles. A graphical interface for the roulette elicitation method is also provided. For multiple experts, a weighted linear pool can be calculated.
shiftR Fast Enrichment Analysis via Circular Permutations
Fast enrichment analysis for locally correlated statistics via circular permutations. The analysis can be performed at multiple significance thresholds for both primary and auxiliary data sets with with efficient correction for multiple testing.
shiny Web Application Framework for R
Makes it incredibly easy to build interactive web applications with R. Automatic ‘reactive’ binding between inputs and outputs and extensive pre-built widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
shiny.i18n Shiny Applications Internationalization
It provides easy internationalization of Shiny applications. It can be used as standalone translation package to translate reports, interactive visualizations or graphical elements as well.
shiny.router Basic Routing for Shiny Web Applications
The minimal router for your Shiny apps. It allows you to create dynamic web applications with real-time user interface and easily share URLs to pages within your Shiny apps.
shiny.semantic Semantic UI Support for Shiny
Creating a great user interface for your Shiny apps can be a hassle, especially if you want to work purely in R and don’t want to use, for instance HTML templates. This package adds support for a powerful UI library Semantic UI – <http://…/>. It also supports universal UI input binding that works with various DOM elements.
shinyaframe WebVR’ Data Visualizations with ‘RStudio Shiny’ and ‘Mozilla A-Frame’
Make R data available in Web-based virtual reality experiences for immersive, cross-platform data visualizations. Includes the ‘gg-aframe’ JavaScript package for a Grammar of Graphics declarative HTML syntax to create 3-dimensional data visualizations with ‘Mozilla A-Frame’ <https://aframe.io>.
shinyalert Easily Create Pretty Popup Messages (Modals) in ‘Shiny’
Easily create pretty popup messages (modals) in ‘Shiny’. A modal can contain text, images, OK/Cancel buttons, an input to get a response from the user, and many more customizable options.
shinyanimate Animation for ‘shiny’ Elements
An extension of ‘animate.css’ that allows user to easily add animations to any UI element in ‘shiny’ app using the elements id.
shinyBI Business Intelligence shinyApp
shinyBI is R package which delivers simple Business Intelligence platform as shiny application. User can load own dataset, perform pivot process utilizing the performance of data.table , plot a chart on pivot results utilizing interactivity of rCharts .
shinybootstrap2 Bootstrap 2 web components for use with Shiny
Provides Bootstrap 2 web components for use with the Shiny package. With versions of Shiny prior to 0.11, these Bootstrap 2 components were included as part of the package. Later versions of Shiny include Bootstrap 3, so the Bootstrap 2 components have been moved into this package for those uses who rely on features specific to Bootstrap 2.
shinybusy Busy Indicator for ‘Shiny’ Applications
Add a global indicator (spinner, progress bar, gif) in your ‘shiny’ applications to show the user that the server is busy.
shiny-chart-builder Shiny app for building charts with a point-and-click interface
This shiny app means to be a system for basic reporting in the style of most Business Intelligence tools, you can create a report, and then share it with the bookmark, without knowing any SQL or R. This package heavily relies on dplyr for database abstraction, it theoretically works with any dplyr-compatible database, but may require some tuning for some of the databases.
shinycssloaders Add CSS Loading Animations to ‘shiny’ Outputs
Create a lightweight Shiny wrapper for the css-loaders created by Luke Hass <https://…/css-loaders>. Wrapping a Shiny output will automatically show a loader when the output is (re)calculating.
shinycustomloader Custom Loader for Shiny Outputs
A custom css/html or gif/image file for the loading screen in R ‘shiny’. It also can use the marquee to have custom text loading screen.
shinydashboard Create Dashboards with ‘Shiny’
Create dashboards with ‘Shiny’. This package provides a theme on top of ‘Shiny’, making it easy to create attractive dashboards.
shinydashboardPlus Add some ‘AdminLTE2’ Components to ‘Shinydashboard’
Extend ‘shinydashboard’ with ‘AdminLTE2’ components. Customize boxes, add timelines and a lot more.
shinyDND Shiny Drag-n-Drop
Add functionality to create drag and drop div elements in shiny.
shinyEffects Customize Your Web Apps with Fancy Effects
Add fancy CSS effects to your ‘shinydashboards’ or ‘shiny’ apps. 100% compatible with ‘shinydashboardPlus’ and ‘bs4Dash’.
shinyEventLogger Logging Events in Shiny Apps
Logging framework dedicated for complex shiny apps. Different types of events can be logged (value of a variable, multi-line output of a function, result of a unit test, custom error, warning, or diagnostic message). Each event can be logged with a list of parameters that are event-specific, common for events within the same scope, session-specific, or app-wide. Logging can be done simultaneously to R console, browser JavaScript console, a file log, and a database (MongoDB). Log data can be further analyzed with the help of process-mining techniques from ‘bupaR’ package.
shinyFeedback Displays User Feedback Next to Shiny Inputs
Easily display user feedback next to Shiny inputs. The feedback message is displayed when the feedback condition evaluates to TRUE.
shinyHeatmaply Deploy ‘heatmaply’ using ‘shiny’
Access functionality of the ‘heatmaply’ package through ‘Shiny UI’.
shinyhelper Easily Add Markdown Help Files to ‘shiny’ Inputs and Outputs
Creates a lightweight way to add markdown helpfiles to ‘shiny’ apps, using modal dialog boxes, with no need to observe each help button separately.
shinyhttr Progress Bars for Downloads in ‘shiny’ Apps
Modifies the progress() function from ‘httr’ package to let it send output to progressBar() function from ‘shinyWidgets’ package. It is just a tweak at the original functions from ‘httr’ package to make it smooth for ‘shiny’ developers.
ShinyImage Image Manipulation, with an Emphasis on Journaling
Standard imaging operations, e.g. crop and contrast adjustment, but with ability to go back and forth through sequence of changes, with records being persistent. Optional Shiny interface. Useful to help with the research reproducibility problem, and as a teaching tool.
ShinyItemAnalysis Test and Item Analysis via Shiny
Interactive shiny application for analysis of educational tests and their items.
shinyjqui jQuery UI’ Interactions and Effects for Shiny
An extension to shiny that brings interactions and animation effects from ‘jQuery UI’ library.
shinyjs Perform Common JavaScript Operations in Shiny Apps using Plain R Code
Perform common JavaScript operations in Shiny applications without having to know any JavaScript. Many useful JavaScript functions are made available by shinyjs with a simple R interface so that you don’t have to write any JavaScript code. Even if you do know JavaScript, shinyjs can be used as convenience functions to avoid dealing with message passing and writing JavaScript code.
http://…/shinyjs-r-package
shinyKGode A Shiny User Interface of Time Warping for Improved Gradient Matching
Interactive shiny application to perform inference of non-linear differential equations via gradient matching. Three (Lotka-Volterra, Fitz hugh Nagumo, and Biopathway) pre-defined models are provided, and users can also load their own models (in the Systems Biology Markup Language format) into the application.
shinylogs Record Everything that Happens in a ‘Shiny’ Application
Track and record the use of applications and the user’s interactions with ‘Shiny’ inputs. Allow to save inputs clicked, output generated and eventually errors.
shinyLP Bootstrap Landing Home Pages for Shiny Applications
Provides functions that wrap HTML Bootstrap components code to enable the design and layout of informative landing home pages for Shiny applications. This can lead to a better user experience for the users and writing less HTML for the developer.
shinymaterial Implement Material Design in Shiny Applications
Allows shiny developers to incorporate UI elements based on Google’s Material design. See <https://…/> for more information.
shinyMatrix Shiny Matrix Input Field
Implements a custom matrix input field.
shinyrecap Shiny User Interface for Multiple Source Capture Recapture Models
Implements user interfaces for log-linear models, Bayesian model averaging and Bayesian Dirichlet process mixture models.
shinyShortcut Creates an Executable Shortcut for Shiny Applications
Provides function shinyShortcut() that, when given the base directory of a shiny application, will produce an executable file that runs the shiny app directly in the user’s default browser. Tested on both windows and unix machines. Inspired by and borrowing from <http://…/>.
shinystan Interactive Visual and Numerical Diagnostics and Posterior Analysis for for Bayesian Models
Most applied Bayesian data analysis requires employing a Markov chain Monte Carlo (MCMC) algorithm to obtain samples from the posterior distributions of the quantities of interest. Diagnosing convergence, checking the fit of the model, and producing graphical and numerical summaries of the parameters of interest is an essential but often laborious process that slows the down the creative and exploratory process of model building. The shinyStan package and Shiny app is designed to facilitate this process in two primary ways:
• Providing interactive visual model exploration: shinyStan provides immediate, informative, customizable visual and numerical summaries of model parameters and convergence diagnostics for MCMC simulations. Although shinyStan has some special features only available for users of the Rstan package (the R interface to the Stan programming language for Bayesian statistical inference), it can also easily be used to explore the output from any other program (e.g. Jags, Bugs, SAS) or any user-written MCMC algorithm.,
• Making saving and sharing more convenient: shinyStan allows you to store the basic components of an entire project (code, posterior samples, graphs, tables, notes) in a single object. Users can also export graphics into their R sessions as ggplot2 objects for further customization and easy integration in reports or post-processing for publication.
GitHub
shinytest Test Shiny Apps
For automated testing of Shiny applications, using a headless browser, driven through ‘WebDriver’.
ShinyTester Functions to Minimize Bonehead Moves While Working with ‘shiny’
It’s my experience that working with ‘shiny’ is intuitive once you’re into it, but can be quite daunting at first. Several common mistakes are fairly predictable, and therefore we can control for these. The functions in this package help match up the assets listed in the UI and the SERVER files, and Visualize the ad hoc structure of the ‘shiny’ App.
shinyTime A Time Input Widget for Shiny
Provides a time input widget for Shiny. This widget allows intuitive time input in the ‘[hh]:[mm]:[ss]’ (24H) format by using a separate numeric input for each part of the time. The interface with R uses ‘DateTimeClasses’ objects. See the project page for more information and examples.
shinytoastr Notifications from ‘Shiny’
Browser notifications in ‘Shiny’ apps, using ‘toastr’: <https://…/toastr#readme>.
shinyTree jsTree Bindings for Shiny
Exposes bindings to jsTree – a JavaScript library that supports interactive trees – to enable a rich, editable trees in Shiny.
shinyWidgets Custom Inputs Widgets for Shiny
Custom inputs widgets to use in Shiny applications.
SHLR Shared Haplotype Length Regression
A statistical method designed to take advantage of population genetics and microevolutionary theory, specifically by testing the association between haplotype sharing length and trait of interest.
shock Slope Heuristic for Block-Diagonal Covariance Selection in High Dimensional Gaussian Graphical Models
Block-diagonal covariance selection for high dimensional Gaussian graphical models. The selection procedure is based on the slope heuristics.
shodan R package to work with the Shodan API
shodan is an R package interface to the Shodan API.
The New and Improved R Shodan Package
ShortForm Automatic Short Form Creation
Performs automatic creation of short forms of scales with an ant colony optimization algorithm. As implemented in the package, the algorithm randomly selects items to build a model of a specified length, then updates the probability of item selection according to the fit of the best model within each set of searches. The algorithm continues until the same items are selected by multiple ants a given number of times in a row. See Leite, Huang, & Marcoulides (2008) <doi:10.1080/00273170802285743> for an applied example.
showimage Show an Image on an ‘R’ Graphics Device
Sometimes it is handy to be able to view an image file on an ‘R’ graphics device. This package just does that. Currently it supports ‘PNG’ files.
SHT Statistical Hypothesis Testing Toolbox
We provide a collection of statistical hypothesis testing procedures ranging from classical to modern methods for non-trivial settings such as high-dimensional scenario. For the general treatment of statistical hypothesis testing, see the book by Lehmann and Romano (2005) <doi:10.1007/0-387-27605-X>.
shuffleCI Confidence Intervals Compared via Shuffling
Scripts and exercises that use card shuffling to teach confidence interval comparisons for different estimators.
SI Stochastic Integrating
An implementation of four stochastic methods of integrating in R, including: 1. Stochastic Point Method (or Monte Carlo Method); 2. Mean Value Method; 3. Important Sampling Method; 4. Stratified Sampling Method. It can be used to estimate one-dimension or multi-dimension integration by Monte Carlo methods. And the estimated variance (precision) is given. Reference: Caflisch, R. E. (1998) <doi:10.1017/S0962492900002804>.
sicegar Analysis of Single-Cell Viral Growth Curves
Classifies time course fluorescence data of viral growth. The package categorize time course data into one of four categories, ‘ambiguous’, ‘no signal’, ‘sigmoidal’, and ‘double sigmoidal’ by fitting a series of mathematical models to the data. The origin of the package name came from ‘SIngle CEll Growth Analysis in R’.
SID Structural Intervention Distance
The code computes the structural intervention distance (SID) between a true directed acyclic graph (DAG) and an estimated DAG. Definition and details about the implementation can be found in J. Peters and P. Bühlmann: “Structural intervention distance (SID) for evaluating causal graphs”, Neural Computation 27, pages 771-799, 2015.
SiER Signal Extraction Approach for Sparse Multivariate Response Regression
Methods for regression with high-dimensional predictors and univariate or maltivariate response variables. It considers the decomposition of the coefficient matrix that leads to the best approximation to the signal part in the response given any rank, and estimates the decomposition by solving a penalized generalized eigenvalue problem followed by a least squares procedure. Ruiyan Luo and Xin Qi (2017) <doi:10.1016/j.jmva.2016.09.005>.
sievePH Sieve Analysis Methods for Proportional Hazards Models
Implements semiparametric estimation and testing procedures for a continuous, possibly multivariate, mark-specific hazard ratio (treatment/placebo) of an event of interest in a randomized treatment efficacy trial with a time-to-event endpoint, as described in Juraska M and Gilbert PB (2013), Mark-specific hazard ratio model with multivariate continuous marks: an application to vaccine efficacy. Biometrics 69(2):328 337, and in Juraska M and Gilbert PB (2015), Mark-specific hazard ratio model with missing multivariate marks. Lifetime Data Analysis 22(4): 606-25. The former considers continuous multivariate marks fully observed in all subjects who experience the event of interest, whereas the latter extends the previous work to allow multivariate marks that are subject to missingness-at-random. For models with missing marks, two estimators are implemented based on (i) inverse probability weighting (IPW) of complete cases, and (ii) augmentation of the IPW estimating functions by leveraging correlations between the mark and auxiliary data to ‘impute’ the expected profile score vectors for subjects with missing marks. The augmented IPW estimator is doubly robust and recommended for use with incomplete mark data. The methods make two key assumptions: (i) the time-to-event is assumed to be conditionally independent of the mark given treatment, and (ii) the weight function in the semiparametric density ratio/biased sampling model is assumed to be exponential. Diagnostic testing procedures for evaluating validity of both assumptions are implemented. Summary and plotting functions are provided for estimation and inferential results.
sigmajs Interface to ‘Sigma.js’ Graph Visualization Library
Interface to ‘sigma.js’ graph visualization library including animations, plugins and shiny proxies.
sigmaNet Render Graphs Using ‘Sigma.js’
Create interactive graph visualizations using ‘Sigma.js’ <http://…/>. This package is meant to be used in conjunction with ‘igraph’, replacing the (somewhat underwhelming) plotting features of the package. The idea is to quickly render graphs, regardless of their size, in a way that allows for easy, iterative modification of aesthetics. Because ‘Sigma.js’ is a ‘javascript’ library, the visualizations are inherently interactive and are well suited for integration with ‘Shiny’ apps. While there are several ‘htmlwidgets’ focused on network visualization, they tend to underperform on medium to large sized graphs. ‘Sigma.js’ was designed for larger network visualizations and this package aims to make those strengths available to ‘R’ users.
sigmoid Sigmoid Functions for Machine Learning
Several different sigmoid functions are implemented, including a wrapper function, SoftMax preprocessing and inverse functions.
SignifReg Significant Variable Selection in Linear Regression
Provide a significant variable selection procedure with different directions (forward, backward, stepwise) based on diverse criteria (Mallows’ Cp, AIC, BIC, adjusted r-square, p-value). The algorithm selects a final model with only significant variables based on a correction choice of False Discovery Rate, Bonferroni, or no correction.
sigora Signature Overrepresentation Analysis
Pathway Analysis is the process of statistically linking observations on the molecular level to biological processes or pathways on the systems (organism, organ, tissue, cell) level. Traditionally, pathway analysis methods regard pathways as collections of single genes and treat all genes in a pathway as equally informative. This can lead to identification of spurious (misleading) pathways as statistically significant, since components are often shared amongst pathways. SIGORA seeks to avoid this pitfall by focusing on genes or gene-pairs that are (as a combination) specific to a single pathway. In relying on such pathway gene-pair signatures (Pathway-GPS), SIGORA inherently uses the status of other genes in the experimental context to identify the most relevant pathways. The current version allows for pathway analysis of human and mouse data sets and contains pre-computed Pathway-GPS data for pathways in the KEGG and Reactome pathway repositories as well as mechanisms for extracting GPS for user supplied repositories.
sigr Format Significance Summaries for Reports
Succinctly format significance summaries of various models and tests. The main purpose is unified reporting and planning of experimental results, working around issue such as the difficulty of extracting model summary facts (such as with ‘lm’/’glm’). This package also includes empirical tests, such as bootstrap estimates.
siland Spatial Influence of Landscape
Method to estimate the spatial influence scales of landscape variables on a response variable. The method is based on Chandler and Hepinstall-Cymerman (2016) Estimating the spatial scales of landscape effects on abundance, Landscape ecology, 31: 1383-1394, <doi:10.1007/s10980-016-0380-z>.
Sim.PLFN Simulation of Piecewise Linear Fuzzy Numbers
The definition of fuzzy random variable and the methods of simulation from fuzzy random variables are two challenging statistical problems in three recent decades. This package is organized based on a special definition of fuzzy random variable and simulate fuzzy random variable by Piecewise Linear Fuzzy Numbers (PLFNs); see Coroianua et al. (2013) <doi:10.1016/j.fss.2013.02.005> for details about PLFNs. Some important statistical functions are considered for obtaining the membership function of main statistics, such as mean, variance, summation, standard deviation and coefficient of variance. Some of applied advantages of ‘Sim.PLFN’ package are: (1) Easily generating / simulation a random sample of PLFN, (2) drawing the membership functions of the simulated PLFNs or the membership function of the statistical result, and (3) Considering the simulated PLFNs for arithmetic operation or importing into some statistical computation. Finally, it must be mentioned that ‘Sim.PLFN’ package works on the basis of ‘FuzzyNumbers’ package.
simcausal Simulating Longitudinal Data with Causal Inference Applications
A flexible tool for simulating complex longitudinal data using structural equations, with emphasis on problems in causal inference. Specify interventions and simulate from intervened data generating distributions. Define and evaluate treatment-specific means, the average treatment effects and coefficients from working marginal structural models. User interface designed to facilitate the conduct of transparent and reproducible simulation studies, and allows concise expression of complex functional dependencies for a large number of time-varying nodes.
simcdm Simulate Cognitive Diagnostic Model (CDM) Data
Provides efficient R and ‘C++’ routines to simulate cognitive diagnostic model data for Deterministic Input, Noisy ‘And’ Gate (DINA) and reduced Reparameterized Unified Model (rRUM) from Culpepper and Hudson (2017) <doi: 10.1177/0146621617707511>, Culpepper (2015) <doi:10.3102/1076998615595403>, and de la Torre (2009) <doi:10.3102/1076998607309474>.
SimCop Simulate from Arbitrary Copulae
Provides a framework to generating random variates from arbitrary multivariate copulae, while concentrating on (bivariate) extreme value copulae. Particularly useful if the multivariate copulae are not available in closed form.
SimCorrMix Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions
Generate continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count (regular or zero-inflated, Poisson or Negative Binomial) variables with a specified correlation matrix, or one continuous variable with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets (i.e., plasmodes, as in Vaughan et al., 2009 <DOI:10.1016/j.csda.2008.02.032>). The methods extend those found in the ‘SimMultiCorrData’ R package. Standard normal variables with an imposed intermediate correlation matrix are transformed to generate the desired distributions. Continuous variables are simulated using either Fleishman (1978)’s third order <DOI:10.1007/BF02293811> or Headrick (2002)’s fifth order <DOI:10.1016/S0167-9473(02)00072-5> polynomial transformation method (the power method transformation, PMT). Non-mixture distributions require the user to specify mean, variance, skewness, standardized kurtosis, and standardized fifth and sixth cumulants. Mixture distributions require these inputs for the component distributions plus the mixing probabilities. Simulation occurs at the component level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. However, the package provides functions to approximate expected correlations with continuous mixture variables given target correlations with the components. Binary and ordinal variables are simulated using a modification of ordsample() in package ‘GenOrd’. Count variables are simulated using the inverse CDF method. There are two simulation pathways which calculate intermediate correlations involving count variables differently. Correlation Method 1 adapts Yahav and Shmueli’s 2012 method <DOI:10.1002/asmb.901> and performs best with large count variable means and positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and Ferrari’s 2015 modification of the ‘GenOrd’ package <DOI:10.1002/asmb.2072> and performs best under the opposite scenarios. The optional error loop may be used to improve the accuracy of the final correlation matrix. The package also contains functions to calculate the standardized cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables.
SimDesign Structure for Organizing Monte Carlo Simulation Designs
Provides tools to help organize Monte Carlo simulations in R. The tools provided control the structure and back-end of the Monte Carlo simulations by utilizing a generate-analyse-summarise strategy. The functions control common simulation issues such as re-simulating non-convergent results, support parallel back-end computations, save and restore temporary files, aggregate results across independent nodes, and provide native support for debugging.
simdistr Assessment of Data Trial Distributions According to the Carlisle-Stouffer Method
Assessment of the distributions of baseline continuous and categorical variables in randomised trials. This method is based on the Carlisle-Stouffer method with Monte Carlo simulations. It calculates p-values for each trial baseline variable, as well as combined p-values for each trial – these p-values measure how compatible are distributions of trials baseline variables with random sampling. This package also allows for graphically plotting the cumulative frequencies of computed p-values. Please note that code was partly adapted from Carlisle JB, Loadsman JA. (2017) <doi:10.1111/anae.13650>.
simEd Simulation Education
Contains various functions to be used for simulation education, including queueing simulation functions, variate generation functions capable of producing independent streams and antithetic variates, functions for illustrating random variate generation for various discrete and continuous distributions, and functions to compute time-persistent statistics. Also contains two queueing data sets (one fabricated, one real-world) to facilitate input modeling.
simest Constrained Single Index Model Estimation
Estimation of function and index vector in single index model with and without shape constraints including different smoothness conditions.
simglm Simulate Models Based on the Generalized Linear Model
Easily simulates regression models, including both simple regression and generalized linear mixed models with up to three level of nesting. Power simulations that are flexible allowing the specification of missing data, unbalanced designs, and different random error distributions are built into the package.
SimHaz Simulated Survival and Hazard Analysis for Time-Dependent Exposure
Generate power for the Cox proportional hazards model by simulating survival events data with time dependent exposure status for subjects. A dichotomous exposure variable is considered with a single transition from unexposed to exposed status during the subject’s time on study.
SimilaR R Source Code Similarity Evaluation
An Implementation of a novel method to determine similarity of R functions based on program dependence graphs, see Bartoszuk, Gagolewski (2017) <doi:10.1109/FUZZ-IEEE.2017.8015582>. Possible use cases include plagiarism detection among students’ homework assignments.
SimilarityMeasures Trajectory Similarity Measures
Functions to run and assist four different similarity measures. The similarity measures included are: longest common subsequence (LCSS), Frechet distance, edit distance and dynamic time warping (DTW). Each of these similarity measures can be calculated from two n-dimensional trajectories, both in matrix form.
similr Text Similarity
Using brute-force string comparator algorithms, this package facilitates finding a particular string’s closest match amongst a target vector of strings.
SimInf A Framework for Stochastic Disease Spread Simulations
Livestock movements are important for the spread of many infectious diseases between herds. The package provides an efficient and flexible framework for stochastic disease spread modelling that integrates within-herd disease dynamics as continuous-time Markov chains and livestock movements between herds as scheduled events. The core simulation solver is implemented in C and uses ‘OpenMP’ (if available) to divide work over multiple processors. The package contains template models and can be extended with user defined models.
simIReff Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores
Provides tools for the stochastic simulation of effectiveness scores to mitigate data-related limitations of Information Retrieval evaluation research, as described in Urbano and Nagler (2018) <doi:10.1145/3209978.3210043>. These tools include: fitting, selection and plotting distributions to model system effectiveness, transformation towards a prespecified expected value, proxy to fitting of copula models based on these distributions, and simulation of new evaluation data from these distributions and copula models.
simLife Simulation of Fatigue Lifetimes
Provides methods for simulation and analysis of a very general fatigue lifetime model for (metal matrix) composite materials.
simmer Discrete Event Simulation for R
simmer is a discrete event package for the R language. It is developed with my own specific requirements for simulating day-to-day hospital proceses and thus might not be suited for everyone. It is designed to be as simple to use as possible and tries to be compatible with the chaining/piping workflow introduced by the magrittr package.
GitHub
simmer.bricks Helper Methods for ‘simmer’ Trajectories
Provides wrappers for common activity patterns in ‘simmer’ trajectories.
simmer.plot Plotting Methods for ‘simmer’
A set of plotting methods for ‘simmer’ trajectories and simulations.
simml Single-Index Models with Multiple-Links
A major challenge in estimating treatment decision rules from a randomized clinical trial dataset with covariates measured at baseline lies in detecting relatively small treatment effect modification-related variability (i.e., the treatment-by-covariates interaction effects on treatment outcomes) against a relatively large non-treatment-related variability (i.e., the main effects of covariates on treatment outcomes). The class of Single-Index Models with Multiple-Links is a novel single-index model specifically designed to estimate a single-index (a linear combination) of the covariates associated with the treatment effect modification-related variability, while allowing a nonlinear association with the treatment outcomes via flexible link functions. The models provide a flexible regression approach to developing treatment decision rules based on patients’ data measured at baseline. We refer to Petkova, Tarpey, Su, and Ogden (2017) <doi: 10.1093/biostatistics/kxw035> and ‘A constrained single-index model for estimating interactions between a treatment and covariates’ (under review, 2019) for detail. The main function of this package is simml().
SimMultiCorrData Simulation of Correlated Data with Multiple Variable Types
Generate continuous (normal or non-normal), binary, ordinal, and count (Poisson or Negative Binomial) variables with a specified correlation matrix. It can also produce a single continuous variable. This package can be used to simulate data sets that mimic real-world situations (i.e. clinical data sets, plasmodes). All variables are generated from standard normal variables with an imposed intermediate correlation matrix. Continuous variables are simulated by specifying mean, variance, skewness, standardized kurtosis, and fifth and sixth standardized cumulants using either Fleishman’s Third-Order (<DOI:10.1007/BF02293811>) or Headrick’s Fifth-Order (<DOI:10.1016/S0167-9473(02)00072-5>) Polynomial Transformation. Binary and ordinal variables are simulated using a modification of GenOrd’s ordsample function. Count variables are simulated using the inverse cdf method. There are two simulation pathways which differ primarily according to the calculation of the intermediate correlation matrix. In Method 1, the intercorrelations involving count variables are determined using a simulation based, logarithmic correlation correction (adapting Yahav and Shmueli’s 2012 method, <DOI:10.1002/asmb.901>). In Method 2, the count variables are treated as ordinal (adapting Barbiero and Ferrari’s 2015 modification of GenOrd, <DOI:10.1002/asmb.2072>). There is an optional error loop that corrects the final correlation matrix to be within a user-specified precision value of the target matrix. The package also includes functions to calculate standardized cumulants for theoretical distributions or from real data sets, check if a target correlation matrix is within the possible correlation bounds (given the distributions of the simulated variables), summarize results (numerically or graphically), to verify valid power method pdfs, and to calculate lower standardized kurtosis bounds.
simode Statistical Inference for Systems of Ordinary Differential Equations using Separable Integral-Matching
Implements statistical inference for systems of ordinary differential equations, that uses the integral-matching criterion and takes advantage of the separability of parameters, in order to obtain initial parameter estimates for nonlinear least squares optimization. Dattner & Yaari (2018) <arXiv:1807.04202>. Dattner et al. (2017) <doi:10.1098/rsif.2016.0525>. Dattner & Klaassen (2015) <doi:10.1214/15-EJS1053>.
simone Statistical Inference for MOdular NEtworks (SIMoNe)
Implements the inference of co-expression networks based on partial correlation coefficients from either steady-state or time-course transcriptomic data. Note that with both type of data this package can deal with samples collected in different experimental conditions and therefore not identically distributed. In this particular case, multiple but related networks are inferred on one simone run.
simPATHy Simulation Model of Expression Data for Pathway Analysis
Simulate data from a Gaussian graphical model or a Gaussian Bayesian network in two conditions. Given a covariance matrix of a reference condition simulate plausible dysregulations.
simpleCache Simply Caching R Objects
Provides intuitive functions for caching R objects, encouraging reproducible, restartable, and distributed R analysis. The user selects a location to store caches, and then provides nothing more than a cache name and instructions (R code) for how to produce the R object. Also provides some advanced options like environment assignments, recreating or reloading caches, and cluster compute bindings (using the ‘batchtools’ package) making it flexible enough for use in large-scale data analysis projects.
simplegraph Simple Graph Data Types and Basic Algorithms
Simple classic graph algorithms for simple graph classes. Graphs may possess vertex and edge attributes. ‘simplegraph’ has so dependencies and it is written entirely in R, so it is easy to install.
simpleRCache Simple R Cache
Simple result caching in R based on Henrik Bengtsson’s R.cache. The global environment is not considered when caching results simplifying moving files between multiple instances of R. Relies on more base functions than R.cache (e.g. cached results are saved using saveRDS() and readRDS()).
simpleroptions Easily Manage Options Files for your Packages and Scripts
A framework to easily setup and maintain options files in your R packages and scripts.
simpleSetup Set Up R Source Code Files for Use on Multiple Machines
When working across multiple machines and, similarly for reproducible research, it can be time consuming to ensure that you have all of the needed packages installed and loaded and that the correct working directory is set. ‘simpleSetup’ provides simple functions for making these tasks more straightforward.
simplr Basic Symbolic Expression Simplification
Basic tools for symbolic expression simplification, e.g. simplify(x*1) => x, or simplify(sin(x)^2+cos(x)^2) => 1. Based on the ‘Expression v3’ (Ev3) 1.0 system by Leo Liberti.
simputation Simple Imputation
Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the ‘magrittr’ package.
simr Power Analysis for Generalised Linear Mixed Models by Simulation
Calculate power for generalised linear mixed models, using simulation. Designed to work with models fit using the ‘lme4’ package.
SimRepeat Simulation of Correlated Systems of Equations with Multiple Variable Types
Generate correlated systems of statistical equations which represent repeated measurements or clustered data. These systems contain either: a) continuous normal, non-normal, and mixture variables based on the techniques of Headrick and Beasley (2004) <DOI:10.1081/SAC-120028431> or b) continuous (normal, non-normal and mixture), ordinal, and count (regular or zero-inflated, Poisson and Negative Binomial) variables based on the hierarchical linear models (HLM) approach. Headrick and Beasley’s method for continuous variables calculates the beta (slope) coefficients based on the target correlations between independent variables and between outcomes and independent variables. The package provides functions to calculate the expected correlations between outcomes, between outcomes and error terms, and between outcomes and independent variables, extending Headrick and Beasley’s equations to include mixture variables. These theoretical values can be compared to the simulated correlations. The HLM approach requires specification of the beta coefficients, but permits group and subject-level independent variables, interactions among independent variables, and fixed and random effects, providing more flexibility in the system of equations. Both methods permit simulation of data sets that mimic real-world clinical or genetic data sets (i.e. plasmodes, as in Vaughan et al., 2009, <10.1016/j.csda.2008.02.032>). The techniques extend those found in the ‘SimMultiCorrData’ and ‘SimCorrMix’ packages. Standard normal variables with an imposed intermediate correlation matrix are transformed to generate the desired distributions. Continuous variables are simulated using either Fleishman’s third-order (<DOI:10.1007/BF02293811>) or Headrick’s fifth-order (<DOI:10.1016/S0167-9473(02)00072-5>) power method transformation (PMT). Simulation occurs at the component-level for continuous mixture distributions. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. The target correlation matrices are specified in terms of correlations with components of continuous mixture variables. Binary and ordinal variables are simulated by discretizing the normal variables at quantiles defined by the marginal distributions. Count variables are simulated using the inverse CDF method. There are two simulation pathways for the multi-variable type systems which differ by intermediate correlations involving count variables. Correlation Method 1 adapts Yahav and Shmueli’s 2012 method <DOI:10.1002/asmb.901> and performs best with large count variable means and positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and Ferrari’s 2015 modification of the ‘GenOrd’ package <DOI:10.1002/asmb.2072> and performs best under the opposite scenarios. There are three methods available for correcting non-positive definite correlation matrices. The optional error loop may be used to improve the accuracy of the final correlation matrices. The package also provides function to check parameter inputs and summarize the simulated systems of equations.
SimSCRPiecewise Simulates Univariate and Semi-Competing Risks Data Given Covariates and Piecewise Exponential Baseline Hazards’
Contains two functions for simulating survival data from piecewise exponential hazards with a proportional hazards adjustment for covariates. The first function SimUNIVPiecewise simulates univariate survival data based on a piecewise exponential hazard, covariate matrix and true regression vector. The second function SimSCRPiecewise semi-competing risks data based on three piecewise exponential hazards, three true regression vectors and three matrices of patient covariates (which can be different or the same). This simulates from the Semi-Markov model of Lee et al (2015) given patient covariates, regression parameters, patient frailties and baseline hazard functions.
simstandard Generate Standardized Data
Creates simulated data from structural equation models with standardized loading.
simstudy Simulation of Study Data
Simulates data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).
simsurv Simulate Survival Data
Simulate survival times from standard parametric survival distributions (exponential, Weibull, Gompertz), 2-component mixture distributions, or a user-defined hazard or log hazard function. Baseline covariates can be included under a proportional hazards assumption. Time dependent effects (i.e. non-proportional hazards) can be included by interacting covariates with linear time or some transformation of time. The 2-component mixture distributions can allow for a variety of flexible baseline hazard functions. If the user wishes to provide a user-defined hazard or log hazard function then this is also possible, and the resulting cumulative hazard function does not need to have a closed-form solution. Note that this package is modelled on the ‘survsim’ package available in the ‘Stata’ software (see Crowther and Lambert (2012) <http://…/sjpdf.html?articlenum=st0275> or Crowther and Lambert (2013) <doi:10.1002/sim.5823>).
simtimer Datetimes as Integers for Discrete-Event Simulations
Handles datetimes as integers for the usage inside Discrete-Event Simulations (DES). The conversion is made using the internally generic function as.numeric() of the base package. DES is described in Simulation Modeling and Analysis by Averill Law and David Kelton (1999) <doi:10.2307/2288169>.
SimTimeVar Simulate Longitudinal Dataset with Time-Varying Correlated Covariates
Flexibly simulates a dataset with time-varying covariates with user-specified exchangeable correlation structures across and within clusters. Covariates can be normal or binary and can be static within a cluster or time-varying. Time-varying normal variables can optionally have linear trajectories within each cluster. See ?make_one_dataset for the main wrapper function. See Montez-Rath et al. <arXiv:1709.10074> for methodological details.
simTool Conduct Simulation Studies with a Minimal Amount of Source Code
Tool for statistical simulations that have two components. One component generates the data and the other one analyzes the data. The main aims of the package are the reduction of the administrative source code (mainly loops and management code for the results) and a simple applicability of the package that allows the user to quickly learn how to work with it. Parallel computing is also supported. Finally, convenient functions are provided to summarize the simulation results.
simukde Simulation with Kernel Density Estimation
Generates random values from a univariate and multivariate continuous distribution by using kernel density estimation based on a sample. Duong (2017) <doi:10.18637/jss.v021.i07>, Christian P. Robert and George Casella (2010 ISBN:978-1-4419-1575-7) <doi:10.1007/978-1-4419-1576-4>.
simulator An Engine for Running Simulations
A framework for performing simulations such as those common in methodological statistics papers. The design principles of this package are described in greater depth in Bien, J. (2016) ‘The simulator: An Engine to Streamline Simulations,’ which is available at <http://…/simulator.pdf>.
simule A Constrained L1 Minimization Approach for Estimating Multiple Sparse Gaussian or Nonparanormal Graphical Models
The SIMULE (Shared and Individual parts of MULtiple graphs Explicitly) is a generalized method for estimating multiple related graphs with shared and individual pattern among graphs. For more details, please see <arXiv:1605.03468>.
sinaplot An Enhanced Chart for Simple and Truthful Representation of Single Observations over Multiple Classes
The sinaplot is a data visualization chart suitable for plotting any single variable in a multiclass dataset. It is an enhanced jitter strip chart, where the width of the jitter is controlled by the density distribution of the data within each class.
sindyr Sparse Identification of Nonlinear Dynamics
This implements the Brunton et al (2016; PNAS <doi:10.1073/pnas.1517384113>) sparse identification algorithm for finding ordinary differential equations for a measured system from raw data (SINDy). The package includes a set of additional tools for working with raw data, with an emphasis on cognitive science applications (Dale and Bhat, in press <doi:10.1016/j.cogsys.2018.06.020>).
sinew Create ‘roxygen2’ Skeleton with Information from Function Script
Create ‘roxygen2’ skeleton populated with information scraped from the within the function script. Also creates field entries for imports in the ‘DESCRIPTION’ and import in the ‘NAMESPACE’ files. Can be run from the R console or through the ‘RStudio’ ‘addin’ menu.
sinib Sum of Independent Non-Identical Binomial Random Variables
Density, distribution function, quantile function and random generation for the sum of independent non-identical binomial distribution with parameters \code{size} and \code{prob}.
SinIW The SinIW Distribution
Density, distribution function, quantile function, random generation and survival function for the Sine Inverse Weibull Distribution as defined by SOUZA, L. New Trigonometric Class of Probabilistic Distributions. 219 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2015 (available at <http://…obabilistic-distributions-602633.html> ) and BRITO, C. C. R. Method Distributions generator and Probability Distributions Classes. 241 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2014 (available upon request).
SIRE Finding Feedback Effects in SEM and Testing for Their Significance
Provides two main functionalities. 1 – Given a system of simultaneous equation, it decomposes the matrix of coefficients weighting the endogenous variables into three submatrices: one includes the subset of coefficients that have a causal nature in the model, two include the subset of coefficients that have a interdependent nature in the model, either at systematic level or induced by the correlation between error terms. 2 – Given a decomposed model, it tests for the significance of the interdependent relationships acting in the system, via Maximum likelihood and Wald test, which can be built starting from the function output. For theoretical reference see Faliva (1992) <doi:10.1007/BF02589085> and Faliva and Zoia (1994) <doi:10.1007/BF02589041>.
sisal Sequential Input Selection Algorithm
Implements the SISAL algorithm by Tikka and Hollmén. It is a sequential backward selection algorithm which uses a linear model in a cross-validation setting. Starting from the full model, one variable at a time is removed based on the regression coefficients. From this set of models, a parsimonious (sparse) model is found by choosing the model with the smallest number of variables among those models where the validation error is smaller than a threshold. Also implements extensions which explore larger parts of the search space and/or use ridge regression instead of ordinary least squares.
SISIR Sparse Interval Sliced Inverse Regression
An interval fusion procedure for functional data in the semiparametric framework of SIR. Standard ridge and sparse SIR are also included in the package.
sitmo Parallel Pseudo Random Number Generator (PPRNG) ‘sitmo’ Header Files
Provided within is a high quality and fast PPRNG that is able to be used in an ‘OpenMP’ parallel environment compiled under either C++98 or C++11. The objective of this package release is to consolidate the distribution of the ‘sitmo’ library on CRAN by enabling others to link to the ‘sitmo’ header file instead of including a copy of ‘sitmo’ within their individual package. Lastly, the package contains example implementations using ‘sitmo’ and two accompanying vignette that provide additional information.
sitree Single Tree Simulator
Forecasts plots at tree level.
sitreeE Sitree Extensions
Provides extensions for package ‘sitree’ for allometric variables, growth, mortality, recruitment, management, tree removal and external modifiers functions.
sivipm Sensitivity Indices with Dependent Inputs
Sensitivity indices with dependent correlated inputs, using a method based on PLS regression.
SixSigma Six Sigma Tools for Quality Control and Improvement
Functions and utilities to perform Statistical Analyses in the Six Sigma way. Through the DMAIC cycle (Define, Measure, Analyze, Improve, Control), you can manage several Quality Management studies: Gage R&R, Capability Analysis, Control Charts, Loss Function Analysis, etc. Data frames used in the books ‘Six Sigma with R’ (Springer, 2012) and ‘Quality Control with R’ (Springer, 2015) are also included in the package.
sjlabelled Labelled Data Utility Functions
Collection of functions to work with labelled data to read and write data between R and other statistical software packages like ‘SPSS’, ‘SAS’ or ‘Stata’, and to work with labelled data. This includes easy ways to get, set or change value and variable label attributes, to convert labelled vectors into factors or numeric (and vice versa), or to deal with multiple declared missing values.
sjmisc Miscellaneous Data Management Tools
Collection of several utility functions for reading or writing data, recoding and labelling variables and some frequently used statistical tests.
https://…age-for-working-with-labelled-data-rstats
sjPlot Data Visualization for Statistics in Social Science
Collection of several plotting and table output functions for visualizing data, and utility functions.
sjstats Collection of Convenient Functions for Common Statistical Computations
Collection of convenient functions for common statistical computations, which are not directly provided by R’s base or stats packages. This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like standard errors or root mean squared errors). Second, these shortcut functions are generic (if appropriate), and can be applied not only to vectors, but also to other objects as well (e.g., the Coefficient of Variation can be computed for vectors, linear models, or linear mixed models; the r2()-function returns the r-squared value for ‘lm’, ‘glm’, ‘merMod’ or ‘lme’ objects). The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models and mixed effects models. However, some of the functions also deal with other statistical measures, like Cronbach’s Alpha, Cramer’s V, Phi etc.
SK Segment-Based Ordinary Kriging and Segment-Based Regression Kriging for Spatial Prediction
Segment-based Kriging methods, including segment-based ordinary Kriging (SOK) and segment-based regression Kriging (SRK) for spatial prediction of line segment spatial data as described in Yongze Song (2018) <doi:10.1109/TITS.2018.2805817>. Includes the spatial prediction and spatial visualisation. The descriptions of the methods and case datasets refer to the citation information below.
skeletor An R Package Skeleton Generator
A tool for bootstrapping new packages with useful defaults, including a test suite outline that passes checks and helpers for running tests, checking test coverage, building vignettes, and more. Package skeletons it creates are set up for pushing your package to ‘GitHub’ and using other hosted services for building and test automation.
skellam Densities and Sampling for the Skellam Distribution
Functions for the Skellam distribution, including: density (pmf), cdf, quantiles and random variates.
skimr Compact and Flexible Summaries of Data
A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the ‘Using skimr’ vignette and the README.
sklarsomega Measuring Agreement Using Sklar’s Omega Coefficient
Provides tools for applying Sklar’s Omega (Hughes, 2018) <arXiv:1706.04651> methodology to nominal, ordinal, interval, or ratio scores. The framework can accommodate any number of units, any number of coders, and missingness; and can be used to measure agreement with a gold standard, intra-coder agreement, and/or inter-coder agreement.
skm Selective k-Means
Algorithms for solving selective k-means problem, which is defined as finding k rows in an m x n matrix such that the sum of each column minimal is minimized. In the scenario when m == n and each cell value in matrix is a valid distance metric, this is equivalent to a k-means problem. The selective k-means extends the k-means problem in the sense that it is possible to have m != n, often the case m < n which implies the search is limited within a small subset of rows. Also, the selective k-means extends the k-means problem in the sense that the instance in row set can be instance not seen in the column set, e.g., select 2 from 3 internet service provider (row) for 5 houses (column) such that minimize the overall cost (cell value) – overall cost is the sum of the column minimal of the selected 2 service provider.
skpr Design of Experiments Suite: Generate and Evaluate Optimal Designs
Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of split/split-split/…/N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and enhance reproducibility.
skynet Generates Networks from BTS Data
A flexible tool that allows generating bespoke air transport statistics for urban studies based on publicly available data from the Bureau of Transport Statistics (BTS) in the United States <https://…/databases.asp Mode_ID=1&Mode_Desc=Aviation&Subject_ID2=0>.
slackr A package to send webhook API messages to Slack.com channels/users from R
slackr – a package to send full & webhook API messages to Slack.com channels/users Slackr contains functions that make it possible to interact with slack messaging platform. When you need to share information/data from R, rather than resort to copy/paste in e-mails or other services like Skype, you can use this package to send well-formatted output from multiple R objects and expressions to all teammates at the same time with little effort. You can also send images from the current graphics device, R objects (as RData), and upload files.
sld Estimation and Use of the Quantile-Based Skew Logistic Distribution
The skew logistic distribution is a quantile-defined generalisation of the logistic distribution (van Staden and King 2015). Provides random numbers, quantiles, probabilities, densities and density quantiles for the distribution. It provides Quantile-Quantile plots and method of L-Moments estimation (including asymptotic standard errors) for the distribution.
SLDAssay Serial Limiting Dilution Assay Statistics
Calculates maximum likelihood estimate, exact and asymptotic confidence intervals, and the p-value goodness of fit for infectious units per million (IUPM) from serial limiting dilution assays. This package uses the likelihood equation, PGOF, and confidence intervals described in Meyers et. al. (1994).
sleekts 4253H, Twice Smoothing
Compute Time series Resistant Smooth 4253H, twice smoothing method.
sleepwalk Interactively Explore Dimension-Reduced Embeddings
A tool to interactively explore the embeddings created by dimension reduction methods such as Principal Components Analysis (PCA), Multidimensional Scaling (MDS), T-distributed Stochastic Neighbour Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP) or any other.
slickR Create Interactive Carousels with the JavaScript ‘Slick’ Library
Create and customize interactive carousels using the ‘Slick’ JavaScript library and the ‘htmlwidgets’ package. The carousels can contain plots produced in R, images, ‘iframes’, videos and other ‘htmlwidgets’. These carousels can be used directly from the R console, from ‘RStudio’, in Shiny apps and R Markdown documents.
SLIDE Single Cell Linkage by Distance Estimation is SLIDE
This statistical method uses the nearest neighbor algorithm to estimate absolute distances between single cells based on a chosen constellation of surface proteins, with these distances being a measure of the similarity between the two cells being compared. Based on Sen, N., Mukherjee, G., and Arvin, A.M. (2015) <DOI:10.1016/j.ymeth.2015.07.008>.
slideview Compare Raster Images Side by Side with a Slider
Create a side-by-side view of raster(image)s with an interactive slider to switch between regions of the images. This can be especially useful for image comparison of the same region at different time stamps.
slim Singular Linear Models for Longitudinal Data
Fits singular linear models to longitudinal data. Singular linear models are useful when the number, or timing, of longitudinal observations may be informative about the observations themselves. They are described in Farewell (2010) <doi:10.1093/biomet/asp068>, and are extensions of the linear increments model of Diggle et al. (2007) <doi:10.1111/j.1467-9876.2007.00590.x> to general longitudinal data.
slimrec Sparse Linear Method to Predict Ratings and Top-N Recommendations
Sparse Linear Method(SLIM) predicts ratings and top-n recommendations suited for sparse implicit positive feedback systems. SLIM is decomposed into multiple elasticnet optimization problems which are solved in parallel over multiple cores. The package is based on ‘SLIM: Sparse Linear Methods for Top-N Recommender Systems’ by Xia Ning and George Karypis <doi:10.1109/ICDM.2011.134>.
slippymath Slippy Map Tile Tools
Provides functions for performing common tasks when working with slippy map tile service APIs e.g. Google maps, Open Street Map, Mapbox, Stamen, among others. Functionality includes converting from latitude and longitude to tile numbers, determining tile bounding boxes, and compositing tiles to a georeferenced raster image.
slouch Stochastic Linear Ornstein-Uhlenbeck Comparative Hypotheses
An implementation of a phylogenetic comparative method. It can fit univariate among-species Ornstein-Uhlenbeck models of phenotypic trait evolution, where the trait evolves towards a primary optimum. The optimum can be modelled as a single parameter, as multiple discrete regimes on the phylogenetic tree, and/or with continuous covariates.
slowraker A Slow Version of the Rapid Automatic Keyword Extraction (RAKE) Algorithm
A mostly pure-R implementation of the RAKE algorithm (Rose, S., Engel, D., Cramer, N. and Cowley, W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data.
smacpod Statistical Methods for the Analysis of Case-Control Point Data
Various statistical methods for analyzing case-control point data. The methods available closely follow those in chapter 6 of Applied Spatial Statistics for Public Health Data by Waller and Gotway (2004).
smallarea Fits a Fay Herriot Model
Inference techniques for Fay Herriot Model.
SmallCountRounding Small Count Rounding of Tabular Data
A statistical disclosure control tool to protect frequency tables in cases where small values are sensitive. The function RoundViaDummy() performs small count rounding of necessary inner cells so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The methodology is described in Langsrud and Heldal (2018) <https://…/327768398>.
smartdata Data Preprocessing
Eases data preprocessing tasks, providing a data flow based on a pipe operator which eases cleansing, transformation, oversampling, or instance/feature selection operations.
SmartEDA SmartEDA: Summarize and Explore the Data
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tabl
SmartSifter Online Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms
Addressing the problem of outlier detection from the viewpoint of statistical learning theory. This method is proposed by Yamanishi, K., Takeuchi, J., Williams, G. et al. (2004) <DOI:10.1023/B:DAMI.0000023676.72185.7c>. It learns the probabilistic model (using a finite mixture model) through an on-line unsupervised process. After each datum is input, a score will be given with a high one indicating a high possibility of being a statistical outlier.
smartsizer Power Analysis for a SMART Design
A set of tools for determining the necessary sample size in order to identify the optimal dynamic treatment regime in a sequential, multiple assignment, randomized trial (SMART). Utilizes multiple comparisons with the best methodology to adjust for multiple comparisons. Designed for an arbitrary SMART design. Please see Artman (2018) <arXiv:1804.04587> for more details.
SmartSVA Implementation of Smart SVA
Introduces an improved Surrogate Variable Analysis algorithm that automatically captures salient features from data in the presence of confounding factors. The algorithm is based on the popular ‘SVA’ package and proposes a revision on it that achieves 10 times faster running time while trading no accuracy loss in return.
smbinning Optimal Binning for Scoring Modeling
In its core, smbinning categorizes a numeric variable into buckets for ulterior usage in scoring modeling. Its purpose is to automate, as much as possible, the time consuming process of generating predictive characteristics, and also document SQL codes, tables and plots used during the development stage.
SmCCNet Sparse Multiple Canonical Correlation Network Analysis Tool
A canonical correlation based framework for constructing phenotype-specific multi-omics networks by integrating multiple omics data types and a quantitative phenotype of interest.
smcfcs Multiple Imputation of Covariates by Substantive Model Compatible Fully Conditional Specification
Implements multiple imputation of missing covariates by Substantive Model Compatible Fully Conditional Specification. This is a modification of the popular FCS/chained equations multiple imputation approach, and allows imputation of missing covariate values from models which are compatible with the user specified substantive model.
smds Symbolic Multidimensional Scaling
Symbolic multidimensional scaling for interval-valued dissimilarities. The hypersphere model and the hyperbox model are available.
smerc Statistical Methods for Regional Counts
Provides statistical methods for the analysis of data aggregated spatially over regions (areal data).
SMFilter Filtering Algorithms for the State Space Model on the Stiefel Manifold
Provides the filtering algorithms for the state space model on the Stiefel manifold.
smicd Statistical Methods for Interval Censored Data
Functions that provide statistical methods for interval censored (grouped) data. The package supports the estimation of linear and linear mixed regression models with interval censored dependent variables. Parameter estimates are obtained by a stochastic expectation maximization algorithm. Furthermore, the package enables the direct (without covariates) estimation of statistical indicators from interval censored data via an iterative kernel density algorithm. Survey and Organisation for Economic Co-operation and Development (OECD) weights can be included into the direct estimation (see, Groß, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017) <doi:10.1111/rssa.12179>).
Smisc Sego Miscellaneous
A collection of functions for statistical computing and data manipulation in R. Includes routines for data ingestion, operating on dataframes and matrices, conversion to and from lists, converting factors, filename manipulation, programming utilities, parallelization, plotting, statistical and mathematical operations, and time series.
SMLoutliers Outlier Detection Using Statistical and Machine Learning Methods
Local Correlation Integral (LOCI) method for outlier identification is implemented here. The LOCI method developed here is invented in Breunig, et al. (2000), see <doi:10.1145/342009.335388>.
SMM Simulation and Estimation of Multi-State Discrete-Time Semi-Markov and Markov Models
Performs parametric and non-parametric estimation and simulation for multi-state discrete-time semi-Markov processes. For the parametric estimation, several discrete distributions are considered for the sojourn times: Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial. The non-parametric estimation concerns the sojourn time distributions, where no assumptions are done on the shape of distributions. Moreover, the estimation can be done on the basis of one or several sample paths, with or without censoring at the beginning or/and at the end of the sample paths. The implemented methods are described in Barbu, V.S., Limnios, N. (2008) <doi:10.1007/978-0-387-73173-5>, Barbu, V.S., Limnios, N. (2008) <doi:10.1080/10485250701261913> and Trevezas, S., Limnios, N. (2011) <doi:10.1080/10485252.2011.555543>. Estimation and simulation of discrete-time k-th order Markov chains are also considered.
smnet Smoothing For Stream Network Data
Fits flexible additive models to data on stream networks, taking account of flow-connectivity of the network. Models are fit using penalised least squares.
smoof Single- and Multi-Objective Optimization Functions
This package offers an interface for objective functions in the context of (multi-objective) global optimization. It conveniently builds up on the S3 objects, i. e., an objective function is a S3 object composed of a descriptive name, the function itself, a parameter set, box constraints or other constraints, number of objectives and so on. Moreover, the package contains generators for a load of both single- and multi-objective optimization test functions which are frequently being used in the literature of (benchmarking) optimization algorithms. The bi-objective ZDT function family by Zitzler, Deb and Thiele is included as well as the popular single-objective test functions like De Jong’s function, Himmelblau function and Schwefel function. Moreover, the package offers a R interface to the C implementation of the Black-Box Optimization Benchmarking (BBOB) set of noiseless test functions.
smooth Forecasting Using Smoothing Functions
The set of smoothing functions used for time series analysis and in forecasting. Currently the package includes exponential smoothing models and SARIMA in state-space form + several simulation functions.
smoothAPC Smoothing of Two-Dimensional Demographic Data, Optionally Taking into Account Period and Cohort Effects
The implemented method uses for smoothing bivariate thin plate splines, bivariate lasso-type regularization, and allows for both period and cohort effects. Thus the mortality rates are modelled as the sum of four components: a smooth bivariate function of age and time, smooth one-dimensional cohort effects, smooth one-dimensional period effects and random errors.
smoothr Spatial Feature Smoothing
Smooth spatial features (i.e. lines and polygons) to remove sharp corners and make curves appear more natural or aesthetically pleasing. Two smoothing methods are available: Chaikin’s corner cutting algorithm (Chaikin 1974 <doi:10.1016/0146-664X(74)90028-8>) and spline interpolation.
smoothROCtime Smooth Time-Dependent ROC Curve Estimation
Computes smooth estimations for the Cumulative/Dynamic and Incident/Dynamic ROC curves, in presence of right censorship, based on the bivariate kernel density estimation of the joint distribution function of the Marker and Time-to-event variables.
SmoothWin Soft Windowing on Linear Regression
The main function in the package utilizes a windowing function in the form of an exponential weighting function. The bandwidth and sharpness of the window are controlled by two parameters. Then, a penalized change point detection is used to identify the right shape of the window (see Charles Kervrann (2004) <doi:10.1007/978-3-540-24672-5_11>).
smotefamily A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE
A collection of various oversampling techniques developed from SMOTE is provided. SMOTE is a oversampling technique which synthesizes a new minority instance between a pair of one minority instance and one of its K nearest neighbor. (see <https://…/live-953-2037-jair.pdf> for more information) Other techniques adopt this concept with other criteria in order to generate balanced dataset for class imbalance problem.
smovie Some Movies to Illustrate Concepts in Statistics
Provides movies to help students to understand statistical concepts. The ‘rpanel’ package <https://…/package=rpanel> is used to create interactive plots that move to illustrate key statistical ideas and methods. There are movies to: visualise probability distributions (including user-supplied ones); illustrate sampling distributions of the sample mean (central limit theorem), the sample maximum (extremal types theorem) and (the Fisher transformation of the) Pearson product moment correlation coefficient; examine the influence of an individual observation in simple linear regression; illustrate key concepts in statistical hypothesis testing. Also provided are dpqr functions for the distribution of the Fisher transformation of the correlation coefficient under sampling from a bivariate normal distribution.
smpic Creates Images Sized for Social Media
Creates images that are the proper size for social media. Beautiful plots, charts and graphs wither and die if they are not shared. Social media is perfect for this but every platform has its own image dimensions. With ‘smpic’ you can easily save your plots with the exact dimensions needed for the different platforms.
smurf Sparse Multi-Type Regularized Feature Modeling
Implementation of the SMuRF algorithm of Devriendt et al. (2018) <arXiv:1810.03136> to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood.
snakecase Convert Strings into any Case
A consistent, flexible and easy to use tool to parse and convert strings into cases like snake or camel among others.
SnakeCharmR R and Python Integration
Run ‘Python’ code, make function calls, assign and retrieve variables, etc. from R. A fork from ‘rPython’ which uses ‘jsonlite’, ‘Rcpp’ and has several fixes and improvements.
snappier Compress and Decompress ‘Snappy’ Encoded Data
Compression and decompression with ‘Snappy’.
snem EM Algorithm for Multivariate Skew-Normal Distribution with Overparametrization
Efficient estimation of multivariate skew-normal distribution in closed form.
snfa Smooth Non-Parametric Frontier Analysis
Fitting of non-parametric production frontiers for use in efficiency analysis. Methods are provided for both a smooth analogue of Data Envelopment Analysis (DEA) and a non-parametric analogue of Stochastic Frontier Analysis (SFA). Frontiers are constructed for multiple inputs and a single output using constrained kernel smoothing as in Racine et al. (2009), which allow for the imposition of monotonicity and concavity constraints on the estimated frontier.
snht Standard Normal Homogeneity Test
Robust and non-robust SNHT tests for changepoint detection.
snn Stabilized Nearest Neighbor Classifier
Implement K-nearest neighbor classifier, weighted nearest neighbor classifier, bagged nearest neighbor classifier, optimal weighted nearest neighbor classifier and stabilized nearest neighbor classifier, and perform model selection via 5 fold cross-validation for them. This package also provides functions for computing the classification error and classification instability of a classification procedure.
snowboot Network Analysis with Non-Parametric Methods that Emerge from Snowball and Bootstrap Sampling
Functions for analysis of network objects, which are imported or simulated by the package. The non-parametric methods of analysis center around snowball and bootstrap sampling.
sNPLS NPLS Regression with L1 Penalization
Tools for performing variable selection in three-way data using N-PLS in combination with L1 penalization.
SNscan Scan Statistics in Social Networks
Scan statistics applied in social network data can be used to test the cluster characteristics among a social network.
SOAR Memory management in R by delayed assignments
Allows objects to be stored on disc and automatically recalled into memory, as required, by delayed assignment.
sobolnp Nonparametric Sobol Estimator with Bootstrap Bandwidth
Algorithm to estimate the Sobol indices using a non-parametric fit of the regression curve. The bandwidth is estimated using bootstrap to reduce the finite-sample bias. The package is based on the paper SolAs, M. (2018) <arXiv:1803.03333>.
soc.ca Specific Correspondence Analysis for the Social Sciences
Specific and class specific multiple correspondence analysis on survey-like data. Soc.ca is optimized to the needs of the social scientist and presents easily interpretable results in near publication ready quality.
social Social Autocorrelation
A set of functions to quantify and visualise social autocorrelation.
SocialMediaLab Tools for Collecting Social Media Data and Generating Networks for Analysis
A suite of tools for collecting and constructing networks from social media data. Provides easy-to-use functions for collecting data across popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis.
socialmixr Social Mixing Matrices for Infectious Disease Modelling
Provides methods for sampling contact matrices from diary data for use in infectious disease modelling, as discussed in Mossong et al. (2008) <doi:10.1371/journal.pmed.0050074>.
socviz Utility Functions and Data Sets for Data Visualization
Supporting materials for a course and book on data visualization. It contains utility functions for graphs and several sample data sets. See Healy (2019) <ISBN 978-0691181622>.
sodavis SODA: Main and Interaction Effects Selection for Discriminant Analysis and Logistic Regression
Variable and interaction selection are essential to classification in high-dimensional setting. In this package, we provide the implementation of SODA procedure, which is a forward-backward algorithm that selects both main and interaction effects under quadratic discriminant analysis and logistic regression model.
sodium A Modern and Easy-to-Use Crypto Library
Bindings to libsodium: a modern, easy-to-use software library for encryption, decryption, signatures, password hashing and more. Sodium uses curve25519, a state-of-the-art Diffie-Hellman function by Daniel Bernstein, which has become very popular after it was discovered that the NSA had backdoored Dual EC DRBG.
softermax Read Exported Data from ‘SoftMax Pro’
Read microtiter plate data and templates exported from Molecular Devices ‘SoftMax Pro’ software <https://…/softmax-pro-7-software>. Data exported by ‘SoftMax Pro’ version 5.4 and greater are supported.
softmaxreg Training Multi-Layer Neural Network for Softmax Regression and Classification
Implementation of ‘softmax’ regression and classification models with multiple layer neural network. It can be used for many tasks like word embedding based document classification, ‘MNIST’ dataset handwritten digit recognition and so on. Multiple optimization algorithm including ‘SGD’, ‘Adagrad’, ‘RMSprop’, ‘Moment’, ‘NAG’, etc are also provided.
SoftRandomForest Classification Random Forests for Soft Decision Trees
Performs random forests for soft decision trees for a classification problem. Current limitations are for a maximum depth of 5 resulting in 16 terminal nodes. Some data cleaning is required before input. Final graphic output requires currently requires exporting to ‘Microsoft Excel’ for visualization. Method based on Irsoy, Yildiz and Alpaydin (2012, ISBN: 978-4-9906441-1-6).
SOIL Sparsity Oriented Importance Learning
Sparsity Oriented Importance Learning (SOIL) provides an objective and informative profile of variable importances for high dimensional regression and classification models.
solartime Utilities Dealing with Solar Time Such as Sun Position and Time of Sunrise
Provide utilities to work with solar time, i.e. where noon is exactly when sun culminates. Provides functions for computing sun position and times of sunrise and sunset.
solitude An Implementation of Isolation Forest
Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>).
solrium General Purpose R Interface to ‘Solr’
Provides a set of functions for querying and parsing data from ‘Solr’ (<http://…/solr> ) ‘endpoints’ (local and remote), including search, ‘faceting’, ‘highlighting’, ‘stats’, and ‘more like this’. In addition, some functionality is included for creating, deleting, and updating documents in a ‘Solr’ ‘database’.
SolveRationalMatrixEquation Solve Rational Matrix Equation
Given a symmetric positive definite matrix Q and a non-singular matrix L, find symmetric positive definite solution X such that X = Q + L (X inv) L^T. Reference: Benner, P., Faßbender, H. On the Solution of the Rational Matrix Equation. Benner, Faßbender (2007) <doi:10.1155/2007/21850>.
som.nn Topological k-NN Classifier Based on Self-Organising Maps
A topological version of k-NN: An abstract model is build as 2-dimensional self-organising map. Samples of unknown class are predicted by mapping them on the SOM and analysing class membership of neurons in the neighbourhood.
SOMbrero SOM Bound to Realize Euclidean and Relational Outputs
The stochastic (also called on-line) version of the Self-Organising Map (SOM) algorithm is provided. Different versions of the algorithm are implemented, for numeric and relational data and for contingency tables. The package also contains many plotting features (to help the user interpret the results) and a graphical user interface based on shiny (which is also available on-line at http://…/sombrero ).
sommer Solving Mixed Model Equations in R
General mixed model equations solver, allowing the specification of variance covariance matrices of random effects and residual structures.
somspace Spatial Analysis with Self-Organizing Maps
Application of the Self-Organizing Maps technique for spatial classification of time series. The package uses spatial data, point or gridded, to create clusters with similar characteristics. The clusters can be further refined to a smaller number of regions by hierarchical clustering and their spatial dependencies can be presented as complex networks. Thus, meaningful maps can be created, representing the regional heterogeneity of a single variable. More information and an example of implementation can be found in Markonis and Strnad (2019).
sonify Data Sonification – Turning Data into Sound
Sonification (or audification) is the process of representing data by sounds in the audible range. This package provides the R function sonify() that transforms univariate data, sampled at regular or irregular intervals, into a continuous sound with time-varying frequency. The ups and downs in frequency represent the ups and downs in the data. Sonify provides a substitute for R’s plot function to simplify data analysis for the visually impaired.
SOR Estimation using Sequential Offsetted Regression
Estimation for longitudinal data following outcome dependent sampling using the sequential offsetted regression technique. Includes support for binary, count, and continuous data.
SorptionAnalysis Static Adsorption Experiment Plotting and Analysis
Provides tools to efficiently analyze and visualize laboratory data from aqueous static adsorption experiments. The package provides functions to plot Langmuir, Freundlich, and Temkin isotherms and functions to determine the statistical conformity of data points to the Langmuir, Freundlich, and Temkin adsorption models through statistical characterization of the isothermic least squares regressions lines. Scientific Reference: Dada, A.O, Olalekan, A., Olatunya, A. (2012) <doi:10.9790/5736-0313845>.
SortedEffects Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis
Implements the estimation and inference methods for sorted causal effects and classification analysis as in Chernozhukov, Fernandez-Val and Luo (2018) <doi:10.3982/ECTA14415>.
sound A Sound Interface for R
Basic functions for dealing with wav files and sound samples.
sourceR Fits a Non-Parametric Bayesian Source Attribution Model
Implements a non-parametric source attribution model to attribute cases of disease to sources in Bayesian framework with source and type effects. Type effects are clustered using a Dirichlet Process. Multiple times and locations are supported.
sourcetools Tools for the Reading and Tokenization of R Code
Tools for the reading and tokenization of R code. The ‘sourcetools’ package provides both an R and C++ interface for the tokenization of R code, and helpers for interacting with the tokenized representation of R code.
sp500SlidingWindow Sliding Window Investment Analysis
Test the results of any given investment/expense combinations for a series of sliding-window periods of the S&P500 from 1950 to the present.
SpaCCr Spatial Convex Clustering
Genomic Region Detection via Spatial Convex Clustering. See <https://…/1611.04696> for details.
spaceNet Latent Space Models for Multidimensional Networks
Latent space models for multivariate networks (multiplex) estimated via MCMC algorithm. See D Angelo et al. (2018) <arXiv:1803.07166>.
spacesRGB Standard and User-Defined RGB Color Spaces, with Conversion Between RGB and CIE XYZ
Standard RGB spaces included are sRGB, ‘Adobe’ RGB, and ‘ProPhoto’ RGB. User-defined RGB spaces are also possible.
spacom Spatially Weighted Context Data for Multilevel Modelling
Provides tools to construct and exploit spatially weighted context data. Spatial weights are derived by a Kernel function from a user-defined matrix of distances between contextual units. Spatial weights can then be applied either to precise contextual measures or to aggregate estimates based on micro-level survey data, to compute spatially weighted context data. Available aggregation functions include indicators of central tendency, dispersion, or inter-group variability, and take into account survey design weights. The package further allows combining the resulting spatially weighted context data with individual-level predictor and outcome variables, for the purposes of multilevel modelling. An ad hoc stratified bootstrap resampling procedure generates robust point estimates for multilevel regression coefficients and model fit indicators, and computes confidence intervals adjusted for measurement dependency and measurement error of aggregate estimates. As an additional feature, residual and explained spatial dependency can be estimated for the tested models.
spacyr R Wrapper to the spaCy NLP Library
An R wrapper to the ‘Python’ ‘spaCy’ ‘NLP’ library, from <http://spacy.io>.
spAddins A Set of RStudio Addins
A set of RStudio addins that are designed to be used in combination with user-defined RStudio keyboard shortcuts. These addins either insert text at a cursor position (e.g. insert operators %>%, <<-, %$%, etc.) or replace symbols in selected pieces of text (e.g., convert backslashes to forwardslashes which results in stings like ‘c:\data\’ converted into ‘c:/data/’).
SpaDES Develop and Run Spatially Explicit Discrete Event Simulation Models
Easily implement a variety of simulation models, with a focus on spatially explicit agent based models. These include raster-based, event-based, and agent-based models. The core simulation components are built upon a discrete event simulation framework that facilitates modularity, and easily enables the user to include additional functionality by running user-built simulation modules. Included are numerous tools to visualize raster and other maps. The suggested package ‘fastshp’ can be installed with ‘install.packages(‘fastshp’, repos=’http://rforge.net’, type=’source’)’.
SpaDES.addins Development Tools for ‘SpaDES’ and ‘SpaDES’ Modules
Provides ‘RStudio’ addins for ‘SpaDES’ packages and ‘SpaDES’ module development. See ‘?SpaDES.addins’ for an overview of the tools provided.
SpaDES.core Core Utilities for Developing and Running Spatially Explicit Discrete Event Simulation Models
Provide the core discrete event simulation (DES) framework for implementing spatially explicit simulation models. The core DES components facilitate modularity, and easily enable the user to include additional functionality by running user-built simulation modules.
SpaDES.tools Additional Tools for Developing Spatially Explicit Discrete Event Simulation (SpaDES) Models
Provides GIS/map utilities and additional modeling tools for developing cellular automata and agent based models in ‘SpaDES’.
spagmix Artificial Spatial and Spatiotemporal Densities on Bounded Windows
A utility package containing some simple tools to design and generate density functions on bounded regions in space and space-time, and simulate iid data therefrom. See Davies & Hazelton (2010) <doi:10.1002/sim.3995> for example.
spam64 64-Bit Extension of the SPArse Matrix R Package ‘spam’
Provides the Fortran code of the R package ‘spam’ with 64-bit integers. Loading this package together with the R package ‘spam’ enables the sparse matrix class ‘spam’ to handle huge sparse matrices with more than 2^31-1 non-zero elements.
spanr Search Partition Analysis
Carries out a search for an optimal partition in terms of a regular Boolean expression
sparcl Perform sparse hierarchical clustering and sparse k-means clustering
Implements the sparse clustering methods of Witten and Tibshirani (2010): ‘A framework for feature selection in clustering’; published in Journal of the American Statistical Association 105(490): 713-726.
spark.sas7bdat Read in ‘SAS’ Data (‘.sas7bdat’ Files) into ‘Apache Spark’
Read in ‘SAS’ Data (‘.sas7bdat’ Files) into ‘Apache Spark’ from R. ‘Apache Spark’ is an open source cluster computing framework available at <http://spark.apache.org>. This R package uses the ‘spark-sas7bdat’ ‘Spark’ package (<https://…/spark-sas7bdat> ) to import and process ‘SAS’ data in parallel using ‘Spark’. Hereby allowing to execute ‘dplyr’ statements in parallel on top of ‘SAS’ data.
sparkavro Load Avro File into ‘Apache Spark’
Load Avro Files into ‘Apache Spark’ using ‘sparklyr’. This allows to read files from ‘Apache Avro’ <https://…/>.
sparkbq Google ‘BigQuery’ Support for ‘sparklyr’
A ‘sparklyr’ extension package providing an integration with Google ‘BigQuery’. It supports direct import/export where records are directly streamed from/to ‘BigQuery’. In addition, data may be imported/exported via intermediate data extracts on Google ‘Cloud Storage’.
sparkline jQuery’ Sparkline ‘htmlwidget’
Include interactive sparkline charts <http://…/jquery.sparkline> in all R contexts with the convenience of ‘htmlwidgets’.
sparklines A sparkline htmlwidget for R using jQuery Sparklines
sparklines: an htmlwidget for R. This sparkline htmlwidget is based on the nifty jQuery Sparklines. htmlwidgets for R has made it extremely easy to integrate, access and use html widgets from R.
sparklyr R Interface to Apache Spark
Provision, connect and interface to Apache Spark from within R. This package supports connecting to local and remote Apache Spark clusters, provides a ‘dplyr’ compatible back-end, and provides an interface to Spark’s built-in machine learning algorithms.
sparklyr.nested A ‘sparklyr’ Extension for Nested Data
A ‘sparklyr’ extension adding the capability to work easily with nested data.
SparkR R frontend for Spark
SparkR is an R package that provides a light-weight frontend to use Spark from R.
NOTE: As of April 2015, SparkR has been merged into Apache Spark and is shipping in an upcoming release (1.4) due early summer 2015. This repo currently targets users using released versions of Spark. This repo no longer accepts new pull requests, and they should now be submitted to apache/spark.
https://…/spark
Announcing SparkR: R on Spark
SparkRext SparkR extension for closer to dplyr
The package SparkRext have been created to make SparkR be closer to dplyr. SparkRext redefines the functions of SparkR to enable NSE inputs. As a result, the functions will be able to be used in the same way as dplyr.
sparkTable Sparklines and Graphical Tables for TeX and HTML
Create sparklines and graphical tables for documents and websites.
http://…/kowarik-meindl-templ.pdf
sparkwarc Load WARC Files into Apache Spark
Load WARC (Web ARChive) files into Apache Spark using ‘sparklyr’. This allows to read files from the Common Crawl project <http://…/>.
sparkxgb Interface for ‘XGBoost’ on ‘Apache Spark’
A ‘sparklyr’ <https://…/> extension that provides an interface for ‘XGBoost’ <https://…/xgboost> on ‘Apache Spark’. ‘XGBoost’ is an optimized distributed gradient boosting library.
sparsebn Learning Sparse Bayesian Networks from High-Dimensional Data
Fast methods for learning sparse Bayesian networks from high-dimensional data using sparse regularization. Designed to incorporate mixed experimental and observational data with thousands of variables with either continuous or discrete observations.
sparsebnUtils Utilities for Learning Sparse Bayesian Networks
A set of tools for representing and estimating sparse Bayesian networks from continuous and discrete data.
sparsediscrim Sparse and Regularized Discriminant Analysis
A collection of sparse and regularized discriminant analysis methods intended for small-sample, high-dimensional data sets. The package features the High-Dimensional Regularized Discriminant Analysis classifier.
sparseEigen Computation of Sparse Eigenvectors of a Matrix
Computation of sparse eigenvectors of a matrix (aka sparse PCA) with running time 2-3 orders of magnitude lower than existing methods and better final performance in terms of recovery of sparsity pattern and estimation of numerical values. Can handle covariance matrices as well as data matrices with real or complex-valued entries. Different levels of sparsity can be specified for each individual ordered eigenvector and the method is robust in parameter selection. See vignette for a detailed documentation and comparison, with several illustrative examples. The package is based on the paper: K. Benidis, Y. Sun, P. Babu, and D. P. Palomar (2016). ‘Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation,’ IEEE Transactions on Signal Processing <doi:10.1109/TSP.2016.2605073>.
sparseFLMM Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data
Estimation of functional linear mixed models for irregularly or sparsely sampled data based on functional principal component analysis.
sparseIndexTracking Design of Portfolio of Stocks to Track an Index
Computation of sparse portfolios for financial index tracking, i.e., joint selection of a subset of the assets that compose the index and computation of their relative weights (capital allocation). The level of sparsity of the portfolios, i.e., the number of selected assets, is controlled through a regularization parameter. Different tracking measures are available, namely, the empirical tracking error (ETE), downside risk (DR), Huber empirical tracking error (HETE), and Huber downside risk (HDR). See vignette for a detailed documentation and comparison, with several illustrative examples. The package is based on the paper: K. Benidis, Y. Feng, and D. P. Palomar, ‘Sparse Portfolios for High-Dimensional Financial Index Tracking,’ IEEE Trans. on Signal Processing, vol. 66, no. 1, pp. 155-170, Jan. 2018. <doi:10.1109/TSP.2017.2762286>.
sparseinv Computation of the Sparse Inverse Subset
Creates a wrapper for the ‘SuiteSparse’ routines that execute the Takahashi equations. These equations compute the elements of the inverse of a sparse matrix at locations where the its Cholesky factor is non-zero. The resulting matrix is known as a sparse inverse subset. Some helper functions are also implemented. Support for spam matrices is currently limited and will be implemented in the future. See Rue and Martino (2007) <doi:10.1016/j.jspi.2006.07.016> and Zammit-Mangion and Rougier (2017) <arXiv:1707.00892> for the application of these equations to statistics.
SparseLearner Sparse Learning Algorithms Using a LASSO-Type Penalty for Coefficient Estimation and Model Prediction
Performs the LASSO-type sparse learning algorithm and its improved versions such as Bolasso, bootstrap ranking LASSO, two-stage hybrid LASSO and so on for coefficient estimation and model prediction. These estimation procedures are applied in the fields of variable selection, graphical modeling and model prediction.
SparseLPM The Sparse Latent Position Model for Nonnegative Interaction Data
Models the nonnegative entries of a rectangular adjacency matrix using a sparse latent position model, as illustrated in Rastelli, R. (2018) ‘The Sparse Latent Position Model for nonnegative weighted networks’ <arXiv:1808.09262>.
sparseMatEst Sparse Matrix Estimation and Inference
The ‘sparseMatEst’ package provides functions for estimating sparse covariance and precision matrices with error control. A false positive rate is fixed corresponding to the probability of falsely including a matrix entry in the support of the estimator. It uses the binary search method outlined in Kashlak and Kong (2019) <arXiv:1705.02679> and in Kashlak (2019) <arXiv:1903.10988>.
SparseMSE Multiple Systems Estimation for Sparse Capture Data
Implements the routines and algorithms developed and analysed in ‘Multiple systems estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists’ Chan, L, Silverman, B. W., Vincent, K (2019) <arXiv:1902.05156>. This package explicitly handles situations where there are pairs of lists which have no observed individuals in common.
sparsenet Fit Sparse Linear Regression Models via Nonconvex Optimization
Efficient procedure for fitting regularization paths between L1 and L0, using the MC+ penalty of Zhang, C.H. (2010)<doi:10.1214/09-AOS729>. Implements the methodology described in Mazumder, Friedman and Hastie (2011) <DOI: 10.1198/jasa.2011.tm09738>. Sparsenet computes the regularization surface over both the family parameter and the tuning parameter by coordinate descent.
sparsepca Sparse Principal Component Analysis (SPCA)
Sparse principal component analysis (SPCA) attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few ‘active’ (nonzero) values. This approach provides better interpretability for the principal components in high-dimensional data settings. This is, because the principal components are formed as a linear combination of only a few of the original variables. This package provides efficient routines to compute SPCA. Specifically, a variable projection solver is used to compute the sparse solution. In addition, a fast randomized accelerated SPCA routine and a robust SPCA routine is provided. Robust SPCA allows to capture grossly corrupted entries in the data. The methods are discussed in detail by N. Benjamin Erichson et al. (2018) <arXiv:1804.00341>.
sparsepp Rcpp’ Interface to ‘sparsepp’
Provides interface to ‘sparsepp’ – fast, memory efficient hash map. It is derived from Google’s excellent ‘sparsehash’ implementation. We believe ‘sparsepp’ provides an unparalleled combination of performance and memory usage, and will outperform your compiler’s unordered_map on both counts. Only Google’s ‘dense_hash_map’ is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance).
sparseSEM Sparse-aware Maximum Likelihood for Structural Equation Models
Sparse-aware maximum likelihood for structural equation models in inferring gene regulatory networks
sparsestep SparseStep Regression
Implements the SparseStep model for solving regression problems with a sparsity constraint on the parameters. The SparseStep regression model was proposed in Van den Burg, Groenen, and Alfons (2017) <https://…/1701.06967>. In the model, a regularization term is added to the regression problem which approximates the counting norm of the parameters. By iteratively improving the approximation a sparse solution to the regression problem can be obtained. In this package both the standard SparseStep algorithm is implemented as well as a path algorithm which uses golden section search to determine solutions with different values for the regularization parameter.
sparsesvd Sparse Truncated Singular Value Decomposition (from ‘SVDLIBC’)
Wrapper around the ‘SVDLIBC’ library for (truncated) singular value decomposition of a sparse matrix. Currently, only sparse real matrices in the Matrix package format are supported.
sparseSVM Solution Paths of Sparse Linear Support Vector Machine with Lasso or ELastic-Net Regularization
Fast algorithm for fitting solution paths of sparse linear SVM with lasso or elastic-net regularization generate sparse solutions.
sparsevar A Package for Sparse VAR/VECM Estimation
A wrapper for sparse VAR/VECM time series models estimation using penalties like ENET, SCAD and MCP.
sparsio I/O Operations with Sparse Matrices
Fast ‘SVMlight’ reader and writer. ‘SVMlight’ is most commonly used format for storing sparse matrices (possibly with some target variable) on disk. For additional information about ‘SVMlight’ format see <http://…/>.
spass Study Planning and Adaptation of Sample Size
Sample size estimation and blinded sample size reestimation in Adaptive Study Design.
spate Spatio-Temporal Modeling of Large Data Using a Spectral SPDE Approach
This package provides functionality for spatio-temporal modeling of large data sets. A Gaussian process in space and time is defined through a stochastic partial differential equation (SPDE). The SPDE is solved in the spectral space, and after discretizing in time and space, a linear Gaussian state space model is obtained. When doing inference, the main computational difficulty consists in evaluating the likelihood and in sampling from the full conditional of the spectral coefficients, or equivalently, the latent space-time process. In comparison to the traditional approach of using a spatio-temporal covariance function, the spectral SPDE approach is computationally advantageous. This package aims at providing tools for two different modeling approaches. First, the SPDE based spatio-temporal model can be used as a component in a customized hierarchical Bayesian model (HBM). The functions of the package then provide parameterizations of the process part of the model as well as computationally efficient algorithms needed for doing inference with the HBM. Alternatively, the adaptive MCMC algorithm implemented in the package can be used as an algorithm for doing inference without any additional modeling. The MCMC algorithm supports data that follow a Gaussian or a censored distribution with point mass at zero. Covariates can be included in the model through a regression term.
SpatEntropy Spatial Entropy Measures
The heterogeneity of spatial data presenting a finite number of categories can be measured via computation of spatial entropy. Functions are available for the computation of the main entropy and spatial entropy measures in the literature. They include the traditional version of Shannon’s entropy, Batty’s spatial entropy, O’Neill’s entropy, Li and Reynolds’ contagion index, Karlstrom and Ceccato’s entropy, Leibovici’s entropy, Parresol and Edwards’ entropy and Altieri’s entropy. References for all measures can be found under the topic ‘SpatEntropy’. The package is able to work with lattice and point data.
SPAtest Score Test Based on Saddlepoint Approximation
Performs score test using saddlepoint approximation to estimate the null distribution.
SpatialAcc Spatial Accessibility Measures
Provides a set of spatial accessibility measures from a set of locations (demand) to another set of locations (supply). It aims, among others, to support research on spatial accessibility to health care facilities.
spatialClust Spatial Clustering using Fuzzy Geographically Weighted Clustering
Perform Spatial Clustering Analysis using Fuzzy Geographically Weighted Clustering. Provide optimization using Gravitational Search Algorithm.
spatialEco Spatial Analysis and Modeling
Utilities to support spatial data manipulation, query, sampling and modeling. Functions include models for species population density, download utilities for climate and global deforestation spatial products, spatial smoothing, multivariate separability, point process model for creating pseudo-absences and subsampling, polygon and point-distance landscape metrics, auto-logistic model, sampling models, cluster optimization and statistical exploratory tools.
spatialfil Application of 2D Convolution Kernel Filters to Matrices or 3D Arrays
Filter matrices or (three dimensional) array data using different convolution kernels.
SpatialFloor Spatial Floor Simulation (Isotropic)
Spatial floor simulation with exponential/Gaussian variance-covariance function (isotropic), with specification of distance function, nugget, sill, range. The methodology follows Nole A.C. Cressie (2015) <doi:10.1002/9781119115151>.
spatialkernel Non-Parametric Estimation of Spatial Segregation in a Multivariate Point Process
Edge-corrected kernel density estimation and binary kernel regression estimation for multivariate spatial point process data. For details, see Diggle, P.J., Zheng, P. and Durr, P. A. (2005) <doi:10.1111/j.1467-9876.2005.05373.x>.
spatialnbda Performs spatial NBDA in a Bayesian context
Network based diffusion analysis (NBDA) allows inference on the asocial and social transmission of information. This may involve the social transmission of a particular behaviour such as tool use, for example. For the NBDA, the key parameters estimated are the social effect and baseline rate parameters. The baseline rate parameter gives the rate at which the behaviour is first performed (or acquired) asocially amongst the individuals in a given population. The social effect parameter quantifies the effect of the social associations amongst the individuals on the rate at which each individual first performs or displays the behaviour. Spatial NBDA involves incorporating spatial information in the analysis. This is done by incorporating social networks derived from spatial point patterns (of the home bases of the individuals under study). In addition, a spatial covariate such as vegetation cover, or slope may be included in the modelling process.
SpatialPosition Spatial Position Models
Computes spatial position models: Stewart potentials, Reilly catchment areas, Huff catchment areas.
spatialreg Spatial Regression Analysis
A collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in ‘spdep’, ‘sphet’ and ‘spse’. These model fitting functions include maximum likelihood methods for cross-sectional models proposed by ‘Cliff’ and ‘Ord’ (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by ‘Ord’ (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by ‘Anselin’ (1988) <doi:10.1007/978-94-015-7799-1>. Spatial two stage least squares and spatial general method of moment models initially proposed by ‘Kelejian’ and ‘Prucha’ (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/1468-2354.00027> are provided. Impact methods and MCMC fitting methods proposed by ‘LeSage’ and ‘Pace’ (2009) <doi:10.1201/9781420064254> are implemented for the family of cross-sectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by ‘Bivand et al.’ (2013) <doi:10.1111/gean.12008>, and model fitting methods by ‘Bivand’ and ‘Piras’ (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. ‘spatialreg’ >= 1.1-* correspond to ‘spdep’ >= 1.1-1, in which the model fitting functions are deprecated and pass through to ‘spatialreg’, but will mask those in ‘spatialreg’. From versions 1.2-*, the functions will be made defunct in ‘spdep’.
spatialrisk Calculating Concentration Risk under Solvency II
Methods for determining spatial risk, in particular concentration risk in the context of the EU insurance regulation framework (Solvency II).
SpatialVS Spatial Variable Selection
Perform variable selection for the spatial Poisson regression model under the adaptive elastic net penalty. Spatial count data with covariates is the input. We use a spatial Poisson regression model to link the spatial counts and covariates. For maximization of the likelihood under adaptive elastic net penalty, we implemented the penalized quasi-likelihood (PQL) and the approximate penalized loglikelihood (APL) methods. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations among the responses. More details are available in Xie et al. (2018, <arXiv:1809.06418>). The package also contains the Lyme disease dataset, which consists of the disease case data from 2006 to 2011, and demographic data and land cover data in Virginia. The Lyme disease case data were collected by the Virginia Department of Health. The demographic data (e.g., population density, median income, and average age) are from the 2010 census. Land cover data were obtained from the Multi-Resolution Land Cover Consortium for 2006.
spatialwarnings Spatial Early Warning Signals of Ecosystem Degradation
Tools to compute and assess significance of early-warnings signals (EWS) of ecosystem degradation on raster data sets. EWS are metrics derived from the observed spatial structure of an ecosystem — e.g. spatial autocorrelation — that increase before an ecosystem undergoes a non-linear transition (Kefi et al. (2014) <doi:10.1371/journal.pone.0092097>).
spatialwidget Converts Spatial Data to Javascript Object Notation (JSON) for Use in Htmlwidgets
Many packages use ‘htmlwidgets’ <https://…/package=htmlwidgets> for interactive plotting of spatial data. This package provides functions for converting R objects, such as simple features, into structures suitable for use in ‘htmlwidgets’ mapping libraries.
SpaTimeClus Model-Based Clustering of Spatio-Temporal Data
Mixture model is used to achieve the clustering goal. Each component is itself a mixture model of polynomial autoregressive regressions whose the logistic weights consider the spatial and temporal information.
SpatMCA Regularized Spatial Maximum Covariance Analysis
Provide regularized maximum covariance analysis incorporating smoothness, sparseness and orthogonality of couple patterns by using the alternating direction method of multipliers algorithm. The method can be applied to either regularly or irregularly spaced data.
SpatPCA Regularized Principal Component Analysis for Spatial Data
This package provides regularized principal component analysis incorporating smoothness, sparseness and orthogonality of eigenfunctions by using alternating direction method of multipliers (ADMM) algorithm.
SpATS Spatial Analysis of Field Trials with Splines
Analysis of field trial experiments by modelling spatial trends using two-dimensional Penalised spline (P-spline) models.
spatsoc Group Animal Relocation Data by Spatial and Temporal Relationship
Detects spatial and temporal groups in GPS relocations. It can be used to convert GPS relocations to gambit-of-the-group format to build proximity-based social networks. In addition, the randomizations function provides data-stream randomization methods suitable for GPS data.
spatstat Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
Comprehensive open-source toolbox for analysing spatial data, mainly Spatial Point Patterns, including multitype/marked points and spatial covariates, in any two-dimensional spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, and point patterns on a linear network. Contains about 2000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Diggle-Cressie-Loosmore-Ford, Dao-Genton) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov) are also supported. Parametric models can be fitted to point pattern data using the functions ppm, kppm, slrm similar to glm. Types of models include Poisson, Gibbs, Cox and cluster point processes. Models may involve dependence on covariates, interpoint interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise, AIC). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.
spatstat
shiny_spatstat
Spatstat – An introduction and measurements with Bio7
spatstat.local Extension to ‘spatstat’ for Local Composite Likelihood
Extension to the ‘spatstat’ package, enabling the user to fit point process models to point pattern data by local composite likelihood (‘geographically weighted regression’).
spbabel Convert Spatial Data Using Tidy Tables
Functions to convert from `Spatial` classes to long table form, and from long table form to `Spatial`.
spBayes Univariate and Multivariate Spatial-temporal Modeling
Fits univariate and multivariate spatio-temporal models with Markov chain Monte Carlo (MCMC).
Spbsampling Spatially Balanced Samples
Provides functions to draw spatially balanced samples. It contains fast implementations (C++ via ‘Rcpp’) of the included sampling methods. In particular, the algorithms to draw spatially balanced samples are pwd() and swd(). These methods use a probability distribution, proportional to the within sample distance. See Benedetti R and Piersimoni F (2017) <doi:10.1002/bimj.201600194> for details. Moreover, there is a function hpwd(), which is a heuristic method to achieve approximated samples obtained by pwd() in a faster way. See Benedetti R and Piersimoni F (2017) <arXiv:1710.09116> for details. Finally, there are two functions, stprod() and stsum(), useful to standardize distance matrices in order to achieve fixed sample size using, respectively, the functions pwd() and swd().
spca Sparse Principal Component Analysis
Computes Least Squares Sparse Principal Components either by a Branch-and-Bound search or with an iterative Backward Elimination algorithm. Sparse solutions can be plotted, printed and compared using the methods included.
SPCALDA A New Reduced-Rank Linear Discriminant Analysis Method
A new reduced-rank LDA method which works for high dimensional multi-class data.
SPCAvRP Sparse Principal Component Analysis via Random Projections (SPCAvRP)
Implements the SPCAvRP algorithm, developed and analysed in ‘Sparse principal component analysis via random projections’ Gataric, M., Wang, T. and Samworth, R. J. (2017) <arXiv:1712.05630>. The algorithm is based on the aggregation of eigenvector information from carefully-selected random projections of the sample covariance matrix.
SPCDAnalyze Design and Analyze Studies using the Sequential Parallel Comparison Design
Programs to find the sample size or power of studies using the Sequential Parallel Comparison Design (SPCD) and programs to analyze such studies. This is a clinical trial design where patients initially on placebo who did not respond are re-randomized between placebo and active drug in a second phase and the results of the two phases are pooled. The method of analyzing binary data with this design is described in Fava,Evins, Dorer and Schoenfeld(2003) <doi:10.1159/000069738>, and the method of analyzing continuous data is described in Chen, Yang, Hung and Wang (2011) <doi:10.1016/j.cct.2011.04.006>.
spCP Spatially Varying Change Points
Implements a spatially varying change point model with unique intercepts, slopes, variance intercepts and slopes, and change points at each location. Inference is within the Bayesian setting using Markov chain Monte Carlo (MCMC). The response variable can be modeled as Gaussian (no nugget), probit or Tobit link and the five spatially varying parameter are modeled jointly using a multivariate conditional autoregressive (MCAR) prior. The MCAR is a unique process that allows for a dissimilarity metric to dictate the local spatial dependencies. Full details of the package can be found in the accompanying vignette.
spData Datasets for Spatial Analysis
Diverse spatial datasets for demonstrating, benchmarking and teaching spatial data analysis. It includes R data of class sf (defined by the package ‘sf’). Unlike other spatial data packages such as ‘rnaturalearth’ and ‘maps’, it also contains data stored in a range of file formats including GeoJSON, ESRI Shapefile and GeoPackage. Some of the datasets are designed to illustrate specific analysis techniques. cycle_hire_osm, for example, is designed to illustrate point pattern analysis techniques.
spdownscale Spatial Downscaling Using Bias Correction Approach
Spatial downscaling of climate data (Global Circulation Models/Regional Climate Models) using quantile-quantile bias correction technique.
spdplyr Data Manipulation Verbs for the Spatial Classes
Methods for ‘dplyr’ verbs for ‘sp’ ‘Spatial’ classes. The basic verbs that modify data attributes, remove or re-arrange rows are supported and provide complete ‘Spatial’ analogues of the input data. The group by and summarize workflow returns a non-topological spatial union. There is limited support for joins, with left and inner to copy attributes from another table.
spduration Split-Population Duration (Cure) Regression
Functions for estimating split-duration regression models and \ various associated generic function methods.
spearmanCI Jackknife Euclidean / Empirical Likelihood Inference for Spearman’s Rho
Functions for conducting jackknife Euclidean / empirical likelihood inference for Spearman’s rho (de Carvalho and Marques (2012) <10.1080/10920277.2012.10597644>).
spec A Data Specification Format and Interface
Creates a data specification that describes the columns of a table (data.frame). Provides methods to read, write, and update the specification. Checks whether a table matches its specification. See specification.data.frame(),read.spec(), write.spec(), as.csv.spec(), respecify.character(), and %matches%.data.frame().
SpecDetec Change Points Detection with Spectral Clustering
Calculate change point based on spectral clustering with the option to automatically calculate the number of clusters if this information is not available.
spectral Common Methods of Spectral Data Analysis
Fourier and Hilbert transforms are utilized to perform several types of spectral analysis on the supplied data. Also fragmented and irregularly spaced data can be processed. A user friendly interface helps to interpret the results.
spectral.methods Singular Spectrum Analysis (SSA) Tools for Time Series Analysis
Contains some implementations of Singular Spectrum Analysis (SSA) for the gapfilling and spectral decomposition of time series. It contains the code used by Buttlar et. al. (2014), Nonlinear Processes in Geophysics. In addition, the iterative SSA gapfilling method of Kondrashov and Ghil (2006) is implemented. All SSA calculations are done via the truncated and fast SSA algorithm of Korobeynikov (2010) (package ‘Rssa’).
spectralAnalysis Pre-Process, Visualize and Analyse Process Analytical Data, by Spectral Data Measurements Made During a Chemical Process
Infrared, near-infrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the (pharma-)chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spc-format, which is generally supported by spectrometer vendor software. Versatile pre-processing functions are available to perform baseline correction by linking to the ‘baseline’ package; noise reduction via the ‘signal’ package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a pre-processing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the ‘plotly’ package. Non-negative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a parts-based representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. The functionality to read in spc-files was adapted from the ‘hyperSpec’ package.
spectralGraphTopology Learning Graphs from Data via Spectral Constraints
Block coordinate descent estimators to learn k-component, bipartite, and k-component bipartite graphs from data by imposing spectral constraints on the eigenvalues and eigenvectors of the Laplacian and adjacency matrices. Those estimators leverages spectral properties of the graphical models as a prior information, which turn out to play key roles in unsupervised machine learning tasks such as clustering and community detection. This package is based on the paper ‘A Unified Framework for Structured Graph Learning via Spectral Constraints’ by S. Kumar et al (2019) <arXiv:1904.09792>.
SpectralMap Diffusion Map and Spectral Map
Implements the diffusion map method of dimensionality reduction and spectral method of combining multiple diffusion maps, including creation of the spectra and visualization of maps.
Spectrum Versatile Ultra-Fast Spectral Clustering for Single and Multi-View Data
A versatile ultra-fast spectral clustering method for single or multi-view data. ‘Spectrum’ uses a new type of adaptive density aware kernel that strengthens local connections in dense regions in the graph. For integrating multi-view data and reducing noise we use a recently developed tensor product graph data integration and diffusion system. ‘Spectrum’ contains two techniques for finding the number of clusters (K); the classical eigengap method and a novel multimodality gap procedure. The multimodality gap analyses the distribution of the eigenvectors of the graph Laplacian to decide K and tune the kernel. ‘Spectrum’ is suited for clustering a wide range of complex data.
spef Semiparametric Estimating Functions
Functions for fitting semiparametric regression models for panel count survival data.
spellcheckr Correct the Spelling of a Given Word in the English Language
Corrects the spelling of a given word in English using a modification of Peter Norvig’s spell correct algorithm (see <http://…/spell-correct.html> ) which handles up to three edits. The algorithm tries to find the spelling with maximum probability of intended correction out of all possible candidate corrections from the original word.
spelling Tools for Spell Checking in R
Spell checking common document formats including latex, markdown, manual pages, and description files. Includes utilities to automate checking of documentation and vignettes as a unit test during ‘R CMD check’. Both British and American English are supported out of the box and other languages can be added. In addition, packages may define a ‘wordlist’ to allow custom terminology without having to abuse punctuation.
spemd A Bi-Dimensional Implementation of the Empirical Mode Decomposition for Spatial Data
This implementation of the Empirical Mode Decomposition (EMD) works in 2 dimensions simultaneously, and can be applied on spatial data. It can handle both gridded or un-gridded datasets.
spew SPEW Framework for Generating Synthetic Ecosystems
Tools for generating synthetic synthetic using the SPEW (Synthetic Populations and Ecosystems of the World) framework. We provide functions for the ‘synthesis’ step of the SPEW, which converts harmonized data into a synthetic ecosystem. We also provide functions for visualizing and summarizing synthetic ecosystems generated by SPEW. For details see Gallagher, S., Richardson, L.F., Ventura, S.L., Eddy W.F. (2017) <arXiv:1701.02383>.
spex Spatial Extent as Polygons with Projection
Functions to produce a fully fledged ‘Spatial’ object extent as a ‘SpatialPolygonsDataFrame’.
spfrontier Spatial Stochastic Frontier models estimation
A set of tools for estimation of various spatial specifications of stochastic frontier models
spFSR Feature Selection and Ranking by Simultaneous Perturbation Stochastic Approximation
An implementation of feature selection and ranking via simultaneous perturbation stochastic approximation (SPSA-FSR) based on works by V. Aksakalli and M. Malekipirbazari (2015) <arXiv:1508.07630> and Zeren D. Yenice and et al. (2018) <arXiv:1804.05589>. The SPSA-FSR algorithm searches for a locally optimal set of features that yield the best predictive performance using a specified error measure such as mean squared error (for regression problems) and accuracy rate (for classification problems). This package requires an object of class ‘task’ and an object of class ‘Learner’ from the ‘mlr’ package.
spftir Pre-Processing and Analysis of Mid-Infrared Spectral Region
Functions to manipulate, pre-process and analyze spectra in the mid-infrared region. The pre-processing of the mid-infrared spectra is a transcendental step in the spectral analysis. Preprocessing of the spectra includes smoothing, offset, baseline correction, and normalization, is performed before the analysis of the spectra and is essential to obtain conclusive results in subsequent quantitative or qualitative analysis. This package was supported by FONDECYT 3150630, and CIPA Conicyt-Regional R08C1002 is gratefully acknowledged.
spGARCH Spatial ARCH and GARCH Models (spGARCH)
A collection of functions to deal with spatial and spatiotemporal autoregressive conditional heteroscedasticity (spatial ARCH and GARCH models) by Otto, Schmid, Garthoff (2017) <arXiv:1609.00711>: simulation of spatial ARCH-type processes (spARCH, exponential spARCH, complex spARCH); quasi-maximum-likelihood estimation of the parameters of spARCH models and spatial autoregressive models with spARCH disturbances, diagnostic checks, visualizations.
spikes Detecting Election Fraud from Irregularities in Vote-Share Distributions
Applies re-sampled kernel density method to detect vote fraud. It estimates the proportion of coarse vote-shares in the observed data relative to the null hypothesis of no fraud.
SPIn Simulation-efficient Shortest Probability Intervals
An optimal weighting strategy to compute simulation-efficient shortest probability intervals (spins).
SPINA Structure Parameter Inference Approach
Calculates constant structure parameters of endocrine homeostatic systems from equilibrium hormone concentrations. Methods and equations have been described in Dietrich et al. (2012) <doi:10.1155/2012/351864> and Dietrich et al. (2016) <doi:10.3389/fendo.2016.00057>.
spind Spatial Methods and Indices
Functions for spatial methods based on generalized estimating equations (GEE) and wavelet-revised methods (WRM), functions for scaling by wavelet multiresolution regression (WMRR), conducting multi-model inference, and stepwise model selection. Further, contains functions for spatially corrected model accuracy measures.
spinifex Manual Tours, Manual Control of Dynamic Projections of Numeric Multivariate Data
Generates the path for manual tours [‘Cook’ & ‘Buja’ (1997) <doi:10.2307/1390747>]. Tours are generally available in the ‘tourr’ package [‘Wickham’ ‘et’ ‘al.’ (2011) <doi:10.18637/jss.v040.i02>]. The grand tour is an algorithm that shows all possible projections given sufficient time. Guided uses projection pursuit to steer the tour towards interesting projections. The ‘spinifex’ package implements manual control, where the contribution of a selected variable can be adjusted between -1 to 1, to examine the sensitivity of structure in the data to that variable. The result is an animation where the variable is toured into and out of the projection completely, which can be rendered using the ‘gganimate’ and ‘plotly’ packages.
spinyReg Sparse Generative Model and Its EM Algorithm
Implements a generative model that uses a spike-and-slab like prior distribution obtained by multiplying a deterministic binary vector. Such a model allows an EM algorithm, optimizing a type-II log-likelihood.
splashr Tools to Work with the ‘Splash’ ‘JavaScript’ Rendering and Scraping Service
Splash’ <https://…/splash> is a ‘JavaScript’ rendering service. It is a lightweight web browser with an ‘HTTP’ API, implemented in ‘Python’ using ‘Twisted’ and ‘QT’ and provides some of the core functionality of the ‘RSelenium’ or ‘seleniumPipes’ R packages in a lightweight footprint. Some of ‘Splash’ features include the ability to process multiple web pages in parallel; retrieving ‘HTML’ results and/or take screen shots; disabling images or use ‘Adblock Plus’ rules to make rendering faster; executing custom ‘JavaScript’ in page context; getting detailed rendering info in ‘HAR’ format.
splines2 Regression Spline Functions and Classes Too
A complementary package on splines providing functions constructing M-spline, I-spline, and integral of B-spline basis.
splinetree Longitudinal Regression Trees and Forests
Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.
splitFeas Multi-Set Split Feasibility
An implementation of the majorization-minimization (MM) algorithm introduced by Xu, Chi, Yang, and Lange (2017) <arXiv:1612.05614> for solving multi-set split feasibility problems. In the multi-set split feasibility problem, we seek to find a point x in the intersection of multiple closed sets and whose image under a mapping also must fall in the intersection of several closed sets.
splitfngr Combined Evaluation and Split Access of Functions
Some R functions, such as optim(), require a function its gradient passed as separate arguments. When these are expensive to calculate it may be much faster to calculate the function (fn) and gradient (gr) together since they often share many calculations (chain rule). This package allows the user to pass in a single function that returns both the function and gradient, then splits (hence ‘splitfngr’) them so the results can be accessed separately. The functions provided allow this to be done with any number of functions/values, not just for functions and gradients.
splithalf Calculate Task Split Half Reliability Estimates
A series of functions to calculate the split half reliability of RT based tasks. The core function performs a Monte Carlo procedure to process a user defined number of random splits in order to provide a better reliability estimate. The current functions target the dot- probe task, however, can be modified for other tasks.
SplitReg Split Regularized Regression
Functions for computing split regularized estimators defined in Christidis, Lakshmanan, Smucler and Zamar (2019) <arXiv:1712.03561>. The approach fits linear regression models that split the set of covariates into groups. The optimal split of the variables into groups and the regularized estimation of the regression coefficients are performed by minimizing an objective function that encourages sparsity within each group and diversity among them. The estimated coefficients are then pooled together to form the final fit.
SplitSoftening Softening Splits in Decision Trees
Allows to produce and use classification trees with probability (soft) splits.
splot Split Plot
Automates common plotting tasks to ease data exploration. Makes density plots (potentially overlaid on histograms), scatter plots with prediction lines, or bar or line plots with error bars. For each type, y, or x and y variables can be plotted at levels of other variables, all with minimal specification.
spm Spatial Predictive Modeling
Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <h…/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <h…/74030>.
spMaps Europe SpatialPolygonsDataFrame Builder
Build custom Europe SpatialPolygonsDataFrame, if you don’t know what is a SpatialPolygonsDataFrame see SpatialPolygons() in ‘sp’, by example for mapLayout() in ‘antaresViz’. Antares is a powerful software developed by RTE to simulate and study electric power systems (more information about ‘Antares’ here: <https://antares.rte-france.com> ).
spMC Continuous-Lag Spatial Markov Chains
A set of functions is provided for
1) the stratum lengths analysis along a chosen direction,
2) fast estimation of continuous lag spatial Markov chains model parameters and probability computing (also for large data sets),
3) transition probability maps and transiograms drawing,
4) simulation methods for categorical random fields.
spmoran Moran’s Eigenvector-Based Spatial Regression Models
Functions for estimating fixed and random effects eigenvector spatial filtering models.
SpNetPrep Linear Network Preprocessing for Spatial Statistics
Launches a Shiny application that allows users to carry out some of the steps that are required to curate both a linear network object based on a road structure and a point pattern that lies on such a network, becoming two previous steps to the performance of a spatial statistics analysis.
SpNMF Supervised NMF
Non-negative Matrix Factorization(NMF) is a powerful tool for identifying the key features of microbial communities and a dimension-reduction method. When we are interested in the differences between the structures of two groups of communities, supervised NMF(Yun Cai, Hong Gu and Tobby Kenney (2017),<doi:10.1186/s40168-017-0323-1>) provides a better way to do this, while retaining all the advantages of NMF — such as interpretability, and being based on a simple biological intuition.
spnn Scale Invariant Probabilistic Neural Networks
Scale invariant version of the original PNN proposed by Specht (1990) <doi:10.1016/0893-6080(90)90049-q> with the added functionality of allowing for smoothing along multiple dimensions while accounting for covariances within the data set. It is written in the R statistical programming language. Given a data set with categorical variables, we use this algorithm to estimate the probabilities of a new observation vector belonging to a specific category. This type of neural network provides the benefits of fast training time relative to backpropagation and statistical generalization with only a small set of known observations.
sport Sequential Pairwise Online Rating Techniques
Calculates ratings for two-player or multi-player challenges. Methods included in package such as are able to estimate ratings (players strengths) and their evolution in time, also able to predict output of challenge. Algorithms are based on Bayesian Approximation Method, and they don’t involve any matrix inversions nor likelihood estimation. Parameters are updated sequentially, and computation doesn’t require any additional RAM to make estimation feasible. Additionally, base of the package is written in C++ what makes sport computation even faster. Methods used in the package refers to Mark E. Glickman (1999) <http://…/glicko.pdf>; Mark E. Glickman (2001) <doi:10.1080/02664760120059219>; Ruby C. Weng, Chih-Jen Lin (2011) <http://…/weng11a.pdf>; W. Penny, Stephen J. Roberts (1999) <doi:10.1109/IJCNN.1999.832603>.
spotGUI Graphical User Interface for the Package ‘SPOT’
A graphical user interface for the Sequential Parameter Optimization Toolbox (package ‘SPOT’). It includes a quick, graphical setup for spot, interactive 3D plots, export possibilities and more.
spotifyr Pull Track Audio Features from the ‘Spotify’ Web API
A wrapper for pulling track audio features from the ‘Spotify’ Web API <http://…/web-api> in bulk. By automatically batching API requests, it allows you to enter an artist’s name and retrieve their entire discography in seconds, along with audio features and track/album popularity metrics. You can also pull song and playlist information for a given ‘Spotify’ user (including yourself!).
spp ChIP-Seq Processing Pipeline
Description: R package for analysis of ChIP-seq and other functional sequencing data.
sppmix Modeling Spatial Poisson and Related Point Processes
Implements classes and methods for modeling spatial point patterns using inhomogeneous Poisson point processes, where the intensity surface is assumed to be analogous to a finite additive mixture of normal components and the number of components is a finite, fixed or random integer. Extensions to the marked inhomogeneous Poisson point processes case are also presented. We provide an extensive suite of R functions that can be used to simulate, visualize and model point patterns, estimate the parameters of the models, assess convergence of the algorithms and perform model selection and checking in the proposed modeling context.
spray Sparse Arrays and Multivariate Polynomials
Sparse arrays interpreted as multivariate polynomials.
spreadr Simulating Spreading Activation in a Network
The notion of spreading activation is a prevalent metaphor in the cognitive sciences. This package provides the tools for cognitive scientists and psychologists to conduct computer simulations that implement spreading activation in a network representation. The algorithmic method implemented in ‘spreadr’ subroutines follows the approach described in Vitevitch, Ercal, and Adagarla (2011, Frontiers), who viewed activation as a fixed cognitive resource that could spread among nodes that were connected to each other via edges or connections (i.e., a network). See Vitevitch, M. S., Ercal, G., & Adagarla, B. (2011). Simulating retrieval from a highly clustered network: Implications for spoken word recognition. Frontiers in Psychology, 2, 369. <doi:10.3389/fpsyg.2011.00369>.
SPREDA Statistical Package for Reliability Data Analysis
The Statistical Package for REliability Data Analysis (SPREDA) implements recently-developed statistical methods for the analysis of reliability data. Modern technological developments, such as sensors and smart chips, allow us to dynamically track product/system usage as well as other environmental variables, such as temperature and humidity. We refer to these variables as dynamic covariates. The package contains functions for the analysis of time-to-event data with dynamic covariates and degradation data with dynamic covariates. The package also contains functions that can be used for analyzing time-to-event data with right censoring, and with left truncation and right censoring. Financial support from NSF and DuPont are acknowledged.
SPreg Bias Reduction in the Skew-Probit Model for a Binary Response
Provides a function for the estimation of parameters in a binary regression with the skew-probit link function. Naive MLE, Jeffrey type of prior and Cauchy prior type of penalization are implemented, as described in DongHyuk Lee and Samiran Sinha (2019+) <doi:10.1080/00949655.2019.1590579>.
sprintfr An Easy Interface to String Formatting
Makes string formatting easy with an accessible interface.
sprm Sparse and Non-Sparse Partial Robust M Regression
Robust methods for dimension reduction and regression analysis are implemented that yield estimates with a partial least squares alike interpretability. Partial robust M regression is robust to both vertical outliers and leverage points. Sparse partial robust M regression is a related robust method with sparse coefficient estimate, and therefore with intrinsic variable selection.
SPRT Wald’s Sequential Probability Ratio Test
Perform Wald’s Sequential Probability Ratio Test on variables with a Normal, Bernoulli, Exponential and Poisson distribution. Plot acceptance and continuation regions, or create your own with the help of closures.
spsann Optimization of Sample Configurations using Spatial Simulated Annealing
Methods to optimize sample configurations using spatial simulated annealing. Multiple objective functions are implemented for various purposes, such as variogram estimation, trend estimation, and spatial interpolation. A general purpose spatial simulated annealing function enables the user to define his/her own objective function.
spselect Selecting Spatial Scale of Covariates in Regression Models
Fits spatial scale (SS) forward stepwise regression, SS incremental forward stagewise regression, SS least angle regression (LARS), and SS lasso models. All area-level covariates are considered at all available scales to enter a model, but the SS algorithms are constrained to select each area-level covariate at a single spatial scale.
spsur Spatial Seemingly Unrelated Regression Models
A collection of functions to test and estimate Seemingly Unrelated Regression (usually called SUR) models, with spatial structure, by maximum likelihood and three-stage least squares. The package estimates the most common spatial specifications, that is, SUR with Spatial Lag of X regressors (called SUR-SLX), SUR with Spatial Lag Model (called SUR-SLM), SUR with Spatial Error Model (called SUR-SEM), SUR with Spatial Durbin Model (called SUR-SDM), SUR with Spatial Durbin Error Model (called SUR-SDEM), SUR with Spatial Autoregressive terms and Spatial Autoregressive Disturbances (called SUR-SARAR) and SUR with Spatially Independent Model (called SUR-SIM). The methodology of these models can be found in next references Mur, J., Lopez, F., and Herrera, M. (2010) <doi:10.1080/17421772.2010.516443> Lopez, F.A., Mur, J., and Angulo, A. (2014) <doi:10.1007/s00168-014-0624-2>.
spTDyn Spatially Varying and Spatio-Temporal Dynamic Linear Models
Fits, spatially predicts, and temporally forecasts space-time data using Gaussian Process (GP): (1) spatially varying coefficient process models and (2) spatio-temporal dynamic linear models.
sptemExp Constrained Spatiotemporal Mixed Models for Exposure Estimation
The approach of constrained spatiotemporal mixed models is to make reliable estimation of air pollutant concentrations at high spatiotemporal resolution (Li, L., Lurmann, F., Habre, R., Urman, R., Rappaport, E., Ritz, B., Chen, J., Gilliland, F., Wu, J., (2017) <doi:10.1021/acs.est.7b01864>). This package is an extensive tool for this modeling approach with support of block Kriging (Goovaerts, P. (1997) <http://…/229148123.pdf> ) and uses the PM2.5 modeling as examples. It provides the following functionality: (1) Extraction of covariates from the satellite images such as GeoTiff and NC4 raster; (2) Generation of temporal basis functions to simulate the seasonal trends in the study regions; (3) Generation of the regional monthly or yearly means of air pollutant concentration; (4) Generation of Thiessen polygons and spatial effect modeling; (5) Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing; (6) Integrated predictions with or without weights of the model’s performance, supporting multi-core parallel computing; (7) Constrained optimization to interpolate the missing values; (8) Generation of the grid surfaces of air pollutant concentration estimates at high resolution; (9) Block Kriging for regional mean estimation at multiple scales.
spTest Nonparametric Hypothesis Tests of Isotropy and Symmetry
Implements nonparametric hypothesis tests to check isotropy and symmetry properties for two-dimensional spatial data.
spup Uncertainty Propagation Analysis
Uncertainty propagation analysis in spatial environmental modelling following methodology described in Heuvelink et al. (2017) <doi:10.1080/13658810601063951> and Brown and Heuvelink (2007) <doi:10.1016/j.cageo.2006.06.015>. The package provides functions for examining the uncertainty propagation starting from input data and model parameters, via the environmental model onto model outputs. The functions include uncertainty model specification, stochastic simulation and propagation of uncertainty using Monte Carlo (MC) techniques. Uncertain variables are described by probability distributions. Both numerical and categorical data types are handled. Spatial auto-correlation within an attribute and cross-correlation between attributes is accommodated for. The MC realizations may be used as input to the environmental models called from R, or externally.
SPUTNIK SPatially aUTomatic deNoising for Ims toolKit
A set of tools for the peak filtering of mass spectrometry imaging data (MSI or IMS) based on spatial distribution of signal. Given a region-of-interest (ROI), representing the spatial region where the informative signal is expected to be localized, a series of filters determine which peak signals are characterized by an implausible spatial distribution. The filters reduce the dataset dimensionality and increase its information vs noise ratio, improving the quality of the unsupervised analysis results, reducing data dimensionality and simplifying the chemical interpretation.
SQB Sequential Bagging on Regression
Methodology: Remove one observation. Training the rest of data that are sampled without replacement and given this observation’s input, predict the response back. Replicate this N times and for each response, take a sample from these replicates with replacement. Average each responses of sample and again replicate this step N time for each observation. Approximate these N new responses and generate another N responses y’. Training these y’ and predict to have N responses of each testing observation. The average of N is the final prediction. Each observation will do the same.
SQDA Sparse Quadratic Discriminant Analysis
Sparse Quadratic Discriminant Analysis (SQDA) can be performed. In SQDA, the covariance matrix are assumed to be block-diagonal.And, for each block, sparsity assumption is imposed on the covariance matrix. It is useful in high-dimensional setting.
sqldf Perform SQL Selects on R Data Frames
Description: Manipulate R data frames using SQL.
SqlRender Rendering Parameterized SQL and Translation to Dialects
A rendering tool for parameterized SQL that also translates into different SQL dialects. These dialects include Sql Server, Oracle, PostgreSql, Amazon RedShift, and Microsoft PDW.
sqlscore R Utilities for Generating SQL Queries from Model Objects
The sqlscore package provides utilities for generating SQL queries (particularly CREATE TABLE statements) from R model objects. The most important use case is generating SQL to score a generalized linear model or related model represented as an R object, in which case the package handles parsing formula operators and including the model’s response function.
squid Statistical Quantification of Individual Differences
A simulation-based tool made to help researchers to become familiar with multilevel variations, and to build up sampling designs for their study. This tool has two main objectives: First, it provides an educational tool useful for students, teachers and researchers who want to learn to use mixed-effects models. Users can experience how the mixed-effects model framework can be used to understand distinct biological phenomena by interactively exploring simulated multilevel data. Second, it offers research opportunities to those who are already familiar with mixed-effects models, as it enables the generation of data sets that users may download and use for a range of simulation-based statistical analyses such as power and sensitivity analysis of multilevel and multivariate data.
srd Draws Scaled Rectangle Diagrams
Draws scaled rectangle diagrams to represent a 2^k contingency table, for k=6
sRDA Sparse Redundancy Analysis
Sparse redundancy analysis for high dimensional (biomedical) data. Directional multivariate analysis to express the maximum variance in the predicted data set by a linear combination of variables of the predictive data set. Implemented in a partial least squares framework, for more details see Csala et al. (2017) <doi:10.1093/bioinformatics/btx374>.
srp Smooth-Rough Partitioning of the Regression Coefficients
Performs the change-point detection in regression coefficients of linear model by partitioning the regression coefficients into two classes of smoothness. The change-point and the regression coefficients are jointly estimated.
srvyr dplyr’-Like Syntax for Summary Statistics of Survey Data
Use piping, verbs like ‘group_by’ and ‘summarize’, and other ‘dplyr’ inspired syntactic style when calculating summary statistics on survey data using functions from the ‘survey’ package.
ssa Simultaneous Signal Analysis
Procedures for analyzing simultaneous signals, e.g., features that are simultaneously significant in two different studies. Includes methods for detecting simultaneous signals and for identifying them under false discovery rate control.
SSBtools Statistics Norway’s Miscellaneous Small Tools
Small functions used by other packages from Statistics Norway are gathered. Both general data manipulation functions and some more special functions for statistical disclosure control are included. One reason for a separate package is possible reuse of the functions within a Renjin environment.
ssc Semi-Supervised Classification Methods
Provides a collection of self-labeled techniques for semi-supervised classification. In semi-supervised classification, both labeled and unlabeled data are used to train a classifier. This learning paradigm has obtained promising results, specifically in the presence of a reduced set of labeled examples. This package implements a collection of self-labeled techniques to construct a distance-based classification model. This family of techniques enlarges the original labeled set using the most confident predictions to classify unlabeled data. The techniques implemented can be applied to classification problems in several domains by the specification of a suitable base classifier and distance measure. At low ratios of labeled data, it can be shown to perform better than classical supervised classifiers.
sscor Robust Correlation Estimation and Testing Based on Spatial Signs
Provides the spatial sign correlation and the two-stage spatial sign correlation as well as a one-sample test for the correlation coefficient.
ssd Sample Size Determination (SSD) for Unordered Categorical Data
ssd calculates the sample size needed to detect the differences between two sets of unordered categorical data.
sSDR Tools Developed for Structured Sufficient Dimension Reduction (sSDR)
Performs groupwise OLS (gOLS) and groupwise SIR (gSIR).
ssev Sample Size Computation for Fixed N with Optimal Reward
Computes the optimal sample size for various 2-group designs (e.g., when comparing the means of two groups assuming equal variances, unequal variances, or comparing proportions) when the aim is to maximize the rewards over the full decision procedure of a) running a trial (with the computed sample size), and b) subsequently administering the winning treatment to the remaining N-n units in the population. Sample sizes and expected rewards for standard t- and z- tests are also provided.
ssfa Spatial Stochastic Frontier Analysis
Spatial Stochastic Frontier Analysis (SSFA) is an original method for controlling the spatial heterogeneity in Stochastic Frontier Analysis (SFA) models by splitting the inefficiency term into three terms: the first one related to spatial peculiarities of the territory in which each single unit operates, the second one related to the specific production features and the third one representing the error term.
ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors
For Bayesian inference in undirected graphical models using spike-and-slab priors, for multivariate continuous data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Wang (2015) <doi:10.1214/14-BA916>.
ssh Secure Shell (SSH) Client for R
Connect to a remote server over SSH to transfer files via SCP, setup a secure tunnel, or run a command or script on the host while streaming stdout and stderr directly to the client.
sskm Stable Sparse K-Means
Achieve feature selection via taking subsamples of data and then running sparse k-means on each of the subsamples. Only maintain features that received positive weights a high proportion of times. Run standard k-means to cluster the data based on subset of features selected.
SSL Semi-Supervised Learning
Semi-supervised learning has attracted the attention of machine learning community because of its high accuracy with less annotating effort compared with supervised learning.The question that semi-supervised learning wants to address is: given a relatively small labeled dataset and a large unlabeled dataset, how to design classification algorithms learning from both ? This package is a collection of some classical semi-supervised learning algorithms in the last few decades.
SSLASSO The Spike-and-Slab LASSO
Efficient algorithms for fitting regularization paths for linear models penalized by Spike-and-Slab LASSO.
SSM Fit and Analyze Smooth Supersaturated Models
Creates an S4 class ‘SSM’ and defines functions for fitting smooth supersaturated models, a polynomial model with spline-like behaviour. Functions are defined for the computation of Sobol indices for sensitivity analysis and plotting the main effects using FANOVA methods. It also implements the estimation of the SSM metamodel error using a GP model with a variety of defined correlation functions.
SSMMATLAB
ssmn Skew Scale Mixtures of Normal Distributions
Performs the EM algorithm for regression models using Skew Scale Mixtures of Normal Distributions.
ssMousetrack Bayesian State-Space Modeling of Mouse-Tracking Experiments via Stan
Estimates previously compiled state-space modeling for mouse-tracking experiments using the ‘rstan’ package, which provides the R interface to the Stan C++ library for Bayesian estimation.
ssmsn Scale-Shape Mixtures of Skew-Normal Distributions
It provides the density and random number generator for the Scale-Shape Mixtures of Skew-Normal Distributions proposed by Jamalizadeh and Lin (2016) <doi:10.1007/s00180-016-0691-1>.
SSOSVM Stream Suitable Online Support Vector Machines
Soft-margin support vector machines (SVMs) are a common class of classification models. The training of SVMs usually requires that the data be available all at once in a single batch, however the Stochastic majorization-minimization (SMM) algorithm framework allows for the training of SVMs on streamed data instead Nguyen, Jones & McLachlan(2018)<doi:10.1007/s42081-018-0001-y>. This package utilizes the SMM framework to provide functions for training SVMs with hinge loss, squared-hinge loss, and logistic loss.
SSRA Sakai Sequential Relation Analysis
Takeya Semantic Structure Analysis (TSSA) and Sakai Sequential Relation Analysis (SSRA) for polytomous items for examining whether each pair of items has a sequential or equal relation. Package includes functions for generating a sequential relation table and a treegram to visualize sequential or equal relations between pairs of items.
ssrm.logmer Sample Size Determination for Longitudinal Designs with Binary Outcome
Provides the necessary sample size for a longitudinal study with binary outcome in order to attain a pre-specified power while strictly maintaining the Type I error rate. Kapur K, Bhaumik R, Tang XC, Hur K, Reda DJ, Bhaumik D (2014) <doi:10.1002/sim.6203>.
SSRMST Sample Size Calculation using Restricted Mean Survival Time
Calculates the power and sample size based on the difference in Restricted Mean Survival Times.
Sstack Bootstrap Stacking of Random Forest Models for Heterogeneous Data
Generates and predicts a set of linearly stacked Random Forest models using bootstrap sampling. Individual datasets may be heterogeneous (not all samples have full sets of features). Contains support for parallelization but the user should register their cores before running. This is an extension of the method found in Matlock (2018) <doi:10.1186/s12859-018-2060-2>.
stable Probability Functions and Generalized Regression Models for Stable Distributions
Density, distribution, quantile and hazard functions of a stable variate; generalized regression models for the parameters of a stable distribution.
StableEstim Estimate the Four Parameters of Stable Laws using Different Methods
Estimate the four parameters of stable laws using maximum likelihood method, generalised method of moments with finite and continuum number of points, iterative Koutrouvelis regression and Kogon-McCulloch method. The asymptotic properties of the estimators (covariance matrix, confidence intervals) are also provided.
stablelearner Stability Assessment of Statistical Learning Methods
Graphical and computational methods that can be used to assess the stability of results from supervised statistical learning.
stablespec Stable Specification Search in Structural Equation Models
An exploratory and heuristic approach for specification search in Structural Equation Modeling. The basic idea is to subsample the original data, and for each subset we search the optimal models. Here the criteria of an optimal model is defined by two objectives: to fit the data well and is simple (parsimonious) model. As these objectives are conflicting, we apply NSGA-II to optimize, such that to get optimal models for the whole range of model complexities. From these optimal models, we observe the model specifications (structures) that are both stable and parsimonious, called relevant. At the end we infer causal model from these relevant structures.
stabm Stability Measures for Feature Selection
An implementation of many measures for the assessment of the stability of feature selection. Both simple measures and measures which take into account the similarities between features are available, see Bommert et al. (2017) <doi:10.1155/2017/7907163>.
stackoverflow Stack Overflow’s Greatest Hits
Consists of helper functions collected from StackOverflow.com, a question and answer site for professional and enthusiast programmers.
StagedChoiceSplineMix Mixture of Two-Stage Logistic Regressions with Fixed Candidate Knots
Analyzing a mixture of two-stage logistic regressions with fixed candidate knots. See Bruch, E., F. Feinberg, K. Lee (in press)<DOI:10.1073/pnas.1522494113>.
StakeholderAnalysis Measuring Stakeholder Influence
Proposes an original instrument for measuring stakeholder influence on the development of an infrastructure project that is carried through by a municipality. Hester, P.T., & Adams, K.M. (2013) <doi:10.1016/j.procs.2013.09.282> Hester, P.T., Bradley, J.M., MacGregor K.A. (2012) <doi:10.1504/IJSSE.2012.052687>.
stampr Spatial Temporal Analysis of Moving Polygons
Perform spatial temporal analysis of moving polygons; a longstanding analysis problem in Geographic Information Systems. Facilitates directional analysis, shape analysis, and some other simple functionality for examining spatial-temporal patterns of moving polygons.
STAND Statistical Analysis of Non-Detects
Provides functions for the analysis of occupational and environmental data with non-detects. Maximum likelihood (ML) methods for censored log-normal data and non-parametric methods based on the product limit estimate (PLE) for left censored data are used to calculate all of the statistics recommended by the American Industrial Hygiene Association (AIHA) for the complete data case. Functions for the analysis of complete samples using exact methods are also provided for the lognormal model. Revised from 2007-11-05 ‘survfit~1’.
standardize Tools for Standardizing Variables for Regression in R
Tools which allow regression variables to be placed on similar scales, offering computational benefits as well as easing interpretation of regression output.
StanHeaders C++ Header Files for Stan
The C++ header files associated with the Stan project. Stan is a probabilistic programming language implementing full Bayesian statistical inference with MCMC sampling and penalized maximum likelihood estimation with optimization. See http://mc-stan.org for more information.
stapler Simultaneous Truth and Performance Level Estimation
An implementation of Simultaneous Truth and Performance Level Estimation (STAPLE) <doi:10.1109/TMI.2004.828354>. This method is used when there are multiple raters for an object, typically an image, and this method fuses these ratings into one rating. It uses an expectation-maximization method to estimate this rating and the individual specificity/sensitivity for each rater.
staplr A Toolkit for PDF Files
Provides function to manipulate PDF files: merge multiple PDF files into one; Splits a single input PDF document into individual pages; remove selected pages from a file; rename multiple files in a directory.
starma Modelling STARMA Processes
Statistical functions to identify, estimate and diagnose a STARMA model.
starmie Population Structure Model Inference and Visualisation
Data structures and methods for manipulating output of genetic population structure clustering algorithms. ‘starmie’ can parse output from ‘STRUCTURE’ (see <https://…/structure.html> for details) or ‘ADMIXTURE’ (see <https://…/> for details). ‘starmie’ performs model selection via information criterion, and provides functions for MCMC diagnostics, correcting label switching and visualisation of admixture coefficients.
stars Scalable, Spatiotemporal Tidy Arrays for R
Support for Scalable, Spatiotemporal Tidy Arrays in R, using GDAL bindings.
STARTdesign Single to Double Arm Transition Design for Phase II Clinical Trials
The package is used for calibrating the design parameters for single-to-double arm transition design proposed by Shi and Yin (2017). The calibration is performed via numerical enumeration to find the optimal design that satisfies the constraints on the type I and II error rates.
startR Automatically Retrieve Multidimensional Distributed Data Sets
Tool to automatically fetch, transform and arrange subsets of multidimensional data sets (collections of files) stored in local and/or remote file systems or servers, using multicore capabilities where possible. The tool provides an interface to perceive a collection of data sets as a single large multidimensional data array, and enables the user to request for automatic retrieval, processing and arrangement of subsets of the large array. Wrapper functions to add support for custom file formats can be plugged in/out, making the tool suitable for any research field where large multidimensional data sets are involved.
STARTS Functions for the STARTS Model
Contains functions for estimating the STARTS model of Kenny and Zautra (1995, 2001) <DOI:10.1037/0022-006X.63.1.52>, <DOI:10.1037/10409-008>.
startup Friendly R Startup Configuration
Adds support for R startup configuration via ‘.Renviron.d’ and ‘.Rprofile.d’ directories in addition to ‘.Renviron’ and ‘.Rprofile’ files. This makes it possible to keep private / secret environment variables separate from other environment variables. It also makes it easier to share specific startup settings by simply copying a file to a directory.
STAT Interactive Document for Working with Basic Statistical Analysis
An interactive document on the topic of basic statistical analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
statar Tools Inspired by ‘Stata’ to Manipulate Tabular Data
A set of tools inspired by ‘Stata’ to explore data.frames (‘summarize’, ‘tabulate’, ‘xtile’, ‘pctile’, ‘binscatter’, elapsed quarters/month, lead/lag).
statcomp Statistical Complexity and Information Measures for Time Series Analysis
An implementation of local and global statistical complexity measures (aka Information Theory Quantifiers, ITQ) for time series analysis based on ordinal statistics (Bandt and Pompe (2002) <DOI:10.1103/PhysRevLett.88.174102>). Several distance measures that operate on ordinal pattern distributions, auxiliary functions for ordinal pattern analysis, and generating functions for stochastic and deterministic-chaotic processes for ITQ testing are provided.
statGraph Statistical Methods for Graphs
Contains statistical methods to analyze graphs, such as graph parameter estimation, model selection based on the GIC (Graph Information Criterion), statistical tests to discriminate two or more populations of graphs (ANOGVA -Analysis of Graph Variability), correlation between graphs, and clustering of graphs.
stationery Working Examples for Reproducible Research Documents
Templates, guides, and scripts for writing documents in ‘LaTeX’ and ‘R markdown’ to produce guides, slides, and reports. Special care is taken to illustrate use of templates and customization opportunities. Challenges and opportunities of ‘HTML’ output from ‘R markdown’ receive special attention. Includes several vignettes to assist new users of literate programming.
statip Miscellaneous Basic Statistical Functions
A collection of miscellaneous statistical functions for probability distributions: dbern(), pbern(), qbern(), rbern() for the Bernoulli distribution, and distr2name(), name2distr() for distribution names; probability density estimation: densityfun(); most frequent value estimation: mfv(), mfv1(); calculation of the Hellinger distance: hellinger(); use of classical kernels: kernelfun(), kernel_properties().
statisticalModeling Functions for Teaching Statistical Modeling
Provides graphics and other functions that evaluate and display models across many different kinds of model architecture. For instance, you can evaluate the effect size of a model input in the same way, regardless of architecture, interaction terms, etc.
StatMeasures Easy Data Manipulation, Data Quality and Statistical Checks
Offers useful functions to perform day-to-day data manipulation operations, data quality checks and post modelling statistical checks. One can effortlessly change class of a number of variables to factor, remove duplicate observations from the data, create deciles of a variable, perform data quality checks for continuous (integer or numeric), categorical (factor) and date variables, and compute goodness of fit measures such as AUC for statistical models. The functions are consistent for objects of class ‘data.frame’ and ‘data.table’, which is an enhanced ‘data.frame’ implemented in the package ‘data.table’.
statnetWeb A Graphical User Interface for Network Modeling with ‘Statnet’
A graphical user interface for network modeling with the ‘statnet’ software.
statoo Miscellaneous Basic Statistical Functions
A collection of miscellaneous statistical functions for probability distributions: ‘dbern’, ‘pbern’, ‘qbern’, ‘rbern’ for the Bernoulli distribution, and ‘distr2name’, ‘name2distr’ for distribution names; probability density estimation (‘densityfun’); most frequent value estimation (‘mfv’, ‘mfv1’); calculation of the Hellinger distance (‘hellinger’); use of classical kernels (‘kernelfun’, ‘kernel_properties’).
staTools Statistical Tools for Social Network Analysis
A collection of statistical tools for social network analysis, with strong emphasis on the analysis of discrete powerlaw distributions and statistical hypothesis tests.
StatPerMeCo Statistical Performance Measures to Evaluate Covariance Matrix Estimates
Statistical performance measures used in the econometric literature to evaluate conditional covariance/correlation matrix estimates (MSE, MAE, Euclidean distance, Frobenius distance, Stein distance, asymmetric loss function, eigenvalue loss function and the loss function defined in Eq. (4.6) of Engle et al. (2016) <doi:10.2139/ssrn.2814555>). Additionally, compute Eq. (3.1) and (4.2) of Li et al. (2016) <doi:10.1080/07350015.2015.1092975> to compare the factor loading matrix. The statistical performance measures implemented have been previously used in, for instance, Laurent et al. (2012) <doi:10.1002/jae.1248>, Amendola et al. (2015) <doi:10.1002/for.2322> and Becker et al. (2015) <doi:10.1016/j.ijforecast.2013.11.007>.
statquotes Quotes on Statistics, Data Visualization and Science
Generates a random quotation from a data base of quotes on topics in statistics, data visualization and science.
StatRank Statistical Rank Aggregation: Inference, Evaluation, and Visualization
A set of methods to implement Generalized Method of Moments and Maximal Likelihood methods for Random Utility Models. These methods are meant to provide inference on rank comparison data. These methods accept full, partial, and pairwise rankings, and provides methods to break down full or partial rankings into their pairwise components. Please see Generalized Method-of-Moments for Rank Aggregation from NIPS 2013 for a description of some of our methods.
stats R statistical functions
This package contains functions for statistical calculations and random number generation. For a complete list of functions, use library(help = ‘stats’).
statVisual Statistical Visualization Tools
Visualization functions in the applications of translational medicine (TM) and biomarker (BM) development to compare groups by statistically visualizing data and/or results of analyses, such as visualizing data by displaying in one figure different groups’ histograms, boxplots, densities, scatter plots, error-bar plots, or trajectory plots, by displaying scatter plots of top principal components or dendrograms with data points colored based on group information, or visualizing volcano plots to check the results of whole genome analyses for gene differential expression.
STB Simultaneous Tolerance Bounds
Provides an implementation of simultaneous tolerance bounds (STB), useful for checking whether a numeric vector fits to a hypothetical null-distribution or not. Furthermore, there are functions for computing STB (bands, intervals) for random variates of linear mixed models fitted with package ‘VCA’. All kinds of, possibly transformed (studentized, standardized, Pearson-type transformed) random variates (residuals, random effects), can be assessed employing STB-methodology.
stcm Tools for Inference with Set-Theoretic Comparative Methods
Provides a number of functions for carrying out inference with set-theoretic comparative methods, including facilities for examining causal paths, assessing the sensitivity of results to measurement and model specification error, and performing Random Forest Comparative Analysis.
stcos Space-Time Change of Support
Spatio-temporal change of support (STCOS) methods are designed for statistical inference on geographic and/or time domains that differ from those on which the data were observed. ‘stcos’ implements a parsimonious class of Bayesian hierarchical spatio-temporal models for STCOS with Gaussian outcomes introduced by Bradley, Wikle, and Holan <doi:10.1002/sta4.94>.
stcov Stein’s Covariance Estimator
Estimates a covariance matrix using Stein’s isotonized covariance estimator, or a related estimator suggested by Haff.
stddiff Calculate the Standardized Difference for Numeric, Binary and Category Variables
Contains three main functions including stddiff.numeric(), stddiff.binary() and stddiff.category(). These are used to calculate the standardized difference between two groups. It is especially used to evaluate the balance between two groups before and after propensity score matching.
stdReg Regression Standardization
Contains functionality for regression standardization. Two general classes of models are allowed; generalized linear models and Cox proportional hazards models.
stdvectors C++ Standard Library Vectors in R
Allows the creation and manipulation of C++ std::vector’s in R.
steadyICA ICA and Tests of Independence via Multivariate Distance Covariance
Functions related to multivariate measures of independence and ICA: -estimate independent components by minimizing distance covariance; -conduct a test of mutual independence based on distance covariance; -estimate independent components via infomax (a popular method but generally performs poorer than mdcovica, ProDenICA, and/or fastICA, but is useful for comparisons); -order indepedent components by skewness; -match independent components from multiple estimates; -other functions useful in ICA.
SteinerNet Steiner Tree Approach for Graph Analysis
A set of graph functions to find Steiner trees on graphs. It provides tools for analysing Steiner tree application on networks. It has applications in biological pathway network analysis (Sadeghi 2013) <doi:10.1186/1471-2105-14-144>.
SteinIV Semi-Parametric Stein-Like Estimator with Instrumental Variables
Routines for computing different types of linear estimators, based on instrumental variables (IVs), including the semi-parametric Stein-like (SPS) estimator, originally introduced by Judge and Mittelhammer (2004) <DOI:10.1198/016214504000000430>.
stellaRbase A ‘Stellar’ Client
An R wrapper for interacting with Horizon. Horizon is a RESTful API that allows applications to query the ‘Stellar’ payments network and pull or stream data in real time. A full overview of Horizon can be found at <https://…/>.
StempCens Spatio-Temporal Estimation and Prediction for Censored/Missing Responses
It estimates the parameters of a censored or missing data in spatio-temporal models using the SAEM algorithm (Delyon et al., 1999 <doi:10.1214/aos/1018031103>). This algorithm is a stochastic approximation of the widely used EM algorithm and an important tool for models in which the E-step does not have an analytic form. Besides the expressions obtained to estimate the parameters to the proposed model, we include the calculations for the observed information matrix using the method developed by Louis (1982) <https://…/2345828>. To examine the performance of the fitted model, case-deletion measure are provided.
stepPenal Stepwise Forward Variable Selection in Penalized Regression
Model Selection Based on Combined Penalties. This package implements a stepwise forward variable selection algorithm based on a penalized likelihood criterion that combines the L0 with L2 or L1 norms.
stepR Fitting Step-Functions
Allows to fit step-functions to univariate serial data where neither the number of jumps nor their positions is known.
StepReg Stepwise Regression Analysis
Stepwise regression analysis for variable selection can be used to get the best candidate final regression model in univariate or multivariate regression analysis with the ‘forward’ and ‘stepwise’ steps. Procedure uses Akaike information criterion, the small-sample-size corrected version of Akaike information criterion, Bayesian information criterion, Hannan and Quinn information criterion, the corrected form of Hannan and Quinn information criterion, Schwarz criterion and significance levels as selection criteria, where the significance levels for entry and for stay are set to 0.15 as default. Multicollinearity detection in regression model are performed by checking tolerance value, which is set to 1e-7 as default. Continuous variables nested within class effect are also considered in this package.
steps Spatially- and Temporally-Explicit Population Simulator
Software to simulate population dynamics across space and time.
StepSignalMargiLike Step-Wise Signal Extraction via Marginal Likelihood
Provides function to estimate multiple change points using marginal likelihood method. See the Manual file in data folder for a detailed description of all functions, and a walk through tutorial. For more information of the method, please see Du, Kao and Kou (2016) <doi:10.1080/01621459.2015.1006365>.
StepwiseTest Multiple Testing Method to Control Generalized Family-Wise Error Rate and False Discovery Proportion
Collection of stepwise procedures to conduct multiple hypotheses testing. The details of the stepwise algorithm can be found in Romano and Wolf (2007) <DOI:10.1214/009053606000001622> and Hsu, Kuan, and Yen (2014) <DOI:10.1093/jjfinec/nbu014>.
stevedore Docker Client
Work with containers over the Docker API. Rather than using system calls to interact with a docker client, using the API directly means that we can receive richer information from docker. The interface in the package is automatically generated using the ‘OpenAPI’ (a.k.a., ‘swagger’) specification, and all return values are checked in order to make them type stable.
stheoreme Klimontovich’s S-Theorem Algorithm Implementation and Data Preparation Tools
Functions implementing the procedure of entropy comparison between two data samples after the renormalization of respective probability distributions with the algorithm designed by Klimontovich (Zeitschrift fur Physik B Condensed Matter. 1987, Volume 66, Issue 1, pp 125-127) and extended by Anishchenko (Proc. SPIE 2098, Computer Simulation in Nonlinear Optics. 1994, pp.130-136). The package also includes data preparation tools which can also be used separately for various applications.
sticky Persist Object Attributes Across Data Operations
In base R, object attributes are lost when objects are modified by common data operations such as subset, filter, slice, append, extract etc. This packages allows objects to be marked as ‘sticky’ and have resilient attributes that persist during these operations or when inserted into or extracted from recursive (i.e. list-or table-like) objects.
stima Simultaneous Threshold Interaction Modeling Algorithm
Regression trunk model estimation proposed by Dusseldorp and Meulman (2004) <doi:10.1007/bf02295641> and Dusseldorp, Conversano, Van Os (2010) <doi:10.1198/jcgs.2010.06089>, integrating a regression tree and a multiple regression model.
stlplus Enhanced Seasonal Decomposition of Time Series by Loess
Decompose a time series into seasonal, trend, and remainder components using an implementation of Seasonal Decomposition of Time Series by Loess (STL) that provides several enhancements over the STL method in the stats package. These enhancements include handling missing values, providing higher order (quadratic) loess smoothing with automated parameter choices, frequency component smoothing beyond the seasonal and trend components, and some basic plot methods for diagnostics.
stm Estimation of the Structural Topic Model
The Structural Topic Model (STM) allows researchers to estimate topic models with document-level covariates. The package also includes tools for model selection, visualization, and estimation of topic-covariate regressions.
stmBrowser Structural Topic Model Browser
This visualization allows users to interactively explore the relationships between topics and the covariates estimated from the stm package in R.
stmCorrViz A Tool for Structural Topic Model Visualizations
Generates an interactive visualization of topic correlations/hierarchy in a Structural Topic Model (STM) of Roberts, Stewart, and Tingley. The package performs a hierarchical clustering of topics which are then exported to a JSON object and visualized using D3.
STMedianPolish Spatio-Temporal Median Polish
Analysis by spatio-temporal data using the decomposition in n-dimensional arrays and using median polish technique.
stmgui Shiny Application for Creating STM Models
Provides an application that acts as a GUI for the ‘stm’ text processing package.
stminsights A ‘Shiny’ Application for Inspecting Structural Topic Models
This app enables interactive validation, interpretation and visualization of structural topic models from the ‘stm’ package by Roberts and others (2014) <doi:10.1111/ajps.12103>. It also includes helper functions for model diagnostics and extracting data from effect estimates.
StMoMo Stochastic Mortality Modelling
Implementation of the family of generalised age-period-cohort stochastic mortality models. This family of models encompasses many models proposed in the actuarial and demographic literature including the Lee-Carter model and the Cairns-Blake-Dowd model. It includes functions for fitting mortality models, analysing their goodness-of-fit and performing mortality projections and simulations.
STMotif Discovery of Motifs in Spatial-Time Series
Discovery of motifs in a dataset containing numeric values. A motif is a previously unknown subsequence of a time series with relevant number of occurrences. To discover motifs the Combined Series Approach [CSA] is used.
StochKit2R Efficient Discrete Stochastic Simulation
Efficient discrete stochastic simulation using the Gillespie algorithm (aka the Stochastic Simulation Algorithm or SSA) and adaptive tau-leaping. It provides an R interface to simulation algorithms and it provides functions for visualizing (plotting) simulation output.
stopwords Multilingual Stopword Collection
Exposes the full Stopwords ISO dataset as an easy to use data structure.
storr Simple Key Value Stores
Creates and manages simple key-value stores. These can use a variety of approaches for storing the data. This package implements the base methods and support for file system and in-memory stores. A vignette shows how additional drivers can be created, and stubs exist for supporting ‘Redis’ databases.
stoRy Theme Enrichment Analysis for Stories
An implementation of the hypergeometric test to check for over-represented themes in a storyset relative to a background set of stories.
stplanr Sustainable Transport Planning
Functionality and data access tools for transport planning, including origin-destination analysis, route allocation and modelling travel patterns.
stpm Stochastic Model for Analysis of Longitudinal Data
Utilities to estimate parameters of stochastic process and modeling survival trajectories and time-to-event outcomes observed from longitudinal studies. Miscellaneous function for data preparation is also provided. For more information, see: ‘Stochastic model for analysis of longitudinal data on aging and mortality’ by Yashin A. et al, 2007, Mathematical Biosciences, 208(2), 538-551 <DOI:10.1016/j.mbs.2006.11.006>.
stpp Space-Time Point Pattern simulation, visualisation and analysis
A package for analysing, simulating and displaying space-time point patterns
stR STR Decomposition
Methods for decomposing seasonal data: STR (a Seasonal-Trend decomposition procedure based on Regression) and Robust STR. In some ways, STR is similar to Ridge Regression and Robust STR can be related to LASSO. They allow for multiple seasonal components, multiple linear covariates with constant, flexible and seasonal influence. Seasonal patterns (for both seasonal components and seasonal covariates) can be fractional and flexible over time; moreover they can be either strictly periodic or have a more complex topology. The methods provide confidence intervals for the estimated components. The methods can be used for forecasting.
stranger Simple Toolkit in R for ANomalies Get, Explain and Report
Framework for unsupervised anomalies detection that simplifies the user experience because the one does not need to be concerned with the many packages and functions that are required. Package ‘stranger’ acts as a wrapper around existing packages (‘a la ‘caret’ for modeling’) and provides a clean and uniform toolkit for evaluation/explain/reporting purposes.
strapgod Resampled Data Frames
Create data frames with virtual groups that can be used with ‘dplyr’ to efficiently compute resampled statistics, generate the data for hypothetical outcome plots, and fit multiple models on resampled variations of the original data.
strat An Implementation of the Stratification Index
An implementation of the stratification index proposed by Zhou (2012) <DOI:10.1177/0081175012452207>. The package provides two functions, srank, which returns stratum-specific information, including population share and average percentile rank; and strat, which returns the stratification index and its approximate standard error. When a grouping factor is specified, strat also provides a detailed decomposition of the overall stratification into between-group and within-group components.
stratbr Optimal Stratification in Stratified Sampling
An Optimization Algorithm Applied to Univariate Stratification Problem This function aims at constructing optimal strata with an optimization algorithm based on a global optimisation technique called Biased Random Key Genetic Algorithms.
strategicplayers Strategic Players
Identifies individuals in a social network who should be the intervention subjects for a network intervention in which you have a group of targets, a group of avoiders, and a group that is neither.
Strategy Generic Framework to Analyze Trading Strategies
Users can build and test customized quantitative trading strategies. Some quantitative trading strategies are already implemented, e.g. various moving-average filters with trend following approaches. The implemented class called ‘Strategy’ allows users to access several methods to analyze performance figures, plots and backtest the strategies. Furthermore, custom strategies can be added, a generic template is available. The custom strategies require a certain input and output so they can be called from the Strategy-constructor.
stratEst Strategy Estimation
Implements variants of the strategy frequency estimation method by Dal Bo & Frechette (2011) <doi:10.1257/aer.101.1.411>, including its adaptation to behavioral memory-one Markov strategies by Breitmoser (2015) <doi:10.1257/aer.20130675>, and the extension in the spirit of latent-class regression by Dvorak & Fehrler (2018) <doi:10.2139/ssrn.2986445>.
StratifiedBalancing Performs Stratified Covariate Balancing for Data with Discrete and Continuous Outcome Variables
Stratified covariate balancing through naturally occurring strata to adjust for confounding and interaction effects. Contains 2 primary functions which perform stratification and return adjusted odds along with naturally occurring strata.
StratifiedRF Builds Trees by Sampling Variables from Groups
Random Forest that works with groups of predictor variables. When building a tree, a number of variables is taken randomly from each group separately, thus ensuring that it contains variables from each group. Useful when rows contain information about different things (e.g. user information and product information) and it’s not sensible to make a prediction with information from only one group of variables, or when there are far more variables from one group than the other and it’s desired to have groups appear evenly on trees. Trees are grown using the C5.0 algorithm. Currently works for classification only.
stratifyR Optimal Stratification of Univariate Populations
This implements the stratification of univariate populations under stratified sampling designs using the method of Khan et al. (2002) <doi:10.1177/0008068320020518>, Khan et al. (2008) (<http://…/10761-eng.pdf> ) and Khan et al. (2015) <doi:10.1080/02664763.2015.1018674>. It determines the Optimum Strata Boundaries (OSB) and Optimum Sample Sizes (OSS) for the study variable, y, using the best-fit frequency distribution of a survey variable (if data is available) or a hypothetical distribution (if data is not available). The method formulates the problem of determining the OSB as mathematical programming problem which is solved by using a dynamic programming technique. If a dataset of the population is available to the surveyor, the method estimates its best-fit distribution and determines the OSB and OSS under Neyman allocation directly. When the dataset is not available, stratification is made based on the assumption that the values of the study variable, y, are available as hypothetical realizations of proxy values of y from recent surveys. Thus, it requires certain distributional assumptions about the study variable. At present, it handles stratification for the populations where the study variable follows a continuous distribution, namely, Pareto, Triangular, Right-triangular, Weibull, Gamma, Exponential, Uniform, Normal, Log-normal and Cauchy distributions.
stratvns Optimal Stratification in Stratified Sampling Optimization Algorithm
An Optimization Algorithm Applied to stratification Problem. It is aims to delimit the population strata and defining the allocation of sample,considering the following objective: minimize the sample size given a fixed precision level. Exhaustive enumeration method is applied in small problems, while in problems with greater complexity the algorithm is based on metaheuristic Variable Neighborhood Decomposition Search with Path Relink.
streamgraph An R htmlwidget for creating streamgraph visualizations
A streamgraph (or “stream graph”) is a type of stacked area graph which is displaced around a central axis, resulting in a flowing, organic shape. Streamgraphs were developed by Lee Byron and popularized by their use in a February 2008 New York Times article on movie box office revenues.
stremr Streamlined Estimation of Survival for Static, Dynamic and Stochastic Treatment and Monitoring Regimes
Analysis of longitudinal time-to-event or time-to-failure data. Estimates the counterfactual discrete survival curve under static, dynamic and stochastic interventions on treatment (exposure) and monitoring events over time. Estimators (IPW, MSM-IPW, GCOMP, longitudinal TMLE) adjust for measured time-varying confounding and informative right-censoring. Model fitting can be performed either with GLM or H2O-3 machine learning libraries, including the ensemble-based SuperLearner (‘h2oEnsemble’). The exposure, monitoring and censoring variables can be coded as either binary, categorical or continuous. Each can be multivariate (e.g., can use more than one column of dummy indicators for different censoring events). The input data needs to be in long format.
strex Extra String Manipulation Functions
There are some things that I wish were easier with the ‘stringr’ or ‘stringi’ packages. The foremost of these is the extraction of numbers from strings. ‘stringr’ and ‘stringi’ make you figure out the regular expression for yourself; ‘strex’ takes care of this for you. There are many other handy functionalities in ‘strex’. Contributions to this package are encouraged: it is intended as a miscellany of string manipulation functions that cannot be found in ‘stringi’ or ‘stringr’.
strict Make R behave a little more strictly
The goal of strict to make R behave a little more strictly, making base functions more likely to throw an error rather than returning potentially ambiguous results. library(strict) forces you to confront potential problems now, instead of in the future. This has both pros and cons: often you can most easily fix a potential ambiguity when you’re working on the code (rather than in six months time when you’ve forgotten how it works), but it also forces you to resolve ambiguities that might never occur with your code/data.
strider Strided Iterator and Range
The strided iterator adapts multidimensional buffers to work with the C++ standard library and range-based for-loops. Given a pointer or iterator into a multidimensional data buffer, one can generate an iterator range using make_strided to construct strided versions of the standard library’s begin and end. For constructing range-based for-loops, a strided_range class is provided. These help authors avoid integer-based indexing, which in some cases can impede algorithm performance and introduce indexing errors. This library exists primarily to expose the header file to other R projects.
String2AdjMatrix Creates an Adjacency Matrix from a List of Strings
Takes a list of character strings and forms an adjacency matrix for the times the specified characters appear together in the strings provided. For use in social network analysis and data wrangling. Simple package, comprised of three functions.
stringb Convenient Base R String Handling
Base R already ships with string handling capabilities ‘out-of-the-box’ but lacks streamlined function names and workflow. The ‘stringi’ (‘stringr’) package on the other hand has well named functions, extensive Unicode support and allows for a streamlined workflow. On the other hand it adds dependencies and regular expression interpretation between base R functions and ‘stringi’ functions might differ. This packages aims at providing a solution to the use case of unwanted dependencies on the one hand but the need for streamlined text processing on the other. The packages’ functions are solely based on wrapping base R functions into ‘stringr’/’stringi’ like function names. Along the way it adds one or two extra functions and last but not least provides all functions as generics, therefore allowing for adding methods for other text structures besides plain character vectors.
stringdist Approximate String Matching and String Distance Functions
Implements an approximate string matching version of R’s native ‘match’ function. Can calculate various string distances based on edits (damerau-levenshtein, hamming, levenshtein, optimal sting alignment), qgrams (q-gram, cosine, jaccard distance) or heuristic metrics (jaro, jaro-winkler). An implementation of soundex is provided as well.
stringformattr Dynamic String Formatting
Pass named and unnamed character vectors into specified positions in strings. This represents an attempt to replicate some of python’s string formatting.
stringi Character String Processing Facilities
Allows for fast, correct, consistent, portable, as well as convenient character string/text processing in every locale and any native encoding. Owing to the use of the ICU library, the package provides R users with platform-independent functions known to Java, Perl, Python, PHP, and Ruby programmers. Among available features there are: pattern searching (e.g. via regular expressions), random string generation, string collation, transliteration, concatenation, date-time formatting and parsing, etc.
stringi: The string processing package for R
stringr Make it easier to work with strings
stringr is a set of simple wrappers that make R’s string functions more consistent, simpler and easier to use. It does this by ensuring that: function and argument names (and positions) are consistent, all functions deal with NA’s and zero length character appropriately, and the output data structures from each function matches the input data structures of other functions.
strip Lighten your R Model Outputs
The strip function deletes components of R model outputs that are useless for specific purposes, such as predict[ing], print[ing], summary[izing], etc.
striprtf Extract Text from RTF File
Extracts plain text from RTF (Rich Text Format) file.
StroupGLMM R Codes and Datasets for Generalized Linear Mixed Models: Modern Concepts, Methods and Applications by Walter W. Stroup
R Codes and Datasets for Stroup, W. W. (2012). Generalized Linear Mixed Models: Modern Concepts, Methods and Applications, CRC Press.
strptimer An Easy Interface to Time Formatting
Makes time formatting easy with an accessible interface.
strucchange Testing, Monitoring, and Dating Structural Changes
Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
structree Tree-Structured Clustering
Tree-structured modelling of categorical predictors or measurement units.
stubthat Stubbing Framework for R
Create stubs of functions for use while testing.
stUPscales Spatio-Temporal Uncertainty Propagation Across Multiple Scales
Provides several R functions for temporal aggregation of environmental variables used in e.g. Urban Drainage Models (UDMs), as precipitation and pollutants. Also, it provides methods and functions for uncertainty propagation via Monte Carlo simulation. This package, moreover, provides specific analysis functions for urban drainage system simulation to evaluate water quantity and quality in combined sewer overflows (CSOs).
STV Single Transferable Vote Counting
Implementations of the Single Transferable Vote counting system. By default, it uses the Cambridge method for surplus allocation and Droop method for quota calculation. Fractional surplus allocation and the Hare quota are available as options.
StVAR Student’s t Vector Autoregression (StVAR)
Estimation of multivariate Student’s t dynamic regression models for a given degrees of freedom and lag length. Users can also specify the trends and dummies of any kind in matrix form.
styler Non-Invasive Pretty Printing of R Code
Pretty-prints R code without changing the user’s formatting intent.
stylest Estimating Speaker Style Distinctiveness
Estimates distinctiveness in speakers’ (authors’) style. Fits models that can be used for predicting speakers of new texts. Methods developed in Spirling et al (2018) <doi:10.2139/ssrn.3235506> (working paper).
subcopem2D Bivariate Empirical Subcopula
Calculate empirical subcopula and dependence measures from a given bivariate sample.
subdetect Detect Subgroup with an Enhanced Treatment Effect
A test for the existence of a subgroup with enhanced treatment effect. And, a sample size calculation procedure for the subgroup detection test.
subgroup.discovery Subgroup Discovery and Bump Hunting
Developed to assist in discovering interesting subgroups in high-dimensional data. The PRIM implementation is based on the 1998 paper ‘Bump hunting in high-dimensional data’ by Jerome H. Friedman and Nicholas I. Fisher. <doi:10.1023/A:1008894516817> PRIM involves finding a set of ‘rules’ which combined imply unusually large (or small) values of some other target variable. Specifically one tries to find a set of sub regions in which the target variable is substantially larger than overall mean. The objective of bump hunting in general is to find regions in the input (attribute/feature) space with relatively high (low) values for the target variable. The regions are described by simple rules of the type if: condition-1 and … and condition-n then: estimated target value. Given the data (or a subset of the data), the goal is to produce a box B within which the target mean is as large as possible. There are many problems where finding such regions is of considerable practical interest. Often these are problems where a decision maker can in a sense choose or select the values of the input variables so as to optimize the value of the target variable. In bump hunting it is customary to follow a so-called covering strategy. This means that the same box construction (rule induction) algorithm is applied sequentially to subsets of the data.
submax Effect Modification in Observational Studies Using the Submax Method
Effect modification occurs if a treatment effect is larger or more stable in certain subgroups defined by observed covariates. The submax or subgroup-maximum method of Lee et al. (2017) <arXiv:1702.00525> does an overall test and separate tests in subgroups, correcting for multiple testing using the joint distribution.
subniche Within Outlying Mean Index Analysis: Exploratory Niche Methods
Complementary multivariate analysis to the Outlying Mean Index analysis to explore niche shift of a community within an Euclidean space, with graphical displays.
subprocess Manage Sub-Processes in R
Create and handle multiple sub-processes in R, exchange data over standard input and output streams, control their life cycle.
subsamp Subsample Winner Algorithm for Variable Selection in Linear Regression with a Large Number of Variables
This subsample winner algorithm (SWA) for regression with a large-p data (X, Y) selects the important variables (or features) among the p features X in explaining the response Y. The SWA first uses a base procedure, here a linear regression, on each of subsamples randomly drawn from the p variables, and then computes the scores of all features, i.e., the p variables, according to the performance of these features collected in each of the subsample analyses. It then obtains the ‘semifinalist’ of the features based on the resulting scores and determines the ‘finalists’, i.e., the important features, from the ‘semifinalist’. Fan, Sun and Qiao (2017) <http://…/>.
subscreen Systematic Screening of Study Data for Subgroup Effects
Systematically screens study data for subgroup effects and visualizes these.
subspace Interface to OpenSubspace
An interface to ‘OpenSubspace’, an open source framework for evaluation and exploration of subspace clustering algorithms in WEKA (see <http://…/opensubspace> for more information). Also performs visualization.
subspaceMOA Interface to ‘subspaceMOA’
An interface to ‘subspaceMOA’, a Framework for the Evaluation of subspace stream clustering algorithms. (see <http://…/subspacemoa> for more information.)
subtee Subgroup Treatment Effect Estimation in Clinical Trials
Naive and adjusted treatment effect estimation for subgroups. Model averaging and bagging are proposed to address the problem of selection bias in treatment effect estimates for subgroups. The package can be used for all commonly encountered type of outcomes in clinical trials (continuous, binary, survival, count). Additional functions are provided to build the subgroup variables to be used and to plot the results using forest plots.
SubTite Subgroup Specific Optimal Dose Assignment
Contains functions for choosing subgroup specific optimal doses in a phase I dose finding clinical trial and simulating a clinical trial under the subgroup specific time to event continual reassessment method.
suddengains Identify Sudden Gains in Longitudinal Data
Identify sudden gains based on the criteria outlined by Tang and DeRubeis (1999) <doi:10.1037/0022-006X.67.6.894>. Sudden losses, defined as the opposite of sudden gains can also be identified. It applies all three criteria to a dataset while adjusting the third criteria for missing values. It handles multiple gains per person by creating two datasets, one with all sudden gains and one with one selected sudden gain for each participant. It can extract and plot scores around sudden gains on multiple measures. See the GitHub repository for more information and examples.
sugrrants Supporting Graphs for Analysing Time Series
Provides ‘ggplot2’ graphics for analysing time series data. It aims to fit into the ‘tidyverse’ and grammar of graphics framework for handling temporal data.
sumFREGAT Fast Region-Based Association Tests on Summary Statistics
An adaptation of classical region/gene-based association analysis techniques that uses summary statistics (P values and effect sizes) and correlations between genetic variants as input. It is a tool to perform the most common and efficient gene-based tests on the results of genome-wide association (meta-)analyses without having the original genotypes and phenotypes at hand.
summariser Easy Calculation and Visualisation of Confidence Intervals
Functions to speed up the exploratory analysis of simple datasets using ‘dplyr’ and ‘ggplot2’. Functions are provided to do the common tasks of calculating confidence intervals and visualising the results.
SUMMER Spatio-Temporal Under-Five Mortality Methods for Estimation
Provides methods for estimating, projecting, and plotting spatio-temporal under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>.
sunburstR Htmlwidget’ for ‘Kerry Rodden’ ‘d3.js’ Sequence Sunburst
Make interactive ‘d3.js’ sequence sunburst diagrams in R with the convenience and infrastructure of an ‘htmlwidget’.
Sunclarco Survival Analysis using Copulas
Survival analysis for unbalanced clusters using Archimedean copulas (Prenen et al. (2016) <DOI:10.1111/rssb.12174>).
sundialr An Interface to ‘SUNDIALS’ Ordinary Differential Equation (ODE) Solvers
Provides a way to call the functions in ‘SUNDIALS’ C ODE solving library (<https://…/sundials> ). Currently the serial version of ODE solver, ‘CVODE’ from the library can be accessed. The package requires ODE to be written using ‘Rcpp’ and does not require the libraries to be installed on the system.
supc The Self-Updating Process Clustering Algorithms
Implements the self-updating process clustering algorithms proposed in Shiu and Chen (2016) <doi:10.1080/00949655.2015.1049605>.
supcluster Supervised Cluster Analysis
Clusters features under the assumption that each cluster has a random effect and there is an outcome variable that is related to the random effects by a linear regression. In this way the cluster analysis is ‘supervised’ by the outcome variable. An alternate specification is that features in each cluster have the same compound symmetric normal distribution, and the conditional distribution of the outcome given the features has the same coefficient for each feature in a cluster.
SuperExactTest Exact Test and Visualization of Multi-Set Intersections
Efficient statistical testing and scalable visualization of intersections among multiple sets.
SuperGauss Superfast Likelihood Inference for Stationary Gaussian Time Series
Likelihood evaluations for stationary Gaussian time series are typically obtained via the Durbin-Levinson algorithm, which scales as O(n^2) in the number of time series observations. This package provides a ‘superfast’ O(n log^2 n) algorithm written in C++, crossing over with Durbin-Levinson around n = 300. Efficient implementations of the score and Hessian functions are also provided, leading to superfast versions of inference algorithms such as Newton-Raphson and Hamiltonian Monte Carlo. The C++ code provides a Toeplitz matrix class packaged as a header-only library, to simplify low-level usage in other packages and outside of R.
superheat A Graphical Tool for Exploring Complex Datasets Using Heatmaps
A system for generating extendable and customizable heatmaps for exploring complex datasets, including big data and data with multiple data types.
SuperLearner Super Learner Prediction
This package implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.
superml Build Machine Learning Models Like Using Python’s Scikit-Learn Library in R
The idea is to provide a standard interface to users who use both R and Python for building machine learning models. This package provides a scikit-learn’s fit, predict interface to train machine learning models in R faster.
supernova Judd & McClelland Formatting for ANOVA Output
Produces ANOVA tables in the format used by Judd, McClelland, and Ryan (2017, ISBN:978-1138819832) in their introductory textbook, Data Analysis. This includes proportional reduction in error and formatting to improve ease the transition between the book and R.
SuperPCA Supervised Principal Component Analysis
Dimension reduction of complex data with supervision from auxiliary information. The package contains a series of methods for different data types (e.g., multi-view or multi-way data) including the supervised singular value decomposition (SupSVD), supervised sparse and functional principal component (SupSFPC), supervised integrated factor analysis (SIFA) and supervised PARAFAC/CANDECOMP factorization (SupCP). When auxiliary data are available and potentially affect the intrinsic structure of the data of interest, the methods will accurately recover the underlying low-rank structure by taking into account the supervision from the auxiliary data. For more details, see the paper by Gen Li, <DOI:10.1111/biom.12698>.
SuperRanker Sequential Rank Agreement
Tools for analysing the aggreement of two or more rankings of the same items. Examples are importance rankings of predictor variables and risk predictions of subjects. Benchmarks for agreement are computed based on random permutation and bootstrap.
supervisedPRIM Supervised Classification Learning and Prediction using Patient Rules Induction Method (PRIM)
The Patient Rules Induction Method (PRIM) is typically used for ‘bump hunting’ data mining to identify regions with abnormally high concentrations of data with large or small values. This package extends this methodology so that it can be applied to binary classification problems and used for prediction.
SupMZ Detecting Structural Change with Heteroskedasticity
Calculates the sup MZ value to detect the unknown structural break points under Heteroskedasticity as given in Ahmed et al. (2017) (<DOI: 10.1080/03610926.2016.1235200>).
suppdata Downloading Supplementary Data from Published Manuscripts
Downloads data supplementary materials from manuscripts, using papers’ DOIs as references. Facilitates open, reproducible research workflows: scientists re-analyzing published datasets can work with them as easily as if they were stored on their own computer, and others can track their analysis workflow painlessly. The main function suppdata() returns a (temporary) location on the user’s computer where the file is stored, making it simple to use suppdata() with standard functions like read.csv().
support Support Points
Provides the function sp() for generating the support points proposed in Mak and Joseph (2017) <arXiv:1609.01811>. Support points are representative points of a possibly non-uniform distribution, and can be used as optimal sampling or integration points for a distribution of choice. The provided function sp() can be used to generate support points for standard distributions or for reducing big data (e.g., from Markov-chain Monte Carlo methods). A detailed description of the algorithm is found in Mak and Joseph (2017).
support.BWS Basic Functions for Supporting an Implementation of Best-Worst Scaling
Provides three basic functions that support an implementation of object case (Case 1) best-worst scaling: one for converting a two-level orthogonal main-effect design/balanced incomplete block design into questions; one for creating a data set suitable for analysis; and one for calculating count-based scores.
support.BWS2 Basic Functions for Supporting an Implementation of Case 2 Best-Worst Scaling
Provides three basic functions that support an implementation of Case 2 (profile case) best-worst scaling. The first is to convert an orthogonal main-effect design into questions, the second is to create a dataset suitable for analysis, and the third is to calculate count-based scores.
supportInt Calculates Likelihood Support Intervals for Common Data Types
Calculates likelihood based support intervals for several common data types including binomial, Poisson, normal, lm(), and glm(). For the binomial, Poisson, and normal data likelihood intervals are calculated via root finding algorithm. Additional parameters allow the user to specify whether they would like to receive a parametric bootstrap estimate of the confidence level of said support interval. For lm() and glm(), the function returns profile likelihoods for each coefficient in the model.
sure Surrogate Residuals for Ordinal and General Regression Models
An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an ‘autoplot’ function for producing standard diagnostic plots using ‘ggplot2’ graphics. The package currently supports cumulative link models from packages ‘MASS’, ‘ordinal’, ‘rms’, and ‘VGAM’. Support for binary regression models using the standard ‘glm’ function is also available.
SurfaceTortoise Find Optimal Sampling Locations Based on Spatial Covariate(s)
Create sampling designs using the surface reconstruction algorithm. Original method by: Olsson, D. 2002. A method to optimize soil sampling from ancillary data. Poster presenterad at: NJF seminar no. 336, Implementation of Precision Farming in Practical Agriculture, 10-12 June 2002, Skara, Sweden.
suropt Surrogate-Based Optimization
Multi-Objective optimization based on surrogate models. Important functions: build_surmodel, train_hego, train_mego, train_sme.
SurrogateOutcome Estimation of the Proportion of Treatment Effect Explained by Surrogate Outcome Information
Provides functions to estimate the proportion of treatment effect on a censored primary outcome that is explained by the treatment effect on a censored surrogate outcome/event. All methods are described in detail in ‘Assessing the Value of a Censored Surrogate Outcome’ by Parast L, Tian L, and Cai T which is currently in press at Lifetime Data Analysis. The main functions are (1) R.q.event() which calculates the proportion of the treatment effect (the difference in restricted mean survival time at time t) explained by surrogate outcome information observed up to a selected landmark time, (2) R.t.estimate() which calculates the proportion of the treatment effect explained by primary outcome information only observed up to a selected landmark time, and (3) IV.event() which calculates the incremental value of the surrogate outcome information.
SurrogateTest Early Testing for a Treatment Effect using Surrogate Marker Information
Provides functions to test for a treatment effect in terms of the difference in survival between a treatment group and a control group using surrogate marker information obtained at some early time point in a time-to-event outcome setting. Nonparametric kernel estimation is used to estimate the test statistic and perturbation resampling is used for variance estimation. More details will be available in the future in: Parast L, Cai T, Tian L (2017) ‘Using a Surrogate Marker for Early Testing of a Treatment Effect’ (under review).
surrosurv Evaluation of Failure Time Surrogate Endpoints in Individual Patient Data Meta-Analyses
Provides functions for the evaluation of surrogate endpoints when both the surrogate and the true endpoint are failure time variables. The approaches implemented are: (1) the two-step approach (Burzykowski et al, 2001) <DOI:10.1111/1467-9876.00244> with a copula model (Clayton, Plackett, Hougaard) at the first step and either a linear regression of log-hazard ratios at the second step (either adjusted or not for measurement error); (2) mixed proportional hazard models estimated via mixed Poisson GLM (Rotolo et al, 2019 <DOI:10.1177/0962280217718582>).
surrosurvROC Surrogate Survival ROC
Nonparametric and semiparametric estimations of the time-dependent ROC curve for an incomplete failure time data with surrogate failure time endpoints.
survAccuracyMeasures Estimate accuracy measures for risk prediction markers from survival data
This package provides a function to estimate the AUC, TPR(c), FPR(c), PPV(c), and NPV(c) for for a specific timepoint and marker cutoff value c using non-parametric and semi-parametric estimators. Standard errors and confidence intervals are also computed. Either analytic or bootstrap standard errors can be computed.
survAWKMT2 Two-Sample Tests Based on Differences of Kaplan-Meier Curves
Tests for equality of two survival functions based on integrated weighted differences of two Kaplan-Meier curves.
survBootOutliers Concordance Based Bootstrap Methods for Outlier Detection in Survival Analysis
Three new methods to perform outlier detection in a survival context. In total there are six methods provided, the first three methods are traditional residual-based outlier detection methods, the second three are the concordance-based. Package developed during the work on the two following publications: Pinto J., Carvalho A. and Vinga S. (2015) <doi:10.5220/0005225300750082>; Pinto J.D., Carvalho A.M., Vinga S. (2015) <doi:10.1007/978-3-319-27926-8_22>.
survClip Survival Analysis for Pathways
Survival analysis using pathway topology. Data reduction techniques with graphical models are used to identify pathways or modules that are associated to survival.
SurvCorr Correlation of Bivariate Survival Times
Estimates correlation coefficients with associated confidence limits for bivariate, partially censored survival times. Uses the iterative multiple imputation approach proposed by Schemper, Kaider, Wakounig and Heinze, Statistics in Medicine 2013. Provides a scatterplot function to visualize the bivariate distribution, either on the original time scale or as copula.
survELtest Comparing Multiple Survival Functions via Empirical Likelihood (EL) Based Tests
Contains routines for computing the one-sided/two-sided integrated/maximally selected EL statistics for simultaneous testing, the one-sided/two-sided EL tests for pointwise testing, and an initial test that precedes one-sided testing to exclude the possibility of crossings or alternative orderings.
surveybootstrap Tools for the Bootstrap with Survey Data
Tools for using different kinds of bootstrap for estimating sampling variation using complex survey data.
surveyoutliers Helps Manage Outliers in Sample Surveys
At present, the only functionality is the calculation of optimal one-sided winsorizing cutoffs. The main function is optimal.onesided.cutoff.bygroup. It calculates the optimal tuning parameter for one-sided winsorisation, and so calculates winsorised values for a variable of interest. See the help file for this function for more details and an example.
surveyplanning Survey Planning Tools
Tools for sample survey planing, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation.
surveysd Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs
Calculate point estimates and their standard errors in complex household surveys using bootstrap replicates. Bootstrapping considers survey design with a rotating panel. A comprehensive description of the methodology can be found under <https://…/methodology.html>.
SurvGSD Group Sequential Design for a Clinical Trial with Censored Survival Data
Sample size calculation utilizing the information fraction and the alpha spending function in a group sequential clinical trial with censored survival data from underlying generalized gamma survival distributions or log-logistic survival distributions. Hsu, C.-H., Chen, C.-H, Hsu, K.-N. and Lu, Y.-H. (2018) A useful design utilizing the information fraction in a group sequential clinical trial with censored survival data. To appear in Biometrics.
survHE Survival Analysis in Health Economic Evaluation
Contains a suite of functions for survival analysis in health economics. These can be used to run survival models under a frequentist (based on maximum likelihood) or a Bayesian approach (both based on Integrated Nested Laplace Approximation or Hamiltonian Monte Carlo). The user can specify a set of parametric models using a common notation and select the preferred mode of inference. The results can also be post-processed to produce probabilistic sensitivity analysis and can be used to export the output to an Excel file (e.g. for a Markov model, as often done by modellers and practitioners).
survidm Inference and Prediction in an Illness-Death Model
Newly developed methods for the estimation of several probabilities in an illness-death model. The package can be used to obtain nonparametric and semiparametric estimates for: transition probabilities, occupation probabilities, cumulative incidence function and the sojourn time distributions. Several auxiliary functions are also provided which can be used for marginal estimation of the survival functions.
survivalAnalysis High-Level Interface for Survival Analysis and Associated Plots
A high-level interface to perform survival analysis, including Kaplan-Meier analysis and log-rank tests and Cox regression. Aims at providing a clear and elegant syntax, support for use in a pipeline, structured output and plotting. Builds upon the survminer package for Kaplan-Meier plots and provides a customizable implementation for forest plots.
survivALL Continuous Biomarker Assessment by Exhaustive Survival Analysis
In routine practice, biomarker performance is calculated by splitting a patient cohort at some arbitrary level, often by median gene expression. The logic behind this is to divide patients into “high” or “low” expression groups that in turn correlate with either good or poor prognosis. However, this median-split approach assumes that the data set composition adheres to a strict 1:1 proportion of high vs. low expression, that for every one “low” there is an equivalent “high”. In reality, data sets are often heterogeneous in their composition (Perou, CM et al., 2000 <doi:10.1038/35021093>)- i.e. this 1:1 relationship is unlikely to exist and the true relationship unknown. Given this limitation, it remains difficult to determine where the most significant separation should be made. For example, estrogen receptor (ER) status determined by immunohistochemistry is standard practice in predicting hormone therapy response, where ER is found in an ~1:3 ratio (-:+) in the populationi (Selli, C et al., 2016 <doi:10.1186/s13058-016-0779-0>). We would expect therefore, upon dividing patients by ER expression, 25% to be classified “low” and 75% “high”, and an otherwise 50-50 split to incorrectly classify 25% of our patient cohort, rendering our survival estimate under powered. ‘survivALL’ is a data-driven approach to calculate the relative survival estimates for all possible points of separation – i.e. at all possible ratios of “high” vs. “low” – allowing a measure’s relationship with survival to be more reliably determined and quantified. We see this as a solution to a flaw in common research practice, namely the failure of a true biomarker as part of a meta-analysis.
survivalsvm Survival Support Vector Analysis
Performs support vectors analysis for data sets with survival outcome. Three approaches are available in the package: The regression approach takes censoring into account when formulating the inequality constraints of the support vector problem. In the ranking approach, the inequality constraints set the objective to maximize the concordance index for comparable pairs of observations. The hybrid approach combines the regression and ranking constraints in the same model.
survminer Drawing Survival Curves using ‘ggplot2’
Contains the function ‘ggsurvplot()’ for drawing easily beautiful and ‘ready-to-publish’ survival curves using ‘ggplot2’. It includes also some options for displaying the p-value and the ‘number at risk’ table under the survival curves.
survMisc Miscellaneous Functions for Survival Data
A collection of functions to help in the analysis of right-censored survival data. These extend the methods available in package:survival.
survPen Multidimensional Penalized Splines for Survival and Net Survival Models
Fits hazard and excess hazard models with multidimensional penalized splines allowing for time-dependent effects, non-linear effects and interactions between several continuous covariates.
SurvRank Rank Based Survival Modelling
Estimation of the prediction accuracy in a unified survival AUC approach. Model selection and prediction estimation based on a survival AUC. Stepwise model selection, based on several ranking approaches.
survRM2 Comparing Restricted Mean Survival Time
Performs two-sample comparisons using the restricted mean survival time (RMST) as a summary measure of the survival time distribution. Three kinds of between-group contrast metrics (i.e., the difference in RMST, the ratio of RMST and the ratio of the restricted mean time lost (RMTL)) are computed. It performs an ANCOVA-type covariate adjustment as well as unadjusted analyses for those measures.
survRM2adapt Flexible and Coherent Test/Estimation Procedure Based on Restricted Mean Survival Times
Performs the procedure proposed by Horiguchi et al. (2018) <doi:10.1002/sim.7661>. The method specifies a set of truncation time points tau’s for calculating restricted mean survival times (RMST), performs testing for equality, and estimates the difference in RMST between two groups at the specified tau’s. Multiplicity by specifying several tau’s is taken into account in this procedure.
survsim Simulation of Simple and Complex Survival Data
Simulation of simple and complex survival data including recurrent and multiple events and competing risks.
survsup Plotting Survival Curves with Numbers at Risk Table
Implements functions to plot survival curves. Provides the capability to add numbers at risk tables, and allows for using the pipe operator to create more complex plots.
survtmle Compute Targeted Minimum Loss-Based Estimates in Right-Censored Survival Settings
Targeted estimates of marginal cumulative incidence estimates in survival settings with and without competing risks, including estimators that respect bounds (Benkeser, Carone, and Gilbert (2017) <doi: 10.1002/sim.7337>).
survutils Utility Functions for Survival Analysis
Functional programming principles to iteratively run Cox regression and plot its results. The results are reported in tidy data frames. Additional utility functions are available for working with other aspects of survival analysis such as survival curves, C-statistics, etc.
survxai Visualization of the Local and Global Survival Model Explanations
Survival models may have very different structures. This package contains functions for creating a unified representation of a survival models, which can be further processed by various survival explainers. Tools implemented in ‘survxai’ help to understand how input variables are used in the model and what impact do they have on the final model prediction. Currently, four explanation methods are implemented. We can divide them into two groups: local and global.
sutteForecastR Forecasting Data using Alpha-Sutte Indicator
The alpha-Sutte indicator (alpha-Sutte) was originally from developed of Sutte indicator. Sutte indicator can using to predict the movement of stocks. As the development of science, then Sutte indicator developed to predict not only the movement of stocks but also can forecast data on financial, insurance, and others time series data. Ahmar, A.S. (2017) <doi:10.17605/osf.io/rknsv>.
sValues Measures of the Sturdiness of Regression Coefficients
The sValues package implements the s-values proposed by Ed. Leamer.
svars Data-Driven Identification of SVAR Models
Implements data-driven identification methods for structural vector autoregressive (SVAR) models. Based on an existing VAR model object (provided by e.g. VAR() from the ‘vars’ package), the structural impact matrix is obtained via data-driven identification techniques (i.e. changes in volatility (Rigobon, R. (2003) <doi:10.1162/003465303772815727>), least dependent innovations (Herwartz, H., Ploedt, M., (2016) <doi:10.1016/j.jimonfin.2015.11.001>) or non-Gaussian maximum likelihood (Lanne, M., Meitz, M., Saikkonen, P. (2017) <doi:10.1016/j.jeconom.2016.06.002>).
svdvis Singular Value Decomposition Visualization
Visualize singular value decompositions (SVD), principal component analysis (PCA), factor analysis (FA) and related methods.
svenssonm Svensson’s Method
Obtain parameters of Svensson’s Method, including percentage agreement, systematic change and individual change. Also, the contingency table can be generated. Svensson’s Method is a rank-invariant nonparametric method for the analysis of ordered scales which measures the level of change both from systematic and individual aspects. For the details, please refer to Svensson E. Analysis of systematic and random differences between paired ordinal categorical data [dissertation]. Stockholm: Almqvist & Wiksell International; 1993.
svglite A SVG Graphics Device
A graphics device for R that produces ‘Scalable Vector Graphics’. ‘svglite’ is a fork of the older ‘RSvgDevice’ package.
GitHub
svgPanZoom R ‘Htmlwidget’ to Add Pan and Zoom to Almost any R Graphic
This ‘htmlwidget’ provides pan and zoom interactivity to R graphics, including ‘base’, ‘lattice’, and ‘ggplot2’. The interactivity is provided through the ‘svg-pan-zoom.js’ library. Various options to the widget can tailor the pan and zoom experience to nearly any user desire.
GitHub
svgViewR 3D Animated Interactive Visualizations using SVG
Creates 3D animated, interactive visualizations in Scalable Vector Graphics (SVG) format that can be viewed in a web browser.
svmadmm Linear/Nonlinear SVM Classification Solver Based on ADMM and IADMM Algorithms
Solve large-scale regularised linear/kernel classification by using ADMM and IADMM algorithms. This package provides linear L2-regularised primal classification (both ADMM and IADMM are available), kernel L2-regularised dual classification (IADMM) as well as L1-regularised primal classification (both ADMM and IADMM are available).
SVMMatch Causal Effect Estimation and Diagnostics with Support Vector Machines
Causal effect estimation in observational data often requires identifying a set of untreated observations that are comparable to some treated group of interest. This package provides a suite of functions for identifying such a set of observations and for implementing standard and new diagnostics tools. The primary function, svmmatch(), uses support vector machines to identify a region of common support between treatment and control groups. A sensitivity analysis, balance checking, and assessment of the region of overlap between treated and control groups is included. The Bayesian implementation allows for recovery of uncertainty estimates for the treatment effect and all other parameters.
svmplus Implementation of Support Vector Machines Plus (SVM+)
Implementation of Support Vector Machines Plus (SVM+) for classification problems. See (Vladimir et. al, 2009, <doi:10.1016/j.neunet.2009.06.042>) for theoretical details and see (Li et. al, 2016, <https://…/svmplus_matlab> ) for implementation details in ‘MATLAB’.
SVN Statistically Validated Networks
Determines networks of significant synchronization between the discrete states of nodes; see Tumminello et al <doi:10.1371/journal.pone.0017994>.
svrpath The SVR Path Algorithm
Computes the entire solution paths for Support Vector Regression with low cost. See Wang, G. et al (2008) <doi:10.1109/TNN.2008.2002077> for details regarding the method.
svs Tools for Semantic Vector Spaces
Various tools for semantic vector spaces, such as correspondence analysis (simple, multiple and discriminant), latent semantic analysis, probabilistic latent semantic analysis, non-negative matrix factorization and EM clustering. Furthermore, there are specialized distance measures, plotting functions and some helper functions.
svydiags Linear Regression Model Diagnostics for Survey Data
Contains functions for computing diagnostics for fixed effects linear regression models fitted with survey data. Extensions of standard diagnostics to complex survey data are included: standardized residuals, leverages, Cook’s D, dfbetas, dffits, condition indexes, and variance inflation factors.
swa Subsampling Winner Algorithm for Classification
This algorithm conducts variable selection in the classification setting. It repeatedly subsamples variables and runs linear discriminant analysis (LDA) on the subsampled variables. Variables are scored based on the AUC and the t-statistics. Variables then enter a competition and the semi-finalist variables will be evaluated in a final round of LDA classification. The algorithm then outputs a list of variable selected. Qiao, Sun and Fan (2017) <http://…/swa.html>.
swagger Dynamically Generates Documentation from a ‘Swagger’ Compliant API
A collection of ‘HTML’, ‘JavaScript’, and ‘CSS’ assets that dynamically generate beautiful documentation from a ‘Swagger’ compliant API: <https://…/>.
swapClass A Null Model Adapted to Abundance Class Data in Ecology
A null model randomizing semi-quantitative multi-classes (or ordinal) data by swapping sub-matrices while both the row and the column marginal sums are held constant.
SwarmSVM Ensemble Learning Algorithms Based on Support Vector Machines
Three ensemble learning algorithms based on support vector machines. They all train support vector machines on subset of data and combine the result.
swatches Read, Inspect, and Manipulate Color Swatch Files
There are numerous places to create and download color palettes. These are usually shared in ‘Adobe’ swatch file formats of some kind. There is also often the need to use standard palettes developed within an organization to ensure that aesthetics are carried over into all projects and output. Now there is a way to read these swatch files in R and avoid transcribing or converting color values by hand or or with other programs. This package provides functions to read and inspect ‘Adobe Color’ (‘ACO’), ‘Adobe Swatch Exchange’ (‘ASE’), ‘GIMP Palette’ (‘GPL’), ‘OpenOffice’ palette (‘SOC’) files and ‘KDE Palette’ (‘colors’) files. Detailed descriptions of ‘Adobe Color’ and ‘Swatch Exchange’ file formats as well as other swatch file formats can be found at <http://…/fileformats.php>.
swCRTdesign Stepped Wedge Cluster Randomized Trial (SW CRT) Design
A set of tools for examining the design and analysis aspects of stepped wedge cluster randomized trials (SW CRT) based on a repeated cross-sectional sampling scheme (Hussey MA and Hughes JP (2007) Contemporary Clinical Trials 28:182-191. <doi:10.1016/j.cct.2006.05.007>).
swdft Sliding Window Discrete Fourier Transform (SWDFT)
Implements the Sliding Window Discrete Fourier Transform (SWDFT). Also provides statistical and graphical tools for analyzing the output. For details see Richardson, L.F., Eddy W.F. (2018) <arXiv:1807.07797>.
sweep Tidy Tools for Forecasting
Tidies up the forecasting modeling and prediction work flow, extends the ‘broom’ package with ‘sw_tidy’, ‘sw_glance’, ‘sw_augment’, and ‘sw_tidy_decomp’ functions for various forecasting models, and enables converting ‘forecast’ objects to ‘tidy’ data frames with ‘sw_sweep’.
swgee Simulation Extrapolation Inverse Probability Weighted Generalized Estimating Equations
Simulation extrapolation and inverse probability weighted generalized estimating equations method for longitudinal data with missing observations and measurement error in covariates. References: Yi, G. Y. (2008) <doi:10.1093/biostatistics/kxm054>; Cook, J. R. and Stefanski, L. A. (1994) <doi:10.1080/01621459.1994.10476871>; Little, R. J. A. and Rubin, D. B. (2002, ISBN:978-0-471-18386-0).
switchr Installing, Managing, and Switching Between Distinct Sets of Installed Packages
Provides an abstraction for managing, installing, and switching between sets of installed R packages. This allows users to maintain multiple package libraries simultaneously, e.g. to maintain strict, package-version-specific reproducibility of many analyses, or work within a devel/release paradigm. Introduces a generalized package installation process which supports multiple repository and non-repository sources and tracks package provenance.
switchrGist Publish Package Manifests to GitHub Gists
Provides a simple plugin to the switchr framework which allows users to publish manifests of packages – or of specific versions thereof – as single-file GitHub repositories (Gists). These manifest files can then be used as remote seeds (see switchr documentation) when creating new package libraries.
sylcount Syllable Counting and Readability Measurements
An English language syllable counter, plus readability score measure-er. The package has been carefully optimized and should be very efficient, both in terms of run time performance and memory consumption. The main methods are ‘vectorized’ by document, and scores for multiple documents are computed in parallel via ‘OpenMP’.
syllabifyr Syllabifier for CMU Dictionary Transcriptions
Implements tidy syllabification of transcription. Based on @kylebgorman’s ‘python’ implementation <https://…/syllabify>.
syllable A Small Collection of Syllable Counting Functions
Tools for counting syllables and polysyllables. The tools rely primarily on a ‘data.table’ hash table lookup, resulting in fast syllable counting.
sylly Hyphenation and Syllable Counting for Text Analysis
Provides the hyphenation algorithm used for ‘TeX’/’LaTeX’ and similar software, as proposed by Liang (1983, <https://…/> ). Mainly contains the function hyphen() to be used for hyphenation/syllable counting of text objects. It was originally developed for and part of the ‘koRpus’ package, but later released as a separate package so it’s lighter to have this particular functionality available for other packages. Support for various languages needs be added on-the-fly or by plugin packages; this package does not include any language specific data. Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev mailing list (<http://korpusml.reaktanz.de> ).
sym.arma Autoregressive and Moving Average Symmetric Models
Functions for fitting the Autoregressive and Moving Average Symmetric Model for univariate time series introduced by Maior and Cysneiros (2018), <doi:10.1007/s00362-016-0753-z>. Fitting method: conditional maximum likelihood estimation. For details see: Wei (2006), Time Series Analysis: Univariate and Multivariate Methods, Section 7.2.
symDMatrix Partitioned Symmetric Matrices
A class that partitions a symmetric matrix into matrix-like objects (blocks) while behaving similarly to a base R matrix. Very large symmetric matrices are supported if the blocks are memory-mapped objects.
SymTS Symmetric Tempered Stable Distributions
Contains methods for simulation and for evaluating the pdf, cdf, and quantile functions for symmetric stable, symmetric classical tempered stable, and symmetric power tempered stable distributions.
synapser Interface to Synapse, a Collaborative Workspace for Reproducible Data Intensive Research Projects
The synapser package provides an interface to Synapse, a collaborative workspace for reproducible data intensive research projects, providing support for:
• integrated presentation of data, code and text
• fine grained access control
• provenance tracking
The synapser package lets you communicate with the Synapse platform to create collaborative data analysis projects and access data using the R programming language. Other Synapse clients exist for Python, Java, and the web browser.
SyncRNG A Synchronized Tausworthe RNG for R and Python
Random number generation designed for cross-language usage.
syntaxr An ‘SPSS’ Syntax Generator for Multi-Variable Manipulation
A set of functions for generating ‘SPSS’ syntax files from the R environment.
sys Portable System Utilities
Powerful replacements for base system2 with consistent behavior across platforms. Supports interruption, background tasks, and full control over STDOUT / STDERR binary or text streams.
syt Standard Young Tableaux
Deals with standard Young tableaux (field of combinatorics). Performs enumeration, counting, random generation, the Robinson-Schensted correspondence, and conversion to and from paths on the Young lattice.
syuzhet Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Extracts sentiment and sentiment-derived plot arcs from text using three sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include “afinn” developed by Finn Arup Nielsen, “bing” developed by Minqing Hu and Bing Liu, and “nrc” developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the “get_sentiment” function. The package also provides a method for implementing Stanford’s coreNLP sentiment parser. The package provides several methods for plot arc normalization.

T

T2EQ Functions for Applying the T^2-Test for Equivalence
Contains functions for applying the T^2-test for equivalence. The T^2-test for equivalence is a multivariate two-sample equivalence test. Distance measure of the test is the Mahalanobis distance. For multivariate normally distributed data the T^2-test for equivalence is exact and UMPI. The function T2EQ() implements the T^2-test for equivalence according to Wellek (2010) <DOI:10.1201/ebk1439808184>. The function T2EQ.dissolution.profiles.hoffelder() implements a variant of the T^2-test for equivalence according to Hoffelder (2016) <http://…/suse_item.php?suseId=Z|pi|8430> for the equivalence comparison of highly variable dissolution profiles.
taber Split and Recombine Your Data
Sometimes you need to split your data and work on the two chunks independently before bringing them back together. ‘Taber’ allows you to do that with its two functions.
tablaxlsx Write Formatted Tables in Excel Workbooks
Some functions are included in this package for writing tables in Excel format suitable for distribution.
table1 Tables of Descriptive Statistics in HTML
Create HTML tables of descriptive statistics, as one would expect to see as the first table (i.e. ‘Table 1’) in a medical/epidemiological journal article.
tableHTML A Tool to Create HTML Tables
A tool to easily create and style HTML tables, which are compatible with shiny.
tableMatrix Combines ‘data.table’ and ‘matrix’ Classes
Provides two classes extending ‘data.table’ class. Simple ‘tableList’ class wraps ‘data.table’ and any additional structures together. More complex ‘tableMatrix’ class combines strengths of ‘data.table’ and ‘matrix’. See <http://…/tableMatrix> for more information and examples.
tableone Create ‘Table 1’ to Describe Baseline Characteristics
Creates ‘Table 1’, i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the ‘survey’ package. See ‘github’ for a screen cast. ‘tableone’ was inspired by descriptive statistics functions in ‘Deducer’ , a Java-based GUI package by Ian Fellows. This package does not require GUI or Java, and intended for command-line users.
tablerDash Tabler’ API for ‘Shiny’
R’ interface to the ‘Tabler’ HTML template. See more here <https://tabler.io>. ‘tablerDash’ is a light ‘Bootstrap 4’ dashboard template. There are different layouts available such as a one page dashboard or a multi page template, where the navigation menu is contained in the navigation bar. A fancy example is available at <https://…/>.
tableschema.r Frictionless Data Table Schema
Allows to work with ‘Table Schema’ (<http://…/> ). ‘Table Schema’ is well suited for use cases around handling and validating tabular data in text formats such as ‘csv’, but its utility extends well beyond this core usage, towards a range of applications where data benefits from a portable schema format. The ‘tableschema.r’ package can load and validate any table schema descriptor, allow the creation and modification of descriptors, expose methods for reading and streaming data that conforms to a ‘Table Schema’ via the ‘Tabular Data Resource’ abstraction.
tabplot Tableplot, a Visualization of Large Datasets
A tableplot is a visualisation of a (large) dataset with a dozen of variables, both numeric and categorical. Each column represents a variable and each row bin is an aggregate of a certain number of records. Numeric variables are visualized as bar charts, and categorical variables as stacked bar charts. Missing values are taken into account. Also supports large ‘ffdf’ datasets from the ‘ff’ package.
tabulizer Bindings for ‘Tabula’ PDF Table Extractor Library
Bindings for the ‘Tabula’ <http://…/> ‘Java’ library, which can extract tables from PDF documents. The ‘tabulizerjars’ package <https://…/tabulizerjars> provides versioned ‘Java’ .jar files, including all dependencies, aligned to releases of ‘Tabula’.
tabulizerjars Java’ .jar Files for ‘tabulizer’
Java’ .jar files for the ‘Tabula’ <http://…/> ‘Java’ library, which are required by the ‘tabulizer’ R package. The version numbering of this package corresponds to versions of ‘tabula-java’ library releases <https://…/>.
tactile New and Extended Plots, Methods, and Panel Functions for ‘lattice’
Extensions to ‘lattice’, providing new high-level functions, methods for existing functions, panel functions, and a theme.
tagcloud Tag Clouds
Tag and Word Clouds
tailDepFun Minimum Distance Estimation of Tail Dependence Models
Provides functions implementing minimal distance estimation methods for parametric tail dependence models.
tailloss Estimate the Probability in the Upper Tail of the Aggregate Loss Distribution
Set of tools to estimate the probability in the upper tail of the aggregate loss distribution using different methods: Panjer recursion, Monte Carlo simulations, Markov bound, Cantelli bound, Moment bound, and Chernoff bound.
tailr Automatic Tail Recursion Optimisation
Implements meta-programming functions for automatically translating recursive functions into looping functions or trampolines.
taipan Tool for Annotating Images in Preparation for Analysis
A tool to help create shiny apps for selecting and annotating elements of images. Users must supply images, questions, and answer choices. The user interface is a dynamic shiny app, that displays the images and questions and answer choices. The data generated can be saved to a file that can be used for subsequent analysis. The original purpose was to annotate still images from tennis video for face recognition and emotion detection purposes.
tanaka Design Shaded Contour Lines (or Tanaka) Maps
The Tanaka method enhances the representation of topography on a map using shaded contour lines. In this simplified implementation of the method, north-west white contours represent illuminated topography and south-east black contours represent shaded topography.
tangram The Grammar of Tables
Provides an extensible formula system to quickly and easily create production quality tables. The steps of the process are formula parser, statistical content generation from data, to rendering. Each step of the process is separate and user definable thus creating a set of building blocks for highly extensible table generation. A user is not limited by any of the choices of the package creator other than the formula grammar. For example, one could chose to add a different S3 rendering function and output a format not provided in the default package. Or possibly one would rather have Gini coefficients for their statistical content. Routines to achieve New England Journal of Medicine style, Lancet style and Hmisc::summaryM() statistics are provided. The package contains rendering for HTML5, Rmarkdown and an indexing format for use in tracing and tracking are provided.
tapkee Wrapper for ‘tapkee’ Dimension Reduction Library
Wrapper for using ‘tapkee’ command line utility, it allows to run it from inside R and catch the results for further analysis and plotting. ‘Tapkee’ is a program for fast dimension reduction (see <http://…/> for more details).
TAR Bayesian Modeling of Autoregressive Threshold Time Series Models
Identification and estimation of the autoregressive threshold models with Gaussian noise, as well as positive-valued time series. The package provides the identification of the number of regimes, the thresholds and the autoregressive orders, as well as the estimation of remain parameters. The package implements the methodology from the 2005 paper: Modeling Bivariate Threshold Autoregressive Processes in the Presence of Missing Data <DOI:10.1081/STA-200054435>.
Tariff Replicate Tariff Method for Verbal Autopsy
Implement Tariff algorithm for coding cause of death from verbal autopsies. It also provides simple graphical representation of individual and population level statistics.
TAShiny Text Analyzer Shiny’
Interactive shiny application for working with textmining and text analytics. Various visualizations are provided.
taskscheduleR Schedule R Scripts and Processes with the Windows Task Scheduler
Schedule R scripts/processes with the Windows task scheduler. This allows R users to automate R processes on specific time points from R itself.
tatest Two-Group Ta-Test
The ta-test is a modified two-sample or two-group t-test of Gosset (1908). In small samples with less than 15 replicates,the ta-test significantly reduces type I error rate but has almost the same power with the t-test and hence can greatly enhance reliability or reproducibility of discoveries in biology and medicine. The ta-test can test single null hypothesis or multiple null hypotheses without needing to correct p-values.
tatoo Combine and Export Data Frames
Functions to combine data.frames in ways that require additional effort in base R, and to add metadata (id, title, …) that can be used for printing and xlsx export. The ‘Tatoo_report’ class is provided as a convenient helper to write several such tables to a workbook, one table per worksheet.
tau Text Analysis Utilities
Utilities for text analysis.
taucharts An R htmlwidget interface to the TauCharts javascript library http://rpubs.com/hrbrmstr/taucharts
taucharts is an R htmlwidget interface to the TauCharts javascript library
TauStar Efficient Computation of the t* Statistic of Bergsma and Dassios (2014)
Computes the t* statistic corresponding to the tau star population coefficient introduced by Bergsma and Dassios (Bernoulli 20(2), 2014, 1006-1028) and does so in O(n^2*log(n)) time. Can provide both the V-statistic and U-statistic related to the tau star measure depending on user preference.
taxa Taxonomic Classes
Provides taxonomic classes for groupings of taxonomic names without data, and those with data. Methods provided are ‘taxonomically aware’, in that they know about ordering of ranks, and methods that filter based on taxonomy also filter associated data.
TaxicabCA Taxicab Correspondence Analysis
Computation and visualization of Taxicab Correspondence Analysis, Choulakian (2006) <doi:10.1007/s11336-004-1231-4>. Classical correspondence analysis (CA) is a statistical method to analyse 2-dimensional tables of positive numbers and is typically applied to contingency tables (Benzecri, J.-P. (1973). L’Analyse des Donnees. Volume II. L’Analyse des Correspondances. Paris, France: Dunod). Classical CA is based on the Euclidean distance. Taxicab CA is like classical CA but is based on the Taxicab or Manhattan distance. For some tables, Taxicab CA gives more informative results than classical CA.
taxizedb Tools for Working with ‘Taxonomic’ Databases
Tools for working with ‘taxonomic’ databases, including utilities for downloading databases, loading them into various ‘SQL’ databases, cleaning up files, and providing a ‘SQL’ connection that can be used to do ‘SQL’ queries directly or used in ‘dplyr’.
taxlist Handling Taxonomic Lists
Handling taxonomic lists through objects of class ‘taxlist’. This package provides functions to import species lists from ‘Turboveg’ (<https://…/turboveg> ) and the possibility to create backups from resulting R-objects. Also quick displays are implemented in the summary-methods.
taxonomizr Functions to Work with NCBI Accessions and Taxonomy
Functions for assigning taxonomy to NCBI accession numbers and taxon IDs based on NCBI’s accession2taxid and taxdump files. This package allows the user to downloads NCBI data dumps and create a local database for fast and local taxonomic assignment.
taxotools Tools to Handle Taxonomic Lists
Some tools to work with taxonomic name lists.
tbd Estimation of Causal Effects with Outcomes Truncated by Death
Estimation of the survivor average causal effect under outcomes truncated by death, which requires the existence of a substitution variable. It can be applied to both experimental and observational data.
TBEST Tree Branches Evaluated Statistically for Tightness
Our method introduces mathematically well-defined measures for tightness of branches in a hierarchical tree. Statistical significance of the findings is determined, for all branches of the tree, by performing permutation tests, optionally with generalized Pareto p-value estimation.
TBFmultinomial TBF Methodology Extension for Multinomial Outcomes
Extends the test-based Bayes factor (TBF) methodology to multinomial regression models and discrete time-to-event models with competing risks. The TBF methodology has been well developed and implemented for the generalised linear model [Held et al. (2015) <doi:10.1214/14-STS510>] and for the Cox model [Held et al. (2016) <doi:10.1002/sim.7089>].
tbl2xts Convert Tibbles or Data Frames to Xts Easily
Facilitate the movement between data frames to ‘xts’. Particularly useful when moving from ‘tidyverse’ to the widely used ‘xts’ package, which is the input format of choice to various other packages. It also allows the user to use a ‘spread_by’ argument for a character column ‘xts’ conversion.
tbm Transformation Boosting Machines
Boosting the likelihood of conditional and shift transformation models.
tbrf Time-Based Rolling Functions
Provides rolling statistical functions based on date and time windows instead of n-lagged observations.
TCA Tensor Composition Analysis
Tensor Composition Analysis (TCA) allows the deconvolution of two-dimensional data (features by observations) coming from a mixture of sources into a three-dimensional matrix of signals (features by observations by sources). TCA further allows to test the features in the data for different statistical relations with an outcome of interest while modeling source-specific effects (TCA regression); particularly, it allows to look for statistical relations between source-specific signals and an outcome. For example, TCA can deconvolve bulk tissue-level DNA methylation data (methylation sites by individuals) into a tensor of cell-type-specific methylation levels for each individual (methylation sites by individuals by cell types) and it allows to detect cell-type-specific relations (associations) with an outcome of interest. For more details see Rahmani et al. (2018) <DOI:10.1101/437368>.
tccox Treatment Choice Cox Model
Builds time-varying covariate terms needed and fits Treatment Choice Cox models (Parametric Treatment Choice, Hybrid Treatment Choice, or Interval Treatment Choice) for observational time-to-event studies. See Troendle, JF, Leifer, E, Zhang Z, Yang, S, and Tewes H (2017) <doi:10.1002/sim.7377>.
tcie Topologically Correct Isosurface Extraction
Isosurfaces extraction algorithms are a powerful tool in the interpretation of volumetric data. The isosurfaces play an important role in several scientific fields, such as biology, medicine, chemistry and computational fluid dynamics. And, for the data to be correctly interpreted, it is crucial that the isosurface be correctly represented. The Marching Cubes algorithm, proposed by Lorensen and Cline <doi:10.1145/37401.37422> in 1987, is clearly one of the most popular isosurface extraction algorithms, and an important tool for many visualization specialists and researchers. The generalized adoption of the Marching Cubes has resulted in many improvements in its algorithm, including, the establishment of the topological correctness of the generated mesh. In 2013, Custodio et al. <doi:10.1016/j.cag.2013.04.004> noted and corrected algorithmic inaccuracies that compromised the topological correctness of the mesh generated by the last version of the Marching Cubes Algorithm: the Marching Cubes 33 proposed by Chernyaev in 1995, implemented in 2003 by Lewiner et al. <doi:10.1080/10867651.2003.10487582>. In 2019, Custodio et al. (in the work An Extended Triangulation to the Marching Cubes 33 Algorithm) proposed an extended triangulation to the Marching Cubes 33 algorithm, in the proposed algorithm the grid vertex are labeled with ‘+’, ‘-‘ and ‘=’, according to the relationship between its scalar field value and the isovalue.The inclusion of the ‘=’ grid vertex label naturally avoids degenerate triangles, a well-known issue in meshes generated by the Marching Cubes. The Marching Cubes algorithm has been implemented using many software programs and compilers: C++, proposed by Lewiner et al. (2003); ‘MATLAB’, proposed by Hammer (2011); and R, proposed by Feng and Tierney (2008). Marching Cubes is also integrated into many visualization toolkits. The complexity of an algorithm increases considerably when it aims to reproduce the topology of the trilinear interpolant correctly. This complexity can sometimes result in errors in the algorithm or in its implementation. During our experiments we observe that all the implementations mentioned have critical issues that compromise the continuity and the topological correctness of the generated mesh. The ‘tcie’ package is a toolkit with a topologically correct implementation of the Marching Cubes algorithm, based on the Custodio et al. work, which implements the most recent improvements of the algorithm.
tclust Robust Trimmed Clustering
Robust Trimmed Clustering
TDA Statistical Tools for Topological Data Analysis
Tools for the statistical analysis of persistent homology and for density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries ‘GUDHI’ <http://…/>, ‘Dionysus’ <http://…/>, and ‘PHAT’ <https://…/>. This package also implements the methods in Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2014) <doi:10.1145/2582112.2582128> for analyzing the statistical significance of persistent homology features.
TDAmapper Analyze High-Dimensional Data Using Discrete Morse Theory
Topological Data Analysis using Mapper (discrete Morse theory). Generate a 1-dimensional simplicial complex from a filter function defined on the data: 1. Define a filter function (lens) on the data. 2. Perform clustering within within each level set and generate one node (vertex) for each cluster. 3. For each pair of clusters in adjacent level sets with a nonempty intersection, generate one edge between vertices. The function mapper1D uses a filter function with codomain R, while the the function mapper2D uses a filter function with codomain R^2.
TDAstats Pipeline for Topological Data Analysis
A comprehensive toolset for any useR conducting topological data analysis, specifically via the calculation of persistent homology in a Vietoris-Rips complex. The tools this package currently provides can be conveniently split into three main sections: (1) calculating persistent homology; (2) conducting statistical inference on persistent homology calculations; (3) visualizing persistent homology and statistical inference. For a general background on computing persistent homology for topological data analysis, see Otter et al. (2017) <doi:10.1140/epjds/s13688-017-0109-5>. To learn more about how the permutation test is used for nonparametric statistical inference in topological data analysis, read Robinson & Turner (2017) <doi:10.1007/s41468-017-0008-7>. To learn more about how TDAstats calculates persistent homology, you can visit the GitHub repository for Ripser, the software that works behind the scenes at <https://…/ripser>.
TDboost A Boosted Tweedie Compound Poisson Model
A boosted Tweedie compound Poisson model using the gradient boosting. It is capable of fitting a flexible nonlinear Tweedie compound Poisson model (or a gamma model) and capturing interactions among predictors.
TDMR Tuned Data Mining in R
Tuned Data Mining in R (‘TDMR’) performs the complete tuning of a data mining task (predictive analytics, that is classification and regression). Preprocessing parameters and modeling parameters can be tuned simultaneously. It incorporates a variety of tuners (among them ‘SPOT’ and ‘CMA’ with package ‘rCMA’) and allows integration of additional tuners. Noise handling in the data mining optimization process is supported, see Koch et al. (2015) <doi:10.1016/j.asoc.2015.01.005>.
tdr Target Diagram
Implementation of target diagrams using ‘lattice’ and ‘ggplot2’ graphics. Target diagrams provide a graphical overview of the respective contributions of the unbiased RMSE and MBE to the total RMSE (Jolliff, J. et al., 2009. ‘Summary Diagrams for Coupled Hydrodynamic-Ecosystem Model Skill Assessment.’ Journal of Marine Systems 76: 64-82.)
tdROC Nonparametric Estimation of Time-Dependent ROC Curve from Right Censored Survival Data
Compute time-dependent ROC curve from censored survival data using nonparametric weight adjustments.
tdsc Time Domain Signal Coding
Functions for performing time domain signal coding as used in Chesmore (2001) <doi:10.1016/S0003-682X(01)00009-3>, and related tasks. This package creates the standard S-matrix and A-matrix (with variable lag), has tools to convert coding matrices into distributed matrices, provides published codebooks and allows for extraction of code sequences.
tea Threshold Estimation Approaches
Different approaches for selecting the threshold in generalized Pareto distributions. Most of them are based on minimizing the AMSE-criterion or at least by reducing the bias of the assumed GPD-model. Others are heuristically motivated by searching for stable sample paths, i.e. a nearly constant region of the tail index estimator with respect to k, which is the number of data in the tail. The third class is motivated by graphical inspection. In addition to the very helpful eva package which includes many goodness of fit tests for the generalized Pareto distribution, the sequential testing procedure provided in Thompson et al. (2009) <doi:10.1016/j.coastaleng.2009.06.003> is also implemented here.
TeachingSampling Selection of Samples and Parameter Estimation in Finite Population
Allows the user to draw probabilistic samples and make inferences from a finite population based on several sampling designs.
teda An Implementation of the Typicality and Eccentricity Data Analysis Framework
The typicality and eccentricity data analysis (TEDA) framework was put forward by Angelov (2013) <DOI:10.14313/JAMRIS_2-2014/16>. It has been further developed into multiple different techniques since, and provides a non-parametric way of determining how similar an observation, from a process that is not purely random, is to other observations generated by the process. This package provides code to use the batch and recursive TEDA methods that have been published.
telefit Estimation and Prediction for Remote Effects Spatial Process Models
Implementation of the remote effects spatial process (RESP) model for teleconnection. The RESP model is a geostatistical model that allows a spatially-referenced variable (like average precipitation) to be influenced by covariates defined on a remote domain (like sea surface temperatures). The RESP model is introduced in Hewitt et al. (2018) <arXiv:1612.06303>. Sample code for working with the RESP model is available at <https://…/resp_example>.
telegram R Wrapper Around the Telegram Bot API
R wrapper around the Telegram Bot API (http://…/api ) to access Telegram’s messaging facilities with ease (eg you send messages, images, files from R to your smartphone).
telegram.bot Develop a ‘Telegram Bot’ with R
Features a number of tools to make the development of ‘Telegram’ bots with R easy and straightforward, providing an easy-to-use interface that takes some work off the programmer. It is built on top of the pure API implementation, being an extension of the ‘telegram’ package, an R wrapper around the ‘Telegram Bot API’ <http://…/api>.
TELP Social Representation Theory Application: The Free Evocation of Words Technique
Using The Free Evocation of Words Technique method with some functions, this package will make a social representation and other analysis. The Free Evocation of Words Technique consists of collecting a number of words evoked by a subject facing exposure to an inducer term. The purpose of this technique is to understand the relationships created between words evoked by the individual and the inducer term. This technique is included in the theory of social representations, therefore, on the information transmitted by an individual, seeks to create a profile that define a social group.
TempCont Temporal Contributions on Trends using Mixed Models
Method to estimate the effect of the trend in predictor variables on the observed trend of the response variable using mixed models with temporal autocorrelation. See Fernández-Martínez et al. (2017 and 2019) <doi:10.1038/s41598-017-08755-8> <doi:10.1038/s41558-018-0367-7>.
templates A System for Working with Templates
Provides tools to work with template code and text in R. It aims to provide a simple substitutions mechanism for R-expressions inside these templates. Templates can be written in other languages like ‘SQL’, can simply be represented by characters in R, or can themselves be R-expressions or functions.
TempleMetrics Estimating Conditional Distributions
Estimates conditional distributions and conditional quantiles. The versions of the methods in this package are primarily for use in multiple step procedures where the first step is to estimate a conditional distribution. In particular, there are functions for implementing distribution regression. Distribution regression provides a way to flexibly model the distribution of some outcome Y conditional on covariates X without imposing parametric assumptions on the conditional distribution but providing more structure than fully nonparametric estimation (See Foresi and Peracchi (1995) <doi:10.2307/2291056> and Chernozhukov, Fernandez-Val, and Melly (2013) <doi:10.3982/ECTA10582>).
tempoR Characterizing Temporal Dysregulation
TEMPO (TEmporal Modeling of Pathway Outliers) is a pathway-based outlier detection approach for finding pathways showing significant changes in temporal expression patterns across conditions. Given a gene expression data set where each sample is characterized by an age or time point as well as a phenotype (e.g. control or disease), and a collection of gene sets or pathways, TEMPO ranks each pathway by a score that characterizes how well a partial least squares regression (PLSR) model can predict age as a function of gene expression in the controls and how poorly that same model performs in the disease. TEMPO v1.0.3 is described in Pietras (2018) <doi:10.1145/3233547.3233559>.
Temporal Parametric Time to Event Analysis
Performs likelihood-based estimation and inference on time to event data, possibly subject to non-informative right censoring. fitParaSurv() provides maximum likelihood estimates of model parameters and distributional characteristics. compParaSurv() compares the mean and median survival experiences of two treatment arms. Candidate distributions currently include the exponential, gamma, generalized gamma, log-logistic, log-normal, and Weibull.
tempR Temporal Sensory Data Analysis
Analysis and visualization of data from temporal sensory methods, including for temporal check-all-that-apply (TCATA) and temporal dominance of sensations (TDS).
tenispolaR Provides ZENIT-POLAR Substitution Cipher Method of Encryption
Implementation of ZENIT-POLAR substitution cipher method of encryption using by default the TENIS-POLAR cipher. This last cipher of encryption became famous through the collection of Brazilian books ‘Os Karas’ by the author Pedro Bandeira. For more details, see ‘A Cryptographic Dictionary’ (GC&CS, 1944).
tensorBF Bayesian Tensor Factorization
Bayesian Tensor Factorization for decomposition of tensor data sets using the trilinear CANDECOMP/PARAFAC (CP) factorization, with automatic component selection. The complete data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The method performs factorization for three-way tensor datasets and the inference is implemented with Gibbs sampling.
tensorBSS Blind Source Separation Methods for Tensor-Valued Observations
Contains several utility functions for manipulating tensor-valued data (centering, multiplication from a single mode etc.) and the implementations of the following blind source separation methods for tensor-valued data: tFOBI, tJADE, tgFOBI, tgJADE and tSOBI.
tensorflow R Interface to TensorFlow
Interface to ‘TensorFlow’ <https://…/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
tensorr Sparse Tensors in R
Provides methods to manipulate and store sparse tensors. Tensors are multidimensional generalizations of matrices (two dimensional) and vectors (one dimensional).
tensr Covariance Inference and Decompositions for Tensor Datasets
A collection of functions for Kronecker structured covariance estimation and testing under the array normal model. For estimation, maximum likelihood and Bayesian equivariant estimation procedures are implemented. For testing, a likelihood ratio testing procedure is available. This package also contains additional functions for manipulating and decomposing tensor data sets. This work was partially supported by NSF grant DMS-1505136.
Ternary Plot Ternary Diagrams
Plots ternary diagrams using the standard graphics functions. An alternative to ‘ggtern’, which uses the ‘ggplot2’ family of plotting functions.
tesseract Open Source OCR Engine
An OCR engine with unicode (UTF-8) support that can recognize over 100 languages out of the box.
testassay A Hypothesis Testing Framework for Validating an Assay for Precision
A common way of validating a biological assay for is through a procedure, where m levels of an analyte are measured with n replicates at each level, and if all m estimates of the coefficient of variation (CV) are less than some prespecified level, then the assay is declared validated for precision within the range of the m analyte levels. Two limitations of this procedure are: there is no clear statistical statement of precision upon passing, and it is unclear how to modify the procedure for assays with constant standard deviation. We provide tools to convert such a procedure into a set of m hypothesis tests. This reframing motivates the m:n:q procedure, which upon completion delivers a 100q% upper confidence limit on the CV. Additionally, for a post-validation assay output of y, the method gives an “effective standard deviation interval” of log(y) plus or minus r, which is a 68% confidence interval on log(mu), where mu is the expected value of the assay output for that sample. Further, the m:n:q procedure can be straightforwardly applied to constant standard deviation assays. We illustrate these tools by applying them to a growth inhibition assay.
TestCor FWER and FDR Controlling Procedures for Multiple Correlation Tests
Different multiple testing procedures for correlation tests are implemented. These procedures were shown to theoretically control asymptotically the Family Wise Error Rate (Roux (2018) <https://…/tel-01971574v1> ) or the False Discovery Rate (Cai & Liu (2016) <doi:10.1080/01621459.2014.999157>). The package gather four test statistics used in correlation testing, four FWER procedures with either single step or stepdown versions, and four FDR procedures.
TestDataImputation Missing Item Responses Imputation for Test and Assessment Data
Functions for imputing missing item responses for dichotomous and polytomous test and assessment data. This package enables missing imputation methods that are suitable for test and assessment data, including: listwise (LW) deletion, treating as incorrect (IN), person mean imputation (PM), item mean imputation (IM), two-way imputation (TW), logistic regression imputation (LR), and EM imputation.
testDriveR Teaching Data for Statistics and Data Science
Provides data sets for teaching statistics and data science courses. It includes a sample of data from John Edmund Kerrich’s famous coinflip experiment. These are data that I use for teaching SOC 4015 / SOC 5050 at Saint Louis University.
testextra Extract Test Blocks
A collection of testing enhancements and utilities. Including utilities for extracting inline test blocks from package source files.
testforDEP Dependence Tests for Two Variables
Provides test statistics, p-value, and confidence intervals based on 10 hypothesis tests for dependence.
TestFunctions Test Functions for Simulation Experiments and Evaluating Optimization and Emulation Algorithms
Test functions are often used to test computer code. They are used in optimization to test algorithms and in metamodeling to evaluate model predictions. This package provides test functions that can be used for any purpose. Some functions are taken from <https://…/~ssurjano>, but their R code is not used.
TestingSimilarity Bootstrap Test for Similarity of Dose Response Curves Concerning the Maximum Absolute Deviation
Provides a bootstrap test which decides whether two dose response curves can be assumed as equal concerning their maximum absolute deviation. A plenty of choices for the model types are available, which can be found in the ‘DoseFinding’ package, which is used for the fitting of the models.
testthis Utils and ‘RStudio’ Addins to Make Testing Even More Fun
Utility functions and ‘RStudio’ addins to ease the life of people using ‘testthat’, ‘devtools’ and ‘usethis’ in their package development workflow. Hotkeyable addins are provided for such common tasks as switching between a source file and an associated test file, or running unit tests in a single file. ‘testthis’ also provides utility function to manage and run tests in subdirectories of the test/testthat directory.
tetraclasse Satisfaction Analysis using Tetraclasse Model and Llosa Matrix
The satisfaction Analysis using the tetraclasse model from Sylvie Llosa. Llosa (1997) <http://…/40592578>.
TeXCheckR Parses LaTeX Documents for Errors
Checks LaTeX documents and .bib files for typing errors, such as spelling errors, incorrect quotation marks. Also provides useful functions for parsing and linting bibliography files.
TexExamRandomizer Personalizes and Randomizes Exams Written in ‘LaTeX’
Randomizing exams with ‘LaTeX’. If you can compile your main document with ‘LaTeX’, the program should be able to compile the randomized versions without much extra effort when creating the document.
texPreview Compile and Preview Snippets of ‘LaTeX’ in ‘RStudio’
Compile and preview snippets of ‘LaTeX’. Can be used directly from the R console, from ‘RStudio’, in Shiny apps and R Markdown documents. Must have ‘pdflatex’ or ‘xelatex’ or ‘lualatex’ in ‘PATH’.
texreg Conversion of R regression output to LaTeX or HTML tables
texreg converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented.
text2speech Text to Speech
Unifies different text to speech engines, such as Google, Microsoft, and Amazon. Text synthesis can be done in any engine with a simple switch of an argument denoting the service requested.
text2vec Fast and Modern Text Mining Framework – Vectorization and Word Embeddings
Very fast and memory-friendly tools for text vectorization and learning word embeddings (GloVe). Also package provides source-agnostic streaming API, which allows to perform analysis of collections of documents, which are much larger the available RAM.
textclean Text Cleaning Tools
Tools to clean and process text.
texteffect Discovering Latent Treatments in Text Corpora and Estimating Their Causal Effects
Implements the approach described in Fong and Grimmer (2016) <https://…/P16-1151.pdf> for automatically discovering latent treatments from a corpus and estimating the average marginal component effect (AMCE) of each treatment. The data is divided into a training and test set. The supervised Indian Buffet Process (sibp) is used to discover latent treatments in the training set. The fitted model is then applied to the test set to infer the values of the latent treatments in the test set. Finally, Y is regressed on the latent treatments in the test set to estimate the causal effect of each treatment.
textfeatures Extracts Features from Text
A tool for extracting some generic features (e.g., number of words, line breaks, characters per word, URLs, lower case, upper case, commas, periods, exclamation points, etc.) from strings of text.
TextForecast Regression Analysis and Forecasting Using Textual Data from a Time-Varying Dictionary
Provides functionalities based on the paper ‘Time Varying Dictionary and the Predictive Power of FED Minutes’ (Lima, 2018) <doi:10.2139/ssrn.3312483>. It selects the most predictive terms, that we call time-varying dictionary using unsupervised machine learning techniques as lasso and elastic net.
textgRid Praat TextGrid Objects in R
The software application Praat can be used to annotate waveform data (e.g., to mark intervals of interest or to label events). (See <http://…/> for more information about Praat.) These annotations are stored in a Praat TextGrid object, which consists of a number of interval tiers and point tiers. An interval tier consists of sequential (i.e., not overlapping) labeled intervals. A point tier consists of labeled events that have no duration. The ‘textgRid’ package provides S4 classes, generics, and methods for accessing information that is stored in Praat TextGrid objects.
textmining Integration of Text Mining and Topic Modeling Packages
A framework for text mining and topic modelling. It provides an easy interface for using different topic modeling methods within R, by integrating the already existing packages. Full functionality of the package requires a local installation of ‘TreeTagger’.
TextoMineR Textual Statistics
Multidimensional statistical methods for textual analysis.
textrank Summarize Text by Ranking Sentences
The ‘textrank’ algorithm is an extension of the ‘Pagerank’ algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the ‘Pagerank’ algorithm which identifies the most important sentences in your text and ranks them. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <http://…/W04-3252>.
textreadr Read Text Documents into R
A small collection of convenience tools for reading text documents into R.
textrecipes Extra ‘Recipes’ for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the ‘recipes’ package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
textreg n-gram Text Regression, aka Concise Comparative Summarization
Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.
textreuse Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
textshape Tools for Reshaping Text
Tools that can be used to reshape text data.
textstem Tools for Stemming and Lemmatizing Text
Tools that stem and lemmatize text. Stemming is a process that removes endings such as affixes. Lemmatization is the process of grouping inflected forms together as a single base form.
textTinyR Text Processing for Small or Big Data Files
Processes big text data files in batches efficiently. For this purpose, it offers functions for splitting, parsing, tokenizing and creating a vocabulary. Moreover, it includes functions for building either a document-term matrix or a term-document matrix and extracting information from those (term-associations, most frequent terms). Lastly, it embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. The source code is based on ‘C++11’ and exported in R through the ‘Rcpp’, ‘RcppArmadillo’ and ‘BH’ packages.
textutils Utilities for Handling Strings and Text
Utilities for handling character vectors that store human-readable text (either plain or with markup, such as HTML or LaTeX). The package provides, in particular, functions that help with the preparation of plain-text reports (e.g. for expanding and aligning strings that form the lines of such reports); the package also provides generic functions for transforming R objects to HTML and to plain text.
tfdatasets Interface to ‘TensorFlow’ Datasets
Interface to ‘TensorFlow’ Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://…/datasets> for additional details.
tfdeploy Deploy ‘TensorFlow’ Models
Tools to deploy ‘TensorFlow’ <https://…/> models across multiple services. Currently, it provides a local server for testing ‘cloudml’ compatible services.
tfestimators Interface to ‘TensorFlow’ Estimators
Interface to ‘TensorFlow’ Estimators <https://…/estimators>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
tfio Interface to ‘TensorFlow IO’
Interface to ‘TensorFlow IO’, datasets and filesystem extensions maintained by ‘TensorFlow SIG-IO’ <https://…/CHARTER.md>.
TFisher Optimal Thresholding Fisher’s P-Value Combination Method
We provide the cumulative distribution function (CDF), quantile, and statistical power calculator for a collection of thresholding Fisher’s p-value combination methods, including Fisher’s p-value combination method, truncated product method and, in particular, soft-thresholding Fisher’s p-value combination method which is proven to be optimal in some context of signal detection. The p-value calculator for the omnibus version of these tests are also included. For reference, please see Hong Zhang and Zheyang Wu. ‘Optimal Thresholding of Fisher’s P-value Combination Tests for Signal Detection’, submitted.
tfruns Training Run Directories for ‘TensorFlow’
Create and manage unique directories for each ‘TensorFlow’ training run. Provides a unique, timestamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.
tfse Tools for Script Editing
A collection of useful tools for programming and writing-scripts. Several functions are simple wrappers around base R functions that extend their functionality while also providing some convenient properties-regular expression functions that automatically detect look-ahead and look-behind statements, a read-line function that suppresses incomplete-final-line warnings and automatically opens and closes connections, a version of substrings that starts from the end of strings, etc. Other functions are useful for working with hexadecimal colors, installing packages, omitting missing data, and showing in-use connections.
tglm Binary Regressions under Independent Student-t Priors
Use Gibbs sampler with Polya-Gamma data augmentation to fit logistic and probit regression under independent Student-t priors (including Cauchy priors and normal priors as special cases).
thankr Find Out Who Maintains the Packages you Use
Find out who maintains the packages you use in your current session or in your package library and maybe say ‘thank you’.
ThankYouStars Give your Dependencies Stars on GitHub!
A tool for starring GitHub repositories.
thief Temporal Hierarchical Forecasting
Methods and tools for generating forecasts at different temporal frequencies using a hierarchical time series approach.
thinkr Tools for Cleaning Up Messy Files
Some tools for cleaning up messy ‘Excel’ files to be suitable for R. People who have been working with ‘Excel’ for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.
thor Interface to ‘LMDB’
Key-value store, implemented as a wrapper around ‘LMDB’; the ‘lightning memory-mapped database’ <https://…/>. ‘LMDB’ is a transactional key value store that uses a memory map for efficient access. This package wraps the entire ‘LMDB’ interface, providing objects for transactions and cursors.
threejs Interactive 3D Scatter Plots and Globes
Create interactive 3D scatter plots and globes using the ‘three.js’ visualization library (http://threejs.org ).
threg Threshold Regression
Fit a threshold regression model based on the first-hitting-time of a boundary by the sample path of a Wiener diffusion process. The threshold regression methodology is well suited to applications involving survival and time-to-event data.
thregI Threshold Regression for Interval-Censored Data with Cure-Rate or without Cure-Rate Model
Fit a threshold regression model for interval-censored data based on the first-hitting-time of a boundary by the sample path of a Wiener diffusion process. The threshold regression methodology is well suited to applications involving survival and time-to-event data.
Thresher Threshing and Reaping for Principal Components
Defines the classes used to identify outliers (threshing) and compute the number of significant principal components and number of clusters (reaping) in a joint application of PCA and hierarchical clustering. See Wang et al., 2018, <doi:10.1186/s12859-017-1998-9>.
threshr Threshold Selection and Uncertainty for Extreme Value Analysis
Provides functions for the selection of thresholds for use in extreme value models, based mainly on the methodology in Northrop, Attalides and Jonathan (2017) <doi:10.1111/rssc.12159>. It also performs predictive inferences about future extreme values, based either on a single threshold or on a weighted average of inferences from multiple thresholds, using the ‘revdbayes’ package <https://…/package=revdbayes>. At the moment only the case where the data can be treated as independent identically distributed observations is considered. See the ‘threshr’ website for more information, documentation and examples.
thriftr Apache Thrift Client Server
Pure R implementation of Apache Thrift. This library doesn’t require any code generation. To learn more about Thrift go to <https://thrift.apache.org>.
thsls Three-Stage Least Squares Estimation for Systems of Simultaneous Equations
Fit the Simultaneous Systems of Linear Equations using Three-stage Least Squares.
tibble Simple Data Frames
Provides a ‘tbl_df’ class that offers better checking and printing capabilities than traditional data frames.
tibbletime Time Aware Tibbles
Built on top of the ‘tibble’ package, ‘tibbletime’ is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time based subsetting on tibbles, quickly summarising and aggregating results by time periods, and calling functions similar in spirit to the map family from ‘purrr’ on time based tibbles.
tidybayes Tidy Data and ‘Geoms’ for Bayesian Models
Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models (‘JAGS’, ‘Stan’, ‘rstanarm’, ‘brms’, ‘MCMCglmm’, ‘coda’, …) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, ‘ggplot2’ ‘geoms’ and ‘stats’ are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.
tidyboot Tidyverse-Compatible Bootstrapping
Compute arbitrary non-parametric bootstrap statistics on data in tidy data frames.
tidycode Analyze Lines of R Code the Tidy Way
Analyze lines of R code using tidy principles. This allows you to input lines of R code and output a data frame with one row per function included. Additionally, it facilitates code classification via included lexicons.
tidygraph A Tidy API for Graph Manipulation
A graph, while not ‘tidy’ in itself, can be thought of as two tidy data frames describing node and edge data respectively. ‘tidygraph’ provides an approach to manipulate these two virtual data frames using the API defined in the ‘dplyr’ package, as well as provides tidy interfaces to a lot of common graph algorithms.
tidyimpute Imputation the Tidyverse Way
Functions and methods for imputing missing values (NA) in tables and list patterned after the tidyverse approach of ‘dplyr’ and ‘rlang’; works with data.tables as well.
tidyinftheo Some Information-Theoretic Functions in the ‘Tidy’ Style
A frontend to the Shannon entropy, conditional Shannon entropy, and mutual information calculations provided by the ‘infotheo’ package. See Cover and Thomas (2001) <doi:10.1002/0471200611> for an explanation of these measures. Also provides a convenient heatmap to compare more than two columns in a pairwise fashion.
tidyjson A Grammar for Turning ‘JSON’ into Tidy Tables
An easy and consistent way to turn ‘JSON’ into tidy data frames that are natural to work with in ‘dplyr’, ‘ggplot2’ and other tools.
tidylog Logging for ‘dplyr’ Functions
Provides feedback about basic ‘dplyr’ operations.
tidyLPA Easily Carry Out Latent Profile Analysis
An interface to the ‘mclust’ package to easily carry out latent profile analysis (‘LPA’). Provides functionality to estimate commonly-specified models. Follows a tidy approach, in that output is in the form of a data frame that can subsequently be computed on. Also has functions to interface to the commercial ‘MPlus’ software via the ‘MplusAutomation’ package.
tidymodels Easily Install and Load the ‘Tidymodels’ Packages
The tidy modeling ‘verse’ is a collection of package for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
tidymv Tidy Model Visualisation for Generalised Additive Models
Provides functions for visualising generalised additive models and getting predicted values using tidy tools from the ‘tidyverse’ packages.
tidync A Tidy Approach to ‘NetCDF’ Data Exploration and Extraction
Tidy tools for ‘NetCDF’ data sources. Explore the contents of a ‘NetCDF’ source (file or URL) presented as variables organized by grid with a database-like interface. The hyper_filter() interactive function translates the filter value or index expressions to array-slicing form. No data is read until explicitly requested, as a data frame or list of arrays via hyper_tibble() or hyper_array().
tidyposterior Bayesian Analysis to Compare Models using Resampling Statistics
Bayesian analysis used here to answer the question: ‘when looking at resampling results, are the differences between models ‘real’?’ To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model’s affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) <http://…/16-305.html>.
tidypredict Run Predictions Inside the Database
It parses a fitted ‘R’ model object, and returns a formula in ‘Tidy Eval’ code that calculates the predictions. It works with several databases back-ends because it leverages ‘dplyr’ and ‘dbplyr’ for the final ‘SQL’ translation of the algorithm. It currently supports lm(), glm() and randomForest() models.
tidyquant Tidy Quantitative Financial Analysis
Bringing quantitative financial analysis to the ‘tidyverse’. The ‘tidyquant’ package provides a convenient wrapper to various ‘xts’, ‘quantmod’ and ‘TTR’ package functions and returns the objects in the tidy ‘tibble’ format. The main advantage is being able to use quantitative functions with the ‘tidyverse’ functions including ‘purrr’, ‘dplyr’, ‘tidyr’, ‘ggplot2’, ‘lubridate’, etc. See the ‘tidyquant’ website for more information, documentation and examples.
tidyr Easily Tidy Data with spread() and gather() Functions
An evolution of reshape2. It’s designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines.
tidyRSS Tidy RSS for R
With the objective of including data from RSS feeds into your analysis, tidyRSS parses RSS and Atom xml feeds and returns a tidy data frame.
tidyselect Select from a Set of Strings
A backend for the selecting functions of the ‘tidyverse’. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other ‘tidyverse’ interfaces for selection.
tidystats Create a Tidy Statistics Output File
Produce a data file containing the output of statistical models and assist with a workflow aimed at writing scientific papers using ‘R Markdown’. Supported statistical functions are: t.test(), cor.test(), lm(), aov(), anova(). The package is based on tidy principles (i.e., the ‘tidyverse’; Wickham, 2017).
tidystopwords Customizable Lists of Stopwords in 53 Languages
Functions to generate stopword lists in 53 languages, in a way consistent across all the languages supported. The generated lists are based on the morphological tagset from the Universal Dependencies.
tidystringdist String Distance Calculation with Tidy Data Principles
Calculation of string distance following the tidy data principles. Built on top of the ‘stringdist’ package.
tidytext Text Mining using ‘dplyr’, ‘ggplot2’, and Other Tidy Tools
Text mining for word processing and sentiment analysis using ‘dplyr’, ‘ggplot2’, and other tidy tools.
tidytidbits A Collection of Tools and Helpers Extending the Tidyverse
A selection of various tools to extend a data analysis workflow based on the ‘tidyverse’ packages. This includes high-level data frame editing methods (in the style of ‘mutate’/’mutate_at’), some methods in the style of ‘purrr’ and ‘forcats’, ‘lookup’ methods for dict-like lists, a generic method for lumping a data frame by a given count, various low-level methods for special treatment of ‘NA’ values, ‘python’-style tuple-assignment and ‘truthy’/’falsy’ checks, saving to PDF and PNG from a pipe and various small utilities.
tidytransit Read, Validate, Analyze, and Map Files in the General Transit Feed Specification
Read General Transit Feed Specification (GTFS) zipfiles into a list of R dataframes. Perform validation of the data structure against the specification. Analyze the headways and frequencies at routes and stops. Create maps and perform spatial analysis on the routes and stops. Please see the GTFS documentation here for more detail: <http://…/>.
tidyxl Read Untidy Excel Files
Imports non-tabular from Excel files into R. Exposes cell content, position and formatting in a tidy structure for further manipulation. Provides functions for selecting cells by position and relative position, and for associating data cells with header cells by proximity in given directions. Supports ‘.xlsx’ and ‘.xlsm’ via the embedded ‘RapidXML’ C++ library <http://rapidxml.sourceforge.net>. Does not support ‘.xlsb’ or ‘.xls’.
tigerhitteR Pre-Process of Time Series Data Set in R
Pre-process for discrete time series data set which is not continuous at the column of ‘date’. Refilling records of missing ‘date’ and other columns to the hollow data set so that final data set is able to be dealt with time series analysis.
tigreBrowserWriter tigreBrowser’ Database Writer
Write modelling results into a database for ‘tigreBrowser’, a web-based tool for browsing figures and summary data of independent model fits, such as Gaussian process models fitted for each gene or other genomic element. The browser is available at <https://…/tigreBrowser>.
tilegramsR R Spatial Data for Tilegrams
R spatial objects for Tilegrams. Tilegrams are tiled maps where the region size is proportional to the certain characteristics of the dataset.
TileManager Tile Manager
Tools for creating and detecting tiling schemes for raster datasets.
tiler Create Geographic and Non-Geographic Map Tiles
Creates geographic map tiles from geospatial map files or non-geographic map tiles from simple image files. This package provides a tile generator function for creating map tile sets for use with packages such as ‘leaflet’. In addition to generating map tiles based on a common raster layer source, it also handles the non-geographic edge case, producing map tiles from arbitrary images. These map tiles, which have a non-geographic, simple coordinate reference system (CRS), can also be used with ‘leaflet’ when applying the simple CRS option. Map tiles can be created from a input file with any of the following extensions: tif, grd and nc for spatial maps and png, jpg and bmp for basic images. This package requires ‘Python’ and the ‘gdal’ library for ‘Python’. ‘Windows’ users are recommended to install ‘OSGeo4W’ (<https://…/> ) as an easy way to obtain the required ‘gdal’ support for ‘Python’.
tilting Variable Selection via Tilted Correlation Screening Algorithm
Implements an algorithm for variable selection in high-dimensional linear regression using the ’tilted correlation’, a new way of measuring the contribution of each variable to the response which takes into account high correlations among the variables in a data-driven way.
time2event Survival and Competing Risk Analyses with Time-to-Event Data as Covariates
Cox proportional hazard and competing risk regression analyses can be performed with time-to-event data as covariates.
timechange Efficient Changing of Date-Times
Efficient routines for manipulation of date-time objects while accounting for time-zones and daylight saving times. The package includes utilities for updating of date-time components (year, month, day etc.), modification of time-zones, rounding of date-times, period addition and subtraction etc. Parts of the ‘CCTZ’ source code, released under the Apache 2.0 License, are included in this package. See <https://…/cctz> for more details.
timekit Simplified and Extensible Time Series Coercion Tools
Coerces between time-based tibbles (‘tbl’) and the primary time series object classes including ‘xts’, ‘zoo’, and ‘ts’. Additionally, provides methods for maintaining the non-regularized time index during coercion to regularized time series objects.
timelineR Visualization for Time Series Data
Helps to visualize multi-variate time-series having numeric and factor variables. You can use the package for visual analysis of data by plotting the data for each variable in the desired order and study interaction between a factor and a numeric variable by creating overlapping plots.
timelineS Timeline and Time Duration-Related Tools
An easy tool for plotting annotated timelines, grouped timelines, and exploratory graphics (boxplot/histogram/density plot/scatter plot/line plot). Filter, summarize date data by duration and convert to calendar units.
timeR Time Your Codes
Provides a ‘timer’ class that makes timing codes easier. One can create ‘timer’ objects and use them to record all timings, and extract recordings as data frame for later use.
TimeSeries.OBeu Time Series Analysis ‘OpenBudgets’
Estimate and return the needed parameters for visualisations designed for ‘OpenBudgets’ <http://…/> time series data. Calculate time series model and forecast parameters in Budget time series data of municipalities across Europe, according to the ‘OpenBudgets’ data model. There are functions for measuring deterministic and stochastic trend of the input time series data, decomposing with local regression models or seasonal trend decomposition, modelling the appropriate auto regressive integrated moving average model and provide forecasts for the input ‘OpenBudgets’ time series fiscal data. Also, can be used generally to extract visualisation parameters convert them to ‘JSON’ format and use them as input in a different graphical interface.
timeseriesdb Store and Organize Time Series in a Database
An R package to store and organize time series and their multi-lingual meta information in a database. The timeseriesdb package suggests a simple yet powerful database structure to store a large amount of time series in a relational PostgreSQL database. The package provides an interface to the R user to create, update and delete time series.
timetk A Tool Kit for Working with Time Series in R
Get the time series index, signature, and summary from time series objects and time-based tibbles. Create future time series based on properties of existing time series index. Coerce between time-based tibbles (‘tbl’) and ‘xts’, ‘zoo’, and ‘ts’.
timevis Create Interactive Timeline Visualizations in R
Create rich and fully interactive timeline visualizations. Timelines can be included in Shiny apps and R markdown documents, or viewed from the R console and RStudio Viewer. ‘timevis’ includes an extensive API to manipulate a timeline after creation, and supports getting data out of the visualization into R. Based on the ‘vis.js’ Timeline module and the ‘htmlwidgets’ R package.
TimeVTree Survival Analysis of Time Varying Coefficients Using a Tree-Based Approach
Estimates time varying regression effects under Cox type models in survival data using classification and regression tree. The codes in this package were originally written in S-Plus for the paper ‘Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach,’ by Xu, R. and Adak, S. (2002) <doi:10.1111/j.0006-341X.2002.00305.x>, Biometrics, 58: 305-315. Development of this package was supported by NIH grants AG053983 and AG057707, and by the UCSD Altman Translational Research Institute, NIH grant UL1TR001442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The example data are from the Honolulu Heart Program/Honolulu Asia Aging Study (HHP/HAAS).
tinsel Transform Functions using Decorators
Instead of nesting function calls, annotate and transform functions using ‘#.’ comments.
tint Tint is not Tufte
A ‘tufte’-alike style for ‘rmarkdown’.
tinter Generate a Monochromatic Palette
Generate a palette of tints, shades or both from a single colour.
tinyProject A Lightweight Template for Data Analysis Projects
Creates useful files and folders for data analysis projects and provides functions to manage data, scripts and output files. Also provides a project template for ‘Rstudio’.
tinytest Lightweight but Feature Complete Unit Testing Framework
Provides a lightweight (zero-dependency) and easy to use unit testing framework. Main features: easily install tests with the package. Test results are treated as data that can be stored and manipulated. Test files are R scripts interspersed with test commands, that can be programmed over. Fully automated build-install-test sequence for packages. Skip tests when not run locally (e.g. on CRAN). Flexible and configurable output printing. Compare computed output with output stored with the package.
tinytex Helper Functions to Install and Maintain ‘LaTeX’, and Compile ‘LaTeX’ Documents
Helper functions to install and maintain the ‘LaTeX’ distribution named ‘TinyTeX’ (<https://…/> ), a lightweight and portable version of ‘TeX Live’. This package also contains helper functions to compile ‘LaTeX’ documents, and install missing ‘LaTeX’ packages automatically.
TippingPoint Enhanced Tipping Point Displays the Results of Sensitivity Analyses for Missing Data
Using the idea of ‘tipping point’ (proposed in Gregory Campbell, Gene Pennello and Lilly Yue(2011) <DOI:10.1080/10543406.2011.550094>) to visualize the results of sensitivity analysis for missing data, the package provides a set of functions to list out all the possible combinations of the values of missing data in two treatment arms, calculate corresponding estimated treatment effects and p values and draw a colored heat-map to visualize them. It could deal with randomized experiments with a binary outcome or a continuous outcome. In addition, the package provides a visualized method to compare various imputation methods by adding the rectangles or convex hulls on the basic plot.
tippy Add Tooltips to ‘R markdown’ Documents or ‘Shiny’ Apps
Htmlwidget’ of ‘Tippyjs’ to add tooltips to ‘Shiny’ apps and ‘R markdown’ documents.
tipr Tipping Point Analyses
The strength of evidence provided by epidemiological and observational studies is inherently limited by the potential for unmeasured confounding. We focus on three key quantities: the observed bound of the confidence interval closest to the null, a plausible residual effect size for an unmeasured continuous or binary confounder, and a realistic mean difference or prevalence difference for this hypothetical confounder. Building on the methods put forth by Lin, Psaty, & Kronmal (1998) <doi:10.2307/2533848>, we can use these quantities to assess how an unmeasured confounder may tip our result to insignificance, rendering the study inconclusive.
tkRplotR Display Resizable Plots
Display a plot in a Tk canvas.
Tlasso Non-Convex Optimization and Statistical Inference for Sparse Tensor Graphical Models
An optimal alternating optimization algorithm for estimation of precision matrices of sparse tensor graphical models, and an efficient inference procedure for support recovery of the precision matrices.
TLBC Two-Level Behavior Classification
Contains functions for training and applying two-level random forest and hidden Markov models for human behavior classification from raw tri-axial accelerometer and/or GPS data. Includes functions for training a two-level model, applying the model to data, and computing performance.
TLMoments Calculate TL-Moments and Convert Them to Distribution Parameters
Calculates empirical TL-moments (trimmed L-moments) of arbitrary order and trimming, and converts them to distribution parameters.
tls Tools of Total Least Squares in Error-in-Variables Models
Functions for point and interval estimation in error-in-variables models via total least squares or generalized total least squares method. See Golub and Van Loan (1980) <doi:10.1137/0717073>, Gleser (1981) <https://…/2240867>, Ivan Markovsky and Huffel (2007) <doi:10.1016/j.sigpro.2007.04.004> for more information.
tm Text Mining Package
A framework for text mining applications within R.
tmap Thematic Maps
Thematic maps are geographical maps in which statistical data are visualized. This package offers a flexible, layer-based, way to create thematic maps, such as choropleths and bubble maps.
tmaptools Thematic Map Tools
Set of tools for reading and processing spatial data. The aim is to supply the workflow to create thematic maps. This package also facilitates tmap, the package for visualizing thematic maps.
TMB Template Model Builder: A General Random Effect Tool Inspired by ADMB
With this tool, a user should be able to quickly implement complex random effect models through simple C++ templates. The package combines CppAD (C++ automatic differentiation), Eigen (templated matrix-vector library) and CHOLMOD (sparse matrix routines available from R) to obtain an efficient implementation of the applied Laplace approximation with exact derivatives. Key features are: Automatic sparseness detection, parallelism through BLAS and parallel user templates.
tmbstan MCMC Sampling from ‘TMB’ Model Object using ‘Stan’
Enables all ‘rstan’ functionality for a ‘TMB’ model object, in particular MCMC sampling and chain visualization. Sampling can be performed with or without Laplace approximation for the random effects.
tmcn A Text Mining Toolkit for Chinese
A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support ‘tm’ package in Chinese.
Tmisc Turner Miscellaneous
Miscellaneous data and utility functions for manipulating data and your R environment.
tmle Targeted Maximum Likelihood Estimation
tmle implements targeted maximum likelihood estimation, first described in van der Laan and Rubin, 2006 (Targeted Maximum Likelihood Learning, The International Journal of biostatistics, 2(1), 2006. This version adds the tmleMSM function to the package, for estimating the parameters of a marginal structural model (MSM) for a binary point treatment effect. The tmle function calculates the adjusted marginal difference in mean outcome associated with a binary point treatment, for continuous or binary outcomes. Relative risk and odds ratio estimates are also reported for binary outcomes. Missingness in the outcome is allowed, but not in treatment assignment or baseline covariate values. Effect estimation stratified by a binary mediating variable is also available. The population mean is calculated when there is missingness, and no variation in the treatment assignment. An ID argument can be used to identify repeated measures. Default settings call SuperLearner to estimate the Q and g portions of the likelihood, unless values or a user-supplied regression function are passed in as arguments.
tmlenet Targeted Maximum Likelihood Estimation for Network Data
Estimation of average causal effects for single time point interventions in network-dependent data (e.g., in the presence of spillover and/or interference). Supports arbitrary interventions (static or stochastic). Implemented estimation algorithms are the targeted maximum likelihood estimation (TMLE), the inverse-probability-of-treatment (IPTW) estimator and the parametric G-computation formula estimator. Asymptotically correct influence-curve-based confidence intervals are constructed for the TMLE and IPTW. The data are assumed to consist of rows of unit-specific observations, each row i represented by variables (F.i,W.i,A.i,Y.i), where F.i is a vector of friend IDs of unit i (i’s network), W.i is a vector of i’s baseline covariates, A.i is i’s exposure (can be binary, categorical or continuous) and Y.i is i’s binary outcome. Exposure A.i depends on (multivariate) user-specified baseline summary measure(s) sW.i, where sW.i is any function of i’s baseline covariates W.i and the baseline covariates of i’s friends in F.i. Outcome Y.i depends on sW.i and (multivariate) user-specified summary measure(s) sA.i, where sA.i is any function of i’s baseline covariates and exposure (W.i,A.i) and the baseline covariates and exposures of i’s friends. The summary measures are defined with functions def.sW and def.sA. See ?’tmlenet-package’ for a general overview.
tmt Estimation of the Rasch Model for Multistage Tests
Provides conditional maximum likelihood (CML) estimation of item parameters in multistage designs (Zwitser & Maris, 2013, <doi:10.1007/s11336-013-9369-6>) and CML estimation for conventional designs. Additional features are the likelihood ratio test (Andersen, 1973, <doi:10.1007/BF02291180>) and simulation of multistage designs.
tmuxr Manage ‘tmux’
Create, control, and record ‘tmux’ sessions, windows, and panes using a pipeable API.
tmvmixnorm Efficient Sampling Truncated Scale of Normals with Constraints
Efficient sampling of truncated multivariate (scale) mixtures of normals under linear inequality constraints is nontrivial due to the analytically intractable normalizing constant. Meanwhile, traditional methods may subject to numerical issues, especially when the dimension is high and dependence is strong. Algorithms proposed by Li and Ghosh (2015) <doi: 10.1080/15598608.2014.996690> are adopted for overcoming difficulties in simulating truncated distributed random numbers. Efficient rejection sampling for simulating truncated univariate normal distribution is included in the package, which shows superiority in terms of acceptance rate and numerical stability compared to existing methods and R packages. An efficient function for sampling from truncated multivariate normal distribution subject to convex polytope restriction regions based on Gibbs sampler for conditional truncated univariate distribution is provided. By extending the sampling method, a function for sampling truncated multivariate Student-t distribution is also developed. Moreover, the proposed method and computation remain valid for high dimensional and strong dependence scenarios. Empirical results in Li and Ghosh (2015) <doi: 10.1080/15598608.2014.996690> illustrated the superior performance in terms of various criteria (e.g. mixing and integrated auto-correlation time).
tmvnsim Truncated Multivariate Normal Simulation
Importance sampling from the truncated multivariate normal using the GHK (Geweke-Hajivassiliou-Keane) simulator. Unlike Gibbs sampling which can get stuck in one truncation sub-region depending on initial values, this package allows truncation based on disjoint regions that are created by truncation of absolute values. The GHK algorithm uses simple Cholesky transformation followed by recursive simulation of univariate truncated normals hence there are also no convergence issues. Importance sample is returned along with sampling weights, based on which, one can calculate integrals over truncated regions for multivariate normals.
tnam Temporal Network Autocorrelation Models (TNAM)
Temporal and cross-sectional network autocorrelation models (TNAM).
TNC Temporal Network Centrality (TNC) Measures
Node centrality measures for temporal networks. Available measures are temporal degree centrality, temporal closeness centrality and temporal betweenness centrality defined by Kim and Anderson (2012) <doi:10.1103/PhysRevE.85.026107>. Applying the REN algorithm by Hanke and Foraita (2017) <doi:10.1186/s12859-017-1677-x> when calculating the centrality measures keeps the computational running time linear in the number of graph snapshots. Further, all methods can run in parallel up to the number of nodes in the network.
TOC Total Operating Characteristic Curve and ROC Curve
Construction of the Total Operating Characteristic (TOC) Curve and the Receiver (aka Relative) Operating Characteristic (ROC) Curve for spatial and non-spatial data. The TOC method is a modification of the ROC method which measures the ability of an index variable to diagnose either presence or absence of a characteristic. The diagnosis depends on whether the value of an index variable is above a threshold. Each threshold generates a two-by-two contingency table, which contains four entries: hits (H), misses (M), false alarms (FA), and correct rejections (CR). While ROC shows for each threshold only two ratios, H/(H + M) and FA/(FA + CR), TOC reveals the size of every entry in the contingency table for each threshold (Pontius Jr., R.G., Si, K. 2014. The total operating characteristic to measure diagnostic ability for multiple thresholds. Int. J. Geogr. Inf. Sci. 28 (3), 570-583).
todor Find All TODO Comments and More
This is a simple addin to ‘RStudio’ that finds all ‘TODO’, ‘FIX ME’, ‘CHANGED’ etc. comments in your project and shows them as a markers list.
togglr Toggl.com’ Api for ‘Rstudio’
Use the <http://toggl.com> time tracker api through R.
tokenbrowser Create Full Text Browsers from Annotated Token Lists
Create browsers for reading full texts from a token list format. Information obtained from text analyses (e.g., topic modeling, word scaling) can be used to annotate the texts.
tokenizers Tokenize Text
Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the ‘stringi’ package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.
tolBasis Fundamental Definitions and Utilities of the Time Oriented Language (TOL)
Imports the fundamental definitions and utilities of the Time Oriented Language (TOL), focused on time series analysis and stochastic processes, and provides the basis for the integration of TOL in R. See <https://www.tol-project.org> for more information about the TOL project.
toolmaRk Tests for Same-Source of Toolmarks
Implements two tests for same-source of toolmarks. The chumbley_non_random() test follows the paper ‘An Improved Version of a Tool Mark Comparison Algorithm’ by Hadler and Morris (2017) <doi:10.1111/1556-4029.13640>. This is an extension of the Chumbley score as previously described in ‘Validation of Tool Mark Comparisons Obtained Using a Quantitative, Comparative, Statistical Algorithm’ by Chumbley et al (2010) <doi:10.1111/j.1556-4029.2010.01424.x>. fixed_width_no_modeling() is based on correlation measures in a diamond shaped area of the toolmark as described in Hadler (2017).
tools4uplift Tools for Uplift Modeling
Uplift modeling aims at predicting the causal effect of an action such as medical treatment or a marketing campaign on a particular individual by taking into consideration the response to a treatment. In order to simplify the task for practitioners in uplift modeling, we propose a combination of tools that can be separated into the following ingredients: i) categorization, ii) visualization, iii) feature engineering, iv) feature selection and v) model validation. For a review of uplift modeling, please read Gutierrez and G<c3><a9>rardy (2017) <http://…/gutierrez17a.html>.
ToolsForCoDa Multivariate Tools for Compositional Data Analysis
Provides functions for multivariate analysis with compositional data. Includes a function for doing compositional canonical correlation analysis. This analysis requires two data matrices of compositions, which can be adequately transformed and used as entries in a specialized program for canonical correlation analysis, that is able to deal with singular covariance matrices. The methodology is described in Graffelman et al. (2017) <doi:10.1101/144584>.
TooManyCellsR An R Wrapper for ‘TooManyCells’
An R wrapper for using ‘TooManyCells’, a command line program for clustering, visualizing, and quantifying cell clade relationships. See <https://…/> for more details.
toOrdinal Function for Converting Cardinal to Ordinal Numbers by Adding a Language Specific Ordinal Indicator to the Number
Function for converting cardinal to ordinal numbers by adding a language specific ordinal indicator (http://…/Ordinal_indicator ) to the number.
tor Import Multiple Files From a Single Directory at Once
The goal of tor (to-R) is to help you to import multiple files from a single directory at once, and to do so as quickly, flexibly, and simply as possible. It makes a frequent, task less painful.
tosca Tools for Statistical Content Analysis
A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the ‘lda’ package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang’s intruder words and intruder topics is provided.
TOSTER Two One-Sided Tests (TOST) Equivalence Testing
Two one-sided tests (TOST) procedure to test equivalence for t-tests, correlations, and meta-analyses, including power analysis for t-tests and correlations. Allows you to specify equivalence bounds in raw scale units or in terms of effect sizes.
touchard Touchard Model and Regression
Tools for analyzing count data with the Touchard model. It includes univariate estimation (ML and MM) and regression tools, following Matsushita et al. (2018) The Touchard distribution, Communications in Statistics – Theory and Methods <doi:10.1080/03610926.2018.1444177>.
toxboot Bootstrap Methods for ‘ToxCast’ High Throughput Screening Data
Provides methods to use bootstrapping to quantify uncertainty in fitting ‘ToxCast’ concentration response data. Data is stored in memory, written to file, or stored in ‘MySQL’ or ‘MongoDB’ databases.
toxplot Batch Processing, Modeling and Visualizing the Dose-Response of High-Throughput Screening Bioassay
A convenient interface to batch process high-throughput toxicology bioassay screening data. It’s designed specifically for screening experiment that features a primary inhibition-type assay and a companion cytotoxicity assay. This package provides functions for data normalization, quality-control analysis, dose-response curve fitting (using the Hill model provided in the ‘tcpl’ package), visualization, and a unique toxicity-adjusted potency ranking system.
tpAUC Estimation and Inference of Two-Way pAUC, pAUC and pODC
Tools for estimating and inferring two-way partial area under receiver operating characteristic curves (two-way pAUC), partial area under receiver operating characteristic curves (pAUC), and partial area under ordinal dominance curves (pODC). Methods includes Mann-Whitney statistic and Jackknife, etc. Plots of regions under corresponding curves can also be generated.
TPD Methods for Measuring Functional Diversity Based on Trait Probability Density
Tools to calculate trait probability density functions (TPD) at any scale (e.g. populations, species, communities). TPD functions are used to compute several indices of functional diversity, as well as its partition across scales. These indices constitute a unified framework that incorporates the underlying probabilistic nature of trait distributions into uni- or multidimensional functional trait-based studies. See Carmona et al. (2016) <doi:10.1016/j.tree.2016.02.003> for further information.
TPEA A Novel Topology-Based Pathway Enrichment Analysis Approach
We described a novel Topology-based pathway enrichment analysis (TPEA), which integrated the global position of the nodes and the topological property of the pathways in KEGG Database.
tracer Slick Call Stacks
Better looking call stacks after an error.
trackdem Particle Tracking and Demography
Obtain population density and body size structure, using video material or image sequences as input. Functions assist in the creation of image sequences from videos, background detection and subtraction, particle identification and tracking. An artificial neural network can be trained for noise filtering. The goal is to supply accurate estimates of population size, structure and/or individual behavior, for use in evolutionary and ecological studies.
trackeR Infrastructure for Running and Cycling Data from GPS-Enabled Tracking Devices
The aim of this package is to provide infrastructure for handling running and cycling data from GPS-enabled tracking devices. After extraction and appropriate manipulation of the training or competition attributes, the data are placed into session-based and unit-aware data objects of class trackeRdata (S3 class). The information in the resultant data objects can then be visualised, summarised, and analysed through corresponding flexible and extensible methods.
trackr Semantic Annotation and Discoverability System for R-Based Artifacts
Automatically annotates R-based artifacts with relevant descriptive and provenance-related and provides a backend-agnostic storage and discoverability system for organizing, retrieving, and interrogating such artifacts.
trade Tools for Trade Practitioners
A collection of tools for trade practitioners, including the ability to calibrate different consumer demand systems and simulate the effects of tariffs and quotas under different competitive regimes.
TRADER Tree Ring Analysis of Disturbance Events in R
Tree Ring Analysis of Disturbance Events in R (TRADER) package provides only one way for disturbance reconstruction from tree-ring data.
tradestatistics Open Trade Statistics API Wrapper and Utility Program
Access Open Trade Statistics API from R to download international trade data.
Trading Trades, Curves, Rating Tables, Add-on Tables, CSAs
Contains trades from the five major assets classes and also functionality to use pricing curves, rating tables, CSAs and add-on tables. The implementation follows an object oriented logic whereby each trade inherits from more abstract classes while also the curves/tables are objects. There is a lot of functionality focusing on the counterparty credit risk calculations however the package can be used for trading applications in general.
trafo Estimation, Comparison and Selection of Transformations
Estimation, selection and comparison of several families of transformations. The families of transformations included in the package are the following: Bickel-Doksum (Bickel and Doksum 1981 <doi:10.2307/2287831>), Box-Cox, Dual (Yang 2006 <doi:10.1016/j.econlet.2006.01.011>), Glog (Durbin et al. 2002 <doi:10.1093/bioinformatics/18.suppl_1.S105>), Gpower1, Log, Log-shift opt (Feng et al. 2016 <doi:10.1002/sta4.104>), Manly, Modulus (John and Draper 1980 <doi:10.2307/2986305>), Neglog (Whittaker et al. 2005 <doi:10.1111/j.1467-9876.2005.00520.x>), Reciprocal and Yeo-Johnson. The package simplifies to compare linear models with untransformed and transformed dependent variable as well as linear models where the dependent variable is transformed with different transformations. Furthermore, the package employs maximum likelihood approaches, moments optimization and divergence minimization to estimate the optimal transformation parameter.
traitdataform Formatting and Harmonizing Ecological Trait-Data
Assistance for handling ecological trait data and applying the Ecological Trait-Data Standard terminology (Schneider et al. 2018 <doi:10.1101/328302>). There are two major use cases: (1) preparation of own trait datasets for upload into public data bases, and (2) harmonizing trait datasets from different sources by re-formatting them into a unified format. See ‘traitdataform’ website for full documentation.
traj Trajectory Analysis
Implements the three-step procedure proposed by Leffondree et al. (2004) to identify clusters of individual longitudinal trajectories. The procedure involves (1) calculating 24 measures describing the features of the trajectories; (2) using factor analysis to select a subset of the 24 measures and (3) using cluster analysis to identify clusters of trajectories, and classify each individual trajectory in one of the clusters.
TrajDataMining Trajectories Data Mining
Contains a set of methods for trajectory data preparation, such as filtering, compressing and clustering, and for trajectory pattern discovery.
trajr Animal Trajectory Analysis
A toolbox to assist with statistical analysis of 2-dimensional animal trajectories. It provides simple access to algorithms for calculating and assessing a variety of characteristics such as speed and acceleration, as well as multiple measures of straightness or tortuosity. Turchin (1998, ISBN:0878938478).
tram Transformation Models
Formula-based user-interfaces to specific transformation models implemented in package ‘mlt’. Available models include Cox models, some parametric survival models (Weibull, etc.), models for ordered categorical variables, normal and non-normal (Box-Cox type) linear models, and continuous outcome logistic regression (Lohse et al., 2017, <DOI:10.12688/f1000research.12934.1>). The underlying theory is described in Hothorn et al. (2018) <DOI:10.1111/sjos.12291>.
transcribeR Automated Transcription of Audio Files Through the HP IDOL API
Transcribes audio to text with the HP IDOL API. Includes functions to upload files, retrieve transcriptions, and monitor jobs.
transformr Polygon and Path Transformations
In order to smoothly animate the transformation of polygons and paths, many aspects needs to be taken into account, such as differing number of control points, changing center of rotation, etc. The ‘transformr’ package provides an extensive framework for manipulating the shapes of polygons and paths and can be seen as the spatial brother to the ‘tweenr’ package.
translateSPSS2R Toolset for Translating SPSS-Syntax to R-Code
Package with translated commands of SPSS. The usage is oriented on the handling of SPSS-Syntax. Mainly the package has two purposes: It facilitates SPSS-Users to change over to R and aids migration projects from SPSS to R.
TransP Implementation of Transportation Problem Algorithms
Implementation of two transportation problem algorithms. 1. North West Corner Method 2. Minimum Cost Method or Least cost method. For more technical details about the algorithms please refer below URLs. <http://…/nw.htm>. <http://…/chapter7.pdf>.
tranSurv Estimating a Survival Distribution in the Presence of Dependent Left Truncation and Right Censoring
A structural transformation model for a latent, quasi-independent truncation time as a function of the observed dependent truncation time and the event time, and an unknown dependence parameter. The dependence parameter is chosen to minimize the conditional Kendall’s tau. The marginal distribution for the truncation time and the event time are completely left unspecified.
trawl Estimation and Simulation of Trawl Processes
Contains R functions for simulating and estimating integer-valued trawl processes as described in the article ‘Modelling, simulation and inference for multivariate time series of counts using trawl processes’ by A. E. D. Veraart (Journal of Multivariate Analysis, 2018, to appear, preprint available at: <https://…/papers.cfm?abstract_id=3100076> ) and for simulating random vectors from the bivariate negative binomial and the bi- and trivariate logarithmic series distributions.
treatSens Sensitivity Analysis for Causal Inference
Utilities to investigate sensitivity to unmeasured confounding in parametric models with either binary or continuous treatment.
tree Classification and Regression Trees
Classification and regression trees.
tree.bins Recategorization of Factor Variables by Decision Tree Leaves
Provides users the ability to categorize categorical variables dependent on a response variable. It creates a decision tree by using one of the categorical variables (class factor) and the selected response variable. The decision tree is created from the rpart() function from the ‘rpart’ package. The rules from the leaves of the decision tree are extracted, and used to recategorize the appropriate categorical variable (predictor). This step is performed for each of the categorical variables that is fed into the data component of the function. Only variables containing more than 2 factor levels will be considered in the function. The final output generates a data set containing the recategorized variables or a list containing a mapping table for each of the candidate variables. For more details see T. Hastie et al (2009, ISBN: 978-0-387-84857-0).
TreeBUGS Hierarchical Multinomial Processing Tree Modeling
User-friendly analysis of hierarchical multinomial processing tree (MPT) models that are often used in cognitive psychology. Implements the latent-trait MPT approach (Klauer, 2010) and the beta-MPT approach (Smith & Batchelder, 2010) to model heterogeneity of participants. MPT models are conventiently specified by an .eqn-file as used by other MPT software. Data is either provided as comma-separated file (.csv) or directly in R. Models are either fitted by calling JAGS (Plummer, 2003) or by an MPT-tailored Gibbs sampler in C++ (only for nonhierarchical and beta MPT models). Provides tests of heterogeneity and MPT-tailored summaries and plotting functions.
treeClust Cluster Distances Through Trees
Create a measure of inter-point dissimilarity useful for clustering mixed data, and, optionally, perform the clustering.
treeDA Tree-Based Discriminant Analysis
Performs sparse discriminant analysis on a combination of node and leaf predictors when the predictor variables are structured according to a tree.
treeHFM Hidden Factor Graph Models
Hidden Factor graph models generalise Hidden Markov Models to tree structured data. The distinctive feature of ‘treeHFM’ is that it learns a transition matrix for first order (sequential) and for second order (splitting) events. It can be applied to all discrete and continuous data that is structured as a binary tree. In the case of continuous observations, ‘treeHFM’ has Gaussian distributions as emissions.
treelet An Adaptive Multi-Scale Basis for High-Dimensional, Sparse and Unordered Data
Treelets provides a novel construction of multi-scale bases that extends wavelets to non-smooth signals. It returns a multi-scale orthonormal basis, where the final computed basis functions are supported on nested clusters in a hierarchical tree. Both the tree and the basis, which are constructed simultaneously, reflect the internal structure of the data.
treeman Phylogenetic Tree Manipulation Class and Methods
S4 class and methods for efficient phylogenetic tree manipulation for simulating evolution, running phylogenetic statistics and plotting.
treemapify Draw Treemaps in ‘ggplot2’
Provides ‘ggplot2’ geoms for drawing treemaps.
treeplyr dplyr’ Functionality for Matched Tree and Data Objects
Matches phylogenetic trees and trait data, and allows simultaneous manipulation of the tree and data using ‘dplyr’.
trelliscope Create and Navigate Large Multi-Panel Visual Displays
An extension of Trellis Display that enables creation, organization, and interactive viewing of multi-panel displays created against potentially very large data sets. The dynamic viewer tiles panels of a display across the screen in a web browser and allows the user to interactively page through the panels and sort and filter them based on ‘cognostic’ metrics computed for each panel. Panels can be created using many of R’s plotting capabilities, including base R graphics, ‘lattice’, ‘ggplot2’, and many ‘htmlwidgets’. Conditioning is handled through the ‘datadr’ package, which enables ‘Trelliscope’ displays with potentially millions of panels to be created against terabytes of data on systems like ‘Hadoop’. While designed to scale, ‘Trelliscope’ displays can also be very useful for small data sets.
trelliscopejs Create Interactive Trelliscope Displays
Trelliscope is a scalable, flexible, interactive approach to visualizing data (Hafen, 2013 <doi:10.1109/LDAV.2013.6675164>). This package provides methods that make it easy to create a Trelliscope display specification for TrelliscopeJS. High-level functions are provided for creating displays from within ‘dplyr’ or ‘ggplot2’ workflows. Low-level functions are also provided for creating new interfaces.
trend Non-Parametric Trend Tests and Change-Point Detection
The analysis of environmental data often requires the detection of trends and change-points. This package provides the Mann-Kendall Trend Test, seasonal Mann-Kendall Test, correlated seasonal Mann-Kendall Test, partial Mann-Kendall Trend test, (Seasonal) Sen’s slope, partial correlation trend test and change-point test after Pettitt.
trendchange Innovative Trend Analysis and Time-Series Change Point Analysis
Innovative Trend Analysis is a graphical method to examine the trends in time series data. Sequential Mann-Kendall test uses the intersection of prograde and retrograde series to indicate the possible change point in time series data. Distribution free cumulative sum charts indicate location and significance of the change point in time series. Zekai, S. (2011). <doi:10.1061/(ASCE)HE.1943-5584.0000556>. Grayson, R. B. et al. (1996). Hydrological Recipes: Estimation Techniques in Australian Hydrology. Cooperative Research Centre for Catchment Hydrology, Australia, p. 125. Sneyers, S. (1990). On the statistical analysis of series of observations. Technical note no 5 143, WMO No 725 415. Secretariat of the World Meteorological Organization, Geneva, 192 pp.
TrendInTrend Odds Ratio Estimation for the Trend in Tend Model
Estimation of causal odds ratio given trends in exposure prevalence and outcome frequencies of stratified data.
trendsegmentR Linear Trend Segmentation and Point Anomaly Detection
Performs the detection of point anomalies and linear trend changes for univariate time series by implementing the bottom-up unbalanced wavelet transformation proposed by H. Maeng and P. Fryzlewicz (2019) <http://…/>. The estimated number and locations of the change-points are returned with the piecewise-linear estimator for signal.
trendyy A Tidy Wrapper Around ‘gtrendsR’
Access Google Trends information. This package provides a tidy wrapper to the ‘gtrendsR’ package. Use four spaces when indenting paragraphs within the Description.
TRES Tensor Regression with Envelope Structure and Three Generic Envelope Estimation Approaches
Provides three estimators for tensor response regression (TRR) and tensor predictor regression (TPR) models with tensor envelope structure. The three types of estimation approaches are generic and can be applied to any envelope estimation problems. The full Grassmannian (FG) optimization is often associated with likelihood-based estimation but requires heavy computation and good initialization; the one-directional optimization approaches (1D and ECD algorithms) are faster, stable and does not require carefully chosen initial values; the SIMPLS-type is motivated by the partial least squares regression and is computationally the least expensive.
triangulation Determine Position of Observer
Measuring angles between points in a landscape is much easier than measuring distances. When the location of three points is known the position of the observer can be determined based solely on the angles between these points as seen by the observer. This task (known as triangulation) however requires onerous calculations – these calculations are automated by this package.
tribe Play with the Tribe of Attributes
Functions to make manipulation of object attributes easier. It also contains a few functions that extend the ‘dplyr’ package for data manipulation, and it provides new pipe operators, including the pipe ‘%@>%’ similar to the ‘magrittr’ ‘%>%’, but with the additional functionality to enable attributes propagation.
tricolore A Flexible Color Scale for Ternary Compositions
A flexible color scale for ternary compositions with options for discretization, centering and scaling.
triebeard Radix’ Trees in ‘Rcpp’
Radix trees’, or ‘tries’, are key-value data structures optimised for efficient lookups, similar in purpose to hash tables. ‘triebeard’ provides an implementation of ‘radix trees’ for use in R programming and in developing packages with ‘Rcpp’.
trimcluster Cluster analysis with trimming
Trimmed k-means clustering.
trimr An Implementation of Common Response Time Trimming Methods
Provides various commonly-used response time trimming methods, including the recursive / moving-criterion methods reported by Van Selst and Jolicoeur (1994). By passing trimming functions raw data files, the package will return trimmed data ready for inferential testing.
trinROC Statistical Tests for Assessing Trinormal ROC Data
Several statistical test functions as well as a function for exploratory data analysis to investigate classifiers allocating individuals to one of three disjoint and ordered classes. In a single classifier assessment the discriminatory power is compared to classification by chance. In a comparison of two classifiers the null hypothesis corresponds to equal discriminatory power of the two classifiers.
TrioSGL Trio Model with a Combination of Lasso and Group Lasso Regularization
Fit a trio model via penalized maximum likelihood. The model is fit for a path of values of the penalty parameter. This package is based on Noah Simon, et al. (2011) <doi:10.1080/10618600.2012.681250>.
triversity Diversity Measures on Tripartite Graphs
Computing diversity measures on tripartite graphs. This package first implements a parametrized family of such diversity measures which apply on probability distributions. Sometimes called ‘True Diversity’, this family contains famous measures such as the richness, the Shannon entropy, the Herfindahl-Hirschman index, and the Berger-Parker index. Second, the package allows to apply these measures on probability distributions resulting from random walks between the levels of tripartite graphs. By defining an initial distribution at a given level of the graph and a path to follow between the three levels, the probability of the walker’s position within the final level is then computed, thus providing a particular instance of diversity to measure.
tropAlgebra Tropical Algebra Functions
It includes tropical algebraic functions like Tropical Addition, Multiplication etc. It also includes these functions for Vectors and Matrices. For your knowledge, in tropical algebra the sum of two numbers is the minimum number and the product of two numbers is actually the sum of these numbers. For more information see also <https://…/Tropical_geometry>.
tropicalSparse Sparse Tropical Algebra
Some of the basic tropical algebra functionality is provided for sparse matrices by applying sparse matrix storage techniques. Some of these are addition and multiplication of vectors and matrices, dot product of the vectors in tropical form and some general equations are also solved using tropical algebra.
trread Transit File Reader
Read General Transit Feed Specification (GTFS) zipfiles into a list of R dataframes. Perform validation of the data structure against the specification. Please see the GTFS documentation here for more detail: <http://…/>.
trtf Transformation Trees and Forests
Recursive partytioning of transformation models with corresponding random forest for conditional transformation models as described in ‘Transformation Forests’ (Hothorn and Zeileis, 2017, <arXiv:1701.02110>) and ‘Top-Down Transformation Choice’ (Hothorn, 2017, <arXiv:1706.08269>).
TruncatedNormal Truncated Multivariate Normal
A collection of functions to deal with the truncated univariate and multivariate normal distributions.
tryCatchLog Advanced ‘tryCatch()’ and ‘try()’ Functions
Advanced tryCatch() and try() functions for better error handling (logging, stack trace with source code references and support for post-mortem analysis).
tsbox Class-Agnostic Time Series
Time series toolkit with identical behavior for all time series classes: ‘ts’,’xts’, ‘data.frame’, ‘data.table’, ‘tibble’, ‘zoo’, ‘timeSeries’, ‘tsibble’. Also converts reliably between these classes.
tsBSS Tools for Blind Source Separation for Time Series
Different estimates are provided to solve the blind source separation problem for time series with stochastic volatility.
tsc Likelihood-ratio Tests for Two-Sample Comparisons
Performs the two-sample comparisons using the following exact test procedures: the exact likelihood-ratio test (LRT) for equality of two normal populations proposed in Zhang et al. (2012); the combined test based on the LRT and Shapiro-Wilk test for normality via the Bonferroni correction technique; the newly proposed density-based empirical likelihood (DBEL) ratio test. To calculate p-values of the DBEL procedures, three procedures are used: (a) the traditional Monte Carlo (MC) method implemented in C++, (b) a new interpolation method based on regression techniques to operate with tabulated critical values of the test statistic; (c) a Bayesian type method that uses the tabulated critical values as the prior information and MC generated DBEL-test-statistic’s values as data.
TSclust Time Series Clustering Utilities
This package contains a set of measures of dissimilarity between time series to perform time series clustering. Metrics based on raw data, on generating models and on the forecast behavior are implemented. Some additional utilities related to time series clustering are also provided, such as clustering algorithms and cluster evaluation metrics.
tscount Analysis of Count Time Series
Likelihood-based methods for model fitting and assessment, prediction and intervention analysis of count time series following generalized linear models are provided. Models with the identity and with the logarithmic link function are allowed. The conditional distribution can be Poisson or Negative Binomial.
TSCS Time Series Cointegrated System
A set of functions to implement Time Series Cointegrated System (TSCS) spatial interpolation and relevant data visualization.
TSdata TSdbi Illustration
TSdata illustrates the various TSdbi packages using time series data from several sources. It also illustrates some simple time series manipulation and plotting using packages tframe and tfplot.
tsdb Terribly-Simple Data Base for Time Series
A terribly-simple data base for numeric time series, written purely in R, so no external database-software is needed. Series are stored in plain-text files (the most-portable and enduring file type) in CSV format. Timestamps are encoded using R’s native numeric representation for ‘Date’/’POSIXct’, which makes them fast to parse, but keeps them accessible with other software. The package provides tools for saving and updating series in this standardised format, for retrieving and joining data, for summarising files and directories, and for coercing series from and to other data types (such as ‘zoo’ series).
TSdbi TSdbi: Time Series Database Interface
Provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. The interface, and the SQL implementations, also handle vintages of time series data (sometime called editions or realtime data). There is also a (not yet well tested) mechanism to handle multilingual data documentation. Comprehensive examples of all the TS* packages is provided in the vignette Guide.pdf with the TSdata package.
tsdecomp Decomposition of Time Series Data
ARIMA-model-based decomposition of quarterly and monthly time series data. The methodology is developed and described, among others, in Burman (1980) <DOI:10.2307/2982132> and Hillmer and Tiao (1982) <DOI:10.2307/2287770>.
tsdf Two-/Three-Stage Designs for Phase 1&2 Clinical Trials
Calculate optimal Zhong’s two-/three-stage Phase II designs (see Zhong (2012) <doi:10.1016/j.cct.2012.07.006>). Generate two-/three-stage dose finding decision table. This package also allows users to run dose-finding simulations based on customized decision table.
tsdisagg2 Time Series Disaggregation
Disaggregates low frequency time series data to higher frequency series. Implements the following methods for temporal disaggregation: Boot, Feibes and Lisman (1967) <DOI:10.2307/2985238>, Chow and Lin (1971) <DOI:10.2307/1928739>, Fernandez (1981) <DOI:10.2307/1924371> and Litterman (1983) <DOI:10.2307/1391858>.
tsensembler Dynamic Ensembles for Time Series Forecasting
A framework for dynamically combining forecasting models for time series forecasting predictive tasks. It leverages machine learning models from other packages to automatically combine expert advice using metalearning and other state-of-the-art forecasting combination approaches. The predictive methods receive a data matrix as input, representing an embedded time series, and return a predictive ensemble model. The ensemble use generic functions ‘predict()’ and ‘forecast()’ to forecast future values of the time series. Moreover, an ensemble can be updated using methods, such as ‘update_weights()’ or ‘update_base_models()’. A complete description of the methods can be found in: Cerqueira, V., Torgo, L., Pinto, F., and Soares, C. ‘Arbitrated Ensemble for Time Series Forecasting.’ to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017; and Cerqueira, V., Torgo, L., and Soares, C.: ‘Arbitrated Ensemble for Solar Radiation Forecasting.’ International Work-Conference on Artificial Neural Networks. Springer, 2017 <doi:10.1007/978-3-319-59153-7_62>.
tseriesEntropy Entropy Based Analysis and Tests for Time Series
Implements an Entropy measure of dependence based on the Bhattacharya-Hellinger-Matusita distance. Can be used as a (nonlinear) autocorrelation/crosscorrelation function for continuous and categorical time series. The package includes tests for serial dependence and nonlinearity based on it. Some routines have a parallel version that can be used in a multicore/cluster environment. The package makes use of S4 classes.
TSF Two Stage Forecasting (TSF) for Long Memory Time Series in Presence of Structural Break
Forecasting of long memory time series in presence of structural break by using TSF algorithm by Papailias and Dias (2015) <doi:10.1016/j.ijforecast.2015.01.006>.
tsfeatures Time Series Feature Extraction
Methods for extracting various features from time series data. The features provided are those from Hyndman, Wang and Laptev (2013) <doi:10.1109/ICDMW.2015.104>, Kang, Hyndman and Smith-Miles (2017) <doi:10.1016/j.ijforecast.2016.09.004> and from Fulcher, Little and Jones (2013) <doi:10.1098/rsif.2013.0048>. Features include spectral entropy, autocorrelations, measures of the strength of seasonality and trend, and so on. Users can also define their own feature functions.
tsfknn Time Series Forecasting Using Nearest Neighbors
Allows to forecast time series using nearest neighbors regression Francisco Martinez, Maria P. Frias, Maria D. Perez-Godoy and Antonio J. Rivera (2017) <doi:10.1007/s10462-017-9593-z>. When the forecasting horizon multi-step ahead forecasting strategies can be used. The model built is is higher than 1, two autoregressive, that is, it is only based on the observations of the time series. The nearest neighbors used in a prediction can be consulted and plotted.
tsgui Gui for Simulating Time Series
This gui shows realisations of times series, currently ARMA and GARCH processes. It might be helpful for teaching and studying.
tsibble Tidy Temporal Data Frames and Tools
Provides a ‘tbl_ts’ class (the ‘tsibble’) to store and manage temporal-context data in a data-centric format, which is built on top of the ‘tibble’. The ‘tsibble’ aims at manipulating and analysing temporal data in a tidy and modern manner, including easily interpolate missing values, aggregate over calendar periods, performing rolling window calculations, and etc.
TSMining Mining Univariate and Multivariate Motifs in Time-Series Data
Implementations of a number of functions used to mine numeric time-series data. It covers the implementation of SAX transformation, univariate motif discovery (based on the random projection method), multivariate motif discovery (based on graph clustering), and several functions used for the ease of visualizing the motifs discovered. The details of SAX transformation can be found in J. Lin. E. Keogh, L. Wei, S. Lonardi, Experiencing SAX: A novel symbolic representation of time series, Data Mining and Knowledge Discovery 15 (2) (2007) 107-144. Details on univariate motif discovery method implemented can be found in B. Chiu, E. Keogh, S. Lonardi, Probabilistic discovery of time series motifs, ACM SIGKDD, Washington, DC, USA, 2003, pp. 493-498. Details on the multivariate motif discovery method implemented can be found in A. Vahdatpour, N. Amini, M. Sarrafzadeh, Towards unsupervised activity discovery using multi-dimensional motif detection in time series, IJCAI 2009 21st International Joint Conference on Artificial Intelligence.
TSmisc TSdbi’ Extensions to Wrap Miscellaneous Data Sources
Methods to retrieve data from several different sources. This include historical quote data from Yahoo and Oanda, economic data from FRED, and xls and csv data from different sources. Comprehensive examples of all the ‘TS*’ packages is provided in the vignette Guide.pdf with the ‘TSdata’ package.
TSMN Truncated Scale Mixtures of Normal Distributions
Return the first four moments of the SMN distributions (Normal, Student-t, Pearson VII, Slash or Contaminated Normal).
tsmp Time Series with Matrix Profile
A toolkit implementing the Matrix Profile concept that was created by CS-UCR <http://…/MatrixProfile.html>.
tsna Tools for Temporal Social Network Analysis
Temporal SNA tools for continuous- and discrete-time longitudinal networks having vertex, edge, and attribute dynamics stored in the ‘networkDynamic’ format. This work was supported by grant R01HD68395 from the National Institute of Health.
tsne T-distributed Stochastic Neighbor Embedding for R (t-SNE)
A “pure R” implementation of the t-SNE algorithm.
tsoutliers Detection of Outliers in Time Series
Detection of outliers in time series following the Chen and Liu (1993) procedure. Innovative outliers, additive outliers, level shifts, temporary changes and seasonal level shifts are considered.
tsPI Improved Prediction Intervals for ARIMA Processes and Structural Time Series
Prediction intervals for ARIMA and structural time series models using importance sampling approach with uninformative priors for model parameters, leading to more accurate coverage probabilities in frequentist sense. Instead of sampling the future observations and hidden states of the state space representation of the model, only model parameters are sampled, and the method is based solving the equations corresponding to the conditional coverage probability of the prediction intervals. This makes method relatively fast compared to for example MCMC methods, and standard errors of prediction limits can also be computed straightforwardly.
TSPred Functions for Baseline-Based Time Series Prediction
Functions for time series prediction and accuracy assessment using automatic ARIMA modelling. The generated ARIMA models and its yielded prediction errors are intended to be used as baseline for evaluating the practical value of other time series prediction methods and creating a demand for the refinement of such methods. For this purpose, benchmark data from prediction competitions may be used.
tsqn Applications of the Qn Estimator to Time Series (Univariate and Multivariate)
Time Series Qn is a package with applications of the Qn estimator of Rousseeuw and Croux (1993) <doi:10.1080/01621459.1993.10476408> to univariate and multivariate Time Series in time and frequency domains. More specifically, the robust estimation of autocorrelation or autocovariance matrix functions from Ma and Genton (2000, 2001) <doi:10.1111/1467-9892.00203>, <doi:10.1006/jmva.2000.1942> and Cotta (2017) <doi:10.13140/RG.2.2.14092.10883> are provided. The robust pseudo-periodogram of Molinares et. al. (2009) <doi:10.1016/j.jspi.2008.12.014> is also given. This packages also provides the M-estimator of the long-memory parameter d based on the robustification of the GPH estimator proposed by Reisen et al. (2017) <doi:10.1016/j.jspi.2017.02.008>.
TSrepr Time Series Representations
Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also min-max and z-score normalisations, and forecasting accuracy measures are implemented.
TSS.RESTREND Time Series Segmentation of Residual Trends
To perform the Time Series Segmented Residual Trend (TSS-RESTREND) method. The full details are available in (Burrell et al. 2016???? To be updated after the paper is published).
TSsdmx TSdbi’ Extension to Connect with ‘SDMX’
Methods to retrieve data in the Statistical Data and Metadata Exchange (‘SDMX’) format from several database. (For example, EuroStat, the European Central Bank, the Organisation for Economic Co-operation and Development, the Unesco Institute for Statistics, and the International Labor Organization.) This is a wrapper for package ‘RJSDMX’. Comprehensive examples of all the ‘TS*’ packages is provided in the vignette Guide.pdf with the ‘TSdata’ package.
tsSelect Execution of Time Series Models
Execution of various time series models and choosing the best one either by a specific error metric or by picking the best one by majority vote. The models are based on the ‘forecast’ package, written by Prof. Rob Hyndman.
TSSS Time Series Analysis with State Space Model
Functions for statistical analysis, modeling and simulation of time series with state space model, based on the methodology in Kitagawa (1993, ISBN: 4-00-007703-1 and 2005, ISBN: 4-00-005455-4).
TSstudio Tools for time series analysis and forecasting
The TSstudio package provides a set of functions for time series analysis. That includes interactive data visualization tools based on the plotly package engine, supporting multiple time series objects such as ts, xts, and zoo. In addition, the package provides a set of utility functions for preprocessing time series data, and as well backtesting applications for forecasting models from the forecast, forecastHybrid and bsts packages.
tstools A Time Series Toolbox for Official Statistics
Plot official statistics’ time series conveniently: automatic legends, highlight windows, stacked bar chars with positive and negative contributions, sum-as-line option, two y-axes with automatic horizontal grids that fit both axes and other popular chart types. ‘tstools’ comes with a plethora of defaults to let you plot without setting an abundance of parameters first, but gives you the flexibility to tweak the defaults. In addition to charts, ‘tstools’ provides a super fast, ‘data.table’ backed time series I/O that allows the user to export / import long format, wide format and transposed wide format data to various file types.
TSTr Ternary Search Tree
A ternary search tree is a type of prefix tree with up to three children and the ability for incremental string search. The package uses this ability for word auto-completion and includes a dataset with the 10001 most frequent English words.
tsutils Time Series Exploration, Modelling and Forecasting
Includes: (i) tests and visualisations that can help the modeller explore time series components and perform decomposition; (ii) modelling shortcuts, such as functions to construct lagmatrices and seasonal dummy variables of various forms; (iii) an implementation of the Theta method; (iv) tools to facilitate the design of the forecasting process, such as ABC-XYZ analyses; and (v) ‘quality of life’ functions, such as treating time series for trailing and leading values.
TSVC Tree-Structured Modelling of Varying Coefficients
Fitting tree-structured varying coefficient models (Berger, M., Tutz, G. & Schmid, M. (2018) <doi:10.1007/s11222-018-9804-8>). Simultaneous detection of covariates with varying coefficients and effect modifiers that induce varying coefficients if they are present.
tsvr Timescale-Specific Variance Ratio for Use in Community Ecology
Tools for timescale decomposition of the classic variance ratio of community ecology. Tools are as described in Zhao et al (in prep), extending commonly used methods introduced by Peterson et al (1975) <doi: 10.2307/1936306>.
tsxtreme Bayesian Modelling of Extremal Dependence in Time Series
Characterisation of the extremal dependence structure of time series, avoiding pre-processing and filtering as done typically with peaks-over-threshold methods. It uses the conditional approach of Heffernan and Tawn (2004) <DOI:10.1111/j.1467-9868.2004.02050.x> which is very flexible in terms of extremal and asymptotic dependence structures, and Bayesian methods improve efficiency and allow for deriving measures of uncertainty. For example, the extremal index, related to the size of clusters in time, can be estimated and samples from its posterior distribution obtained.
TTCA Transcript Time Course Analysis
The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for the detection of significant expression dynamics often fail when the expression dynamics show a large heterogeneity, and often cannot cope with irregular and sparse measurements. The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and deals with low replicate numbers and irregular sampling times. The results are given in the form of tables linked to figures. These allow to quickly recognize the relevance of detection, to identify possible false positives and to distinguish between changes in the early and later expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor. See publication ‘TTCA: An R package for the identification of differentially expressed genes in time course microarray data’.
TTmoment Sampling and Calculating the First and Second Moments for the Doubly Truncated Multivariate t Distribution
Computing the first two moments of the truncated multivariate t (TMVT) distribution under the double truncation. Appling the slice sampling algorithm to generate random variates from the TMVT distribution.
TTS Master Curve Estimates Corresponding to Time-Temperature Superposition
Time-Temperature Superposition analysis is often applied to frequency modulated data obtained by Dynamic Mechanic Analysis (DMA) and Rheometry in the analytical chemistry and physics areas. These techniques provide estimates of material mechanical properties (such as moduli) at different temperatures in a wider range of time. This package provides the Time-Temperature superposition Master Curve at a referred temperature by the three methods: the two wider used methods, Arrhenius based methods and WLF, and the newer methodology based on derivatives procedure. The Master Curve is smoothed by B-splines basis. The package output is composed of plots of experimental data, horizontal and vertical shifts, TTS data, and TTS data fitted using B-splines with bootstrap confidence intervals.
ttTensor Tensor-Train Decomposition
Tensor-train is a compact representation for higher-order tensors. Some algorithms for performing tensor-train decomposition are available such as TT-SVD, TT-WOPT, and TT-Cross. For the details of the algorithms, see I. V. Oseledets (2011) <doi:10.1137/090752286>, Yuan Longao, et al (2017) <arXiv:1709.02641>, I. V. Oseledets (2010) <doi:10.1016/j.laa.2009.07.024>.
tttplot Time to Target Plot
Implementation of Time to Target plot based on the work of Ribeiro and Rosseti (2015) <DOI:10.1007/s11590-014-0760-8>, that describe a numerical method that gives the probability of an algorithm A finds a solution at least as good as a given target value in smaller computation time than algorithm B.
tuber Client for the YouTube API
Get comments posted on YouTube videos, information on how many times a video has been liked, search for videos with particular content, and much more. You can also scrape captions from a few videos. To learn more about the YouTube API, see https://…/.
tubern R Client for the YouTube Analytics and Reporting API
Get statistics and reports from YouTube. To learn more about the YouTube Analytics and Reporting API, see <https://…/>.
tuckerR.mmgg Three-Mode Principal Components Analysis
Performs Three-Mode Principal Components Analysis, which carries out Tucker Models.
tufte Tufte’s Styles for R Markdown Documents
Provides R Markdown output formats to use Tufte styles for PDF and HTML output.
TukeyRegion Tukey Region and Median
Tukey regions are polytopes in the Euclidean space, viz. upper-level sets of the Tukey depth function on given data. The bordering hyperplanes of a Tukey region are computed as well as its vertices, facets, centroid, and volume. In addition, the Tukey median set, which is the non-empty Tukey region having highest depth level, and its barycenter (= Tukey median) are calculated. Tukey regions are visualized in dimension two and three. For details see Liu, Mosler, and Mozharovskyi (2017) <arXiv:1412.5122>.
tukeytrend Tukeys Trend Test via Multiple Marginal Models
Provides wrapper functions to the multiple marginal model function mmm() of package ‘multcomp’ to implement the trend test of Tukey, Ciminera and Heyse (1985) <DOI:10.2307/2530666> for general parametric models.
TULIP A Toolbox for Linear Discriminant Analysis with Penalties
Integrates several popular high-dimensional methods based on Linear Discriminant Analysis (LDA) and provides a comprehensive and user-friendly toolbox for linear, semi-parametric and tensor-variate classification as mentioned in Yuqing Pan, Qing Mai and Xin Zhang (2019) <arXiv:1904.03469>. Functions are included for covariate adjustment, model fitting, cross validation and prediction.
tuneRanger Tune Random Forest of the ‘ranger’ Package
Tuning random forest with one line. The package is mainly based on the packages ‘ranger’ and ‘mlrMBO’.
turfR TURF Analysis for R
Package for analyzing TURF (Total Unduplicated Reach and Frequency) data in R. No looping in TURF algorithm results in fast processing times. Allows for individual-level weights, depth specification, and user-truncated combination set(s). Allows user to substitute Monte Carlo simulated combination set(s) after set(s) exceed a user-specified limit.
tuts Time Uncertain Time Series Analysis
Models of time-uncertain time series addressing frequency and non-frequency behavior of continuous and discrete (counting) data.
tvR Total Variation Regularization
Provides tools for denoising noisy signal and images via Total Variation Regularization. Reducing the total variation of the given signal is known to remove spurious detail while preserving essential structural details. For the seminal work on the topic, see Rudin et al (1992) <doi:10.1016/0167-2789(92)90242-F>.
tvReg Time-Varying Coefficients Linear Regression for Single and Multiple Equations
Fitting simultaneous equations with time varying coefficients, both for the case of independent equations and for the case of correlated equations.
TVsMiss Variable Selection for Missing Data
Use a regularization likelihood method to achieve variable selection purpose, can be used with penalty lasso, smoothly clipped absolute deviations (SCAD) and minimax concave penalty (MCP). Tuning parameter selection techniques include cross validation (CV), Bayesian information criterion (BIC) (low and high), stability of variable selection (sVS), stability of BIC (sBIC), and stability of estimation (sEST). More details see Zhao, Jiwei, Yang Yang, and Yang Ning (2018) <arXiv:1703.06379> ‘Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data.’ Statistica Sinica.
twilio An Interface to the Twilio API for R
The Twilio web service provides an API for computer programs to interact with telephony. The included functions wrap the SMS and MMS portions of Twilio’s API, allowing users to send and receive text messages from R. See <https://…/> for more information.
twl Two-Way Latent Structure Clustering Model
Implementation of a Bayesian two-way latent structure model for integrative genomic clustering. The model clusters samples in relation to distinct data sources, with each subject-dataset receiving a latent cluster label, though cluster labels have across-dataset meaning because of the model formulation. A common scaling across data sources is unneeded, and inference is obtained by a Gibbs Sampler. The model can fit multivariate Gaussian distributed clusters or a heavier-tailed modification of a Gaussian density. Uniquely among integrative clustering models, the formulation makes no nestedness assumptions of samples across data sources — the user can still fit the model if a study subject only has information from one data source. The package provides a variety of post-processing functions for model examination including ones for quantifying observed alignment of clusterings across genomic data sources. Run time is optimized so that analyses of datasets on the order of thousands of features on fewer than 5 datasets and hundreds of subjects can converge in 1 or 2 days on a single CPU. See ‘Swanson DM, Lien T, Bergholtz H, Sorlie T, Frigessi A, Investigating Coordinated Architectures Across Clusters in Integrative Studies: a Bayesian Two-Way Latent Structure Model, 2018, <doi:10.1101/387076>, Cold Spring Harbor Laboratory’ at <https://…/387076.full.pdf> for model details.
TwoRegression Process Data from Wearable Research Devices Using Two-Regression Algorithms
Application of two-regression algorithms for wearable research devices. It provides an easy way for users to read in device data files and apply an appropriate two-regression algorithm. More information is available from Hibbing PR, LaMunion SR, Kaplan AS, & Crouter SE (2017) <doi:10.1249/MSS.0000000000001532>.
twosamples Fast Permutation Based Two Sample Tests
Fast randomization based two sample tests. Testing the hypothesis that two samples come from the same distribution using randomization to create p-values. Included tests are: Kolmogorov-Smirnov, Kuiper, Cramer-von Mises, and Anderson-Darling. There is also a very efficient test based on the Wasserstein Distance. The default test ‘two_sample’ builds on the Wasserstein distance by using a weighting scheme like that of Anderson-Darling. We also include the permutation scheme to make test building simple for others.
TwoSampleTest.HD A Two-Sample Test for the Equality of Distributions for High-Dimensional Data
For high-dimensional data whose main feature is a large number, p, of variables but a small sample size, the null hypothesis that the marginal distributions of p variables are the same for two groups is tested. We propose a test statistic motivated by the simple idea of comparing, for each of the p variables, the empirical characteristic functions computed from the two samples. If one rejects this global null hypothesis of no differences in distributions between the two groups, a set of permutation p-values is reported to identify which variables are not equally distributed in both groups.
twoway Analysis of Two-Way Tables
Carries out analyses of two-way tables with one observation per cell, together with graphical displays for an additive fit and a diagnostic plot for removable ‘non-additivity’ via a power transformation of the response. It implements Tukey’s Exploratory Data Analysis methods, including a 1-degree-of-freedom test for row*column ‘non-additivity’, linear in the row and column effects.
txtq A Small Message Queue for Parallel Processes
This queue is a data structure that lets parallel processes send and receive messages, and it can help coordinate the work of complicated parallel tasks. Processes can push new messages to the queue, pop old messages, and obtain a log of all the messages ever pushed. File locking preserves the integrity of the data even when multiple processes access the queue simultaneously.
types Type Annotations
Provides a simple type annotation for R that is usable in scripts, in the R console and in packages. It is intended as a convention to allow other packages to use the type information to provide error checking, automatic documentation or optimizations.

U

uaparserjs Parse Browser ‘User-Agent’ Strings into Data Frames
Despite there being a section in RFC 7231 <https://…/rfc7231#section-5.5.3> defining a suggested structure for ‘User-Agent’ headers this data is notoriously difficult to parse consistently. A function is provided that will take in user agent strings and return structured R objects. This is a ‘V8’-backed package based on the ‘ua-parser’ project <https://…/ua-parser>.
UBL An Implementation of Several Approaches to Utility-Based Learning for Both Classification and Regression Tasks
Provides a set of functions that can be used to obtain better predictive performance on cost-sensitive and cost/benefits tasks (for both regression and classification). This includes re-sampling approaches, cost-based methods, special purpose evaluation metrics as well as specific learning systems.
uclust Clustering and Classification Inference with U-Statistics
Clustering and classification inference for high dimension low sample size (HDLSS) data with U-statistics. The package contains implementations of nonparametric statistical tests for sample homogeneity, group separation, clustering, and classification of multivariate data. The methods have high statistical power and are tailored for data in which the dimension L is much larger than sample size n. See Gabriela B. Cybis, Marcio Valk and Sílvia RC Lopes (2018) <doi:10.1080/00949655.2017.1374387> and Marcio Valk and Gabriela B. Cybis (2018) <arXiv:1805.12179>.
UCSCXenaShiny A Shiny App for UCSC Xena Database
Provides a web app for downloading, analyzing and visualizing datasets from UCSC Xena, which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.
udpipe Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ‘UDPipe’ ‘NLP’ Toolkit
This natural language processing toolkit provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format as provided at <http://…/format.html>. The techniques are explained in detail in the paper: ‘Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe’, available at <doi:10.18653/v1/K17-3009>.
ufs Quantitative Analysis Made Accessible
This is a new version of the ‘userfriendlyscience’ package, which has grown a bit unwieldy. This first submission contains a number of basic functions to easily format values, work with scales, and format vectors in a single character value. Over time, more functions presently in ‘userfriendlyscience’ will be migrated over to this package.
uGMAR Estimate Univariate Gaussian or Student’s t Mixture Autoregressive Model
Maximum likelihood estimation of univariate Gaussian Mixture Autoregressive (GMAR) and Student’s t Mixture Autoregressive (StMAR) models, quantile residual tests, graphical diagnostics, forecast and simulate from GMAR and StMAR processes. Also general linear constraints and restricting autoregressive parameters to be the same for all regimes are supported. Leena Kalliovirta, Mika Meitz, Pentti Saikkonen (2015) <doi:10.1111/jtsa.12108>, Leena Kalliovirta (2012) <doi:10.1111/j.1368-423X.2011.00364.x>.
uHMM Construct an Unsupervised Hidden Markov Model
Construct a Hidden Markov Model with states learnt by unsupervised classification.
uiucthemes R’ ‘Markdown’ Themes for ‘UIUC’ Documents and Presentations
A set of custom ‘R’ ‘Markdown’ templates for documents and presentations with the University of Illinois at Urbana-Champaign (UIUC) color scheme and identity standards.
Ultimixt Bayesian Analysis of a Non-Informative Parametrization for Gaussian Mixture Distributions
A generic reference Bayesian analysis of unidimensional mixtures of Gaussian distributions obtained by a location-scale parameterisation of the model is implemented. Included functions can be applied to produce a Bayesian analysis of Gaussian mixtures with an arbitrary number of components, with no need to define the prior distribution.
umap Uniform Manifold Approximation and Projection
Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for ‘python’ package ‘umap-learn’ (requires separate installation, see vignette for more details).
Umoments Unbiased Central Moment Estimates
Calculates one-sample unbiased central moment estimates and two-sample pooled estimates up to 6th order, including unbiased estimates of powers and products of central moments. Provides the machinery for obtaining unbiased central moment estimators beyond 6th order.
umx Helper Functions for Structural Equation Modelling in OpenMx
Helper functions for making, running, and reporting SEM models in OpenMx. If you are just starting, try typing ?umx.
UncDecomp Uncertainty Decomposition
If a procedure consists of several stages and there are several scenarios that can be selected for each stage, uncertainty of the procedure can be decomposed by stages or scenarios. cum_uncertainty() is used to decompose uncertainty based on the cumulative uncertainty. stage_uncertainty() and scenario_uncertainty() is used to decompose uncertainty based on the second order interaction ANOVA model. In stage_uncertainty() and scenario_uncertainty(), the uncertainty from interaction effect from two stages is distributed equally to each stage.
UncerIn2 Implements Models of Uncertainty into the Interpolation Functions
Provides a basic (random) data, grids, 6 models of uncertainty, 3 automatic interpolations (idw, spline, kriging), variogram and basic data visualization.
UncertainInterval Uncertain Area Methods for Cut-Point Determination in Tests
Functions for the determination of an Uncertain Interval, i.e., a range of test scores that are inconclusive and do not allow a diagnosis, other than ‘Uncertain’.
uncertainty Uncertainty Estimation and Contribution Analysis
Implements the Gaussian method of first and second order, the Kragten numerical method and the Monte Carlo simulation method for uncertainty estimation and analysis.
UNCLES Unification of Clustering Results from Multiple Datasets using External Specifications
Consensus clustering by the unification of clustering results from multiple datasets using external specifications.
understandBPMN Calculator of Understandability Metrics for BPMN
Calculate several understandability metrics of BPMN models. BPMN stands for business process modelling notation and is a language for expressing business processes into business process diagrams. Examples of these understandability metrics are: average connector degree, maximum connector degree, sequentiality, cyclicity, diameter, depth, token split, control flow complexity, connector mismatch, connector heterogeneity, separability, structuredness and cross connectivity. See R documentation and paper on metric implementation included in this package for more information concerning the metrics.
UNF Tools for Creating Universal Numeric Fingerprints for Data
Computes a universal numeric fingerprint (UNF) for an R data object. UNF is a cryptographic hash or signature that can be used to uniquely identify (a version of) a rectangular dataset, or a subset thereof. UNF can be used, in tandem with a DOI, to form a persistent citation to a versioned dataset.
ungroup Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data
Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) <doi:10.1093/aje/kwv020>.
uniah Unimodal Additive Hazards Model
Nonparametric estimation of a unimodal or U-shape covariate effect under additive hazards model.
UniDOE Uniform Design of Experiments
Efficient procedures for constructing uniform design of experiments under various space-filling criteria. It is based on a stochastic and adaptive threshold accepting algorithm with flexible initialization, adaptive threshold, and stochastic evolution. The package may also construct the augmented uniform designs in a sequential manner. View details at: Zhang, A. and Li, H. (2017). UniDOE: An R package for constructing uniform design of experiments via stochastic and adaptive threshold accepting algorithm.
unifDAG Uniform Sampling of Directed Acyclic Graphs
Uniform sampling of Directed Acyclic Graphs (DAG) using exact enumeration by relating each DAG to a sequence of outpoints (nodes with no incoming edges) and then to a composition of integers as suggested by Kuipers, J. and Moffa, G. (2015) <doi:10.1007/s11222-013-9428-y>.
unifed The Unifed Distribution
Introduced in Quijano Xacur (2018) <arXiv:1812.00251>. This package contains the density, distribution, quantile and random generation functions for the unifed. It also contains functions for the unifed family and quasifamily that can be used with the glm() function.
uniformly Uniform Sampling
Uniform sampling on various geometric shapes, such as spheres, ellipsoids, simplices.
UniIsoRegression Unimodal and Isotonic L1, L2 and Linf Regression
Perform L1 or L2 isotonic and unimodal regression on 1D weighted or unweighted input vector and isotonic regression on 2D weighted or unweighted input vector. It also performs L infinity isotonic and unimodal regression on 1D unweighted input vector. Reference: Quentin F. Stout (2008) <doi:10.1016/j.csda.2008.08.005>. Spouge, J., Wan, H. & Wilbur, W.(2003) <doi:10.1023/A:1023901806339>. Q.F. Stout (2013) <doi:10.1007/s00453-012-9628-4>.
unine Unine Light Stemmer
Implementation of ‘light’ stemmers for French, German, Italian, Spanish, Portuguese, Finnish, Swedish. They are based on the same work as the ‘light’ stemmers found in ‘SolR’ <https://…/> or ‘ElasticSearch’ <https://…/elasticsearch>. A ‘light’ stemmer consists in removing inflections only for noun and adjectives. Indexing verbs for these languages is not of primary importance compared to nouns and adjectives. The stemming procedure for French is described in (Savoy, 1999) <doi:10.1002/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H>.
uniqueAtomMat Finding Unique or Duplicated Rows or Columns for Atomic Matrices
An alternative implementation and extension (grpDuplicated) of base::duplicated.matrix, base:anyDuplicated.matrix and base::unique.matrix for matrices of atomic mode, avoiding the time consuming collapse of the matrix into a character vector.
UnitCircle Check if Roots of a Polynomial Lie Outside the Unit Circle
The uc.check() function checks whether the roots of a given polynomial lie outside the Unit circle. You can also easily draw an unit circle.
unitizer Interactive R Unit Tests
Simplifies regression tests by comparing objects produced by test code with earlier versions of those same objects. If objects are unchanged the tests pass, otherwise execution stops with error details. If in interactive mode, tests can be reviewed through the provided interactive environment.
units Measurement Units of Physical Quantities for R Vectors
Support for measurement units of physical quantities in R vectors, based on the udunits library from unidata.
unival Assessing Essential Unidimensionality Using External Validity Information
Assess essential unidimensionality using external validity information using the procedure proposed by Ferrando & Lorenzo-Seva (2019) <doi:10.1177/0013164418824755>. Provides two indices for assessing differential and incremental validity, both based on a second-order modelling schema for the general factor.
univOutl Detection of Univariate Outliers
Well known outlier detection techniques in the univariate case. Methods to deal with skewed distribution are included too. The Hidiroglou-Berthelot (1986) method to search for outliers in ratios of historical data is implemented as well.
UnivRNG Univariate Pseudo-Random Number Generation
Pseudo-random number generation of 17 univariate distributions.
unix Unix System Utilities
Bindings to system utilities found in most Unix systems, mainly POSIX functions which are not part of the Standard C Library.
unjoin Separate a Data Frame by Normalization
Separate a data frame in two based on key columns. The function unjoin() provides an inside-out version of a nested data frame. This is used to identify duplication and normalize it (in the database sense) by linking two tables with the redundancy removed. This is a basic requirement for detecting topology within spatial structures that has motivated the need for this package as a building block for workflows within more applied projects.
unpivotr Unpivot Complex and Irregular Data Layouts
Tools for converting data from complex or irregular layouts to a columnar structure. For example, tables with multilevel column or row headers, or spreadsheets. Header and data cells are selected by their contents and position, as well as formatting and comments where available, and are associated with one other by their proximity in given directions.
unrepx Analysis and Graphics for Unreplicated Experiments
Provides half-normal plots, reference plots, and Pareto plots of effects from an unreplicated experiment, along with various pseudo-standard-error measures, simulated reference distributions, and other tools. Many of these methods are described in Daniel C. (1959) <doi:10.1080/00401706.1959.10489866> and/or Lenth R.V. (1989) <doi:10.1080/00401706.1989.10488595>, but some new approaches are added and integrated in one package.
unrtf Extract Text from Rich Text Format (RTF) Documents
Wraps the ‘unrtf’ utility to extract text from RTF files. Supports document conversion to HTML, LaTeX or plain text. Output in HTML is recommended because ‘unrtf’ has limited support for converting between character encodings.
unsystation Stationarity Test Based on Unsystematic Sub-Sampling
Performs a test for second-order stationarity of time series based on unsystematic sub-samples.
upmfit Unified Probability Model Fitting
Fitting a Unified Probability Model for household-community tuberculosis transmission dynamics.
UpSetR Visualization of Intersecting Sets
Creates visualizations of intersecting sets using a novel matrix design, along with visualizations of several common set, element and attribute related tasks.
UPSvarApprox Approximate the Variance of the Horvitz-Thompson Total Estimator
Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details.
uptasticsearch Get Data Frame Representations of ‘Elasticsearch’ Results
Elasticsearch’ is an open-source, distributed, document-based datastore (<https://…/elasticsearch> ). It provides an ‘HTTP’ ‘API’ for querying the database and extracting datasets, but that ‘API’ was not designed for common data science workflows like pulling large batches of records and normalizing those documents into a data frame that can be used as a training dataset for statistical models. ‘uptasticsearch’ provides an interface for ‘Elasticsearch’ that is explicitly designed to make these data science workflows easy and fun.
urbin Unifying Estimation Results with Binary Dependent Variables
Calculate unified measures that quantify the effect of a covariate on a binary dependent variable (e.g., for meta-analyses). This can be particularly important if the estimation results are obtained with different models/estimators (e.g., linear probability model, logit, probit, …) and/or with different transformations of the explanatory variable of interest (e.g., linear, quadratic, interval-coded, …). The calculated unified measures are: (a) semi-elasticities of linear, quadratic, or interval-coded covariates and (b) effects of linear, quadratic, interval-coded, or categorical covariates when a linear or quadratic covariate changes between distinct intervals, the reference category of a categorical variable or the reference interval of an interval-coded variable needs to be changed, or some categories of a categorical covariate or some intervals of an interval-coded covariate need to be grouped together. Approximate standard errors of the unified measures are also calculated.
urlshorteneR R Wrapper for the Bit.ly, Goo.gl and Is.gd URL Shortening Services
Allows using different URL shortening services, which also provide expanding and analytic functions. Specifically developed for Bit.ly, Goo.gl (both OAUTH2) and is.gd (no API Key). Other can be added by request.
GitHub
urltools Vectorised Tools for URL Handling and Parsing
A toolkit for handling URLs that so far includes functions for URL encoding and decoding, parsing, and parameter extraction. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.
uroot Unit Root Tests for Seasonal Time Series
Seasonal unit roots and seasonal stability tests. P-values based on response surface regressions are available for both tests. P-values based on bootstrap are available for seasonal unit root tests. A parallel implementation of the bootstrap method requires a CUDA capable GPU with compute capability >= 3.0, otherwise a debugging version fully coded in R is used.
ursa Non-Interactive Spatial Tools for Raster Processing and Visualization
S3 classes and methods for manipulation with georeferenced raster data: reading/writing, processing, multi-panel visualization.
usedist Distance Matrix Utilities
Functions to re-arrange, extract, and work with distances.
userfriendlyscience Quantitative Analysis Made Accessible
Contains a number of functions that serve two goals. First, to make R more accessible to people migrating from SPSS by adding a number of functions that behave roughly like their SPSS equivalents (also see <https://rosettastats.com> ). Second, to make a number of slightly more advanced functions more user friendly to relatively novice users. The package also conveniently houses a number of additional functions that are intended to increase the quality of methodology and statistics in psychology, not by offering technical solutions, but by shifting perspectives, for example towards reasoning based on sampling distributions as opposed to on point estimates.
usethis Automate Package and Project Setup
Automate package and project setup tasks that are otherwise performed manually. This includes setting up unit testing, test coverage, continuous integration, Git, ‘GitHub’, licenses, ‘Rcpp’, ‘RStudio’ projects, and more.
utc Coordinated Universal Time Transformations
Three functions are provided: first function changes time from local to UTC, other changes from UTC to local and third returns difference between local and UTC. %h+% operator is also provided it adds hours to a time.
utf8 Unicode Text Processing
Processing and printing ‘UTF-8’ encoded international text (Unicode). Functions to input, validate, normalize, encode, format, and display.
utf8latex Importing, Exporting and Converting Between Datasets and LaTeX
Methods to assist with importing data stored in text files with Unicode characters and to convert text or data with foreign characters or mathematical symbols to LaTeX. It also escapes UTF8 code points (fixing the ‘warning: found non-ASCII strings’ problem), detects languages, encodings and more.
utiml Utilities for Multi-Label Learning
Multi-label learning methods and others utilities to support multi- label classification in R.
uwIntroStats Descriptive Statistics, Inference, Regression, and Plotting in an Introductory Statistics Course
A set of tools designed to facilitate easy adoption of R for students in introductory classes with little programming experience. Compiles output from existing routines together in an intuitive format, and adds functionality to existing functions. For instance, the regression function can perform linear models, generalized linear models, Cox models, or generalized estimating equations. The user can also specify multiple-partial F-tests to print out with the model coefficients. We also give many routines for descriptive statistics and plotting.
uwot The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction
An implementation of the Uniform Manifold Approximation and Projection dimensionality reduction by McInnes et al. (2018) <arXiv:1802.03426>. It also provides means to transform new data and to carry out supervised dimensionality reduction. An implementation of the related LargeVis method of Tang et al. (2016) <arXiv:1602.00370> is also provided. This is a complete re-implementation in R (and C++, via the ‘Rcpp’ package): no Python installation is required. See the uwot website (<https://…/uwot> ) for more documentation and examples.

V

V8 Embedded JavaScript Engine
V8 is Google’s open source, high performance JavaScript engine. It is written in C++ and implements ECMAScript as specified in ECMA-262, 5th edition. The V8 R package builds on the C++ library to provide a completely standalone JavaScript engine within R. A major advantage over the other foreign language interfaces is that V8 requires no compilers, external executables or other run-time dependencies. The entire engine is contained within a 6MB package (2MB zipped) and works on all major platforms.
vagam Variational Approximations for Generalized Additive Models
Fits generalized additive models (GAMs) using a variational approximations (VA) framework. In brief, the VA framework provides a fully or at least closed to fully tractable lower bound approximation to the marginal likelihood of a GAM when it is parameterized as a mixed model (using penalized splines, say). In doing so, the VA framework aims offers both the stability and natural inference tools available in the mixed model approach to GAMs, while achieving computation times comparable to that of using the penalized likelihood approach to GAMs. See Hui et al. (2018) <doi:10.1080/01621459.2018.1518235>.
valaddin Functional Input Validation
A set of basic tools to transform functions into functions with input validation checks, in a manner suitable for both programmatic and interactive use.
valection Sampler for Verification Studies
A binding for the ‘valection’ program which offers various ways to sample the outputs of competing algorithms or parameterizations, and fairly assess their performance against each other. The ‘valection’ C library is required to use this package and can be downloaded from: <http://…/valection>. Cooper CI, et al; Valection: Design Optimization for Validation and Verification Studies; Biorxiv 2018; <doi:10.1101/254839>.
validann Validation Tools for Artificial Neural Networks
Methods and tools for analysing and validating the outputs and modelled functions of artificial neural networks (ANNs) in terms of predictive, replicative and structural validity. Also provides a method for fitting feed-forward ANNs with a single hidden layer.
validate Data Validation Infrastructure
Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity.
validatejsonr Validate JSON Against JSON Schemas
The current implementation uses the C++ library ‘RapidJSON’ to supply the schema functionality, it supports JSON Schema Draft v4. As of 2016-09-09, ‘RapidJSON’ passed 262 out of 263 tests in JSON Schema Test Suite (JSON Schema draft 4).
validatetools Checking and Simplifying Validation Rule Sets
Rule sets with validation rules may contain redundancies or contradictions. Functions for finding redundancies and problematic rules are provided, given a set a rules formulated with ‘validate’.
valorate Velocity and Accuracy of the LOg-RAnk TEst
The algorithm implemented in this package was designed to quickly estimates the distribution of the log-rank especially for heavy unbalanced groups. VALORATE estimates the null distribution and the p-value of the log-rank test based on a recent formulation. For a given number of alterations that define the size of survival groups, the estimation involves a weighted sum of distributions that are conditional on a co-occurrence term where mutations and events are both present. The estimation of conditional distributions is quite fast allowing the analysis of large datasets in few minutes <http://…/valorate>.
valuer Pricing of Variable Annuities
Pricing of variable annuity life insurance contracts by means of Monte Carlo methods. Monte Carlo is used to price the contract in case the policyholder cannot surrender while Least Squares Monte Carlo is used if the insured can surrender. A state-dependent fee structure with a single barrier is implemented.
vamc A Monte Carlo Valuation Framework for Variable Annuities
Implementation of a Monte Carlo simulation engine for valuing synthetic portfolios of variable annuities, which reflect realistic features of common annuity contracts in practice. It aims to facilitate the development and dissemination of research related to the efficient valuation of a portfolio of large variable annuities. The main valuation methodology was proposed by Gan (2017) <doi:10.1515/demo-2017-0021>.
vanquish Variant Quality Investigation Helper
Imports Variant Calling Format file into R. It can detect whether a sample contains contaminant from the same species. In the first stage of the approach, a change-point detection method is used to identify copy number variations for filtering. Next, features are extracted from the data for a support vector machine model. For log-likelihood calculation, the deviation parameter is estimated by maximum likelihood method. Using a radial basis function kernel support vector machine, the contamination of a sample can be detected.
vapour Lightweight Access to the ‘Geospatial Data Abstraction Library’ (‘GDAL’)
Provides low-level access to ‘GDAL’ functionality for R packages. The aim is to minimize the level of interpretation put on the ‘GDAL’ facilities, to enable direct use of it for a variety of purposes. ‘GDAL’ is the ‘Geospatial Data Abstraction Library’ a translator for raster and vector geospatial data formats that presents a single raster abstract data model and single vector abstract data model to the calling application for all supported formats <http://…/>. Other available packages ‘rgdal’ and ‘sf’ also provide access to the ‘GDAL’ library, but neither can be used for these lower level tasks, and both do many other tasks.
varband Variable Banding of Large Precision Matrices
Implementation of the variable banding procedure for modeling local dependence and estimating precision matrices that is introduced in Yu & Bien (2016) and is available at <https://…/1604.07451>.
varbin Optimal Binning of Continuous and Categorical Variables
Tool for easy and efficient discretization of continuous and categorical data. The package calculates the most optimal binning of a given explanatory variable with respect to a user-specified target variable. The purpose is to assign a unique Weight-of-Evidence value to each of the calculated binpoints in order to recode the original variable. The package allows users to impose certain restrictions on the functional form on the resulting binning while maximizing the overall information value in the original data. The package is well suited for logistic scoring models where input variables may be subject to restrictions such as linearity by e.g. regulatory authorities. An excellent source describing in detail the development of scorecards, and the role of Weight-of-Evidence coding in credit scoring is (Siddiqi 2006, ISBN: 978-0-471-75451-0). The package utilizes the discrete nature of decision trees and Isotonic Regression to accommodate the trade-off between flexible functional forms and maximum information value.
VarED Variance Estimation using Difference-Based Methods
Generating functions for both optimal and ordinary difference sequences, and the difference-based estimation functions.
varhandle Functions for Robust Variable Handling
Variables are the fundamental parts of each programming language but handling them might be frustrating for programmers from time to time. This package contains some functions to help user (especially data explorers) to make more sense of their variables and take the most out of variables as well as their hardware. These functions are written, collected and crafted over some years of experience in statistical data analysis and for each of them there was a need. Functions in this package are suppose to be efficient and easy to use, hence they will be frequently updated to make them more convenient.
variables Variable Descriptions
Abstract descriptions of (yet) unobserved variables.
VariableScreening High-Dimensional Screening for Semiparametric Longitudinal Regression
Implements a screening procedure proposed by Chu, Li, and Reimherr (2016) for varying coefficient longitudinal models with ultra- high dimensional predictors <http://…/next_issue.html>. The effect of each predictor is allowed to vary over time, approximated by a low-dimensional B-spline. Within-subject correlation is handled using a generalized estimation equation approach with structure specified by the user. Variance is allowed to change over time, also approximated by a B-spline.
varian Variability Analysis in R
Uses a Bayesian model to estimate the variability in a repeated measure outcome and use that as an outcome or a predictor in a second stage model.
varImp RF Variable Importance for Arbitrary Measures
Computes the random forest variable importance (VIMP) for the conditional inference random forest (cforest) of the ‘party’ package. Includes a function (varImp) that computes the VIMP for arbitrary measures from the ‘measures’ package. For calculating the VIMP regarding the measures accuracy and AUC two extra functions exist (varImpACC and varImpAUC).
variosig Spatial Dependence Based on Empirical Variograms
Apply Monte Carlo permutation to compute pointwise variogram envelopes, and check spatial dependence using permutation test adjusted for multiple testing.
varjmcm Estimations for the Covariance of Estimated Parameters in Joint Mean-Covariance Models
The goal of the package is to equip the ‘jmcm’ package (current version 0.1.8.0) with estimations of the covariance of estimated parameters. Two methods are provided. The first method is to use the inverse of estimated Fisher’s information matrix, see M. Pourahmadi (2000) <doi:10.1093/biomet/87.2.425>, M. Maadooliat, M. Pourahmadi and J. Z. Huang (2013) <doi:10.1007/s11222-011-9284-6>, and W. Zhang, C. Leng, C. Tang (2015) <doi:10.1111/rssb.12065>. The second method is bootstrap based, see Liu, R.Y. (1988) <doi:10.1214/aos/1176351062> for reference.
varrank Heuristics Tools Based on Mutual Information for Variable Ranking
A computational toolbox of heuristics approaches for performing variable ranking and feature selection based on mutual information well adapted for multivariate system epidemiology datasets. The core function is a general implementation of the minimum redundancy maximum relevance model. R. Battiti (1994) <doi:10.1109/72.298224>. Continuous variables are discretized using a large choice of rule. Variables ranking can be learned with a sequential forward/backward search algorithm. The two main problems that can be addressed by this package is the selection of the most representative variable within a group of variables of interest (i.e. dimension reduction) and variable ranking with respect to a set of features of interest.
VarReg Semi-Parametric Variance Regression
Methods for fitting semi-parametric mean and variance models, with normal or censored data. Also extended to allow a regression in the location, scale and shape parameters.
VARSEDIG An Algorithm for Morphometric Characters Selection and Statistical Validation in Morphological Taxonomy
An algorithm which identifies the morphometric features that significantly discriminate two taxa and validates the morphological distinctness between them via a Monte-Carlo test, polar coordinates and overlap of the area under the density curve.
varSel Sequential Forward Floating Selection using Jeffries-Matusita Distance
Feature selection using Sequential Forward Floating feature Selection and Jeffries-Matusita distance. It returns a suboptimal set of features to use for image classification. Reference: Dalponte, M., Oerka, H.O., Gobakken, T., Gianelle, D. & Naesset, E. (2013). Tree Species Classification in Boreal Forests With Hyperspectral Data. IEEE Transactions on Geoscience and Remote Sensing, 51, 2632-2645, <DOI:10.1109/TGRS.2012.2216272>.
VarSelLCM Variable Selection for Model-Based Clustering using the Integrated Complete-Data Likelihood of a Latent Class Model
Variable Selection for model-based clustering by using a mixture model of Gaussian distributions assuming conditional independence between variables. The algorithm carries out the model selection by optimizing the MICL criterion which has a closed form for such a distribution.
VARsignR Sign Restrictions, Bayesian, Vector Autoregression Models
Provides routines for identifying structural shocks in vector autoregressions (VARs) using sign restrictions.
VARtests Tests for Error Autocorrelation and ARCH Errors in Vector Autoregressive Models
Implements the Wild bootstrap tests for autocorrelation in vector autoregressive models of Ahlgren, N. & Catani, P. (2016, <doi:10.1007/s00362-016-0744-0>) and the Combined LM test for ARCH in VAR models of Catani, P. & Ahlgren, N. (2016, <doi:10.1016/j.ecosta.2016.10.006>).
vaultr Vault Client for Secrets and Sensitive Data
Provides an interface to a ‘HashiCorp’ vault server over its http API (typically these are self-hosted; see <https://www.vaultproject.io> ). This allows for secure storage and retrieval of secrets over a network, such as tokens, passwords and certificates. Authentication with vault is supported through several backends including user name/password and authentication via ‘GitHub’.
VCA Variance Component Analysis
ANOVA-type estimation (prediction) of random effects and variance components in linear mixed models, is implemented. Random models, a sub-set of mixed models, can be fit applying a Variance Component Analysis (VCA). This is a special type of analysis frequently used in verifying the precision performance of diagnostics. The Satterthwaite approximation of the total degrees of freedom is implemented. There are several functions for extracting, random effects, fixed effects, variance-covariance matrices of random and fixed effects. Residuals can be extracted as raw, standardized and studentized residuals. Additionally, a variability chart is implemented for visualizing the variability in sub-classes emerging from an experimental design (‘varPlot’).
vcov Variance-Covariance Matrices and Standard Errors
Methods for faster extraction (about 5x faster in a few test cases) of variance-covariance matrices and standard errors from models. Methods in the ‘stats’ package tend to rely on the summary method, which may waste time computing other summary statistics which are summarily ignored.
vcr Record ‘HTTP’ Calls to Disk
Record test suite ‘HTTP’ requests and replays them during future runs. A port of the Ruby gem of the same name (<https://…/> ). Works by hooking into the ‘webmockr’ R package for matching ‘HTTP’ requests by various rules (‘HTTP’ method, ‘URL’, query parameters, headers, body, etc.), and then caching real ‘HTTP’ responses on disk in ‘cassettes’. Subsequent ‘HTTP’ requests matching any previous requests in the same ‘cassette’ use a cached ‘HTTP’ response.
vctrs Vector Helpers
Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analyzing function interfaces.
vdiffr Visual Regression Testing and Graphical Diffing
An extension to the ‘testthat’ package that makes it easy to add graphical unit tests. It provides a Shiny application to manage the test cases.
VDSPCalibration Statistical Methods for Designing and Analyzing a Calibration Study
Provides statistical methods for the design and analysis of a calibration study, which aims for calibrating measurements using two different methods. The package includes sample size calculation, sample selection, regression analysis with error-in measurements and change-point regression. The method is described in Tian, Durazo-Arvizu, Myers, et al. (2014) <DOI:10.1002/sim.6235>.
veccompare Perform Set Operations on Vectors, Automatically Generating All n-Wise Comparisons, and Create Markdown Output
Automates set operations (i.e., comparisons of overlap) between multiple vectors. It also contains a function for automating reporting in ‘RMarkdown’, by generating markdown output for easy analysis, as well as an ‘RMarkdown’ template for use with ‘RStudio’.
vegalite Tools to Encode Visualizations with the ‘Grammar of Graphics’-Like ‘Vega-Lite’ ‘Spec’
The ‘Vega-Lite’ ‘JavaScript’ framework provides a higher-level grammar for visual analysis, akin to ‘ggplot’ or ‘Tableau’, that generates complete ‘Vega’ specifications. Functions exist which enable building a valid ‘spec’ from scratch or importing a previously created ‘spec’ file. Functions also exist to export ‘spec’ files and to generate code which will enable plots to be embedded in properly configured web pages. The default behavior is to generate an ‘htmlwidget’.
vegan Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
vegawidget Htmlwidget Renderer for Vega and Vega-Lite
Vega and Vega-Lite parse text in JSON notation to render chart-specifications into HTML. This package is used to facilitate the rendering. It also provides a means to interact with signals, events, and datasets in a Vega chart using JavaScript or Shiny.
velox Fast Raster Manipulation and Extraction
C++ accelerated raster manipulation and extraction.
vembedr Functions to Embed Video in HTML
A set of functions for generating HTML to embed hosted video in your R Markdown documents or Shiny apps.
venn Draw Venn Diagrams
Draws and displays Venn diagrams up to 7 sets, and any boolean union of set intersections.
vennLasso Variable Selection for Heterogeneous Populations
Provides variable selection and estimation routines for models stratified based on binary factors.
vennplot Venn Diagrams in 2D and 3D
Calculate and plot Venn diagrams in 2D and 3D.
versions Query and Install Specific Versions of Packages on CRAN
Installs specified versions of R packages hosted on CRAN and provides functions to list available versions and the versions of currently installed packages. These tools can be used to help make R projects and packages more reproducible. ‘versions’ fits in the narrow gap between the ‘devtools’ install_version() function and the ‘checkpoint’ package. devtools::install_version() installs a stated package version from source files stored on the CRAN archives. However CRAN does not store binary versions of packages so Windows users need to have RTools installed and Windows and OSX users get longer installation times. ‘checkpoint’ uses the Revolution Analytics MRAN server to install packages (from source or binary) as they were available on a given date. It also provides a helpful interface to detect the packages in use in a directory and install all of those packages for a given date. ‘checkpoint’ doesn’t provide install.packages-like functionality however, and that’s what ‘versions’ aims to do, by querying MRAN. As MRAN only goes back to 2014-09-17, ‘versions’ can’t install packages from before this date.
VertexSimilarity Creates Vertex Similarity Matrix for an Undirected Graph
Creates Vertex Similarity matrix of an undirected graph based on the method stated by E. A. Leicht, Petter Holme, AND M. E. J. Newman in their paper <DOI:10.1103/PhysRevE.73.026120>.
VertexSort Network Decomposition and Randomization
Permits to apply the ‘Vertex Sort’ algorithm (Jothi et al. (2009) <10.1038/msb.2009.52>) to a graph in order to elucidate its hierarchical structure. It also allows graphic visualization of the sorted graph by exporting the results to a cytoscape friendly format. Moreover, it offers five different algorithms of graph randomization: 1) Randomize a graph with preserving node degrees, 2) with preserving similar node degrees, 3) without preserving node degrees, 4) with preserving node in-degrees and 5) with preserving node out-degrees.
VeryLargeIntegers Very Large Integers: Store and Manage Arbitrarily Big Integers
Multi-precission library that allows to store and manage arbitrarily big integers without loss of precision. It includes a large list of tools to work with them, like: -Arithmetic and logic operators -Modular-arithmetic operators -Computer Number Theory utilities -Probabilistic primality tests -Factorization algorithms -Radom generators of diferent kinds of integers.
vesselr Gradient and Vesselness Tools for Arrays and NIfTI Images
Simple functions capable of providing gradient, hessian, and vesselness for a given 3-dimensional volume.
vetr Trust, but Verify
Declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not.
vfcp Computation of v Values for U and Copula C(U, v)
Computation the value of one of two uniformly distributed marginals if the copula probability value is known and the value of the second marginal is also known. Computation and plotting corresponding cumulative distribution function or survival function.
vfprogression Visual Field (VF) Progression Analysis and Plotting Methods
Realization of published methods to analyze visual field (VF) progression. Introduction to the plotting methods (designed by author TE) for VF output visualization. A sample dataset for two eyes, each with 10 follow-ups is included. The VF analysis methods could be found in — Musch et al. (1999) <doi:10.1016/S0161-6420(99)90147-1>, Nouri-Mahdavi et at. (2012) <doi:10.1167/iovs.11-9021>, Schell et at. (2014) <doi:10.1016/j.ophtha.2014.02.021>, Aptel et al. (2015) <doi:10.1111/aos.12788>.
VGAM Vector Generalized Linear and Additive Models
An implementation of about 6 major classes of statistical regression models. At the heart of it are the vector generalized linear and additive model (VGLM/VGAM) classes. Currently only fixed-effects models are implemented, i.e., no random-effects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs (i.e., with smoothing). The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)—these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change, especially before version 1.0.0 is released; see the NEWS file for latest changes.
VGAMextra Additions and Extensions of the ‘VGAM’ Package
Extending the functionalities of the ‘VGAM’ package with additional functions and datasets. At present, ‘VGAMextra’ comprises new family functions (ffs) to estimate several time series models by maximum likelihood using Fisher scoring, unlike popular packages in CRAN relying on optim(), including ARMA-GARCH-like models, the Order-(p, d, q) ARIMAX model (non- seasonal), the Order-(p) VAR model, error correction models for cointegrated time series, and ARMA-structures with Student-t errors. For independent data, new ffs to estimate the inverse- Weibull, the inverse-gamma, the generalized beta of the second kind and the general multivariate normal distributions are available. In addition, ‘VGAMextra’ incorporates new VGLM-links for the mean-function, and the quantile-function (as an alternative to ordinary quantile modelling) of several 1-parameter distributions, that are compatible with the class of VGLM/VGAM family functions. Currently, only fixed-effects models are implemented. All functions are subject to change; see the NEWS for further details on the latest changes.
vhica Vertical and Horizontal Inheritance Consistence Analysis
The ‘Vertical and Horizontal Inheritance Consistence Analysis’ method is described in the following publication: ‘VHICA: a new method to discriminate between vertical and horizontal transposon transfer: application to the mariner family within Drosophila’ by G. Wallau. et al. (2016) <DOI:10.1093/molbev/msv341>. The purpose of the method is to detect horizontal transfers of transposable elements, by contrasting the divergence of transposable element sequences with that of regular genes.
VICmodel The Variable Infiltration Capacity (VIC) Model
The Variable Infiltration Capacity (VIC) model is a macroscale hydrologic model that solves full water and energy balances, originally developed by Xu Liang at the University of Washington (UW). The version of VIC source code used is of 5.0.1 on <https://…/>, see Hamman et al. (2018). Development and maintenance of the current official version of the VIC model at present is led by the UW Hydro (Computational Hydrology group) in the Department of Civil and Environmental Engineering at UW. VIC is a research model and in its various forms it has been applied to most of the major river basins around the world, as well as globally. If you make use of this model, please acknowledge the appropriate references listed in the help page of this package or on the references page <http://…/> of the VIC official documentation website. These should include Liang et al. (1994) plus any references relevant to the features you are using Reference: Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges (1994), A simple hydrologically based model of land surface water and energy fluxes for general circulation models, J. Geophys. Res., 99(D7), 14415-14428, <doi:10.1029/94JD00483>. Hamman et al. (2018) about VIC 5.0.1 also can be considered: Hamman, J. J., Nijssen, B., Bohn, T. J., Gergel, D. R., and Mao, Y. (2018), The Variable Infiltration Capacity model version 5 (VIC-5): infrastructure improvements for new applications and reproducibility, Geosci. Model Dev., 11, 3481-3496, <doi:10.5194/gmd-11-3481-2018>.
VIFCP Detecting Change-Points via VIFCP Method
Contains a function to support the paper ‘A sequential multiple change-point detection procedure via VIF regression’.
VIM Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
vimp Nonparametric Variable Importance
Calculate point estimates of and valid confidence intervals for nonparametric variable importance measures in high and low dimensions, using flexible estimators of the underlying regression functions. For more information about the methods, please see Williamson et al. (2017) <https://…/>.
vinereg D-Vine Quantile Regression
Implements D-vine quantile regression models with parametric or nonparametric pair-copulas. See Kraus and Czado (2017) <doi:10.1016/j.csda.2016.12.009> and Schallhorn et al. (2017) <arXiv:1705.08310>.
vip Variable Importance Plots
A general framework for constructing variable importance plots from various types machine learning models in R. Aside from some standard model- based variable importance measures, this package also provides a novel approach based on partial dependence plots (PDPs) and individual conditional expectation (ICE) curves as described in Greenwell et al. (2018) <arXiv:1805.04755>.
vipor Plot Categorical Data Using Quasirandom Noise and Density Estimates
Generate a violin point plot, a combination of a violin/histogram plot and a scatter plot by offsetting points within a category based on their density using quasirandom noise.
VIRF Computation of Volatility Impulse Response Function of Multivariate Time Series
Computation of volatility impulse response function for multivariate time series model using algorithm by Jin, Lin and Tamvakis (2012) <doi.org/10.1016/j.eneco.2012.03.003>.
viridis Matplotlib Default Color Map
Port of the new Matplotlib default color map (‘viridis’) to R. This color map is designed in such a way that it will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. It is also designed to be perceived by readers with the most common form of color blindness.
Using the new ‘viridis’ colormap in R
viridisLite Default Color Maps from ‘matplotlib’ (Lite Version)
Port of the new ‘matplotlib’ color maps (‘viridis’ – the default -, ‘magma’, ‘plasma’ and ‘inferno’) to ‘R’. ‘matplotlib’ <http://matplotlib.org > is a popular plotting library for ‘python’. These color maps are designed in such a way that they will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. They are also designed to be perceived by readers with the most common form of color blindness. This is the ‘lite’ version of the more complete ‘viridis’ package that can be found at <https://…/package=viridis>.
virtualPollen Simulating Pollen Curves from Virtual Taxa with Different Life and Niche Traits
Tools to generate virtual environmental drivers with a given temporal autocorrelation, and to simulate pollen curves at annual resolution over millennial time-scales based on these drivers and virtual taxa with different life traits and niche features. It also provides the means to simulate quasi-realistic pollen-data conditions by applying simulated accumulation rates and given depth intervals between consecutive samples.
virtuoso Interface to ‘Virtuoso’ using ‘ODBC’
Provides users with a simple and convenient mechanism to manage and query a ‘Virtuoso’ database using the ‘DBI’ (DataBase Interface) compatible ‘ODBC’ (Open Database Connectivity) interface. ‘Virtuoso’ is a high-performance ‘universal server,’ which can act as both a relational database, supporting standard Structured Query Language (‘SQL’) queries, while also supporting data following the Resource Description Framework (‘RDF’) model for Linked Data. ‘RDF’ data can be queried using ‘SPARQL’ (‘SPARQL’ Protocol and ‘RDF’ Query Language) queries, a graph-based query that supports semantic reasoning. This allows users to leverage the performance of local or remote ‘Virtuoso’ servers using popular ‘R’ packages such as ‘DBI’ and ‘dplyr’, while also providing a high-performance solution for working with large ‘RDF’ triplestores from ‘R.’ The package also provides helper routines to install, launch, and manage a ‘Virtuoso’ server locally on ‘Mac’, ‘Windows’ and ‘Linux’ platforms using the standard interactive installers from the ‘R’ command-line. By automatically handling these setup steps, the package can make using ‘Virtuoso’ considerably faster and easier for a most users to deploy in a local environment. Managing the bulk import of triples from common serializations with a single intuitive command is another key feature of this package. Bulk import performance can be tens to hundreds of times faster than the comparable imports using existing ‘R’ tools, including ‘rdflib’ and ‘redland’ packages.
virustotal R Client for the Virustotal API
Use VirusTotal, a Google service that analyzes files and URLs for viruses, worms, trojans etc., provides category of the content hosted by a domain from a variety of prominent services, provides passive DNS information, among other things. See <http://www.virustotal.com> for more information.
visdat Preliminary Data Visualisation
Create preliminary exploratory data visualisations of an entire dataset to identify problems or unexpected features using ‘ggplot2’.
ViSiElse A Visual Tool for Behaviour Analysis
A graphical tool designed to visualize and to give an overview of behavioural observations realized on individuals or groups. ViSiElse allows visualization of raw data during experimental observations of the realization of a procedure. It graphically presents an overview of individuals and group actions usually acquired from timestamps during video recorded sessions. Options of the package allow adding graphical information as statistical indicators (mean, standard deviation, quantile or statistical test) but also for each action green or black zones providing visual information about the accuracy of the realized actions.
visNetwork R package, using vis.js library for network visualization
R package, using vis.js library for network visualization.
http://visjs.org
visova VISOVA (VISualization Of VAriance)
A novel method for exploratory data analysis. It is basically an extension of the trellis graphics and developing their grid concept with parallel coordinates, permitting visualization of many dimensions at once. This package includes functions allowing users to perform VISOVA analysis and compare different column/variable ordering methods for making the high-dimensional structures easier to perceive even when the data is complicated.
vistime Interactive R Timelines using ‘plotly.js’
Create fully interactive time lines or Gantt charts using ‘plotly.js’. The charts can be included in Shiny apps and manipulated via ‘plotly_build()’.
visTree Visualization of Subgroups for Decision Trees
Provides a visualization for characterizing subgroups defined by a decision tree structure. The visualization simplifies the ability to interpret individual pathways to subgroups; each sub-plot describes the distribution of observations within individual terminal nodes and percentile ranges for the associated inner nodes.
vistributions Visualize Probability Distributions
Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions.
visualR Generates a 3D Graphic, Plotting Stock Option Parameters Over Time
Generates a 3D graph which plots a selected stock option parameter over time. The default setting plots the net parameter position of a double vertical spread over time.
VisuClust Visualisation of Clusters in Multivariate Data
Displays multivariate data, based on Sammon’s nonlinear mapping.
visvow A Tool for the Visualization of Vowel Variation
Visualizes vowel variation in f0, F1, F2, F3 and duration.
vita Variable Importance Testing Approaches
Implements the novel testing approach by Janitza et al.(2015) <http://…ver.pl?urn=nbn:de:bvb:19-epub-25587-4> for the permutation variable importance measure in a random forest and the PIMP-algorithm by Altmann et al.(2010) <doi:10.1093/bioinformatics/btq134>. Janitza et al.(2015) <http://…ver.pl?urn=nbn:de:bvb:19-epub-25587-4> do not use the ‘standard’ permutation variable importance but the cross-validated permutation variable importance for the novel test approach. The cross-validated permutation variable importance is not based on the out-of-bag observations but uses a similar strategy which is inspired by the cross-validation procedure. The novel test approach can be applied for classification trees as well as for regression trees. However, the use of the novel testing approach has not been tested for regression trees so far, so this routine is meant for the expert user only and its current state is rather experimental.
vitae Curriculum Vitae for R Markdown
Provides templates and functions to simplify the production and maintenance of curriculum vitae.
vlad Variable Life Adjusted Display
Contains functions to set up risk-adjusted quality control charts in health care. For the variable life adjusted display (VLAD) proposed by Lovegrove et al. (1997) <doi:10.1016/S0140-6736(97)06507-0> and the risk-adjusted cumulative sum chart based on log-likelihood ratio statistic introduced by Steiner et al. (2000) <doi:10.1093/biostatistics/1.4.441> the average run length and control limits can be computed.
vMask Detect Small Changes in Process Mean using CUSUM Control Chart by v-Mask
The cumulative sum (CUSUM) control chart is considered to be an alternative or complementary to Shewhart control charts in statistical process control (SPC) applications, owing to its higher sensitivity to small shifts in the process mean. It utilizes all the available data rather than the last few ones used in Shewhart control charts for quick decision making. V-mask is a traditional technique for separating meaningful data from unusual circumstances in a Cumulative Sum (CUSUM) control chart; for see details about v-mask see Montgomery (1985, ISBN:978-0471656319). The mask is a V-shaped overlay placed on the CUSUM chart so that one arm of the V lines up with the slope of data points, making it easy to see data points that lie outside the slope and to determine whether these points should be discarded as random events, or treated as a performance trend that should be addressed. But, complex computations is one disadvantage V-mask method for detect small changes in mean using CUSUM control chart. Package ‘vMask’ can help to the applied users to overcome this challenge by considering six different methods which each of them are based on different information.
vmd Variational Mode Decomposition
A port and extension to the original ‘Matlab’ code made public by Dragomiretskiy and Zosso, for conducting Variational Mode Decomposition (VMD) as described within their 2013 publication (publication: <doi:10.1109/TSP.2013.2288675>, source: <https://goo.gl/fJH1d5> ).
vocaldia Create and Manipulate Vocalisation Diagrams
Create adjacency matrices of vocalisation graphs from dataframes containing sequences of speech and silence intervals, transforming these matrices into Markov diagrams, and generating datasets for classification of these diagrams by ‘flattening’ them and adding global properties (functionals) etc. Vocalisation diagrams date back to early work in psychiatry (Jaffe and Feldstein, 1970) and social psychology (Dabbs and Ruback, 1987) but have only recently been employed as a data representation method for machine learning tasks including meeting segmentation (Luz, 2012) <doi:10.1145/2328967.2328970> and classification (Luz, 2013) <doi:10.1145/2522848.2533788>.
volesti Volume Approximation and Sampling of Convex Polytopes
Provides an R interface for ‘volesti’ C++ package. ‘volesti’ computes estimations of volume of polytopes given by a set of points or linear inequalities or Minkowski sum of segments (zonotopes). There are two algorithms for volume estimation (I.Z. Emiris and V. Fisikopoulos (2014) <arXiv:1312.2873> and B. Cousins, S. Vempala (2016) <arXiv:1409.6011>) as well as algorithms for sampling, rounding and rotating polytopes. Moreover, ‘volesti’ provides algorithms for estimating copulas (L. Cales, A. Chalkis, I.Z. Emiris, V. Fisikopoulos (2018) <arXiv:1803.05861>).
voronoiTreemap Voronoi Treemaps with Added Interactivity by Shiny
The d3.js framework with the plugins d3-voronoi-map, d3-voronoi-treemap and d3-weighted-voronoi are used to generate Voronoi treemaps in R and in a shiny application. The computation of the Voronoi treemaps are based on Nocaj and Brandes (2012) <doi:10.1111/j.1467-8659.2012.03078.x>.
vortexR Post Vortex Simulation Analysis
Facilitate Post Vortex Simulation Analysis by offering tools to collate multiple Vortex (v10) output files into one R object, and analyse the collated output statistically. Vortex is a software for the development of individual-based model for population dynamic simulation (see <http://…/Vortex10.aspx> ).
vosonSML Tools for Collecting Social Media Data and Generating Networks for Analysis
A suite of tools for collecting and constructing networks from social media data. Provides easy-to-use functions for collecting data across popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis.
votesys Voting Systems, Instant-Runoff Voting, Borda Method, Various Condorcet Methods
Various methods to count ballots in voting systems are provided: Instant-runoff voting described in Reynolds, Reilly and Ellis (2005, ISBN:9789185391189), Borda method in Emerson (2013) <doi:10.1007/s00355-011-0603-9>, original Condorcet method in Stahl and Johnson (2017, ISBN:9780486807386), Dodgson method in McCabe-Dansted and Slinko (2008) <doi:10.1007/s00355-007-0282-8>, Simpson-Kramer method in Levin and Nalebuff (1995) <doi:10.1257/jep.9.1.3>, Schulze method in Schulze (2011) <doi:10.1007/s00355-010-0475-4>, Ranked pairs method in Tideman (1987) <doi:10.1007/BF00433944>. Functions to check validity of ballots are also provided to ensure flexibility.
vpc Create Visual Predictive Checks
Visual predictive checks are a commonly used diagnostic plot in pharmacometrics, showing how certain statistics (percentiles) for observed data compare to those same statistics for data simulated from a model. The package can generate VPCs for continuous, categorical, censored, and (repeated) time-to-event data.
vrcp Change Point Estimation for Regression with Varying Segments and Heteroscedastic Variances
Estimation of varying regression segments and a change point in 2-segment regression models with heteroscedastic variances, and with or without a smoothness constraint at the change point.
vroom Read and Write Rectangular Text Data Quickly
The goal of ‘vroom’ is to read and write data (like ‘csv’, ‘tsv’ and ‘fwf’) quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.
VRPM Visualizing Risk Prediction Models
This is a package to visualize risk prediction models. For each predictor, a color bar represents the contribution to the linear predictor or latent variable. A conversion from the linear predictor to the estimated risk or survival is also given. (Cumulative) contribution charts enable to visualize how the estimated risk for one particular observation is obtained by the model. Several options allow to choose different color maps, and to select the zero level of the contributions. The package is able to deal with ‘glm’, ‘coxph’, ‘mfp’, ‘multinom’ and ‘ksvm’ objects. For ‘ksvm’ objects, the visualization is not always exact. Functions providing tools to indicate the accuracy of the approximation are provided in addition to the visualization.
VSE Variant Set Enrichment
Calculates the enrichment of associated variant set (AVS) for an array of genomic regions. The AVS is the collection of disjoint LD blocks computed from a list of disease associated SNPs and their linked (LD) SNPs. VSE generates a null distribution of matched random variant sets (MRVSs) from 1000 Genome Project Phase III data that are identical to AVS, LD block by block. It then computes the enrichment of AVS intersecting with user provided genomic features (e.g., histone marks or transcription factor binding sites) compared with the null distribution.
vsgoftest Goodness-of-Fit Tests Based on Kullback-Leibler Divergence
An implementation of Vasicek and Song goodness-of-fit tests. Several functions are provided to estimate differential Shannon entropy, i.e., estimate Shannon entropy of real random variables with density, and test the goodness-of-fit of some family of distributions, including uniform, Gaussian, log-normal, exponential, gamma, Weibull, Pareto, Fisher, Laplace and beta distributions; see Lequesne and Regnault (2018) <arXiv:1806.07244>.
vstsr Access to Visual Studio Team Services API via R
Implementation of Visual Studio Team Services <https://…/> API calls. It enables the extraction of information about repositories, build and release definitions and individual releases. It also helps create repositories and work items within a project without logging into Visual Studio. There is the ability to use any API service with a shell for any non-predefined call.
VSURF Variable Selection Using Random Forests
Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose.
vtable Variable Table
Automatically generates HTML variable documentation including variable names, labels, classes, value labels (if applicable), value ranges, and summary statistics. See the vignette ‘vtable’ for a package overview.
vtreat Variable treatment for R data frames
Variable treatment package for R data frames from Win-Vector LLC.
http://…esigning-a-package-for-variable-treatment
GitHub
vtree Display Information About Nested Subsets of a Data Frame
A tool for drawing ‘variable trees’. Variable trees display information about hierarchical subsets of a data frame defined by values of categorical variables.
VWPre Tools for Preprocessing Visual World Data
Gaze data from the Visual World Paradigm requires significant preprocessing prior to plotting and analyzing the data. This package provides functions for preparing visual world eye-tracking data for statistical analysis and plotting. It can prepare data for either linear analyses (e.g., ANOVA, Gaussian-family LMER, Gaussian-family GAMM) as well as logistic analyses (e.g., binomial-family LMER and binomial-family GAMM). Additionally, it contains a plotting function for creating grand average and conditional average plots. See the vignette for samples of the functionality. Currently, the functions in this package are designed for handling data collected with SR Research Eyelink eye trackers using Sample Reports created in SR Research Data Viewer; however, in subsequent releases we would like to add functionality for data collected with Tobii and SMI systems.

W

WACS Multivariate Weather-State Approach Conditionally Skew-Normal Generator
Multivariate weather generator for daily climate variables based on weather-states using a Markov chain for modeling the succession of weather states. Conditionally to the weather states, the multivariate variables are modeled using the family of Complete Skew-Normal distributions. Parameters are estimated on measured series. Data must include the variable ‘Rain’ and can accept as many other variables as desired.
waffle Create Waffle Chart Visualizations in R
Square pie charts (a.k.a. waffle charts) can be used to communicate parts of a whole for categorical quantities. To emulate the percentage view of a pie chart, a 10×10 grid should be used with each square representing 1% of the total. Modern uses of waffle charts do not necessarily adhere to this rule and can be created with a grid of any rectangular shape. Best practices suggest keeping the number of categories small, just as should be done when creating pie charts.
waiter Loading Screen for ‘Shiny’
Full screen splash loading screens for ‘Shiny’.
wakefield Generate random data sets
wakefield is designed to quickly generate random data sets. The user passes n (number of rows) and predefined vectors to the r_data_frame function to produce a dplyr::tbl_df object.
https://…/random-data-sets-quickly
Github
walker Efficient Bayesian Linear Regression with Time-Varying Coefficients
Fully Bayesian linear regression where the regression coefficients are allowed to vary over ‘time’, either as independent random walks. All computations are done using Hamiltonian Monte Carlo provided by Stan, using a state space representation of the model in order to marginalise over the coefficients for efficient sampling.
walkr Random Walks the Intersection of the N-Simplex and Hyperplanes
Consider the intersection of two spaces: the complete solution space to Ax = b and the N-Simplex. The intersection of these two spaces is a convex polytope. The package walkr samples from this intersection using two Monte-Carlo Markov Chain (MCMC) methods: hit-and-run and Dikin walk. Walkr also provide tools to examine sample quality.
wally The Wally Calibration Plot for Risk Prediction Models
A prediction model is calibrated if, roughly, for any percentage x we can expect that x subjects out of 100 experience the event among all subjects that have a predicted risk of x%. A calibration plot provides a simple, yet useful, way of assessing the calibration assumption. The Wally plot consists of a sequence of usual calibration plots. Among the plots contained within the sequence, one is the actual calibration plot which has been obtained from the data and the others are obtained from similar simulated data under the calibration assumption. It provides the investigator with a direct visual understanding of the shape and sampling variability that are common under the calibration assumption. The original calibration plot from the data is included randomly among the simulated calibration plots, similarly to a police lineup. If the original calibration plot is not easily identified then the calibration assumption is not contradicted by the data. The method handles the common situations in which the data contain censored observations and occurrences of competing events.
walmartAPI Walmart Open API Wrapper
Provides API access to the Walmart Open API <https://…/>, that contains data about stores, Value of the day and products which includes names, sale prices, shipping rates and taxonomies.
walrus Robust Statistical Methods
A toolbox of common robust statistical tests, including robust descriptives, robust t-tests, and robust ANOVA. It is also available as a module for ‘jamovi’ (see <https://www.jamovi.org> for more information). Walrus is based on the WRS2 package by Patrick Mair, which is in turn based on the scripts and work of Rand Wilcox. These analyses are described in depth in the book ‘Introduction to Robust Estimation & Hypothesis Testing’.
wand Retrieve ‘Magic’ Attributes from Files and Directories
The ‘libmagic’ library provides functions to determine ‘MIME’ type and other metadata from files through their ‘magic’ attributes. This is useful when you do not wish to rely solely on the honesty of a user or the extension on a file name. It also incorporates other metadata from the mime-db database <https://…/mime-db>.
washeR Time Series Outlier Detection (washer)
Time series outlier detection by mean of non parametric test. Outlier detection regarding two methodologies: single time series variability (vector) and grouped time series similarity (data.frame).
waterfall Waterfall Charts
Provides support for creating waterfall charts in R using both traditional base and lattice graphics.
waterfalls Create Waterfall Charts
There seems to be no simple way to create ‘waterfall charts’ in ‘ggplot2’ currently. This package contains a single function (waterfall) that draws simply a waterfall chart as a ‘ggplot2’ object. Some flexibility in the appearance is available.
wavefunction Wave Function Representation of Real Distributions
Real probability distributions can be represented as the square of an orthogonal sum in the Hermite basis. This representation is formally similar to the representation of quantum mechanical states as wave functions, whose squared modulus is a probability density. This is described in more detail in ‘Wave function representation of probability distributions,’ by Madeleine B. Thompson <arXiv:1712.07764>. This package provides a reference implementation of the technique.
WaveletANN Wavelet ANN Model
Fits hybrid Wavelet ANN model for time series forecasting using algorithm by Anjoy and Paul (2017) <DOI: 10.1007/s00521-017-3289-9>.
WaveletArima Wavelet ARIMA Model
Fits hybrid Wavelet ARIMA model for time series forecasting using algorithm by Aminghafari and Poggi (2012) <doi:10.1142/S0219691307002002>.
WaveletComp Computational Wavelet Analysis
Wavelet analysis and reconstruction of time series, cross-wavelets and phase-difference (with filtering options), significance with simulation algorithms.
WaveLetLongMemory Estimating Long Memory Parameter using Wavelet
Estimation of the long memory parameter using wavelets. Other estimation techniques like GPH (Geweke and Porter-Hudak,1983, <DOI:10.1111/j.1467-9892.1983.tb00371.x>) and Semiparametric methods(Robinson, P. M.,1995, <DOI:10.1214/aos/1176324317>) also have included.
WaverR Data Estimation using Weighted Averages of Multiple Regressions
For multivariate datasets, this function enables the estimation of missing data using the Weighted AVERage of all possible Regressions using the data available.
wavScalogram Wavelet Scalogram Tools for Time Series Analysis
Provides scalogram based wavelet tools for time series analysis: wavelet power spectrum, scalogram, windowed scalogram, windowed scalogram difference (see Bolos et al. (2017) <doi:10.1016/j.amc.2017.05.046>), scale index and windowed scale index (Benitez et al. (2010) <doi:10.1016/j.camwa.2010.05.010>).
wBoot wBootstrap Routines
Supplies bootstrap alternatives to traditional hypothesis-test and confidence-interval procedures such as one-sample and two-sample inferences for means, standard deviations, and proportions; simple linear regression; and more. Suitable for general audiences, including individual and group users, introductory statistics courses, and more advanced statistics courses that desire an introduction to bootstrap methods.
wbstats Programmatic Access to Data and Statistics from the World Bank API
Tools for searching and downloading data and statistics from the World Bank Data API (<http://…/api-overview> ) and the World Bank Data Catalog API (<http://…/data-catalog-api> ).
wbsts Multiple Change-Point Detection for Nonstationary Time Series
Implements detection for the number and locations of the change-points in a time series using the Wild Binary Segmentation and the Locally Stationary Wavelet model.
wccsom SOM Networks for Comparing Patterns with Peak Shifts
SOMs can be useful tools to group patterns containing several peaks. If peaks do not always occur at exactly the same position, classical distance measures cannot be used. This package provides SOM technology using the weighted crosscorrelation (WCC) distance.
WCE Weighted Cumulative Exposure Models
WCE implements a flexible method for modeling cumulative effects of time-varying exposures, weighted according to their relative proximity in time, and represented by time-dependent covariates. The current implementation estimates the weight function in the Cox proportional hazards model. The function that assigns weights to doses taken in the past is estimated using cubic regression splines.
wCorr Weighted Correlations
Calculates Pearson, Spearman, polychoric, and polyserial correlation coefficients, in weighted or unweighted form. The package implements tetrachoric correlation as a special case of the polychoric and biserial correlation as a specific case of the polyserial.
wdm Weighted Dependence Measures
Provides efficient implementations of weighted dependence measures and related asymptotic tests for independence. Implemented measures are the Pearson correlation, Spearman’s rho, Kendall’s tau, Blomqvist’s beta, and Hoeffding’s D; see, e.g., Nelsen (2006) <doi:10.1007/0-387-28678-0> and Hollander et al. (2015, ISBN:9780470387375).
wdman Webdriver’/’Selenium’ Binary Manager
There are a number of binary files associated with the ‘Webdriver’/’Selenium’ project (see <http://…/>, <https://…/>, <https://…/geckodriver>, <http://…/download.html> and <https://…/InternetExplorerDriver> for more information). This package provides functions to download these binaries and to manage processes involving them.
webddx Interact with Online Differential Diagnosis-Generating Tools
Freely available online differential-diagnosis generating tools are changing clinical medicine and biomedical research. With webddx, useRs can generate differential diagnosis lists given a set of symptoms. The web tools would likely be directly used in clinical practice, but programmatic interaction and data manipulation can sharply increase efficiency and reproducibility of research in clinical informatics. Relevant visualizations can also be created with webddx.
webdriver WebDriver’ Client for ‘PhantomJS’
A client for the ‘WebDriver’ ‘API’. It allows driving a (probably headless) web browser, and can be used to test web applications, including ‘Shiny’ apps. In theory it works with any ‘WebDriver’ implementation, but it was only tested with ‘PhantomJS’.
webex Create Interactive Web Exercises in ‘R Markdown’
Functions for easily creating interactive web pages using ‘R Markdown’ that students can use in self-guided learning.
WebGestaltR The R Version of WebGestalt
The web version WebGestalt <http://www.webgestalt.org> supports 12 organisms, 324 gene identifiers and 150,937 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis. The user-friendly output interface allow interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneous analyze multiple gene lists.
webglobe 3D Interactive Globes
Displays geospatial data on an interactive 3D globe in the web browser.
webmockr Stubbing and Setting Expectations on ‘HTTP’ Requests
Stubbing and setting expectations on ‘HTTP’ requests. Includes tools for stubbing ‘HTTP’ requests, including expected request conditions and response conditions. Match on ‘HTTP’ method, query parameters, request body, headers and more.
WebPower Basic and Advanced Statistical Power Analysis
This is a collection of tools for conducting both basic and advanced statistical power analysis including correlation, proportion, t-test, one-way ANOVA, two-way ANOVA, linear regression, logistic regression, Poisson regression, mediation analysis, longitudinal data analysis, structural equation modeling and multilevel modeling. It also serves as the engine for conducting power analysis online at <https://webpower.psychstat.org>.
webreadr Tools for Reading Formatted Access Log Files
R is used by a vast array of people for a vast array of purposes – including web analytics. This package contains functions for consuming and munging various common forms of request log, including the Common and Combined Web Log formats and AWS logs.
websearchr Access Domains and Search Popular Websites
Functions that allow for accessing domains and a number of search engines.
webshot Take Screenshots of Web Pages
Takes screenshots of web pages, including Shiny applications.
webuse Import Stata ‘webuse’ Datasets
A Stata-style ‘webuse()’ function for importing named datasets from Stata’s online collection.
wec Weighted Effect Coding
Provides functions to create factor variables with contrasts based on weighted effect coding. In weighted effect coding the estimates from a first order regression model show the deviations per group from the sample mean. This is especially useful when a researcher has no directional hypotheses and uses a sample from a population in which the number of observation per group is different The package also provides functionality for interactions between two factor variables based on weighted effect coding. Please note that this is a beta version: while functional, it does not follow all R conventions.
wedge The Exterior Calculus
Provides functionality for working with differentials, k-forms, wedge products, Stokes’s theorem, and related concepts from the exterior calculus. The canonical reference would be: M. Spivak (1965, ISBN:0-8053-9021-9). ‘Calculus on Manifolds’, Benjamin Cummings.
weibullness Goodness-of-Fit Test for Weibull (Weibullness Test)
Performs a goodness-of-fit test of Weibull distribution (weibullness test). For more details, see Park (2017) <http://…/2848>. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. NRF-2017R1A2B4004169).
WeibullR Weibull Analysis for Reliability Engineering
Life data analysis in the graphical tradition of Waloddi Weibull. Methods derived from Robert B. Abernethy (2008, ISBN 0-965306-3-2), Wayne Nelson (1982, ISBN: 9780471094586) <DOI:10.1002/0471725234>, William Q. Meeker and Lois A. Escobar (1998, ISBN: 1-471-14328-6), John I. McCool, (2012, ISBN: 9781118217986) <DOI:10.1002/9781118351994>.
weibulltools Statistical Methods for Life Data Analysis
Contains methods for examining bench test or field data using the well-known Weibull Analysis. It includes Monte Carlo simulation for estimating the life span of products that have not failed, taking account of registering and reporting delays as stated in (Verband der Automobilindustrie e.V. (VDA), 2016, <ISSN:0943-9412>). If the products looked upon are vehicles, the covered mileage can be estimated as well. It also provides non-parametric estimators like Median Ranks, Kaplan-Meier (Abernethy, 2006, <ISBN:978-0-9653062-3-2>), Johnson (Johnson, 1964, <ISBN:978-0444403223>), and Nelson-Aalen for failure probability estimation within samples that contain failures as well as censored data. Methods for estimating the parameters of lifetime distributions, like Maximum Likelihood and Median-Rank Regression, (Genschel and Meeker, 2010, <DOI:10.1080/08982112.2010.503447>) as well as the computation of confidence intervals of quantiles and probabilities using the delta method related to Fisher’s confidence intervals (Meeker and Escobar, 1998, <ISBN:9780471673279>) and the beta-binomial confidence bounds are also included. If desired, the data can automatically be divided into subgroups using segmented regression. Besides the calculation, methods for interactive visualization of the edited data using *Plotly* are provided as well. These visualizations include the layout of a probability plot for a specified distribution, the graphical technique of probability plotting and the possibility of adding regression lines and confidence bounds to existing plots.
Weighted.Desc.Stat Weighted Descriptive Statistics
Weighted descriptive statistics is the discipline of quantitatively describing the main features of real-valued fuzzy data which usually given from a fuzzy population. One can summarize this special kind of fuzzy data numerically or graphically using this package. To interpret some of the properties of one or several sets of real-valued fuzzy data, numerically summarize is possible by some weighted statistics which are designed in this package such as mean, variance, covariance and correlation coefficent. Also, graphically interpretation can be given by weighted histogram and weighted scatter plot using this package to describe properties of real-valued fuzzy data set.
WeightedROC Fast, Weighted ROC Curves
Fast computation of Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) for weighted binary classification problems (weights are example-specific cost values).
WeightIt Weighting for Covariate Balance in Observational Studies
Generates weights to form equivalent groups in observational studies by easing and extending the functionality of the R packages ‘twang’ (Ridgeway et al., 2017) <https://…/package=twang> for generalized boosted modeling, ‘CBPS’ (Fong, Ratkovic, & Imai, 2017) <https://…/package=CBPS> for covariate balancing propensity score weighting, ‘ebal’ (Hainbueller, 2014) <https://…/package=ebal> for entropy balancing, and ‘ATE’ (Haris & Chan, 2015) <https://…/package=ATE> for empirical balancing calibration weighting. Also allows for assessment of weights and checking of covariate balance by interfacing directly with ‘cobalt’ (Greifer, 2017) <https://…/package=cobalt>.
weightQuant Weights for Incomplete Longitudinal Data and Quantile Regression
Estimation of observation-specific weights for incomplete longitudinal data and bootstrap procedure for weighted quantile regressions.
weightr Estimating Weight-Function Models for Publication Bias in R
Set of functions for estimating the Vevea and Hedges (1995) weight-function model in R. By specifying arguments, users can also estimate the modified model described in Vevea and Woods (2005), which may be more practical with small datasets. Users can also specify moderators to estimate a linear model. The package functionality allows users to easily extract the results of these analyses as R objects for other uses. In addition, the package includes a function to launch both models as a Shiny application. Although the Shiny application is also available online, this function allows users to launch it locally if they choose.
welchADF Welch-James Statistic for Robust Hypothesis Testing under Heterocedasticity and Non-Normality
Implementation of Johansen’s general formulation of Welch-James’s statistic with Approximate Degrees of Freedom, which makes it suitable for testing any linear hypothesis concerning cell means in univariate and multivariate mixed model designs when the data pose non-normality and non-homogeneous variance. Some improvements, namely trimmed means and Winsorized variances, and bootstrapping for calculating an empirical critical value, have been added to the classical formulation. The code departs from a previous SAS implementation by L.M. Lix and H.J. Keselman, available at <http://…/Program.pdf> and published in Keselman, H.J., Wilcox, R.R., and Lix, L.M. (2003) <DOI:10.1111/1469-8986.00060>.
wellknown Convert Between ‘WKT’ and ‘GeoJSON’
Convert ‘WKT’ to ‘GeoJSON’ and ‘GeoJSON’ to ‘WKT’. Functions included for converting between ‘GeoJSON’ to ‘WKT’, creating both ‘GeoJSON’ features, and non-features, creating WKT from R objects (e.g., lists, data.frames, vectors), and linting ‘WKT’.
WeMix Weighted Mixed-Effects Models, using Multilevel Pseudo Maximum Likelihood Estimation
Run mixed-effects models that include weights at every level. The ‘WeMix’ package fits a Weighted Mixed model, also known as a multilevel, mixed, or hierarchical linear models. The weights could be inverse selection probabilities, such as those developed for an education survey where schools are sampled probabilistically, and then students inside of those schools are sampled probabilistically. Although mixed-effects models are already available in ‘R’, ‘WeMix’ is unique in implementing methods for mixed models using weights at multiple levels. The model is fit using adaptive quadrature following the methodology of Rabe-Hesketh, S., and Skrondal, A. (2006) <doi:10.1111/j.1467-985X.2006.00426.x>.
wevid Quantifying Performance of a Binary Classifier Through Weight of Evidence
The distributions of the weight of evidence (log Bayes factor) favouring case over noncase status in a test dataset (or test folds generated by cross-validation) can be used to quantify the performance of a diagnostic test (McKeigue P., Quantifying performance of a diagnostic test as the expected information for discrimination: relation to the C-statistic. Statistical Methods for Medical Research 2018, in press). The package can be used with any test dataset on which you have observed case-control status and have computed prior and posterior probabilities of case status using a model learned on a training dataset. To quantify how the predictor will behave as a risk stratifier, the quantiles of the distributions of weight of evidence in cases and controls can be calculated and plotted.
wfg Weighted Fast Greedy Algorithm
Implementation of Weighted Fast Greedy algorithm for community detection in networks with mixed types of attributes.
wgeesel Weighted Generalized Estimating Equations and Model Selection
Weighted generalized estimating equations (WGEE) is an extension of generalized linear models to longitudinal clustered data by incorporating the correlation within-cluster when data is missing at random (MAR). The parameters in mean, scale correlation structures are estimated based on quasi-likelihood. Multiple model selection criterion are provided for selection of mean model and working correlation structure based on WGEE/GEE.
wheatmap Incrementally Build Complex Plots using Natural Semantics
Builds complex plots, heatmaps in particular, using natural semantics. Bigger plots can be assembled using directives such as ‘LeftOf’, ‘RightOf’, ‘TopOf’, and ‘Beneath’ and more. Other features include clustering, dendrograms and integration with ‘ggplot2’ generated grid objects. This package is particularly designed for bioinformaticians to assemble complex plots for publication.
whereami Reliably Return the Source and Call Location of a Command
Robust and reliable functions to return informative outputs to console with the run or source location of a command. This can be from the ‘RScript’/R terminal commands or ‘RStudio’ console, source editor, ‘Rmarkdown’ document and a Shiny application.
whiboclustering White Box Clustering Algorithm Design
White Box Cluster Algorithm Design allows you to create Representative based cluster algorithm by using reusable components. This way one can recreate already available cluster algorithms (i.e. K-Means, K-Means++, PAM) but also create new cluster algorithms not available in the literature or any other software. For more information see papers <doi:10.1007/s10462-009-9133-6> and <doi:10.1016/j.datak.2012.03.005>.
whitebox WhiteboxTools’ R Frontend
An R frontend of the ‘WhiteboxTools’ library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph’s Geomorphometry and Hydrogeomatics Research Group. ‘WhiteboxTools’ can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. ‘WhiteboxTools’ also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
whitening Whitening and High-Dimensional Canonical Correlation Analysis
Implements the whitening methods (ZCA, PCA, Cholesky, ZCA-cor, and PCA-cor) discussed in Kessy, Lewin, and Strimmer (2018) ‘Optimal whitening and decorrelation’, The American Statistician, <doi:10.1080/00031305.2016.1277159>, as well as the whitening approach to Canonical Correlation Analysis allowing negative canonical correlations described in Jendoubi and Strimmer (2018) ‘Probabilistic canonical correlation analysis: a whitening approach’, <arXiv:1802.03490>.
whoapi A ‘Whoapi’ API Client
Retrieve data from the ‘Whoapi’ (https://whoapi.com ) store of domain information, including a domain’s geographic location, registration status and search prominence.
wicket Utilities to Handle WKT Spatial Data
Utilities to generate bounding boxes from ‘WKT’ (Well-Known Text) objects and R data types, validate ‘WKT’ objects and convert object types from the ‘sp’ package into ‘WKT’ representations.
widgetframe Htmlwidgets’ in Responsive ‘iframes’
Provides two functions ‘frameableWidget()’, and ‘frameWidget()’. The ‘frameableWidget()’ is used to add extra code to a ‘htmlwidget’ which allows is to be rendered correctly inside a responsive ‘iframe’. The ‘frameWidget()’ is a ‘htmlwidget’ which displays content of another ‘htmlwidget’ inside a responsive ‘iframe’. These functions allow for easier embedding of ‘htmlwidgets’ in content management systems such as ‘wordpress’, ‘blogger’ etc. They also allow for separation of widget content from main HTML content where CSS of the main HTML could interfere with the widget.
widyr Widen, Process, then Re-Tidy Data
Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.
WikidataQueryServiceR API Client Library for ‘Wikidata Query Service’
An API client for the ‘Wikidata Query Service’ <https://…/>.
WikidataR API Client Library for ‘Wikidata’
An API client for the ‘Wikidata’ store of semantic data.
WikipediaR R-Based Wikipedia Client
Provides an interface to the Wikipedia web application programming interface (API), using internet connection. Three functions provide details for a specific Wikipedia page: all links that are present, all pages that link to, all the contributions (revisions for main pages, and discussions for talk pages). Two functions provide details for a specific user: all contributions, and general information (as name, gender, rights or groups). It provides additional information compared to others packages, as ‘WikipediR’. It does not need login. The multiplex network that can be constructed from the results of the functions of ‘WikipediaR’ can be modeled as Stochastic Block Model as in Barbillon P., Donnet, S., Lazega E., and Bar-Hen A.: “Stochastic Block Models for Multiplex networks: an application to networks of researchers”, ArXiv 1501.06444, http://…/1501.06444 .
wikipediatrend Public Subject Attention via Wikipedia Page Access Statistics
Public attention is an interesting field of study. The internet not only allows to access information in no time on virtually any subject but via page access statistics gathered by website authors the subject of attention as well can be studied. For the omnipresent Wikipedia those access statistics are made available via ‘http://stats.grok.se ‘ a server providing the information as file dumps as well as as web API. This package provides an easy to use, consistent and traffic minimizing approach to make those data accessible within R.
Using Wikipediatrend
WikiSocio A MediaWiki API Wrapper
MediaWiki is wiki platform. Providing the infrastructure of Wikipedia, it also offers very sophisticated archiving functionalities. This package is built to store these wiki’s archives to R object – data-frame, lists, vector and variables. All data are downloaded with the help of MediaWiki REST API. For instance, you can get all revisions made by a contributor – contrib_list(), all the revisions of a page page_revisions(), or create corpus of contributors – corpus_contrib_create() – and pages corpus_page_create(). Then, you can make these corpus rich of data about contributors or pages – corpus_contrib_data() or corpus_page_data().
wikisourcer Download Public Domain Works from Wikisource
Download public domain works from Wikisource <https://…/>, a free library from the Wikimedia Foundation project.
wikitaxa Taxonomic Information from ‘Wikipedia’
Taxonomic’ information from ‘Wikipedia’, ‘Wikicommons’, ‘Wikispecies’, and ‘Wikidata’. Functions included for getting taxonomic information from each of the sources just listed, as well performing taxonomic search.
wildcard Templates for Data Frames
Generate data frames from templates.
wilson Web-Based Interactive Omics Visualization
Tool-set of modules for creating web-based applications that use plot based strategies to visualize and analyze multi-omics data. This package utilizes the ‘shiny’ and ‘plotly’ frameworks to provide a user friendly dashboard for interactive plotting.
wingui Advanced Windows Functions
Helps for interfacing with the operating system particularly for Windows.
winRatioAnalysis Estimates the Win-Ratio as a Function of Time
Fits a model to data separately for each treatment group and then calculates the win-Ratio as a function of follow-up time.
wiod World Input Output Database 1995-2011
Data sets from the World Input Output Database, for the years 1995-2011.
WiSEBoot Wild Scale-Enhanced Bootstrap
Perform the Wild Scale-Enhanced (WiSE) bootstrap. Specifically, the user may supply a single or multiple equally-spaced time series and use the WiSE bootstrap to select a wavelet-smoothed model. Conversely, a pre-selected smooth level may also be specified for the time series. Quantities such as the bootstrap sample of wavelet coefficients, smoothed bootstrap samples, and specific hypothesis testing and confidence region results of the wavelet coefficients may be obtained. Additional functions are available to the user which help format the time series before analysis. This methodology is recommended to aid in model selection and signal extraction. Note: This package specifically uses wavelet bases in the WiSE bootstrap methodology, but the theoretical construct is much more versatile.
wiseR A Shiny Application for End-to-End Bayesian Decision Network Analysis and Web-Deployment
A Shiny application for learning Bayesian Decision Networks from data. This package can be used for probabilistic reasoning (in the observational setting), causal inference (in the presence of interventions) and learning policy decisions (in Decision Network setting). Functionalities include end-to-end implementations for data-preprocessing, structure-learning, exact inference, approximate inference, extending the learned structure to Decision Networks and policy optimization using statistically rigorous methods such as bootstraps, resampling, ensemble-averaging and cross-validation. In addition to Bayesian Decision Networks, it also features correlation networks, community-detection, graph visualizations, graph exports and web-deployment of the learned models as Shiny dashboards.
withr Run Code ‘With’ Temporarily Modified Global State
A set of functions to run code ‘with’ safely and temporarily modified global state. Many of these functions were originally a part of the devtools package, this provides a simple package with limited dependencies to provide access to these functions.
wktmo Converting Weekly Data to Monthly Data
Converts weekly data to monthly data. Users can use three types of week formats: ISO week, epidemiology week (epi week) and calendar date.
WLreg Regression Analysis Based on Win Loss Endpoints
Use various regression models for the analysis of win loss endpoints adjusting for non-binary and multivariate covariates.
Wmisc Wamser Misc: Reading Files by Tokens, Stateful Computations, Utility Functions
A tokenizer to read a text file token by token with a very lightweight API, a framework for stateful computations with finite state machines and a few string utility functions.
wmwpow Power Calculations for Two-Sample Wilcoxon-Mann-Whitney Test
Software for power calculations for two-sample Wilcoxon-Mann-Whitney test for a continuous outcome (Mann and Whitney 1947) <doi:10.1214/aoms/1177730491>, (Shieh, Jan, and Randles 2006) <doi:10.1080/10485250500473099>.
WMWssp Wilcoxon-Mann-Whitney Sample Size Planning
Calculates the minimal sample size for the Wilcoxon-Mann-Whitney test that is needed for a given power and two sided type I error rate. The method works for metric data with and without ties, count data, ordered categorical data, and even dichotomous data. But data is needed for the reference group to generate synthetic data for the treatment group based on a relevant effect. For details, see Brunner, E., Bathke A. C. and Konietschke, F: Rank- and Pseudo-Rank Procedures in Factorial Designs – Using R and SAS, Springer Verlag, to appear.
wNNSel Weighted Nearest Neighbor Imputation of Missing Values using Selected Variables
New tools for the imputation of missing values in high-dimensional data are introduced using the non-parametric nearest neighbor methods. It includes weighted nearest neighbor imputation methods that use specific distances for selected variables. It includes an automatic procedure of cross validation and does not require prespecified values of the tuning parameters. It can be used to impute missing values in high-dimensional data when the sample size is smaller than the number of predictors. For more information see Faisal and Tutz (2017) <doi:10.1515/sagmb-2015-0098>.
woe Computes Weight of Evidence and Information Values
Shows the relationship between an independent and dependent variable through Weight of Evidence and Information Value.
woeBinning Supervised WOE Binning of Numeric Variables and Factors
Implements an automated fine and coarse classing to bin numeric variables and factors with respect to a dichotomous target variable. Numeric variables are binned by merging a specified initial number of bins with similar frequencies. As a start sparse bins are merged with adjacent ones. Afterwards nearby bins with most similar WOE values are joined step by step until the information value (IV) decreases more than specified by a percentage value. Factors are binned by merging factor levels. At the beginning sparse levels are merged to a ‘miscellaneous’ level. Afterwards bins with most similar WOE values are joined step by step until the IV decreases more than specified by a percentage value. The package can be used with single variables or an entire data frame. It provides flexible tools for exploring different binning solutions and for deploying them to (new) data.
woeR Weight of Evidence Based Segmentation of a Variable
Segment a numeric variable based on a dichotomous dependent variable by using the weight of evidence (WOE) approach (Ref: Siddiqi, N. (2006) <doi:10.1002/9781119201731.biblio>). The underlying algorithm adopts a recursive approach to create segments that are diverse in respect of their WOE values and meet the demands of user-defined parameters. The algorithm also aims to maintain a monotonic trend in WOE values of consecutive segments. As such, it can be particularly helpful in improving robustness of linear and logistic regression models.
womblR Spatiotemporal Boundary Detection Model for Areal Unit Data
Implements a spatiotemporal boundary detection model with a dissimilarity metric for areal data with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). The response variable can be modeled as Gaussian (no nugget), probit or Tobit link and spatial correlation is introduced at each time point through a conditional autoregressive (CAR) prior. Temporal correlation is introduced through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette.
word.alignment Computing Word Alignment Using IBM Model 1 (and Symmetrization) for a Given Parallel Corpus and Evaluation
For a given Sentence-Aligned Parallel Corpus, it aligns words for each sentence pair. It considers many-to-one and symmetrization alignments. Moreover, it evaluates word alignment based on this package and some other software or method. It also builds an automatic dictionary of two languages based on given parallel corpus.
wordbankr Accessing the Wordbank Database
Tools for connecting to Wordbank, an open repository for developmental vocabulary data.
wordcloud2 Create Word Cloud by htmlWidget
A fast visualization tool for creating wordcloud by using wordcloud2.js.
WordR Rendering Word Documents with R Inline Code
Serves for rendering MS Word documents with R inline code and inserting tables and plots.
wordspace Distributional Semantic Models in R
An interactive laboratory for research on distributional semantic models (‘DSM’, see <https://…/Distributional_semantics> for more information).
workflowr A Framework for Reproducible and Collaborative Data Science
Combines literate programming (‘knitr’ and ‘rmarkdown’) and version control (‘Git’, via ‘git2r’) to generate a website containing time-stamped, versioned, and documented results.
WPC Weighted Predictiveness Curve
Implementing weighted predictiveness curve to visualize the marker-by-treatment relationship and measure the performance of biomarkers for guiding treatment decision.
wPerm Permutation Tests
Supplies permutation-test alternatives to traditional hypothesis-test procedures such as two-sample tests for means, medians, and standard deviations; correlation tests; tests for homogeneity and independence; and more. Suitable for general audiences, including individual and group users, introductory statistics courses, and more advanced statistics courses that desire an introduction to permutation tests.
WPKDE Weighted Piecewise Kernel Density Estimation
Weighted Piecewise Kernel Density Estimation for large data.
wqs Weighted Quantile Sum Regression
Fits weighted quantile sum regression models, calculates weighted quantile sum index and estimated component weights.
wrangle A Systematic Data Wrangling Idiom
Supports systematic scrutiny, modification, and integration of data. The function status() counts rows that have missing values in grouping columns (returned by na() ), have non-unique combinations of grouping columns (returned by dup() ), and that are not locally sorted (returned by unsorted() ). Functions enumerate() and itemize() give sorted unique combinations of columns, with or without occurrence counts, respectively. Function ignore() drops columns in x that are present in y, and informative() drops columns in x that are entirely NA. Data that have defined unique combinations of grouping values behave more predictably during merge operations.
Wrapped Computes Pdf, Cdf, Quantile, Random Numbers and Provides Estimation for 40 Univariate Wrapped Distributions
Computes the probability density function, cumulative distribution function, quantile function and random numbers for 40 univariate wrapped distributions. They include the wrapped normal, wrapped Gumbel, wrapped logistic, wrapped t, wrapped Cauchy, wrapped skew normal, wrapped skew t, wrapped asymmetric Laplace, wrapped normal Laplace, wrapped skew Laplace, wrapped skew logistic, wrapped exponential power, wrapped skew power exponential, wrapped power exponential t, wrapped skew generalized t, wrapped skew hyperbolic, wrapped generalized hyperbolic Student t, wrapped power hyperbola logistic, wrapped Kiener, wrapped Laplace mixture, wrapped skew Laplace, wrapped polynomial tail Laplace, wrapped generalized asymmetric t, wrapped variance gamma, wrapped normal inverse gamma, wrapped skew Cauchy, wrapped slash, wrapped ex Gaussian, wrapped stable and wrapped log gamma distributions. Also given are maximum likelihood estimates of the parameters, standard errors, 95 percent confidence intervals, log-likelihood values, AIC values, CAIC values, BIC values, HQIC values, values of the W statistic, values of the A statistic, values of the KS tatistic and the associated p-value.
wrapr Wrap R Functions for Debugging and Ease of Use
Provides ‘DebugFnW()’ to capture function context on error for debugging, and ‘let()’ which converts non-standard evaluation interfaces to standard evaluation interfaces.
writexl Export Data Frames to ‘xlsx’ Format
Portable, light-weight data frame to ‘xlsx’ exporter based on ‘libxlsxwriter’. No ‘Java’ or ‘Excel’ required.
wrswoR.benchmark Benchmark and Correctness Data for Weighted Random Sampling Without Replacement
Includes performance measurements and results of repeated experiment runs (for correctness checks) for code in the ‘wrswoR’ package.
wru Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
This open-source software package enables researchers to predict individual ethnicity using his/her surname, geolocation, and other attributes such as gender and age. The method utilizes the Bayes’ rule to compute the posterior probability of each racial category for any given individual voter. The package implements methods described in Imai and Khanna (2015) ‘Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records.’
wsyn Wavelet Approaches to Studies of Synchrony in Ecology and Other Fields
Tools for a wavelet-based approach to analyzing spatial synchrony, principally in ecological data. Some tools will be useful for studying community synchrony.
wTO Computing Weighted Topological Overlaps (wTO) & Consensus wTO Network
Computes the Weighted Topological Overlap (wTO) networks. Once a data.frame containing the count/ expression/ abundance per sample, and a vector containing the interested nodes of interaction.It also computes the cut-off threshold or p-value based on the individuals bootstrap or the values reshuffle per individual. It also allows the construction of a Consensus network, based on multiple wTOs. Also includes a visualization tool for the final network.
WVPlots Common Plots for Analysis
Example ‘ggplot2’ plots we have found useful, under a standardized calling interface.
WWR Weighted Win Loss Statistics and their Variances
Calculate the (weighted) win loss statistics including the win ratio, win difference and win product and their variances, with which the p-values are also calculated. The variance estimation is based on Luo et al. (2015) <doi:10.1111/biom.12225>.

X

x.ent eXtraction of ENTity
Provides a tool for extracting information (entities and relations between them) in text datasets. It also emphasizes the results exploration with graphical displays. It is a rule-based system and works with hand-made dictionaries and local grammars defined by users. X.ent uses parsing with Perl functions and JavaScript to define user preferences through a browser and R to display and support analysis of the results extracted. Local grammars are defined and compiled with the tool Unitex, a tool developed by University Paris Est that supports multiple languages. See ?xconfig for an introduction.
x13binary Provide the ‘x13ashtml’ Seasonal Adjustment Binary
The US Census provides a seasonal adjustment program now called ‘X-13ARIMA-SEATS’ building on both earlier Census programs called X-11 and X-12 as well as the SEATS program by the Bank of Spain. Census offers both source and binary versions – which this package integrates for use by other R packages.
xaringan Presentation Ninja
Create HTML5 slides with R Markdown and the JavaScript library ‘remark.js’ (<https://remarkjs.com> ).
xdcclarge Estimating a (c)DCC-GARCH Model in Large Dimensions
Functions for Estimating a (c)DCC-GARCH Model in large dimensions based on a publication by Engle et,al (2017) <doi:10.1080/07350015.2017.1345683> and Nakagawa et,al (2018) <doi:10.3390/ijfs6020052>. This estimation method is consist of composite likelihood method by Pakel et al. (2014) <http://…/Cavit-Pakel.pdf> and (Non-)linear shrinkage estimation of covariance matrices by Ledoit and Wolf (2004,2015,2016). (<doi:10.1016/S0047-259X(03)00096-4>, <doi:10.1214/12-AOS989>, <doi:10.1016/j.jmva.2015.04.006>).
xergm Extensions of Exponential Random Graph Models
Extensions of Exponential Random Graph Models (ERGM): Temporal Exponential Random Graph Models (TERGM), Generalized Exponential Random Graph Models (GERGM), and Temporal Network Autocorrelation Models (TNAM).
xesreadR Read and Write XES Files
Read and write XES Files to create event log objects used by the ‘bupaR’ framework. XES (Extensible Event Stream) is the IEEE standard for storing and sharing event data (see <http://…/1849-2016.html> for more info).
xfun Miscellaneous Functions by ‘Yihui Xie’
Miscellaneous functions commonly used in other packages maintained by ‘Yihui Xie’.
xgb2sql Convert Trained ‘XGBoost’ Model to SQL Query
This tool enables in-database scoring of ‘XGBoost’ models built in R, by translating trained model objects into SQL query. ‘XGBoost’ <https://…/index.html> provides parallel tree boosting (also known as gradient boosting machine, or GBM) algorithms in a highly efficient, flexible and portable way. GBM algorithm is introduced by Friedman (2001) <doi:10.1214/aos/1013203451>, and more details on ‘XGBoost’ can be found in Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
xgboost Gradient Boosting (GBDT, GBRT or GBM) Library for large-scale and distributed machine learning, on single node, hadoop yarn and more. \{Link|https://github.com/dmlc/xgboost|GitHub}
An optimized general purpose gradient boosting library. The library is parallelized, and also provides an optimized distributed version. It implements machine learning algorithm under gradient boosting framework, including generalized linear model and gradient boosted regression tree (GBDT). XGBoost can also also distributed and scale to Terascale data.
xkcdcolors Color Names from the XKCD Color Survey
The XKCD color survey asked participants to name colours. Randall Munroe published the top thousand(roughly) names and their sRGB hex values. This package lets you use them.
XKCDdata Get XKCD Comic Data
Download data from individual XKCD comics, written by Randall Munroe <https://xkcd.com/>.
xLLiM High Dimensional Locally-Linear Mapping
Provides a tool for non linear mapping (non linear regression) using a mixture of regression model and an inverse regression strategy. The methods include the GLLiM model (see Deleforge et al (2015) <DOI:10.1007/s11222-014-9461-5>) based on Gaussian mixtures and a robust version of GLLiM, named SLLiM (see Perthame et al (2016) <https://…/hal-01347455> ) based on a mixture of Generalized Student distributions.
xlsimple XLConnect’ Wrapper
Provides a simple wrapper for some ‘XLConnect’ functions. ‘XLConnect’ is a package that allows for reading, writing, and manipulating Microsoft Excel files. This package, ‘xlsimple’, adds some documentation and pre-defined formatting to the outputted Excel file. Individual sheets can include a description on the first row to remind user what is in the data set. Auto filters and freeze rows are turned on. A brief readme file is created that provides a summary listing of the created sheets and, where provided, the description.
xltabr Automatically Write Beautifully Formatted Cross Tabulations/Contingency Tables to Excel
Writes beautifully formatted cross tabulations to Excel using ‘openxlsx’. It has been developed to help automate the process of publishing Official Statistics. The user provides a dataframe, which is outputted to Excel with various types of rich formatting which are automatically detected from the structure of the cross tabulation. Documentation can be found at the following url <https://…/xltabr>.
xlutils3 Extract Multiple Excel Files at Once
Extract Excel files from folder. Also display extracted data and compute a summary of it. Based on the ‘readxl’ package.
xmeta A Tool Box for Multivariate Meta-Analysis
A comprehensive collection of functions for multivariate meta-analysis. This package includes functions that implement methods to estimate the pooled effect sizes and the components of the between-study covariance matrix for multivariate meta-analysis with continuous outcomes or binary outcomes. This package also provides functions for detecting publication bias when the within-study correlations are unknown.
XML Tools for parsing and generating XML within R and S-Plus
This package provides many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. It also offers access to an XPath ‘interpreter’.
xml2 Parse XML
Work with XML files using a simple, consistent interface. Built on top of the ‘libxml2’ C library.
http://…/xml2
xmlparsedata Parse Data of ‘R’ Code as an ‘XML’ Tree
Convert the output of ‘utils::getParseData()’ to an ‘XML’ tree, that is searchable and easier to manipulate in general.
xmlrpc2 Implementation of the Remote Procedure Call Protocol (‘XML-RPC’)
The ‘XML-RPC’ is a remote procedure call protocol based on ‘XML’. The ‘xmlrpc2’ package is inspired by the ‘XMLRPC’ package but uses the ‘curl’ and ‘xml2’ packages instead ‘RCurl’ and ‘XML’.
xmrr Generate XMR Control Chart Data from Time-Series Data
XMRs combine X-Bar control charts and Moving Range control charts. These functions also will recalculate the reference lines when significant change has occured.
xopen Open System Files, ‘URLs’, Anything
Cross platform solution to open files, directories or ‘URLs’ with their associated programs.
xplain Providing Interactive Interpretations and Explanations of Statistical Results
Allows to provide live interpretations and explanations of statistical functions in R. These interpretations and explanations are shown when the explained function is called by the user. They can interact with the values of the explained function’s actual results to offer relevant, meaningful insights. The interpretations and explanations are based on an easy-to-use XML format that allows to include R code to interact with the returns of the explained function.
xplorerr Tools for Interactive Data Exploration
Tools for interactive data exploration built using ‘shiny’. Includes apps for descriptive statistics, visualizing probability distributions, inferential statistics, linear regression, logistic regression and RFM analysis.
Xplortext Statistical Analysis of Textual Data
A complete set of functions devoted to statistical analysis of documents.
xpose Diagnostics for Pharmacometric Models
Diagnostics for non-linear mixed-effects (population) models from ‘NONMEM’ <http://…/>. ‘xpose’ facilitates data import, creation of numerical run summary and provide ‘ggplot2’-based graphics for data exploration and model diagnostics.
xptr Manipulating External Pointer
There is limited native support for external pointers in the R interface. This package provides some basic tools to verify, create and modify ‘externalptr’ objects.
XR A Structure for Interfaces from R
Support for interfaces from R to other languages, built around a class for evaluators and a combination of functions, classes and methods for communication. Will be used through a specific language interface package. Described in the book ‘Extending R’.
xray X Ray Vision on your Datasets
Tools to analyze datasets previous to any statistical modeling. Has various functions designed to find inconsistencies and understanding the distribution of the data.
xrf eXtreme RuleFit
An implementation of the RuleFit algorithm as described in Friedman & Popescu (2008) <doi:10.1214/07-AOAS148>. eXtreme Gradient Boosting (‘XGBoost’) is used to build rules, and ‘glmnet’ is used to fit a sparse linear model on the raw and rule features. The result is a model that learns similarly to a tree ensemble, while often offering improved interpretability and achieving improved scoring runtime in live applications. Several algorithms for reducing rule complexity are provided, most notably hyperrectangle de-overlapping. All algorithms scale to several million rows and support sparse representations to handle tens of thousands of dimensions.
XRJulia Structured Interface to Julia
A Julia interface structured according to the general form described in package XR and in the book ‘Extending R’.
xROI Delinate Region of Interests (ROI’s) and Extract Time-Series Data from Digital Repeat Photography Images
Digital repeat photography and near-surface remote sensing have been used by environmental scientists to study the environmental change for nearly a decade. However, a user-friendly, reliable, and robust platform to extract color-based statistics and time-series from a large stack of images is still lacking. Here, we present an interactive open-source toolkit, called ‘xROI’, that facilitate the process time-series extraction and improve the quality of the final data. ‘xROI’ provides a responsive environment for scientists to interactively a) delineate regions of interest (ROI), b) handle field of view (FOV) shifts, and c) extract and export time series data characterizing image color (i.e. red, green and blue channel digital numbers for the defined ROI). Using ‘xROI’, user can detect FOV shifts without minimal difficulty. The software gives user the opportunity to readjust the mask files or redraw new ones every time an FOV shift occurs. ‘xROI’ helps to significantly improve data accuracy and continuity.
XRPython Structured Interface to Python
A Python interface structured according to the general form described in package XR and in the book ‘Extending R’.
xslt Transform XML Documents with XSLT Stylesheets in R
Lightweight XSLT processing package for R based on xmlwrapp
xsp The Chi-Square Periodogram
The circadian period of a time series data is predicted and the statistical significance of the periodicity are calculated using the chi-square periodogram.
xspliner Assisted Model Building, using Surrogate Black-Box Models to Train Interpretable Spline Based Additive Models
Builds generalized linear model with automatic data transformation. The ‘xspliner’ helps to build simple, interpretable models that inherits informations provided by more complicated ones. The resulting model may be treated as explanation of provided black box, that was supplied prior to the algorithm.
xtensor Headers for the ‘xtensor’ Library
The ‘xtensor’ C++ library for numerical analysis with multi-dimensional array expressions is provided as a header-only C++14 library. It offers an extensible expression system enabling lazy broadcasting; an API following the idioms of the C++ standard library; and tools to manipulate array expressions and build upon ‘xtensor’.
xts eXtensible Time Series
Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
xtune Regularized Regression with Differential Penalties Integrating External Information
Extends standard penalized regression (Lasso and Ridge) to allow differential shrinkage based on external information with the goal of achieving a better prediction accuracy. Examples of external information include the grouping of predictors, prior knowledge of biological importance, external p-values, function annotations, etc. The choice of multiple tuning parameters is done using an Empirical Bayes approach. A majorization-minimization algorithm is employed for implementation.
xwf Extrema-Weighted Feature Extraction
Extrema-weighted feature extraction for varying length functional data. Functional data analysis method that performs dimensionality reduction based on predefined features and allows for quantile weighting. Method implemented as presented in Van den Boom et al. (2017) <arXiv:1709.10467>.
xxIRT Item Response Theory
An implementation of Item Response Theory (IRT) in R, comprising of five modules:(1) common and utility functions, (2) estimation/calibration procedures, (3) computerized adaptive testing (CAT) framework, (4) automated test assembly (ATA) framework, and (5) multistage testing (MST) framework. See documentation at https://…/README.md.
xyloplot A Method for Creating Xylophone-Like Frequency Density Plots
A method for creating vertical histograms sharing a y-axis using base graphics.
xyz The ‘xyz’ Algorithm for Fast Interaction Search in High-Dimensional Data
High dimensional interaction search by brute force requires a quadratic computational cost in the number of variables. The xyz algorithm provably finds strong interactions in almost linear time. For details of the algorithm see: G. Thanei, N. Meinshausen and R. Shah (2016). The xyz algorithm for fast interaction search in high-dimensional data <https://…/1610.05108v1.pdf>.

Y

yakmoR A Simple Wrapper for the k-Means Library Yakmo
This is a simple wrapper for the yakmo K-Means library (developed by Naoki Yoshinaga, see http://…/yakmo ). It performs fast and robust (orthogonal) K-Means.
yardstick Tidy Characterizations of Model Performance
Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).
yarr Yet Another ‘ARFF’ Reader
A parser and a writer for ‘WEKA’ Attribute-Relation File Format <https://…/> in pure R, with no dependencies. As opposed to other R implementations, this package can read standard (dense) as well as sparse files, i.e. those where each row does only contain nonzero components. Unlike ‘RWeka’, ‘yarr’ does not require any ‘Java’ installation nor is dependent on external software. This implementation is generalized from those in packages ‘mldr’ and ‘mldr.datasets’.
yasp String Functions for Compact and Expressive Code
A collection of string functions designed for writing more compact and expressive code. ‘yasp’ (Yet Another String Package) is simple, fast, dependency-free, and written in pure R. The package provides: a coherent set of abbreviations for paste() from package ‘base’ with a variety of defaults, such as p() for ‘paste’ and pcc() for ‘paste and collapse with commas’; wrap(), bracket(), and others for wrapping a string in flanking characters; unwrap() for removing pairs of characters (at any position in a string); and sentence() for cleaning whitespace around punctuation and capitalization appropriate for prose sentences.
yearn Use and if Needed Install Packages from CRAN, BioConductor, CRAN Archive, and GitHub
This tries to attach a package if you have it; if not, it tries to install it from BioConductor or CRAN; if not available there, it tries to install it from the cran mirror on GitHub, which includes packages that have been removed from CRAN; if not available there, it looks for a matching other package on GitHub to install. Note this is sloppy practice and prone to all sorts of risks. However, there are use cases, such as quick scripting, or in a class where students already know best practices, where this can be useful. yearn was inspired by teaching in PhyloMeth, a course funded by an NSF CAREER award to the author (NSF DEB-1453424).
yesno Ask a Custom Yes-No Question
Asks a custom Yes-No question with variable responses. The order and phrasing of the possible responses varies randomly to ensure the user consciously chooses (as opposed to automatically types their response).
ykmeans K-means using a target variable
The clustering by k-means of using the target variable. To determine the number of clusters with the variance of the target variable in the cluster.
yll Compute Expected Years of Life Lost (YLL) and Average YLL
Compute the standard expected years of life lost (YLL), as developed by the Global Burden of Disease Study (Murray, C.J., Lopez, A.D. and World Health Organization, 1996). The YLL is based on comparing the age of death to an external standard life expectancy curve. It also computes the average YLL, which highlights premature causes of death and brings attention to preventable deaths (Aragon et al., 2008).
youtubecaption Downloading YouTube Subtitle Transcription in a Tidy ‘Tibble’ Data_Frame
Although there exist some R packages tailored for YouTube API (e.g., ‘tuber’), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using ‘youtube-transcript-api Python package’ under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy ‘tibble’ data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.
YPInterimTesting Interim Monitoring Using Adaptively Weighted Log-Rank Test in Clinical Trials
Provide monitoring boundaries for interim testing using the adaptively weighted log-rank test developed by Yang and Prentice (2010 <doi:10.1111/j.1541-0420.2009.01243.x>). The package use a re-sampling method to obtain stopping boundaries in sequential designs. The output consists of stopping boundaries at the interim looks along with nominal p-values defined as the probability of the test exceeding the specific observed value or critical value, regardless of the test behavior at other looks. The asymptotic distribution of the test statistics of the adaptively weighted log-rank test at the interim looks is examined in Yang (2017, pre-print).
YRmisc Y&R Miscellaneous R Functions
Miscellaneous functions for data analysis, graphics, data manipulation, statistical investigation, including descriptive statistics, creating leading and lagging variables, portfolio return analysis, time series difference and percentage change calculation, stacking data for higher efficient analysis.
yuimaGUI A Graphical User Interface for the ‘yuima’ Package
Provides a graphical user interface for the ‘yuima’ package.
yum Utilities to Extract and Process ‘YAML’ Fragments
Provides a number of functions to facilitate extracting information in ‘YAML’ fragments from one or multiple files, optionally structuring the information in a ‘data.tree’. ‘YAML’ (recursive acronym for ‘YAML ain’t Markup Language’) is a convention for specifying structured data in a format that is both machine- and human-readable. ‘YAML’ therefore lends itself well for embedding (meta)data in plain text files, such as Markdown files. This principle is implemented in ‘yum’ with minimal dependencies (i.e. only the ‘yaml’ packages, and the ‘data.tree’ package can be used to enable additional functionality).

Z

zeallot Multiple and Unpacking Variable Assignment
Provides a %<-% operator to perform multiple or unpacking assignment in R. The operator unpacks a list of values and assigns these values to a single or multiple corresponding names.
zebu Local Association Measures
Implements the estimation of local (and global) association measures: Ducher’s Z, pointwise mutual information and normalized pointwise mutual information. The significance of local (and global) association is accessed using p-values estimated by permutations. Finally, using local association subgroup analysis, it identifies if the association between variables is dependent on the value of another variable.
zeitgebr Analysis of Circadian Behaviours
Use behavioural variables to compute period, rhythmicity and other circadian parameters. Methods include computation of chi square periodograms (Sokolove and Bushell (1978) <DOI:10.1016/0022-5193(78)90022-X>), Lomb-Scargle periodograms (Lomb (1976) <DOI:10.1007/BF00648343>, Scargle (1982) <DOI:10.1086/160554>, Ruf (1999) <DOI:10.1076/brhm.30.2.178.1422>), and autocorrelation-based periodograms.
Zelig Everyone’s Statistical Software
A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the collective library. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flow-procedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
zeligverse Easily Install and Load Stable Zelig Packages
Provides an easy way to load stable Core Zelig and ancillary Zelig packages.
zenplots Zigzag Expanded Navigation Plots
Graphical tools for visualizing high-dimensional data with a path of pairs.
zeroEQpart Zero Order vs (Semi) Partial Correlation Test and CI
Uses bootstrap to test zero order correlation being equal to a partial or semi-partial correlation (one or two tailed). Confidence intervals for the parameter (zero order minus partial) can also be determined. Implements the bias-corrected and accelerated bootstrap method as described in ‘An Introduction to the Bootstrap’ Efron (1983) <0-412-04231-2>.
zip Cross-Platform ‘zip’ Compression
Cross-Platform ‘zip’ Compression Library. A replacement for the ‘zip’ function, that does not require any additional external tools on any platform.
zipfextR Zipf Extended Distributions
Implementation of three extensions of the Zipf distribution: the Marshall-Olkin Extended Zipf (MOEZipf) Pérez-Casany, M., & Casellas, A. (2013) <arXiv:1304.4540>, the Zipf-Poisson Extreme (Zipf-PE) and the Zipf-Poisson Stopped Sum (Zipf-PSS) distributions. In log-log scale, the two first extensions allow for top-concavity and top-convexity while the third one only allows for top-concavity. All the extensions maintain the linearity associated with the Zipf model in the tail.
ziphsmm Zero-Inflated Poisson Hidden (Semi-)Markov Models
Fit zero-inflated Poisson hidden (semi-)Markov models with or without covariates by directly maximizing the log likelihood function. Multiple starting values should be used to avoid local minima.
zipR Pythonic Zip() for R
Implements Python-style zip for R. Is a more flexible version of cbind.
zoib Bayesian Inference for Beta Regression and Zero/One Inflated Beta Regression
Fits beta regression and zero/one inflated beta regression and obtains Bayesian Inference of the model via the Markov Chain Monte Carlo approach implemented in JAGS.
ZOIP ZOIP Distribution, ZOIP Regression, ZOIP Mixed Regression
The ZOIP distribution (Zeros Ones Inflated Proportional) is a proportional data distribution inflated with zeros and/or ones, this distribution is defined on the most known proportional data distributions, the beta and simplex distribution, Jørgensen and Barndorff-Nielsen (1991) <doi:10.1016/0047-259X(91)90008-P>, also allows it to have different parameterizations of the beta distribution, Ferrari and Cribari-Neto (2004) <doi:10.1080/0266476042000214501>, Rigby and Stasinopoulos (2005) <doi:10.18637/jss.v023.i07>. The ZOIP distribution has four parameters, two of which correspond to the proportion of zeros and ones, and the other two correspond to the distribution of the proportional data of your choice. The ‘ZOIP’ package allows adjustments of regression models for fixed and mixed effects for proportional data inflated with zeros and/or ones.
zoltr Interface to the ‘Zoltar’ Forecast Repository API
Zoltar’ <https://…/> is a website that provides a repository of model forecast results in a standardized format and a central location. It supports storing, retrieving, comparing, and analyzing time series forecasts for prediction challenges of interest to the modeling community. This package provides functions for working with the ‘Zoltar’ API, including connecting and authenticating, getting information about projects, models, and forecasts, deleting and uploading forecast data, and downloading scores.
zoo S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)
An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
zoocat zoo’ Objects with Column Attributes
Tools for manipulating multivariate time series data by extending ‘zoo’ class.
zoomgrid Grid Search Algorithm with a Zoom
Provides the grid search algorithm with a zoom. The grid search algorithm with a zoom aims to help solving difficult optimization problem where there are many local optima inside the domain of the target function. It offers suitable initial or starting value for the following optimization procedure, provided that the global optimum exists in the neighbourhood of the initial or starting value. The grid search algorithm with a zoom saves time tremendously in cases with high-dimensional arguments.
zoon Reproducible, Accessible & Shareable Species Distribution Modelling
Reproducible and remixable species distribution modelling. The package reads user submitted modules from an online repository, runs full SDM workflows and returns output that is fully reproducible.
ZRA Dynamic Plots for Time Series Forecasting
Combines a forecast of a time series, using the function forecast(), with the dynamic plots from dygraphs.
Zseq Integer Sequence Generator
Generates well-known integer sequences. ‘Rmpfr’ package is adopted for computing with arbitrarily large numbers with user-specified bit precision. Every function has hyperlink to its corresponding item in OEIS (The On-Line Encyclopedia of Integer Sequences) in the function help page. For interested readers, see Sloane and Plouffe (1995, ISBN:978-0125586306).
zstdr R Bindings to Zstandard Compression Library
Provides R bindings to the ‘Zstandard’ compression library. ‘Zstandard’ is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder. See <http://…/> for more information.
ztable Zebra-Striped Tables in LaTeX and HTML Formats
Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.
zTree Functions to Import Data from ‘z-Tree’ into R
Read ‘.xls’ and ‘.sbj’ files which are written by the Microsoft Windows program ‘z-Tree’. The latter is a software for developing and carrying out economic experiments (see <http://…/> for more information).
ZVCV Zero-Variance Control Variates
Zero-variance control variates (ZV-CV, Mira et al. (2013) <doi:10.1007/s11222-012-9344-6>) is a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort is in solving a linear regression problem. Recently, this method has been extended to higher dimensions using regularisation (South et al., 2018 <arXiv:1811.05073>). This package can be used to easily perform ZV-CV or regularised ZV-CV when a set of samples, derivatives and function evaluations are available. Additional functions for applying ZV-CV to two estimators for the normalising constant of the posterior distribution in Bayesian statistics are also supplied.

1 thought on “R Packages”

  1. Hello to all, it’s genuinely a pleasant for me to
    go to see this web page, it contains helpful Information.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.