R Packages

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|R Packages| = 3331


A3 Accurate, Adaptable, and Accessible Error Metrics for Predictive Models
Supplies tools for tabulating and analyzing the results of predictive models. The methods employed are applicable to virtually any predictive model and make comparisons between different methodologies straightforward.
abc Tools for Approximate Bayesian Computation (ABC)
Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models. Data Only: Tools for Approximate Bayesian Computation (ABC)
Contains data which are used by functions of the ‘abc’ package.
ABCanalysis Computed ABC Analysis
For a given data set, the package provides a novel method of computing precise limits to acquire subsets which are easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphically representing the cumulative distribution function. Based on an ABC analysis the algorithm calculates, with the help of the ABC curve , the optimal limits by exploiting the mathematical properties pertaining to distribution of analyzed items. The data containing positive values is divided into three disjoint subsets A, B and C, with subset A comprising very profitable values, i.e. largest data values (“the important few”) subset B comprising values where the profit equals to the effort required to obtain it, and the subset C comprising of non-profitable values, i.e., the smallest data sets (“the trivial many”).
abcrf Approximate Bayesian Computation via Random Forests
Performs Approximate Bayesian Computation (ABC) model choice via random forests.
abctools Tools for ABC Analyses
Tools for approximate Bayesian computation including summary statistic selection and assessing coverage.
An R Package for Tuning Approximate Bayesian Computation Analyses
abodOutlier Angle-Based Outlier Detection
Performs angle-based outlier detection on a given dataframe. Three methods are available, a full but slow implementation using all the data that has cubic complexity, a fully randomized one which is way more efficient and another using k-nearest neighbours. These algorithms are specially well suited for high dimensional data outlier detection.
ACA Abrupt Change-Point or Aberration Detection in Point Series
Offers an interactive function for the detection of breakpoints in series.
accelmissing Missing Value Imputation for Accelerometer Data
Imputation for the missing count values in accelerometer data. The methodology includes both parametric and semi-parametric multiple imputations under the zero-inflated Poisson lognormal model. This package also provides multiple functions to pre-process the accelerometer data previous to the missing data imputation. These includes detecting wearing and non-wearing time, selecting valid days and subjects, and creating plots.
ACDm Tools for Autoregressive Conditional Duration Models
Package for Autoregressive Conditional Duration (ACD, Engle and Russell, 1998) models. Creates trade, price or volume durations from transactions (tic) data, performs diurnal adjustments, fits various ACD models and tests them.
Acinonyx High-Performance interactive graphics system iPlots eXtreme
Acinonyx (genus of cheetah – for its speed) is the codename for the next generation of a high-performance interactive graphics system iPlots eXtreme. It is a continuation of the iPlots project, allowing visualization and exploratory analysis of large data. Due to its highly flexible design and focus on speed optimization, it can also be used as a general graphics system (e.g. it is the fastest R graphics device if you have a good GPU) and an interactive toolkit. It is a complete re-write of iPlots from scratch, taking the best from iPlots design and focusing on speed and flexibility. The main focus compared to the previous iPlots project is on: • speed and scalability to support large data (it uses OpenGL, optimized native code and object sharing to allow visualization of millions of datapoints). • enhanced support for adding statistical models to plots with full interactivity • seamless integration in GUIs (Windows and Mac OS X)
AcousticNDLCodeR Coding Sound Files for Use with NDL
Make acoustic cues to use with the R packages ‘ndl’ or ‘ndl2’. The package implements functions used in the PLOS ONE paper: Denis Arnold, Fabian Tomaschek, Konstantin Sering, Florence Lopez, and R. Harald Baayen (accepted). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLOS ONE More details can be found in the paper and the supplement. ‘ndl’ is available on CRAN. ‘ndl2’ is available by request from <>.
acp Autoregressive Conditional Poisson
Time series analysis of count data
AcrossTic A Cost-Minimal Regular Spanning Subgraph with TreeClust
Construct minimum-cost regular spanning subgraph as part of a non-parametric two-sample test for equality of distribution.
acrt Autocorrelation Robust Testing
Functions for testing affine hypotheses on the regression coefficient vector in regression models with autocorrelated errors.
AdapEnetClass A Class of Adaptive Elastic Net Methods for Censored Data
Provides new approaches to variable selection for AFT model.
adapr Implementation of an Accountable Data Analysis Process
Tracks reading and writing within R scripts that are organized into a directed acyclic graph. Contains an interactive shiny application adaprApp(). Uses Git and file hashes to track version histories of input and output.
adaptDA Adaptive Mixture Discriminant Analysis
The adaptive mixture discriminant analysis (AMDA) allows to adapt a model-based classifier to the situation where a class represented in the test set may have not been encountered earlier in the learning phase.
AdaptGauss Gaussian Mixture Models (GMM)
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot and Chi-squared test.
ADCT Adaptive Design in Clinical Trials
Existing adaptive design methods in clinical trials. The package includes power, stopping boundaries (sample size) calculation functions for two-group group sequential designs, adaptive design with coprimary endpoints, biomarker-informed adaptive design, etc.
addhaz Binomial and Multinomial Additive Hazards Models
Functions to fit the binomial and multinomial additive hazards models and to calculate the contribution of diseases/conditions to the disability prevalence, as proposed by Nusselder and Looman (2004) <DOI:10.1353/dem.2004.0017>.
addhazard Fit Additive Hazards Models for Survival Analysis
Contains tools to fit additive hazards model to random sampling, two-phase sampling and two-phase sampling with auxiliary information. This package provides regression parameter estimates and their model-based and robust standard errors. It also offers tools to make prediction of individual specific hazards.
ADDT A Package for Analysis of Accelerated Destructive Degradation Test Data
Accelerated destructive degradation tests (ADDT) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the collected data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparison. This package performs the least squares (LS) and maximum likelihood (ML) procedures for estimating TI for polymeric materials. The LS approach is a two-step approach that is currently used in industrial standards, while the ML procedure is widely used in the statistical literature. The ML approach allows one to do statistical inference such as quantifying uncertainties in estimation, hypothesis testing, and predictions. Two publicly available datasets are provided to allow users to experiment and practice with the functions.
adegraphics An S4 Lattice-Based Package for the Representation of Multivariate Data
Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the ‘ade4’ package.
adespatial Multivariate Multiscale Spatial Analysis
Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran’s Eigenvectors Maps, MEM).
ADMMnet Regularized Model with Selecting the Number of Non-Zeros
Fit linear and cox models regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty, and their adaptive forms, such as adaptive lasso and net adjusting for signs of linked coefficients. In addition, it treats the number of non-zero coefficients as another tuning parameter and simultaneously selects with the regularization parameter. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
ADPclust Fast Clustering Using Adaptive Density Peak Detection
An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio[2014]’s idea. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘density-distance plot’.
advclust Object Oriented Advanced Clustering
S4 Object Oriented for Advanced Fuzzy Clustering and Fuzzy COnsensus Clustering. Techniques that provided by this package are Fuzzy C-Means, Gustafson Kessel (Babuska Version), Gath-Geva, Sum Voting Consensus, Product Voting Consensus, and Borda Voting Consensus. This package also provide visualization via Biplot and Radar Plot.
AEDForecasting Change Point Analysis in ARIMA Forecasting
Package to incorporate change point analysis in ARIMA forecasting.
affluenceIndex Affluence Indices
Computes the statistical indices of affluence (richness) and constructs bootstrap confidence intervals for these indices. Also computes the Wolfson polarization index.
AFM Atomic Force Microscope Image Analysis
Provides Atomic Force Microscope images analysis such as Power Spectrum Density, roughness against lengthscale, variogram and variance, fractal dimension and scale.
after Run Code in the Background
Run an R function in the background, possibly after a delay. The current version uses the Tcl event loop and was ported from the ‘tcltk2’ package.
aftgee Accelerated Failure Time Model with Generalized Estimating Equations
This package features both rank-based estimates and least square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan’s weight and the general’s weight such as the logrank weight. For the least square estimation, the estimating equation is solved with Generalized Estimating Equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE’s setting.
AhoCorasickTrie Fast Searching for Multiple Keywords in Multiple Texts
Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.
ahp Analytical Hierarchy Process (AHP) with R
An R package to model complex decision making problems using AHP (Analytic Hierarchy Process). AHP lets you analyse complex decision making problems.
AHR Estimation and Testing of Average Hazard Ratios
Methods for estimation of multivariate average hazard ratios as defined by Kalbfleisch and Prentice. The underlying survival functions of the event of interest in each group can be estimated using either the (weighted) Kaplan-Meier estimator or the Aalen-Johansen estimator for the transition probabilities in Markov multi-state models. Right-censored and left-truncated data is supported. Moreover, the difference in restricted mean survival can be estimated.
Ake Associated Kernel Estimations
Continuous and discrete (count or categorical) estimation of density, probability mass function (pmf) and regression functions are performed using associated kernels. The cross-validation technique and the local Bayesian procedure are also implemented for bandwidth selection.
algorithmia Allows you to Easily Interact with the Algorithmia Platform
The company, Algorithmia, houses the largest marketplace of online algorithms. This package essentially holds a bunch of REST wrappers that make it very easy to call algorithms in the Algorithmia platform and access files and directories in the Algorithmia data API. To learn more about the services they offer and the algorithms in the platform visit <>. More information for developers can be found at <>.
algstat Algebraic statistics in R
algstat provides functionality for algebraic statistics in R. Current applications include exact inference in log-linear models for contingency table data, analysis of ranked and partially ranked data, and general purpose tools for multivariate polynomials, building on the mpoly package. To aid in the process, algstat has ports to Macaulay2, Bertini, LattE-integrale and 4ti2.
AlignStat Comparison of Alternative Multiple Aequence Alignments
Methods for comparing two alternative multiple sequence alignments (MSAs) to determine whether they align homologous residues in the same columns as one another. It then classifies similarities and differences into conserved gaps, conserved sequence, merges, splits or shifts of one MSA relative to the other. Summarising these categories for each MSA column yields information on which sequence regions are agreed upon my both MSAs, and which differ. Several plotting functions enable easily visualisation of the comparison data for analysis.
alineR Alignment of Phonetic Sequence Using the ‘ALINE’ Algorithm
Functions are provided to calculate the ‘ALINE’ Distance between a cognate pair. The score is based on phonetic features represented using the Unicode-compliant International Phonetic Alphabet (IPA). Parameterized features weights used to determine the optimal alignment and functions are provided to estimate optimum values.This project was funded by the National Science Foundation Cultural Anthropology Program (Grant number SBS-1030031) and the University of Maryland College of Behavioral and Social Sciences.
allanvar Allan Variance Analysis
A collection of tools for stochastic sensor error characterization using the Allan Variance technique originally developed by D. Allan.
alluvial Alluvial Diagrams
Creating alluvial diagrams (also known as parallel sets plots) for multivariate and time series-like data.
alphaOutlier Obtain Alpha-Outlier Regions for Well-Known Probability Distributions
Given the parameters of a distribution, the package uses the concept of alpha-outliers by Davies and Gather (1993) to flag outliers in a data set. See Davies, L.; Gather, U. (1993): The identification of multiple outliers, JASA, 88 423, 782-792, doi: 10.1080/01621459.1993.10476339 for details.
altmeta Alternative Meta-Analysis Methods
Provides alternative statistical methods for meta-analysis, including new heterogeneity tests, estimators of between-study variance, and heterogeneity measures that are robust to outliers.
ampd An Algorithm for Automatic Peak Detection in Noisy Periodic and Quasi- Periodic Signals
A method for automatic detection of peaks in noisy periodic and quasi-periodic signals. This method, called automatic multiscale-based peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scale-dependent occurrences of local maxima.
analyz Model Layer for Automatic Data Analysis
Class with methods to read and execute R commands described as steps in a CSV file.
anfis Adaptive Neuro Fuzzy Inference System in R
The package implements ANFIS Type 3 Takagi and Sugeno’s fuzzy if-then rule network with the following features: (1) Independent number of membership functions(MF) for each input, and also different MF extensible types. (2) Type 3 Takagi and Sugeno’s fuzzy if-then rule (3) Full Rule combinations, e.g. 2 inputs 2 membership funtions -> 4 fuzzy rules (4) Hibrid learning, i.e. Descent Gradient for precedents and Least Squares Estimation for consequents (5) Multiple outputs.
aniDom Inferring Dominance Hierarchies and Estimating Uncertainty
Provides: (1) Tools to infer dominance hierarchies based on calculating Elo scores, but with custom functions to improve estimates in animals with relatively stable dominance ranks. (2) Tools to plot the shape of the dominance hierarchy and estimate the uncertainty of a given data set.
ANLP Build Text Prediction Model
Library to sample and clean text data, build N-gram model, Backoff algorithm etc.
anocva A Non-Parametric Statistical Test to Compare Clustering Structures
Provides ANOCVA (ANalysis Of Cluster VAriability), a non-parametric statistical test to compare clustering structures with applications in functional magnetic resonance imaging data (fMRI). The ANOCVA allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering.
ANOM Analysis of Means
Analysis of means (ANOM) as used in technometrical computing. The package takes results from multiple comparisons with the grand mean (obtained with multcomp, SimComp, nparcomp, or MCPAN) or corresponding simultaneous confidence intervals as input and produces ANOM decision charts that illustrate which group means deviate significantly from the grand mean.
anomalous Anomalous time series package for R
It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.
anomalous-acm Anomalous time series package for R (ACM)
It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.
anomalyDetection Implementation of Augmented Network Log Anomaly Detection Procedures
Implements procedures to aid in detecting network log anomalies. By combining various multivariate analytic approaches relevant to network anomaly detection, it provides cyber analysts efficient means to detect suspected anomalies requiring further evaluation.
AnomalyDetection Anomaly Detection with R
AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences.
anonymizer Anonymize Data Containing Personally Identifiable Information
Allows users to quickly and easily anonymize data containing Personally Identifiable Information (PII) through convenience functions.
anytime Anything to ‘POSIXct’ Converter
Convert input in character, integer, or numeric form into ‘POSIXct’ objects, using one of a number of predefined formats, and relying on Boost facilities for date and time parsing.
apa Format Outputs of Statistical Tests According to APA Guidelines
Formatter functions in the ‘apa’ package take the return value of a statistical test function, e.g. a call to chisq.test() and return a string formatted according to the guidelines of the APA (American Psychological Association).
apc Age-Period-Cohort Analysis
Functions for age-period-cohort analysis. The data can be organised in matrices indexed by age-cohort, age-period or cohort-period. The data can include dose and response or just doses. The statistical model is a generalized linear model (GLM) allowing for 3,2,1 or 0 of the age-period-cohort factors. The canonical parametrisation of Kuang, Nielsen and Nielsen (2008) is used. Thus, the analysis does not rely on ad hoc identification.
apc: An R Package for Age-Period-Cohort Analysis
apdesign An Implementation of the Additive Polynomial Design Matrix
An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality.
APfun Geo-Processing Base Functions
Base tools for facilitating the creation geo-processing functions in R.
apricom Tools for the a Priori Comparison of Regression Modelling Strategies
Tools to compare several model adjustment and validation methods prior to application in a final analysis.
APtools Average Positive Predictive Values (AP) for Binary Outcomes and Censored Event Times
We provide tools to estimate two prediction performance metrics, the average positive predictive values (AP) as well as the well-known AUC (the area under the receiver operator characteristic curve) for risk scores or marker. The outcome of interest is either binary or censored event time. Note that for censored event time, our functions estimate the AP and the AUC are time-dependent for pre-specified time interval(s). A function that compares the APs of two risk scores/markers is also included. Optional outputs include positive predictive values and true positive fractions at the specified marker cut-off values, and a plot of the time-dependent AP versus time (available for event time data).
arabicStemR Arabic Stemmer for Text Analysis
Allows users to stem Arabic texts for text analysis.
arc Association Rule Classification
Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification (ARC). The package also contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the ‘arules’ package.
ARCensReg Fitting Univariate Censored Linear Regression Model with Autoregressive Errors
It fits an univariate left or right censored linear regression model with autoregressive errors under the normal distribution. It provides estimates and standard errors of the parameters, prediction of future observations and it supports missing values on the dependent variable. It also provides convergence plots when exists at least one censored observation.
ArfimaMLM Arfima-MLM Estimation For Repeated Cross-Sectional Data
Functions to facilitate the estimation of Arfima-MLM models for repeated cross-sectional data and pooled cross-sectional time-series data (see Lebo and Weber 2015). The estimation procedure uses double filtering with Arfima methods to account for autocorrelation in repeated cross-sectional data followed by multilevel modeling (MLM) to estimate aggregate as well as individual-level parameters simultaneously.
ArgumentCheck Improved Communication to Users with Respect to Problems in Function Arguments
The typical process of checking arguments in functions is iterative. In this process, an error may be returned and the user may fix it only to receive another error on a different argument. ‘ArgumentCheck’ facilitates a more helpful way to perform argument checks allowing the programmer to run all of the checks and then return all of the errors and warnings in a single message.
arqas Application in R for Queueing Analysis and Simulation
Provides functions for compute the main characteristics of the following queueing models: M/M/1, M/M/s, M/M/1/k, M/M/s/k, M/M/1/Inf/H, M/ M/s/Inf/H, M/M/s/Inf/H with Y replacements, M/M/Inf, Open Jackson Networks and Closed Jackson Networks. Moreover, it is also possible to simulate similar queueing models with any type of arrival or service distribution: G/ G/1, G/G/s, G/G/1/k, G/G/s/k, G/G/1/Inf/H, G/G/s/Inf/H, G/G/s/Inf/H with Y replacements, Open Networks and Closed Networks. Finally, contains functions for fit data to a statistic distribution.
arsenal An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries
An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types ‘by’ the levels of a categorical variable; modelsum(), which performs simple model fits on the same endpoint for many variables (univariate or adjusted for standard covariates); and freqlist(), a powerful frequency table across many categorical variables.
ART Aligned Rank Transform for Nonparametric Factorial Analysis
An implementation of the Aligned Rank Transform technique for factorial analysis (see references below for details) including models with missing terms (unsaturated factorial models). The function first computes a separate aligned ranked response variable for each effect of the user-specified model, and then runs a classic ANOVA on each of the aligned ranked responses. For further details, see Higgins, J. J. and Tashtoush, S. (1994). An aligned rank transform test for interaction. Nonlinear World 1 (2), pp. 201-211. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins,J.J. (2011). The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’11). New York: ACM Press, pp. 143-146. <doi:10.1145/1978942.1978963>.
artfima Fit ARTFIMA Model
Fit and simulate ARTFIMA. Theoretical autocovariance function and spectral density function for stationary ARTFIMA.
ARTIVA Time-Varying DBN Inference with the ARTIVA (Auto Regressive TIme VArying) Model
Reversible Jump MCMC (RJ-MCMC)sampling for approximating the posterior distribution of a time varying regulatory network, under the Auto Regressive TIme VArying (ARTIVA) model (for a detailed description of the algorithm, see Lebre et al. BMC Systems Biology, 2010). Starting from time-course gene expression measurements for a gene of interest (referred to as ‘target gene’) and a set of genes (referred to as ‘parent genes’) which may explain the expression of the target gene, the ARTIVA procedure identifies temporal segments for which a set of interactions occur between the ‘parent genes’ and the ‘target gene’. The time points that delimit the different temporal segments are referred to as changepoints (CP).
arules Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.
arulesCBA Classification Based on Association Rules
Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.
aRxiv Interface to the arXiv API
An interface to the API for arXiv, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
as.color Assign Random Colors to Unique Items in a Vector
The as.color function takes an R vector of any class as an input, and outputs a vector of unique hexadecimal color values that correspond to the unique input values. This is most handy when overlaying points and lines for data that correspond to different levels or factors. The function will also print the random seed used to generate the colors. If you like the color palette generated, you can save the seed and reuse those colors.
asht Applied Statistical Hypothesis Tests
Some hypothesis test functions with a focus on non-asymptotic methods that have matching confidence intervals.
AsioHeaders Asio C++ Header Files
Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. ‘Asio’ is also included in Boost but requires linking when used with Boost. Standalone it can be used header-only provided a recent-enough compiler. ‘Asio’ is written and maintained by Christopher M. Kohlhoff. ‘Asio’ is released under the ‘Boost Software License’, Version 1.0.
aSPU Adaptive Sum of Powered Score Test
R codes for the (adaptive) Sum of Powered Score (‘SPU’ and ‘aSPU’) tests, inverse variance weighted Sum of Powered score (‘SPUw’ and ‘aSPUw’) tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests (‘SPUpath’) and adaptive ‘SPUpath’ (‘aSPUpath’) test, Gene-based Association Test that uses an extended Simes procedure (‘GATES’), Hybrid Set-based Test (‘HYST’), extended version of ‘GATES’ test for pathway-based association testing (‘Gates-Simes’). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative.
asremlPlus Augments the Use of ‘Asreml’ in Fitting Mixed Models
Provides functions that assist in automating the testing of terms in mixed models when ‘asreml’ is used to fit the models. The package ‘asreml’ is marketed by ‘VSNi’ ( ) as ‘asreml-R’ and provides a computationally efficient algorithm for fitting mixed models using Residual Maximum Likelihood. The content falls into the following natural groupings: (i) Data, (ii) Object manipulation functions, (iii) Model modification functions, (iv) Model testing functions, (v) Model diagnostics functions, (vi) Prediction production and presentation functions, (vii) Response transformation functions, and (viii) Miscellaneous functions. A history of the fitting of a sequence of models is kept in a data frame. Procedures are available for choosing models that conform to the hierarchy or marginality principle and for displaying predictions for significant terms in tables and graphs.
AssayCorrector Detection and Correction of Spatial Bias in HTS Screens
(1) Detects plate-specific spatial bias by identifying rows and columns of all plates of the assay affected by this bias (following the results of the Mann-Whitney U test) as well as assay-specific spatial bias by identifying well locations (i.e., well positions scanned across all plates of a given assay) affected by this bias (also following the results of the Mann-Whitney U test); (2) Allows one to correct plate-specific spatial bias using either the additive or multiplicative PMP (Partial Mean Polish) method (the most appropriate spatial bias model can be either specified by the user or determined by the program following the results of the Kolmogorov-Smirnov two-sample test) to correct the assay measurements as well as to correct assay-specific spatial bias by carrying out robust Z-scores within each plate of the assay and then traditional Z-scores across well locations. Assertions to Check Properties of Data
A set of predicates and assertions for checking the properties of (country independent) complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. Assertions to Check Properties of Strings
A set of predicates and assertions for checking the properties of US-specific complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.files Assertions to Check Properties of Files
A set of predicates and assertions for checking the properties of files and connections. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.numbers Assertions to Check Properties of Numbers
A set of predicates and assertions for checking the properties of numbers. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. Assertions to Check Properties of Variables
A set of predicates and assertions for checking the properties of variables, such as length, names and attributes. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.reflection Assertions for Checking the State of R
A set of predicates and assertions for checking the state and capabilities of R, the operating system it is running on, and the IDE being used. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.sets Assertions to Check Properties of Sets
A set of predicates and assertions for checking the properties of sets. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.strings Assertions to Check Properties of Strings
A set of predicates and assertions for checking the properties of strings. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.types Assertions to Check Types of Variables
A set of predicates and assertions for checking the types of variables. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertr Assertive programming for R analysis pipelines
The assertr package supplies a suite of functions designed to verify assumptions about data early in an dplyr/magrittr analysis pipeline so that data errors are spotted early and can be addressed quickly.
assist A Suite of R Functions Implementing Spline Smoothing Techniques
A comprehensive package for fitting various non-parametric/semi-parametric linear/nonlinear fixed/mixed smoothing spline models.
assortnet Calculate the Assortativity Coefficient of Weighted and Binary Networks
Functions to calculate the assortment of vertices in social networks. This can be measured on both weighted and binary networks, with discrete or continuous vertex values.
asVPC Average Shifted Visual Predictive Checks
The visual predictive checks are well-known method to validate the nonlinear mixed effect model, especially in pharmacometrics area. The average shifted visual predictive checks are the newly developed method of Visual predictive checks combined with the idea of the average shifted histogram.
asymmetry The Slide-Vector Model for Multidimensional Scaling of Asymmetric Data
The slide-vector model is provided in this package together with functions for the analysis and graphical display of asymmetry. The slide vector model is a scaling model for asymmetric data. A distance model is fitted to the symmetric part of the data whereas the asymmetric part of the data is represented by projections of the coordinates onto the slide-vector. The slide-vector points in the direction of large asymmetries in the data. The distance is modified in such a way that the distance between two points that are parallel to the slide-vector is larger in the direction of this vector. The distance is smaller in the opposite direction. If the line connecting two points is perpendicular to the slide-vector the difference between the two projections is zero. In this case the distance between the two points is symmetric. The algorithm for fitting this model is derived from the majorization approach to multidimensional scaling.
ATE Inference for Average Treatment Effects using Covariate Balancing
Nonparametric estimation and inference for average treatment effects based on covariate balancing.
aTSA Alternative Time Series Analysis
Contains some tools for testing, analyzing time series data and fitting popular time series models such as ARIMA, Moving Average and Holt Winters, etc. Most functions also provide nice and clear outputs like SAS does, such as identify, estimate and forecast, which are the same statements in PROC ARIMA in SAS.
attrCUSUM Tools for Attribute VSI CUSUM Control Chart
An implementation of tools for design of attribute variable sampling interval cumulative sum chart. It currently provides information for monitoring of mean increase such as average number of sample to signal, average time to signal, a matrix of transient probabilities, suitable control limits when the data are (zero inflated) Poisson/binomial distribution. Functions in the tools can be easily applied to other count processes. Also, tools might be extended to more complicated cumulative sum control chart. We leave these issues as our perpetual work.
auRoc Various Methods to Estimate the AUC
Estimate the AUC using a variety of methods as follows: (1) frequentist nonparametric methods based on the Mann-Whitney statistic or kernel methods. (2) frequentist parametric methods using the likelihood ratio test based on higher-order asymptotic results, the signed log-likelihood ratio test, the Wald test, or the approximate ”t” solution to the Behrens-Fisher problem. (3) Bayesian parametric MCMC methods.
automagic Automagically Document and Install Packages Necessary to Run R Code
Parse R code in a given directory for R packages and attempt to install them from CRAN or GitHub. Optionally use a dependencies file for tighter control over which package versions to install.
AutoModel Automated Hierarchical Multiple Regression with Assumptions Checking
A set of functions that automates the process and produces reasonable output for hierarchical multiple regression models. It allows you to specify predictor blocks, from which it generates all of the linear models, and checks the assumptions of the model, producing the requisite plots and statistics to allow you to judge the suitability of the model.
AutoregressionMDE Minimum Distance Estimation in Autoregressive Model
Consider autoregressive model of order p where the distribution function of innovation is unknown, but innovations are independent and symmetrically distributed. The package contains a function named ARMDE which takes X (vector of n observations) and p (order of the model) as input argument and returns minimum distance estimator of the parameters in the model.
autoSEM Performs Specification Search in Structural Equation Models
Implements multiple heuristic search algorithms for automatically creating structural equation models.
aVirtualTwins Adaptation of Virtual Twins Method from Jared Foster
Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method.
AWR AWS’ Java ‘SDK’ for R
Installs the compiled Java modules of the Amazon Web Services (‘AWS’) ‘SDK’ to be used in downstream R packages interacting with ‘AWS’. See <https://…/sdk-for-java> for more information on the ‘AWS’ ‘SDK’ for Java.
AWR.Kinesis Amazon ‘Kinesis’ Consumer Application for Stream Processing
Fetching data from Amazon ‘Kinesis’ Streams using the Java-based ‘MultiLangDaemon’ interacting with Amazon Web Services (‘AWS’) for easy stream processing from R. For more information on ‘Kinesis’, see <https://…/kinesis>.
AWR.KMS A Simple Client to the ‘AWS’ Key Management Service
Encrypt plain text and ‘decrypt’ cipher text using encryption keys hosted at Amazon Web Services (‘AWS’) Key Management Service (‘KMS’), on which see <https://…/kms> for more information.
aws.alexa Client for the Amazon Alexa Web Information Services API
Use the Amazon Alexa Web Information Services API to find information about domains, including the kind of content that they carry, how popular are they—rank and traffic history, sites linking to them, among other things. See <https://…/> for more information.
aws.polly Client for AWS Polly
A client for AWS Polly <http://…/polly>, a speech synthesis service. AWS SES Client Package
A simple client package for the Amazon Web Services (AWS) Simple Email Service (SES) <http://…/> REST API.
aws.signature Amazon Web Services Request Signatures
Generates request signatures for Amazon Web Services (AWS) APIs.
aws.sns AWS SNS Client Package
A simple client package for the Amazon Web Services (AWS) Simple Notification Service (SNS) API.
aws.sqs AWS SQS Client Package
A simple client package for the Amazon Web Services (AWS) Simple Queue Service (SQS) API.
awsjavasdk Boilerplate R Access to the Amazon Web Services (‘AWS’) Java SDK
Provides boilerplate access to all of the classes included in the Amazon Web Services (‘AWS’) Java Software Development Kit (SDK) via package:’rJava’. According to Amazon, the ‘SDK helps take the complexity out of coding by providing Java APIs for many AWS services including Amazon S3, Amazon EC2, DynamoDB, and more’. You can read more about the included Java code on Amazon’s website: <https://…/>.
AzureML Discover, Publish and Consume Web Services on Microsoft Azure Machine Learning
Provides an interface with Microsoft Azure to easily publish functions and trained models as a web service, and discover and consume web service.


BACCT Bayesian Augmented Control for Clinical Trials
Implements the Bayesian Augmented Control (BAC, a.k.a. Bayesian historical data borrowing) method under clinical trial setting by calling ‘Just Another Gibbs Sampler’ (‘JAGS’) software. In addition, the ‘BACCT’ package evaluates user-specified decision rules by computing the type-I error/power, or probability of correct go/no-go decision at interim look. The evaluation can be presented numerically or graphically. Users need to have ‘JAGS’ 4.0.0 or newer installed due to a compatibility issue with ‘rjags’ package. Currently, the package implements the BAC method for binary outcome only. Support for continuous and survival endpoints will be added in future releases. We would like to thank AbbVie’s Statistical Innovation group and Clinical Statistics group for their support in developing the ‘BACCT’ package.
backpipe Backward Pipe Operator
Provides a backward-pipe operator for ‘magrittr’ (%<%) or ‘pipeR’ (%<<%) that allows for a performing operations from right-to-left. This is useful in instances where there is right-to-left ordering commonly observed with nested structures such as trees/directories and markup languages such as HTML and XML.
backports Reimplementations of Functions Introduced Since R-3.0.0
Provides implementations of functions which have been introduced in R since version 3.0.0. The backports are conditionally exported which results in R resolving the function names to the version shipped with R (if available) and uses the implemented backports as fallback. This way package developers can make use of the new functions without without worrying about the minimum required R version.
backShift Learning Causal Cyclic Graphs from Unknown Shift Interventions
Code for ‘backShift’, an algorithm to estimate the connectivity matrix of a directed (possibly cyclic) graph with hidden variables. The underlying system is required to be linear and we assume that observations under different shift interventions are available. For more details, see http://…/1506.02494 .
bacr Bayesian Adjustment for Confounding
Estimating the average causal effect based on the Bayesian Adjustment for Confounding (BAC) algorithm.
badger Badge for R Package
Query information and generate badge for using in README and GitHub Pages.
BalanceCheck Balance Check for Multiple Covariates in Matched Observational Studies
Two practical tests are provided for assessing whether multiple covariates in a treatment group and a matched control group are balanced in observational studies.
BAMBI Bivariate Angular Mixture Models
Fit (using Bayesian methods) and simulate mixtures of univariate and bivariate angular distributions.
bamlss Bayesian Additive Models for Location Scale and Shape (and Beyond)
R infrastructures for Bayesian regression models.
BANFF Bayesian Network Feature Finder
Provides efficient Bayesian nonparametric models for network feature selection
bannerCommenter Make Banner Comments with a Consistent Format
A convenience package for use while drafting code. It facilitates making stand-out comment lines decorated with bands of characters. The input text strings are converted into R comment lines, suitably formatted. These are then displayed in a console window and, if possible, automatically transferred to a clipboard ready for pasting into an R script. Designed to save time when drafting R scripts that will need to be navigated and maintained by other programmers.
Barnard Barnard’s Unconditional Test
Barnard’s unconditional test for 2×2 contingency tables.
bartMachine Bayesian Additive Regression Trees
An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization.
bartMachineJARs bartMachine JARs
These are bartMachine’s Java dependency libraries. Note: this package has no functionality of its own and should not be installed as a standalone package without bartMachine.
Barycenter Wasserstein Barycenter
Computation of a Wasserstein Barycenter. The package implements a method described in Cuturi (2014) ‘Fast Computation of Wasserstein Barycenters’. The paper is available at <http://…/cuturi14.pdf>. To speed up the computation time the main iteration step is based on ‘RcppArmadillo’.
BAS Bayesian Model Averaging using Bayesian Adaptive Sampling
Package for Bayesian Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors (JASA 2008) or mixtures of g-priors in GLMS of Li and Clyde 2015. Other model selection criteria include AIC and BIC. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Allows uniform or beta-binomial prior distributions on models, and may force variables to always be included.
base64url Fast and URL-Safe Base64 Encoder and Decoder
In contrast to RFC3548, the 62nd character (‘+’) is replaced with ‘-‘, the 63rd character (‘/’) is replaced with ‘_’. Furthermore, the encoder does not fill the string with trailing ‘=’. The resulting encoded strings comply to the regular expression pattern ‘[A-Za-z0-9_-]’ and thus are safe to use in URLs or for file names.
basefun Infrastructure for Computing with Basis Functions
Some very simple infrastructure for basis functions.
BASS Bayesian Adaptive Spline Surfaces
Bayesian fitting and sensitivity analysis methods for adaptive spline surfaces. Built to handle continuous and categorical inputs as well as functional or scalar output. An extension of the methodology in Denison, Mallick and Smith (1998) <doi:10.1023/A:1008824606259>.
bastah Big Data Statistical Analysis for High-Dimensional Models
Big data statistical analysis for high-dimensional models is made possible by modifying lasso.proj() in ‘hdi’ package by replacing its nodewise-regression with sparse precision matrix computation using ‘BigQUIC’.
BatchExperiments Statistical Experiments on Batch Computing Clusters
Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
BatchGetSymbols Downloads and Organizes Financial Data for Multiple Tickers
Makes it easy to download a large number of trade data from Yahoo or Google Finance.
BatchJobs Batch Computing with R
Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.
batchtools Tools for Computation on Batch Systems
As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (<http://…/> ), ‘OpenLava’ (<http://…/> ), ‘Univia Grid Engine’/’Oracle Grid Engine’ (<http://…/> ), ‘Slurm’ (<http://…/> ), ‘Torque/PBS’ (<http://…/> ), or ‘Docker Swarm’ (<https://…/> ). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.
BaTFLED3D Bayesian Tensor Factorization Linked to External Data
BaTFLED is a machine learning algorithm designed to make predictions and determine interactions in data that varies along three independent modes. For example BaTFLED was developed to predict the growth of cell lines when treated with drugs at different doses. The first mode corresponds to cell lines and incorporates predictors such as cell line genomics and growth conditions. The second mode corresponds to drugs and incorporates predictors indicating known targets and structural features. The third mode corresponds to dose and there are no dose-specific predictors (although the algorithm is capable of including predictors for the third mode if present). See ‘BaTFLED3D_vignette.rmd’ for a simulated example.
batteryreduction An R Package for Data Reduction by Battery Reduction
Battery reduction is a method used in data reduction. It uses Gram-Schmidt orthogonal rotations to find out a subset of variables best representing the original set of variables.
bayesAB Fast Bayesian Methods for AB Testing
bayesAB provides a suite of functions that allow the user to analyze A/B test data in a Bayesian framework. bayesAB is intended to be a drop-in replacement for common frequentist hypothesis test such as the t-test and chi-sq test. Bayesian methods provide several benefits over frequentist methods in the context of A/B tests – namely in interpretability. Instead of p-values you get direct probabilities on whether A is better than B (and by how much). Instead of point estimates your posterior distributions are parametrized random variables which can be summarized any number of ways. Bayesian tests are also immune to ‘peeking’ and are thus valid whenever a test is stopped.
BayesBinMix Bayesian Estimation of Mixtures of Multivariate Bernoulli Distributions
Fully Bayesian inference for estimating the number of clusters and related parameters to heterogeneous binary data.
bayesboot An Implementation of Rubin’s (1981) Bayesian Bootstrap
Functions for performing the Bayesian bootstrap as introduced by Rubin (1981) <doi:10.1214/aos/1176345338> and for summarizing the result. The implementation can handle both summary statistics that works on a weighted version of the data and summary statistics that works on a resampled data set.
BayesBridge Bridge Regression
Bayesian bridge regression.
BayesCombo Bayesian Evidence Combination
Combine diverse evidence across multiple studies to test a high level scientific theory. The methods can also be used as an alternative to a standard meta-analysis.
bayesDP Tools for the Bayesian Discount Prior Function
Functions for augmenting data with historical controls using the Bayesian discount prior function for 1 arm and 2 arm clinical trials.
BayesFactor Computation of Bayes Factors for Common Designs
A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.
BayesFactorExtras Extra functions for use with the BayesFactor R package
BayesFactorExtras is an R package which contains extra features related to the BayesFactor package, such as plots and analysis reports.
BayesFM Bayesian Inference for Factor Modeling
Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: Bayesian Exploratory Factor Analysis (befa), an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling. More approaches will be included in future releases of this package.
BayesH Bayesian Regression Model with Mixture of Two Scaled Inverse Chi Square as Hyperprior
Functions to performs Bayesian regression model with mixture of two scaled inverse chi square as hyperprior distribution for variance of each regression coefficient.
BayesianNetwork Bayesian Network Modeling and Analysis
A Shiny web application for creating interactive Bayesian Network models, learning the structure and parameters of Bayesian networks, and utilities for classical network analysis.
BayesianTools General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
General-purpose MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. Implemented samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.
bayesImageS Bayesian Methods for Image Segmentation using a Potts Model
Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, and approximate Bayesian computation (ABC).
BayesLCA Bayesian Latent Class Analysis
Bayesian Latent Class Analysis using several different methods.
bayesloglin Bayesian Analysis of Contingency Table Data
The function MC3() searches for log-linear models with the highest posterior probability. The function gibbsSampler() is a blocked Gibbs sampler for sampling from the posterior distribution of the log-linear parameters. The functions findPostMean() and findPostCov() compute the posterior mean and covariance matrix for decomposable models which, for these models, is available in closed form.
BayesMAMS Designing Bayesian Multi-Arm Multi-Stage Studies
Calculating Bayesian sample sizes for multi-arm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.
bayesmeta Bayesian Random-Effects Meta-Analysis
A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, etc.
BayesPiecewiseICAR Hierarchical Bayesian Model for a Hazard Function
Fits a piecewise exponential hazard to survival data using a Hierarchical Bayesian model with an Intrinsic Conditional Autoregressive formulation for the spatial dependency in the hazard rates for each piece. This function uses Metropolis- Hastings-Green MCMC to allow the number of split points to vary. This function outputs graphics that display the histogram of the number of split points and the trace plots of the hierarchical parameters. The function outputs a list that contains the posterior samples for the number of split points, the location of the split points, and the log hazard rates corresponding to these splits. Additionally, this outputs the posterior samples of the two hierarchical parameters, Mu and Sigma^2.
bayesplot Plotting for Bayesian Models
Plotting functions for posterior analysis, model checking, and MCMC diagnostics. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with Stan.
bayesreg Bayesian Regression Models with Continuous Shrinkage Priors
Fits linear or logistic regression model using Bayesian continuous shrinkage prior distributions. Handles ridge, lasso, horseshoe and horseshoe+ regression with logistic, Gaussian, Laplace or Student-t distributed targets.
BayesS5 Bayesian Variable Selection Using Simplified Shotgun Stochastic Search with Screening (S5)
In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities).
BayesSpec Bayesian Spectral Analysis Techniques
An implementation of methods for spectral analysis using the Bayesian framework. It includes functions for modelling spectrum as well as appropriate plotting and output estimates. There is segmentation capability with RJ MCMC (Reversible Jump Markov Chain Monte Carlo). The package takes these methods predominantly from the 2012 paper ‘AdaptSPEC: Adaptive Spectral Estimation for Nonstationary Time Series’ <DOI:10.1080/01621459.2012.716340>.
BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics
Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.
BayesTree Bayesian Additive Regression Trees
Implementation of BART:Bayesian Additive Regression Trees, Chipman, George, McCulloch (2010)
BayesTreePrior Bayesian Tree Prior Simulation
Provides a way to simulate from the prior distribution of Bayesian trees by Chipman et al. (1998) <DOI:10.2307/2669832>. The prior distribution of Bayesian trees is highly dependent on the design matrix X, therefore using the suggested hyperparameters by Chipman et al. (1998) <DOI:10.2307/2669832> is not recommended and could lead to unexpected prior distribution. This work is part of my master thesis (In revision, expected 2016) and a journal publication I’m working on.
bazar Miscellaneous Basic Functions
A collection of miscellaneous functions for copying objects to the clipboard (‘Copy’); manipulating strings (‘concat’, ‘mgsub’, ‘trim’, ‘verlan’); loading or showing packages (‘library_with_rep’, ‘require_with_rep’, ‘sessionPackages’); creating or testing for named lists (‘nlist’, ‘as.nlist’, ‘is.nlist’), formulas (‘is.formula’), empty objects (‘as.empty’, ‘is.empty’), whole numbers (‘as.wholenumber’, ‘is.wholenumber’); testing for equality (‘almost.equal’, ‘’); getting modified versions of usual functions (‘rle2’, ‘sumNA’); making a pause or a stop (‘pause’, ‘stopif’); and others (‘erase’, ‘%nin%’, ‘unwhich’).
BCEA Bayesian Cost Effectiveness Analysis
Produces an economic evaluation of a Bayesian model in the form of MCMC simulations. Given suitable variables of cost and effectiveness / utility for two or more interventions, BCEA computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis.
BCEE The Bayesian Causal Effect Estimation Algorithm
Implementation of the Bayesian Causal Effect Estimation algorithm, a data-driven method for the estimation of the causal effect of a continuous exposure on a continuous outcome. For more details, see Talbot et al. (2015).
bcpa Behavioral change point analysis of animal movement
The Behavioral Change Point Analysis (BCPA) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. The method is based on: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) Ecology Letters 12:5 395-408.
bcROCsurface Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests
The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption.
bcrypt Blowfish’ Password Hashing Algorithm
An R interface to the ‘OpenBSD Blowfish’ password hashing algorithm, as described in ‘A Future-Adaptable Password Scheme’ by ‘Niels Provos’. The implementation is derived from the ‘py-bcrypt’ module for Python which is a wrapper for the ‘OpenBSD’ implementation.
bdots Bootstrapped Differences of Time Series
Analyze differences among time series curves with Oleson et al’s modified p-value technique.
bdpopt Optimisation of Bayesian Decision Problems
Optimisation of the expected utility in single-stage and multi-stage Bayesian decision problems. The expected utility is estimated by simulation. For single-stage problems, JAGS is used to draw MCMC samples.
bdvis Biodiversity Data Visualizations
Biodiversity data visualizations using R would be helpful to understand completeness of biodiversity inventory, extent of geographical, taxonomic and temporal coverage, gaps and biases in data.
BDWreg Bayesian Inference for Discrete Weibull Regression
A Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit.
benchr High Precise Measurement of R Expressions Execution Time
Provides infrastructure to accurately measure and compare the execution time of R expressions.
bentcableAR Bent-Cable Regression for Independent Data or Autoregressive Time Series
Included are two main interfaces for fitting and diagnosing bent-cable regressions for autoregressive time-series data or independent data (time series or otherwise): ‘’ and ‘’. Some components in the package can also be used as stand-alone functions. The bent cable (linear-quadratic-linear) generalizes the broken stick (linear-linear), which is also handled by this package. Version 0.2 corrects a glitch in the computation of confidence intervals for the CTP. References that were updated from Versions 0.2.1 and 0.2.2 appear in Version 0.2.3 and up. Version 0.3.0 improves robustness of the error-message producing mechanism. It is the author’s intention to distribute any future updates via GitHub.
Bergm Bayesian Exponential Random Graph Models
Set of tools to analyse Bayesian exponential random graph models.
betacal Beta Calibration
Fit beta calibration models and obtain calibrated probabilities from them.
betas Standardized Beta Coefficients
Computes standardized beta coefficients and corresponding standard errors for the following models: – linear regression models with numerical covariates only – linear regression models with numerical and factorial covariates – weighted linear regression models – robust linear regression models with numerical covariates only.
beyondWhittle Bayesian Spectral Inference for Stationary Time Series
Implementations of a Bayesian parametric (autoregressive), a Bayesian nonparametric (Whittle likelihood with Bernstein-Dirichlet prior) and a Bayesian semiparametric (autoregressive likelihood with Bernstein-Dirichlet correction) procedure are provided. The work is based on the corrected parametric likelihood by C. Kirch et al (2017) <arXiv:1701.04846>. It was supported by DFG grant KI 1443/3-1.
bfork Basic Unix Process Control
Wrappers for fork()/waitpid() meant to allow R users to quickly and easily fork child processes and wait for them to finish.
bgsmtr Bayesian Group Sparse Multi-Task Regression
Fits a Bayesian group-sparse multi-task regression model using Gibbs sampling. The hierarchical prior encourages shrinkage of the estimated regression coefficients at both the gene and SNP level. The model has been applied successfully to imaging phenotypes of dimension up to 100; it can be used more generally for multivariate (non-imaging) phenotypes.
BH Boost C++ Header Files
Boost provides free peer-reviewed portable C++ source libraries. A large part of Boost is provided as C++ template code which is resolved entirely at compile-time without linking. This package aims to provide the most useful subset of Boost libraries for template use among CRAN package. By placing these libraries in this package, we offer a more efficient distribution system for CRAN as replication of this code in the sources of other packages is avoided.
bib2df Parse a BibTeX File to a Tibble
Parse a BibTeX file to a tidy tibble (trimmed down version of data.frame) to make it accessible for further analysis and visualization.
BiBitR R Wrapper for Java Implementation of BiBit
A simple R wrapper for the Java BiBit algorithm from ‘A biclustering algorithm for extracting bit-patterns from binary datasets’ from Domingo et al. (2011) <DOI:10.1093/bioinformatics/btr464>. An adaption for the BiBit algorithm which allows noise in the biclusters is also included.
bife Binary Choice Models with Fixed Effects
Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with a bias-correction proposed by Hahn and Newey (2004) <doi:10.1111/j.1468-0262.2004.00533.x>.
bigFastlm Fast Linear Models for Objects from the ‘bigmemory’ Package
A reimplementation of the fastLm() functionality of ‘RcppEigen’ for big.matrix objects for fast out-of-memory linear model fitting.
biglasso Big Lasso: Extending Lasso Model Fitting to Big Data in R
Extend lasso and elastic-net model fitting for ultrahigh-dimensional, multi-gigabyte data sets that cannot be loaded into memory. Compared to existing lasso-fitting packages, it preserves equivalently fast computation speed but is much more memory-efficient, thus allowing for very powerful big data analysis even with only a single laptop.
bigReg Generalized Linear Models (GLM) for Large Data Sets
Allows the user to carry out GLM on very large data sets. Data can be created using the data_frame() function and appended to the object with object$append(data); data_frame and data_matrix objects are available that allow the user to store large data on disk. The data is stored as doubles in binary format and any character columns are transformed to factors and then stored as numeric (binary) data while a look-up table is stored in a separate .meta_data file in the same folder. The data is stored in blocks and GLM regression algorithm is modified and carries out a MapReduce- like algorithm to fit the model. The functions bglm(), and summary() and bglm_predict() are available for creating and post-processing of models. The library requires Armadillo installed on your system. It probably won’t function on windows since multi-core processing is done using mclapply() which forks R on Unix/Linux type operating systems.
bigrquery An Interface to Google’s BigQuery API
Easily talk to Google’s BigQuery database from R.
bigRR Generalized Ridge Regression (with special advantage for p >> n cases)
The package fits large-scale (generalized) ridge regression for various distributions of response. The shrinkage parameters (lambdas) can be pre-specified or estimated using an internal update routine (fitting a heteroscedastic effects model, or HEM). It gives possibility to shrink any subset of parameters in the model. It has special computational advantage for the cases when the number of shrinkage parameters exceeds the number of observations. For example, the package is very useful for fitting large-scale omics data, such as high-throughput genotype data (genomics), gene expression data (transcriptomics), metabolomics data, etc.
BigSEM Constructing Large Systems of Structural Equations
Construct large systems of structural equations using the two-stage penalized least squares (2SPLS) method proposed by Chen, Zhang and Zhang (2016).
bigstep Stepwise Selection for Large Data Sets
Selecting linear models for large data sets using modified stepwise procedure and modern selection criteria (like modifications of Bayesian Information Criterion). Selection can be performed on data which exceed RAM capacity. Special selection strategy is available, faster than classical stepwise procedure.
bigtcr Nonparametric Analysis of Bivariate Gap Time with Competing Risks
For studying recurrent disease and death with competing risks, comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events. Alternatively, comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. This package implements a nonparametric estimator for the conditional cumulative incidence function and a nonparametric conditional bivariate cumulative incidence function for the bivariate gap times proposed in Huang et al. (2016) <doi:10.1111/biom.12494>.
bimixt Estimates Mixture Models for Case-Control Data
Estimates non-Gaussian mixture models of case-control data. The four types of models supported are binormal, two component constrained, two component unconstrained, and four component. The most general model is the four component model, under which both cases and controls are distributed according to a mixture of two unimodal distributions. In the four component model, the two component distributions of the control mixture may be distinct from the two components of the case mixture distribution. In the two component unconstrained model, the components of the control and case mixtures are the same; however the mixture probabilities may differ for cases and controls. In the two component constrained model, all controls are distributed according to one of the two components while cases follow a mixture distribution of the two components. In the binormal model, cases and controls are distributed according to distinct unimodal distributions. These models assume that Box-Cox transformed case and control data with a common lambda parameter are distributed according to Gaussian mixture distributions. Model parameters are estimated using the expectation-maximization (EM) algorithm. Likelihood ratio test comparison of nested models can be performed using the lr.test function. AUC and PAUC values can be computed for the model-based and empirical ROC curves using the auc and pauc functions, respectively. The model-based and empirical ROC curves can be graphed using the roc.plot function. Finally, the model-based density estimates can be visualized by plotting a model object created with the bimixt.model function.
Binarize Binarization of One-Dimensional Data
Provides methods for the binarization of one-dimensional data and some visualization functions.
BinaryEMVS Variable Selection for Binary Data Using the EM Algorithm
Implements variable selection for high dimensional datasets with a binary response variable using the EM algorithm. Both probit and logit models are supported. Also included is a useful function to generate high dimensional data with correlated variables.
BinaryEPPM Mean and Variance Modeling of Binary Data
Modeling under- and over-dispersed binary data using extended Poisson process models (EPPM).
binaryLogic Binary Logic
Convert to binary numbers (Base2). Shift, rotate, summary. Based on logical vector.
bindr Parametrized Active Bindings
Provides a simple interface for creating active bindings where the bound function accepts additional arguments.
bindrcpp An ‘Rcpp’ Interface to Active Bindings
Provides an easy way to fill an environment with active bindings that call a C++ function.
binman A Binary Download Manager
Tools and functions for managing the download of binary files. Binary repositories are defined in ‘YAML’ format. Defining new pre-download, download and post-download templates allow additional repositories to be added.
binomen Taxonomic’ Specification and Parsing Methods
Includes functions for working with taxonomic data, including functions for combining, separating, and filtering taxonomic groups by any rank or name. Allows standard (SE) and non-standard evaluation (NSE).
binsmooth Generate PDFs and CDFs from Binned Data
Provides several methods for generating density functions based on binned data. Data are assumed to be nonnegative, but the bin widths need not be uniform, and the top bin may be unbounded. All PDF smoothing methods maintain the areas specified by the binned data. (Equivalently, all CDF smoothing methods interpolate the points specified by the binned data.) An estimate for the mean of the distribution may be supplied as an optional argument, which greatly improves the reliability of statistics computed from the smoothed density functions. Methods include step function, recursive subdivision, and optimized spline.
binst Data Preprocessing, Binning for Classification and Regression
Various supervised and unsupervised binning tools including using entropy, recursive partition methods and clustering.
Biocomb Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis
Contains functions for the data analysis with the emphasis on biological data, including several algorithms for feature ranking, feature selection, classification algorithms with the embedded validation procedures. The functions can deal with numerical as well as with nominal features. Includes also the functions for calculation of feature AUC (Area Under the ROC Curve) and HUM (hypervolume under manifold) values and construction 2D- and 3D- ROC curves. Biocomb provides the calculation of Area Above the RCC (AAC) values and construction of Relative Cost Curves (RCC) to estimate the classifier performance under unequal misclassification costs problem. Biocomb has the special function to deal with missing values, including different imputing schemes.
biogeo Point Data Quality Assessment and Coordinate Conversion
Functions for error detection and correction in point data quality datasets that are used in species distribution modelling. Includes functions for parsing and converting coordinates into decimal degrees from various formats.
bioplots Visualization of Overlapping Results with Heatmap
Visualization of complex biological datasets is essential to understand complementary spects of biology in big data era. In addition, analyzing of multiple datasets enables to understand biologcal processes deeply and accurately. Multiple datasets produce multiple analysis results, and these overlappings are usually visualized in Venn diagram. bioplots is a tiny R package that generates a heatmap to visualize overlappings instead of using Venn diagram.
biorxivr Search and Download Papers from the bioRxiv Preprint Server
The bioRxiv preprint server ( ) is a website where scientists can post preprints of scholarly texts in biology. Users can search and download PDFs in bulk from the preprint server. The text of abstracts are stored as raw text within R, and PDFs can easily be saved and imported for text mining with packages such as ‘tm’.
bipartite Visualising bipartite networks and calculating some (ecological) indices
Bipartite provides functions to visualise webs and calculate a series of indices commonly used to describe pattern in ecological webs. It focuses on webs consisting of only two trophic levels, e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the webs topology.
BiplotGUI Interactive Biplots in R
Provides a GUI with which users can construct and interact with biplots.
birdnik Connector for the Wordnik API
A connector to the API for ‘Wordnik’ <>, a dictionary service that also provides bigram generation, word frequency data, and a whole host of other functionality.
bitops Bitwise Operations
Functions for bitwise operations on integer vectors.
BiTrinA Binarization and Trinarization of One-Dimensional Data
Provides methods for the binarization and trinarization of one-dimensional data and some visualization functions.
BivRegBLS Tolerance Intervals and Errors-in-Variables Regressions in Method Comparison Studies
Assess the agreement in method comparison studies by tolerance intervals and errors-in-variables regressions. The Ordinary Least Square regressions (OLSv and OLSh), the Deming Regression (DR), and the (Correlated)-Bivariate Least Square regressions (BLS and CBLS) can be used with unreplicated or replicated data. The BLS and CBLS are the two main functions to estimate a regression line, while XY.plot and MD.plot are the two main graphical functions to display, respectively an (X,Y) plot or (M,D) plot with the BLS or CBLS results. Assuming no proportional bias, the (M,D) plot (Band-Altman plot) may be simplified by calculating horizontal lines intervals with tolerance intervals (beta-expectation (type I) or beta-gamma content (type II)).
bivrp Bivariate Residual Plots with Simulation Polygons
Generates bivariate residual plots with simulation polygons for any diagnostics and bivariate model from which functions to extract the desired diagnostics, simulate new data and refit the models are available.
bkmr Bayesian Kernel Machine Regression
Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures.
BKPC Bayesian Kernel Projection Classifier
Bayesian kernel projection classifier is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. A Gibbs sampler is implemented to find the posterior distributions of the parameters.
blackbox Black Box Optimization and Exploration of Parameter Space
Performs prediction of a response function from simulated response values, allowing black-box optimization of functions estimated with some error. blackbox includes a simple user interface for such applications, as well as more specialized functions designed to be called by the Migraine software (see URL). The latter functions are used for prediction of likelihood surfaces and implied likelihood ratio confidence intervals, and for exploration of predictor space of the surface. Prediction of the response is based on ordinary kriging (with residual error) of the input. Estimation of smoothing parameters is performed by generalized cross validation.
BlandAltmanLeh Plots (slightly extended) Bland-Altman plots
Bland-Altman Plots using base graphics as well as ggplot2, slightly extended by confidence intervals, with detailed return values and a sunflowerplot option for data with ties.
blatr Send Emails Using ‘Blat’ for Windows
A wrapper around the Blat command line SMTP mailer for Windows. Blat is public domain software, but be sure to read the license before use. It can be found at the Blat website .
blavaan Bayesian Latent Variable Analysis
Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models.
BLCOP Black-Litterman and Copula Opinion Pooling Frameworks
An implementation of the Black-Litterman Model and Atilio Meucci’s copula opinion pooling framework.
blendedLink A New Link Function that Blends Two Specified Link Functions
A new link function that equals one specified link function up to a cutover then a linear rescaling of another specified link function. For use in glm() or glm2(). The intended use is in binary regression, in which case the first link should be set to ‘log’ and the second to ‘logit’. This ensures that fitted probabilities are between 0 and 1 and that exponentiated coefficients can be interpreted as relative risks for probabilities up to the cutoff.
blkbox Data Exploration with Multiple Machine Learning Algorithms
Allows data to be processed by multiple machine learning algorithms at the same time, enables feature selection of data by single a algorithm or combinations of multiple. Easy to use tool for k-fold cross validation and nested cross validation.
blob A Simple S3 Class for Representing Vectors of Binary Data (‘BLOBS’)
R’s raw vector is useful for storing a single binary object. What if you want to put a vector of them in a data frame? The blob package provides the blob object, a list of raw vectors, suitable for use as a column in data frame.
blockseg Two Dimensional Change-Points Detection
Segments a matrix in blocks with constant values.
Blossom Functions for making statistical comparisons with distance-function based permutation tests
Blossom is an R package with functions for making statistical comparisons with distance-function based permutation tests developed by P.W. Mielke, Jr. and colleagues at Colorado State University and for testing parameters estimated in linear models with permutation procedures developed by B. S. Cade and colleagues at the Fort Collins Science Center, U.S. Geological Survey. This implementation in R has allowed for numerous improvements not supported by the Cade and Richards Fortran implementation, including use of categorical predictor variables in most routines.
Blossom Statistical Package for R
blsAPI Request Data From The U.S. Bureau of Labor Statistics API
Allows users to request data for one or multiple series through the U.S. Bureau of Labor Statistics API. Users provide parameters as specified in http://…/api_signature.htm and the function returns a JSON string.
BMA Bayesian Model Averaging
Package for Bayesian model averaging for linear models, generalizable linear models and survival models (cox regression).
BMAmevt Multivariate Extremes: Bayesian Estimation of the Spectral Measure
Toolkit for Bayesian estimation of the dependence structure in Multivariate Extreme Value parametric models.
bmixture Bayesian Estimation for Finite Mixture of Distributions
Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions.
bmlm Bayesian Multilevel Mediation
Easy estimation of Bayesian multilevel mediation models with Stan.
bnclassify Learning Bayesian Network Classifiers from Data
Implementation of different algorithms for learning discrete Bayesian network classifiers from data, including wrapper algorithms and those based on Chow-Liu’s algorithm.
BNDataGenerator Data Generator based on Bayesian Network Model
Data generator based on Bayesian network model
bnnSurvival Bagged k-Nearest Neighbors Survival Prediction
Implements a bootstrap aggregated (bagged) version of the k-nearest neighbors survival probability prediction method (Lowsky et al. 2013). In addition to the bootstrapping of training samples, the features can be subsampled in each baselearner to break the correlation between them. The Rcpp package is used to speed up the computation.
bnormnlr Bayesian Estimation for Normal Heteroscedastic Nonlinear Regression Models
Implementation of Bayesian estimation in normal heteroscedastic nonlinear regression Models following Cepeda-Cuervo, (2001)
bnpa Bayesian Networks & Path Analysis
We proposed a hybrid approach using the computational and statistical resources of the Bayesian Networks to learn a network structure from a data set using 4 different algorithms and the robustness of the statistical methods present in the Structural Equation Modeling to check the goodness of fit from model over data. We built an intermediate algorithm to join the features of ‘bnlearn’ and ‘lavaan’ R packages. The Bayesian Networks structure learning algorithms used were ‘Hill-Climbing’, ‘Max-Min Hill-Climbing’, ‘Restricted Maximization’ and ‘Tabu Search’.
BNPMIXcluster Bayesian Nonparametric Model for Clustering with Mixed Scale Variables
Bayesian nonparametric approach for clustering that is capable to combine different types of variables (continuous, ordinal and nominal) and also accommodates for different sampling probabilities in a complex survey design. The model is based on a location mixture model with a Poisson-Dirichlet process prior on the location parameters of the associated latent variables. The package performs the clustering model described in Carmona, C., Nieto-Barajas, L. E., Canale, A. (2016) <http://…/1612.00083>.
BNPTSclust A Bayesian Nonparametric Algorithm for Time Series Clustering
Performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014).
BNSL Bayesian Network Structure Learning
From a given dataframe, this package learns its Bayesian network structure based on a selected score.
bnspatial Spatial Implementation of Bayesian Networks and Mapping
Package for the spatial implementation of Bayesian Networks and mapping in geographical space. It makes maps of expected value (or most likely state) given known and unknown conditions, maps of uncertainty measured as both coefficient of variation or Shannon index (entropy), maps of probability associated to any states of any node of the network. Some additional features are provided as well, such as parallel processing options, data discretization routines and function wrappers designed for users with minimal knowledge of the R programming language.
bnstruct Bayesian Network Structure Learning from Data with Missing Values
Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Hill-climbing heuristic search, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
BonEV An Improved Multiple Testing Procedure for Controlling False Discovery Rates
An improved multiple testing procedure for controlling false discovery rates which is developed based on the Bonferroni procedure with integrated estimates from the Benjamini-Hochberg procedure and the Storey’s q-value procedure. It controls false discovery rates through controlling the expected number of false discoveries.
bookdown Authoring Books with R Markdown
Output formats and utilities for authoring books with R Markdown.
BoolFilter Optimal Estimation of Partially Observed Boolean Dynamical Systems
Tools for optimal and approximate state estimation as well as network inference of Partially-Observed Boolean Dynamical Systems.
boostmtree Boosted Multivariate Trees for Longitudinal Data
Implements Friedman’s gradient descent boosting algorithm for longitudinal data using multivariate tree base learners. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter.
bootnet Bootstrap Methods for Various Network Estimation Routines
Bootstrap standard errors on various network estimation routines, such as EBICglasso from the qgraph package and IsingFit from the IsingFit package.
bootsPLS Bootstrap Subsamplings of Sparse Partial Least Squares – Discriminant Analysis for Classification and Signature Identification
Bootstrap Subsamplings of sparse Partial Least Squares – Discriminant Analysis (sPLS-DA) for Classification and Signature Identification. The method is applicable to any classification problem with more than 2 classes. It relies on bootstrap subsamplings of sPLS-DA and provides tools to select the most stable variables (defined as the ones consistently selected over the bootstrap subsamplings) and to predict the class of test samples.
bootTimeInference Robust Performance Hypothesis Testing with the Sharpe Ratio
Applied researchers often test for the difference of the Sharpe ratios of two investment strategies. A very popular tool to this end is the test of Jobson and Korkie, which has been corrected by Memmel. Unfortunately, this test is not valid when returns have tails heavier than the normal distribution or are of time series nature. Instead, we propose the use of robust inference methods. In particular, we suggest to construct a studentized time series bootstrap confidence interval for the difference of the Sharpe ratios and to declare the two ratios different if zero is not contained in the obtained interval. This approach has the advantage that one can simply resample from the observed data as opposed to some null-restricted data.
boottol Bootstrap Tolerance Levels for Credit Scoring Validation Statistics
Used to create bootstrap tolerance levels for the Kolmogorov-Smirnov (KS) statistic, the area under receiver operator characteristic curve (AUROC) statistic, and the Gini coefficient for each score cutoff.
BootWPTOS Test Stationarity using Bootstrap Wavelet Packet Tests
Provides significance tests for second-order stationarity for time series using bootstrap wavelet packet tests.
bpa Basic Pattern Analysis
Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats.
bpp Computations Around Bayesian Predictive Power
Implements functions to update Bayesian Predictive Power Computations after not stopping a clinical trial at an interim analysis. Such an interim analysis can either be blinded or unblinded. Code is provided for Normally distributed endpoints with known variance, with a prominent example being the hazard ratio.
braidReports Visualize Combined Action Response Surfaces and Report BRAID Analyses
Provides functions to generate, format, and style surface plots for visualizing combined action data. Also provides functions for reporting on a BRAID analysis, including plotting curve-shifts, calculating IAE values, and producing full BRAID analysis reports.
braidrm Fitting Dose Response with the BRAID Combined Action Model
Contains functions for evaluating, analyzing, and fitting combined action dose response surfaces with the Bivariate Response to Additive Interacting Dose (BRAID) model of combined action.
brant Test for Parallel Regression Assumption
Tests the parallel regression assumption for ordinal logit models generated with the function polr() from the package MASS.
brea Bayesian Recurrent Event Analysis
A function to produce MCMC samples for posterior inference in semiparametric Bayesian discrete time competing risks recurrent events models.
BreakoutDetection Breakout Detection via Robust E-Statistics
BreakoutDetection is an open-source R package that makes breakout detection simple and fast. The BreakoutDetection package can be used in wide variety of contexts. For example, detecting breakout in user engagement post an A/B test, detecting behavioral change, or for problems in econometrics, financial engineering, political and social sciences.
bridgedist An Implementation of the Bridge Distribution with Logit-Link as in Wang and Louis (2003)
An implementation of the bridge distribution with logit-link in R. In Wang and Louis (2003) <doi:10.1093/biomet/90.4.765>, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.
briskaR Biological Risk Assessment
A spatio-temporal exposure-hazard model for assessing biological risk and impact. The model is based on stochastic geometry for describing the landscape and the exposed individuals, a dispersal kernel for the dissemination of contaminants and an ecotoxicological equation.
brm Binary Regression Model
Fits novel models for the conditional relative risk, risk difference and odds ratio.
brms Bayesian Regression Models using Stan
Write and fit Bayesian generalized linear mixed models using Stan for full Bayesian inference.
broom Convert Statistical Analysis Objects into Tidy Data Frames
Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model’s statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.
brotli A Compression Format Optimized for the Web
A lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-purpose compression methods. Brotli is similar in speed to deflate but offers more dense compression.
Brq Bayesian Analysis of Quantile Regression Models
Bayesian estimation and variable selection for quantile regression models.
brr Bayesian Inference on the Ratio of Two Poisson Rates
Implementation of the Bayesian inference for the two independent Poisson samples model, using the semi-conjugate family of prior distributions.
brt Biological Relevance Testing
Analyses of large-scale -omics datasets commonly use p-values as the indicators of statistical significance. However, considering p-value alone neglects the importance of effect size (i.e., the mean difference between groups) in determining the biological relevance of a significant difference. Here, we present a novel algorithm for computing a new statistic, the biological relevance testing (BRT) index, in the frequentist hypothesis testing framework to address this problem.
bsearchtools Binary Search Tools
Exposes the binary search functions of the C++ standard library (std::lower_bound, std::upper_bound) plus other convenience functions, allowing faster lookups on sorted vectors.
BSGS Bayesian Sparse Group Selection
The integration of Bayesian variable and sparse group variable selection approaches for regression models.
BSGW Bayesian Survival Model using Generalized Weibull Regression
Bayesian survival model using Weibull regression on both scale and shape parameters.
bshazard Nonparametric Smoothing of the Hazard Function
The function estimates the hazard function non parametrically from a survival object (possibly adjusted for covariates). The smoothed estimate is based on B-splines from the perspective of generalized linear mixed models. Left truncated and right censoring data are allowed.
btb Beyond the Border
Kernel density estimation dedicated to urban geography.
btergm Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood
Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs.
BTR Training and Analysing Asynchronous Boolean Models
Tools for inferring asynchronous Boolean models from single-cell expression data.
BUCSS Bias and Uncertainty Corrected Sample Size
Implements a method of correcting for publication bias and uncertainty when planning sample sizes in a future study from an original study.
bupaR Business Process Analytics in R
Functionalities for process analysis in R. This packages implements an S3-class for event log objects, and related handler functions. Imports related packages for subsetting event data, computation of descriptive statistics, handling of Petri Net objects and visualization of process maps.
bvarsv Bayesian Analysis of a Vector Autoregressive Model with Stochastic Volatility and Time-Varying Parameters
R/C++ implementation of the model proposed by Primiceri (‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies, 2005), with a focus on generating posterior predictive distributions.
BWStest Baumgartner Weiss Schindler Test of Equal Distributions
Performs the ‘Baumgartner-Weiss-Schindler’ two-sample test of equal probability distributions.
bytescircle Statistics About Bytes Contained in a File as a Circle Plot
Shows statistics about bytes contained in a file as a circle graph of deviations from mean in sigma increments. The function can be useful for statistically analyze the content of files in a glimpse: text files are shown as a green centered crown, compressed and encrypted files should be shown as equally distributed variations with a very low CV (sigma/mean), and other types of files can be classified between these two categories depending on their text vs binary content, which can be useful to quickly determine how information is stored inside them (databases, multimedia files, etc).


c060 Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models
c060 provides additional functions to perform stability selection, model validation and parameter tuning for glmnet models
CADStat Provides a GUI to Several Statistical Methods
Using JGR, provides a GUI to several statistical methods – scatterplot, boxplot, linear regression, generalized linear regression, quantile, regression, conditional probability calculations, and regression trees.
caesar Encrypts and Decrypts Strings
Encrypts and decrypts strings using either the Caesar cipher or a pseudorandom number generation (using set.seed()) method.
calACS Count All Common Subsequences
Count all common subsequences between 2 string sequences, with items separated by the same delimiter. The first string input is a length- one vector, the second string input can be a vector or list containing multiple strings. Algorithm from Wang, H. All common subsequences (2007) IJCAI International Joint Conference on Artificial Intelligence, pp. 635-640.
Calculator.LR.FNs Calculator for LR Fuzzy Numbers
Arithmetic operations scalar multiplication, addition, subtraction, multiplication and division of LR fuzzy numbers (which are on the basis of Zadeh extension principle) have a complicate form for using in fuzzy Statistics, fuzzy Mathematics, machine learning, fuzzy data analysis and etc. Calculator for LR Fuzzy Numbers package, i.e. Calculator.LR.FNs package, relieve and aid applied users to achieve a simple and closed form for some complicated operator based on LR fuzzy numbers and also the user can easily draw the membership function of the obtained result by this package.
CALF Coarse Approximation Linear Function
Contains a greedy algorithm for coarse approximation linear function.
CalibrateSSB Weighting and Estimation for Panel Data with Non-Response
Function to calculate weights and estimates for panel data with non-response.
callr Call R from R
It is sometimes useful to perform a computation in a separate R process, without affecting the current R process at all. This packages does exactly that.
CAM Causal Additive Model (CAM)
The code takes an n x p data matrix and fits a Causal Additive Model (CAM) for estimating the causal structure of the underlying process. The output is a p x p adjacency matrix (a one in entry (i,j) indicates an edge from i to j). Details of the algorithm can be found in: P. Bühlmann, J. Peters, J. Ernest: “CAM: Causal Additive Models, high-dimensional order search and penalized regression”, Annals of Statistics 42:2526-2556, 2014.
canvasXpress Visualization Package for CanvasXpress in R
Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See <> for more information.
caret Classification and Regression Training
Misc functions for training and plotting classification and regression models.
caretEnsemble Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList, caretEnsemble, and caretStack. caretList is a convenience function for fitting multiple caret::train models to the same dataset. caretEnsemble will make a linear combination of these models using greedy forward selection, and caretStack will make linear or non-linear combinations of these models, using a caret::train model as a meta-model.
carpenter Build Common Tables of Summary Statistics for Reports
Mainly used to build tables that are commonly presented for bio-medical/health research, such as basic characteristic tables or descriptive statistics.
cartogram Create Cartograms with R
Construct a continuous area cartogram by a rubber sheet distortion algorithm.
Cartographer Interactive Maps for Data Exploration
Cartographer provides interactive maps in R Markdown documents or at the R console. These maps are suitable for data exploration. This package is an R wrapper around Elijah Meeks’s d3-carto-map and d3.js, using htmlwidgets for R.
cartography Thematic Cartography
Create and integrate maps in your R workflow. This package allows various cartographic representations: proportional symbols, chroropleth, typology, flows, discontinuities… It also proposes some additional useful features: cartographic palettes, layout (scale, north arrow, title…), labels, legends, access to cartographic API…
carx Censored Autoregressive Model with Exogenous Covariates
A censored time series class is designed. An estimation procedure is implemented to estimate the Censored AutoRegressive time series with eXogenous covariates (CARX), assuming normality of the innovations. Some other functions that might be useful are also included.
catdap Categorical Data Analysis Program Package
Categorical data analysis program package.
cate High Dimensional Factor Analysis and Confounder Adjusted Testing and Estimation
Provides several methods for factor analysis in high dimension (both n,p >> 1) and methods to adjust for possible confounders in multiple hypothesis testing.
CatEncoders Encoders for Categorical Variables
Contains some commonly used categorical variable encoders, such as ‘LabelEncoder’ and ‘OneHotEncoder’. Inspired by the encoders implemented in python ‘sklearn.preprocessing’ package (see <http://…/preprocessing.html> ).
CATkit Chronomics Analysis Toolkit (CAT): Analyze Periodicity
Performs analysis of sinusoidal rhythms in time series data: actogram, smoothing, autocorrelation, crosscorrelation, several flavors of cosinor.
CATT The Cochran-Armitage Trend Test
The Cochran-Armitage trend test can be applied to a two by k contingency table. The test statistic (Z) and p-value will be reported. A linear trend in the frequencies will be calculated, because the weights (0,1,2) will be used by default.
CausalFX Methods for Estimating Causal Effects from Observational Data
Estimate causal effects of one variable on another, currently for binary data only. Methods include instrumental variable bounds, adjustment by a given covariate set, adjustment by an induced covariate set using a variation of the PC algorithm, and an effect bounding method (the Witness Protection Program) based on covariate adjustment with observable independence constraints.
CausalImpact An R package for causal inference in time series
This R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred. As with all approaches to causal inference on non-experimental data, valid conclusions require strong assumptions. The CausalImpact package, in particular, assumes that the outcome time series can be explained in terms of a set of control time series that were themselves not affected by the intervention. Furthermore, the relation between treated series and control series is assumed to be stable during the post-intervention period. Understanding and checking these assumptions for any given application is critical for obtaining valid conclusions.
cbird Clustering of Multivariate Binary Data with Dimension Reduction via L1-Regularized Likelihood Maximization
The clustering of binary data with reducing the dimensionality (CLUSBIRD) proposed by Yamamoto and Hayashi (2015) <doi:10.1016/j.patcog.2015.05.026>.
cccp Cone Constrained Convex Problems
Routines for solving convex optimization problems with cone constraints by means of interior-point methods. The implemented algorithms are partially ported from CVXOPT, a Python module for convex optimization (see for more information ).
ccdrAlgorithm CCDr Algorithm for Learning Sparse Gaussian Bayesian Networks
Implementation of the CCDr (Concave penalized Coordinate Descent with reparametrization) structure learning algorithm as described in Aragam and Zhou (2015) <http://…/aragam15a.html>. This is a fast, score-based method for learning Bayesian networks that uses sparse regularization and block-cyclic coordinate descent.
CCMnet Simulate Congruence Class Model for Networks
Tools to simulate networks based on Congruence Class models.
cdcsis Conditional Distance Correlation and Its Related Feature Screening Method
Gives conditional distance correlation and performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data. The conditional distance correlation is a novel conditional dependence measurement of two random variables given a third variable. The conditional distance correlation sure independence screening is used for screening variables in ultrahigh dimensional setting.
CDVineCopulaConditional Sampling from Conditional C- and D-Vine Copulas
Provides tools for sampling from a conditional copula density decomposed via Pair-Copula Constructions as C- or D- vine. Here, the vines which can be used for such sampling are those which sample as first the conditioning variables (when following the sampling algorithms shown in Aas et al. (2009) <DOI:10.1016/j.insmatheco.2007.02.001>). The used sampling algorithm is presented and discussed in Bevacqua et al. (2017) <DOI:10.5194/hess-2016-652>, and it is a modified version of that from Aas et al. (2009) <DOI:10.1016/j.insmatheco.2007.02.001>. A function is available to select the best vine (based on information criteria) among those which allow for such conditional sampling. The package includes a function to compare scatterplot matrices and pair-dependencies of two multivariate datasets.
CEC Cross-Entropy Clustering
Cross-Entropy Clustering (CEC) divides the data into Gaussian type clusters. It performs the automatic reduction of unnecessary clusters, while at the same time allows the simultaneous use of various type Gaussian mixture models.
cellWise Analyzing Data with Cellwise Outliers
Tools for detecting cellwise outliers and robust methods to analyze data which may contain them.
cems Conditional Expectation Manifolds
Conditional expectation manifolds are an approach to compute principal curves and surfaces.
censorcopula Estimate Parameter of Bivariate Copula
Implement an interval censor method to break ties when using data with ties to fitting a bivariate copula.
CensSpatial Censored Spatial Models
It fits linear regression models for censored spatial data. It provides different estimation methods as the SAEM (Stochastic Approximation of Expectation Maximization) algorithm and seminaive that uses Kriging prediction to estimate the response at censored locations and predict new values at unknown locations. It also offers graphical tools for assessing the fitted model.
cents Censored time series
Fit censored time series
CEoptim Cross-Entropy R Package for Optimization
Optimization solver based on the Cross-Entropy method.
CepLDA Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability
Performs cepstral based discriminant analysis of groups of time series when there exists Variability in power spectra from time series within the same group as described in R.T. Krafty (2016) ‘Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability’ Journal of Time Series Analysis.
CFC Cause-Specific Framework for Competing-Risk Analysis
Functions for combining survival curves of competing risks to produce cumulative incidence and event-free probability functions, and for summarizing and plotting the results. Survival curves can be either time-denominated or probability-denominated. Point estimates as well as Bayesian, sample-based representations of survival can utilize this framework.
cghRA Array CGH Data Analysis and Visualization
Provides functions to import data from Agilent CGH arrays and process them according to the cghRA workflow. Implements several algorithms such as WACA, STEPS and cnvScore and an interactive graphical interface.
CGP Composite Gaussian process models
Fit composite Gaussian process (CGP) models as described in Ba and Joseph (2012) ‘Composite Gaussian Process Models for Emulating Expensive Functions’, Annals of Applied Statistics. The CGP model is capable of approximating complex surfaces that are not second-order stationary. Important functions in this package are CGP, print.CGP, summary.CGP, predict.CGP and plotCGP.
cgwtools Miscellaneous Tools
A set of tools the author has found useful for performing quick observations or evaluations of data, including a variety of ways to list objects by size, class, etc. Several other tools mimic Unix shell commands, including ‘head’, ‘tail’ ,’pushd’ ,and ‘popd’. The functions ‘seqle’ and ‘reverse.seqle’ mimic the base ‘rle’ but can search for linear sequences. The function ‘splatnd’ allows the user to generate zero-argument commands without the need for ‘makeActiveBinding’ .
changepoint An R package for changepoint analysis
Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean, cpt.var, cpt.meanvar functions should be your first point of call. Methods for Nonparametric Changepoint Detection
Implements the multiple changepoint algorithm PELT with a nonparametric cost function based on the empirical distribution of the data. The function should be your first point of call. This package is an extension to the \code{changepoint} package which uses parametric changepoint methods. For further information on the methods see the documentation for \code{changepoint}.
ChangepointTesting Change Point Estimation for Clustered Signals
A multiple testing procedure for clustered alternative hypotheses. It is assumed that the p-values under the null hypotheses follow U(0,1) and that the distributions of p-values from the alternative hypotheses are stochastically smaller than U(0,1). By aggregating information, this method is more sensitive to detecting signals of low magnitude than standard methods. Additionally, sporadic small p-values appearing within a null hypotheses sequence are avoided by averaging on the neighboring p-values.
ChannelAttributionApp Shiny Web Application for the Multichannel Attribution Problem
Shiny Web Application for the Multichannel Attribution Problem. It is basically a user-friendly graphical interface for running and comparing all the attribution models in package ‘ChannelAttribution’. For customizations or interest in other statistical methodologies for web data analysis please contact <>.
Chaos01 0-1 Test for Chaos
Computes and plot the results of the 0-1 test for chaos proposed by Gottwald and Melbourne (2004) <DOI:10.1137/080718851>. The algorithm is available in parallel for the independent values of parameter c.
checkpoint Install Packages from Snapshots on the Checkpoint Server for Reproducibility
The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint() function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint’s checkpoint() can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) we refresh the Austria CRAN mirror, on the “Managed R Archived Network” server ( ). Immediately after completion of the rsync mirror process, we take a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17.
CHFF Closest History Flow Field Forecasting for Bivariate Time Series
The software matches the current history to the closest history in a time series to build a forecast.
chi2x3way Chi-Squared and Tau Index Partitions for Three-Way Contingency Tables
Provides two index partitions for three-way contingency tables: partition of the association measure chi-squared and of the predictability index tau under several representative hypotheses about the expected frequencies (hypothesized probabilities).
ChIPtest Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data
Nonparametric Tests to identify the differential enrichment region for two conditions or time-course ChIP-seq data. It includes: data preprocessing function, estimation of a small constant used in hypothesis testing, a kernel-based two sample nonparametric test, two assumption-free two sample nonparametric test.
chopthin The Chopthin Resampler
Resampling is a standard step in particle filtering and in sequential Monte Carlo. This package implements the chopthin resampler, which keeps a bound on the ratio between the largest and the smallest weights after resampling.
ChoR Chordalysis R Package
Learning the structure of graphical models from datasets with thousands of variables. More information about the research papers detailing the theory behind Chordalysis is available at <http://…/Research> (KDD 2016, SDM 2015, ICDM 2014, ICDM 2013). The R package development site is <https://…/Monash-ChoR>.
choroplethr Simplify the Creation of Choropleth Maps in R
Choropleths are thematic maps where geographic regions, such as states, are colored according to some metric, such as the number of people who live in that state. This package simplifies this process by 1. Providing ready-made functions for creating choropleths of common maps. 2. Providing data and API connections to interesting data sources for making choropleths. 3. Providing a framework for creating choropleths from arbitrary shapefiles. Please see the vignettes for more details.
chunked Chunkwise Text-File Processing for ‘dplyr’
Text data can be processed chunkwise using ‘dplyr’ commands. These are recorded and executed per data chunk, so large files can be processed with limited memory using the ‘LaF’ package.
CircOutlier Detecting of Outliers in Circular Regression
Detecting of outliers in circular-circular regression models, modifying its and estimating of models parameters.
cIRT Choice Item Response Theory
Jointly model the accuracy of cognitive responses and item choices within a bayesian hierarchical framework as described by Culpepper and Balamuta (2015) <doi:10.1007/s11336-015-9484-7>. In addition, the package contains the datasets used within the analysis of the paper.
Cite An RStudio Addin to Insert BibTex Citation in Rmarkdown Documents
Contain an RStudio addin to insert BibTex citation in Rmarkdown documents with a minimal user interface.
citr RStudio Add-in to Insert Markdown Citations
Functions and an RStudio add-in to search a BibTeX-file to create and insert formatted Markdown citations into the current document.
clarifai Access to Clarifai API
Get description of images from Clarifai API. For more information, see Clarifai uses a large deep learning cloud to come up with descriptive labels of the things in an image. It also provides how confident it is about each of the labels.
classifierplots Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots
Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!
cleanEHR The Critical Care Clinical Data Processing Tools
A toolset to deal with the Critical Care Health Informatics Collaborative dataset. It is created to address various data reliability and accessibility problems of electronic healthcare records (EHR). It provides a unique platform which enables data manipulation, transformation, reduction, anonymisation, cleaning and validation.
cleanNLP A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford’s CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.
cleanr Helps You to Code Cleaner
Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in ‘The C Programming Language’ did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in ‘Clean Code: A Handbook of Agile Software Craftsmanship’ did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout https://…/lintr instead.
clickR Fix Data and Create Report Tables from Different Objects
Fixes data errors in numerical, factor and date variables and performs report tables from models and summaries.
clikcorr Censoring Data and Likelihood-Based Correlation Estimation
A profile likelihood based method of estimation and inference on the correlation coefficient of bivariate data with different types of censoring and missingness.
climbeR Calculate Average Minimal Depth of a Maximal Subtree for ‘ranger’ Package Forests
Calculates first, and second order, average minimal depth of a maximal subtree for a forest object produced by the R ‘ranger’ package. This variable importance metric is implemented as described in Ishwaran et. al. (‘High-Dimensional Variable Selection for Survival Data’, March 2010, <doi:10.1198/jasa.2009.tm08622>).
clipr Read and Write from the System Clipboard
Simple utility functions to read from and write to the system clipboards of Windows, OS X, and Linux.
clisymbols Unicode Symbols at the R Prompt
A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the ‘figures’ ‘npm’ package (see https://…/figures ).
CLME Constrained Inference for Linear Mixed Effects Models
Constrained inference for linear mixed effects models using residual bootstrap methodology
clogitboost Boosting Conditional Logit Model
A set of functions to fit a boosting conditional logit model.
clogitLasso Lasso Estimation of Conditional Logistic Regression Models
Fit a sequence of conditional logistic regression models with lasso, for small to large sized samples.
clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections
Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models. Several adjustments are incorporated to improve small- sample performance. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple-contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple-contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm(), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and (from ‘metafor’).
ClueR CLUster Evaluation (CLUE)
CLUE is an R package for identifying optimal number of clusters in a given time-course dataset clustered by cmeans or kmeans algorithms.
CluMix Clustering and Visualization of Mixed-Type Data
Provides utilities for clustering subjects and variables of mixed data types. Similarities between subjects are measured by Gower’s general similarity coefficient with an extension of Podani for ordinal variables. Similarities between variables are assessed by combination of appropriate measures of association for different pairs of data types. Alternatively, variables can also be clustered by the ‘ClustOfVar’ approach. The main feature of the package is the generation of a mixed-data heatmap. For visualizing similarities between either subjects or variables, a heatmap of the corresponding distance matrix can be drawn. Associations between variables can be explored by a ‘confounderPlot’, which allows visual detection of possible confounding, collinear, or surrogate factors for some variables of primary interest. Distance matrices and dendrograms for subjects and variables can be derived and used for further visualizations and applications.
clusrank Wilcoxon Rank Sum Test for Clustered Data
Non-parametric tests (Wilcoxon rank sum test and Wilcoxon signed rank test) for clustered data.
clust.bin.pair Statistical Methods for Analyzing Clustered Matched Pair Data
Tests, utilities, and case studies for analyzing significance in clustered binary matched-pair data. The central function clust.bin.pair uses one of several tests to calculate a Chi-square statistic. Implemented are the tests Eliasziw, Obuchowski, Durkalski, and Yang with McNemar included for comparison. The utility functions and convert data between various useful formats. Thyroids and psychiatry are the canonical datasets from Obuchowski and Petryshen respectively.
cluster Cluster Analysis Extended Rousseeuw et al
Cluster analysis methods. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990).
clusterCrit Clustering Indices
Compute clustering validation indices Optimal Distance-Based Clustering for Multidimensional Data with Sequential Constraint
A dynamic programming algorithm for optimal clustering multidimensional data with sequential constraint. The algorithm minimizes the sum of squares of within-cluster distances. The sequential constraint allows only subsequent items of the input data to form a cluster. The sequential constraint is typically required in clustering data streams or items with time stamps such as video frames, GPS signals of a vehicle, movement data of a person, e-pen data, etc. The algorithm represents an extension of Ckmeans.1d.dp to multiple dimensional spaces. Similarly to the one-dimensional case, the algorithm guarantees optimality and repeatability of clustering. Method can find the optimal clustering if the number of clusters is known. Otherwise, methods and can be used.
ClusterR Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans and K-Medoids Clustering
Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions.
ClusterRankTest Rank Tests for Clustered Data
Nonparametric rank based tests (rank-sum tests and signed-rank tests) for clustered data, especially useful for clusters having informative cluster size and intra-cluster group size.
ClusterStability Assessment of Stability of Individual Object or Clusters in Partitioning Solutions
Allows one to assess the stability of individual objects, clusters and whole clustering solutions based on repeated runs of the K-means and K-medoids partitioning algorithms.
clustertend Check the Clustering Tendency
Calculate some statistics aiming to help analyzing the clustering tendency of given data. In the first version, Hopkins’ statistic is implemented.
clustMixType k-Prototypes Clustering for Mixed Variable-Type Data
Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304, <DOI:10.1023/A:1009769707641>.
ClustMMDD Variable Selection in Clustering by Mixture Models for Discrete Data
An implementation of a variable selection procedure in clustering by mixture of multinomial models for discrete data. Genotype data are examples of such data with two unordered observations (alleles) at each locus for diploid individual. The two-fold problem is seen as a model selection problem where competing models are characterized by the number of clusters K, and the subset S of clustering variables. Competing models are compared by penalized maximum likelihood criteria. We considered asymptotic criteria such as Akaike and Bayesian Information criteria, and a family of penalized criteria with penalty function to be data driven calibrated.
clustRcompaR Easy Interface for Clustering a Set of Documents and Exploring Group- Based Patterns
Provides an interface to perform cluster analysis on a corpus of text. Interfaces to Quanteda to assemble text corpuses easily. Deviationalizes text vectors prior to clustering using technique described by Sherin (Sherin, B. [2013]. A computational study of commonsense science: An exploration in the automated analysis of clinical interview data. Journal of the Learning Sciences, 22(4), 600-638. Chicago. http://…/10508406.2013.836654 ). Uses cosine similarity as distance metric for two stage clustering process, involving Ward’s algorithm hierarchical agglomerative clustering, and k-means clustering. Selects optimal number of clusters to maximize ‘variance explained’ by clusters, adjusted by the number of clusters. Provides plotted output of clustering results as well as printed output. Assesses ‘model fit’ of clustering solution to a set of preexisting groups in dataset.
ClustVarLV Clustering of Variables Around Latent Variables
The clustering of variables is a strategy for deciphering the underlying structure of a data set. Adopting an exploratory data analysis point of view, the Clustering of Variables around Latent Variables (CLV) approach has been proposed by Vigneau and Qannari (2003). Based on a family of optimization criteria, the CLV approach is adaptable to many situations. In particular, constraints may be introduced in order to take account of additional information about the observations and/or the variables. In this paper, the CLV method is depicted and the R package ClustVarLV including a set of functions developed so far within this framework is introduced. Considering successively different types of situations, the underlying CLV criteria are detailed and the various functions of the package are illustrated using real case studies.
cmaesr Covariance Matrix Adaption Evolutionary Strategy
Pure R implementation of the Covariance Matrix Adaption – Evolution Strategy (CMA-ES) with optional restarts (IPOP-CMA-ES).
CMplot Circle Manhattan Plot
To visualize the results of Genome-Wide Association Study, Manhattan plot was born. However, it will take much time to draw an elaborate one. Here, this package gives a function named ‘CMplot’ can easily solve the problem. Inputting the results of GWAS and adjusting certain parameters, users will obtain the desired Manhattan plot. Also, a circle Manhattan plot is first put forward, which demonstrates multiple traits in one circle plot. A more visualized figure can spare the length of a paper and lift the paper to a higher level.
cmprskQR Analysis of Competing Risks Using Quantile Regressions
Estimation, testing and regression modeling of subdistribution functions in competing risks using quantile regressions, as described in Peng and Fine (2009) <DOI:10.1198/jasa.2009.tm08228>.
cna A Package for Coincidence Analysis (CNA)
Provides functions for performing Coincidence Analysis (CNA).
CNLTreg Complex-Valued Wavelet Lifting for Signal Denoising
Implementations of recent complex-valued wavelet shrinkage procedures for smoothing irregularly sampled signals.
cobalt Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with ‘MatchIt’, ‘twang’, ‘Matching’, and ‘CBPS’ for assessing balance on the output of their preprocessing functions. Users can also specify their data not generated through the above packages.
cocor Comparing Correlations
Statistical tests for the comparison between two correlations based on either independent or dependent groups. Dependent correlations can either be overlapping or nonoverlapping. A web interface is available on the website A plugin for the R GUI and IDE RKWard is included. Please install RKWard from to use this feature. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard.
cocoreg Extracts Shared Variation in Collections of Datasets Using Regression Models
The cocoreg algorithm extracts shared variation from a collection of datasets using regression models.
cOde Automated C Code Generation for Use with the ‘deSolve’ and ‘bvpSolve” Packages
Generates all necessary C functions allowing the user to work with the compiled-code interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. The package also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis.
codingMatrices Alternative Factor Coding Matrices for Linear Model Formulae
A collection of coding functions as alternatives to the standard functions in the stats package, which have names starting with ‘contr.’. Their main advantage is that they provide a consistent method for defining marginal effects in multi-way factorial models. In a simple one-way ANOVA model the intercept term is always the simple average of the class means.
codyn Community Dynamics Metrics
A toolbox of ecological community dynamics metrics that are explicitly temporal. Functions fall into two categories: temporal diversity indices and community stability metrics. The diversity indices are temporal analogs to traditional diversity indices such as richness and rank-abundance curves. Specifically, functions are provided to calculate species turnover, mean rank shifts, and lags in community similarity between time points. The community stability metrics calculate overall stability and patterns of species covariance and synchrony over time.
cofeatureR Generate Cofeature Matrices
Generate cofeature (feature by sample) matrices. The package utilizies ggplot2::geom_tile to generate the matrix allowing for easy additions from the base matrix.
CoFRA Complete Functional Regulation Analysis
Calculates complete functional regulation analysis and visualize the results in a single heatmap. The provided example data is for biological data but the methodology can be used for large data sets to compare quantitative entities that can be grouped. For example, a store might divide entities into cloth, food, car products etc and want to see how sales changes in the groups after some event. The theoretical background for the calculations are provided in New insights into functional regulation in MS-based drug profiling, Ana Sofia Carvalho, Henrik Molina & Rune Matthiesen, Scientific Reports, <doi:10.1038/srep18826>.
cointmonitoR Consistent Monitoring of Stationarity and Cointegrating Relationships
We propose a consistent monitoring procedure to detect a structural change from a cointegrating relationship to a spurious relationship. The procedure is based on residuals from modified least squares estimation, using either Fully Modified, Dynamic or Integrated Modified OLS. It is inspired by Chu et al. (1996) <DOI:10.2307/2171955> in that it is based on parameter estimation on a pre-break ‘calibration’ period only, rather than being based on sequential estimation over the full sample. See the discussion paper <DOI:10.2139/ssrn.2624657> for further information. This package provides the monitoring procedures for both the cointegration and the stationarity case (while the latter is just a special case of the former one) as well as printing and plotting methods for a clear presentation of the results.
cointReg Parameter Estimation and Inference in a Cointegrating Regression
Cointegration methods are widely used in empirical macroeconomics and empirical finance. It is well known that in a cointegrating regression the ordinary least squares (OLS) estimator of the parameters is super-consistent, i.e. converges at rate equal to the sample size T. When the regressors are endogenous, the limiting distribution of the OLS estimator is contaminated by so-called second order bias terms, see e.g. Phillips and Hansen (1990) <DOI:10.2307/2297545>. The presence of these bias terms renders inference difficult. Consequently, several modifications to OLS that lead to zero mean Gaussian mixture limiting distributions have been proposed, which in turn make standard asymptotic inference feasible. These methods include the fully modified OLS (FM-OLS) approach of Phillips and Hansen (1990) <DOI:10.2307/2297545>, the dynamic OLS (D-OLS) approach of Phillips and Loretan (1991) <DOI:10.2307/2298004>, Saikkonen (1991) <DOI:10.1017/S0266466600004217> and Stock and Watson (1993) <DOI:10.2307/2951763> and the new estimation approach called integrated modified OLS (IM-OLS) of Vogelsang and Wagner (2014) <DOI:10.1016/j.jeconom.2013.10.015>. The latter is based on an augmented partial sum (integration) transformation of the regression model. IM-OLS is similar in spirit to the FM- and D-OLS approaches, with the key difference that it does not require estimation of long run variance matrices and avoids the need to choose tuning parameters (kernels, bandwidths, lags). However, inference does require that a long run variance be scaled out. This package provides functions for the parameter estimation and inference with all three modified OLS approaches. That includes the automatic bandwidth selection approaches of Andrews (1991) <DOI:10.2307/2938229> and of Newey and West (1994) <DOI:10.2307/2297912> as well as the calculation of the long run variance.
colf Constrained Optimization on Linear Function
Performs least squares constrained optimization on a linear objective function. It contains a number of algorithms to choose from and offers a formula syntax similar to lm().
CollapsABEL Generalized CDH (GCDH) Analysis
Implements a generalized version of the CDH test <DOI:10.1371/journal.pone.0028145> for detecting compound heterozygosity on a genome-wide level, due to usage of generalized linear models it allows flexible analysis of binary and continuous traits with covariates.
collapsibleTree Interactive Collapsible Tree Diagrams using ‘D3.js’
Interactive Reingold-Tilford tree diagrams created using ‘D3.js’, where every node can be expanded and collapsed by clicking on it. Tooltips and color gradients can be mapped to nodes using a numeric column in the source data frame. See ‘collapsibleTree’ website for more information and examples.
collpcm Collapsed Latent Position Cluster Model for Social Networks
Markov chain Monte Carlo based inference routines for collapsed latent position cluster models or social networks, which includes searches over the model space (number of clusters in the latent position cluster model). The label switching algorithm used is that of Nobile and Fearnside (2007) <doi:10.1007/s11222-006-9014-7> which relies on the algorithm of Carpaneto and Toth (1980) <doi:10.1145/355873.355883>.
collUtils Auxiliary Package for Package ‘CollapsABEL’
Provides some low level functions for processing PLINK input and output files.
coloredICA Implementation of Colored Independent Component Analysis and Spatial Colored Independent Component Analysis
It implements colored Independent Component Analysis (Lee et al., 2011) and spatial colored Independent Component Analysis (Shen et al., 2014). They are two algorithms to perform ICA when sources are assumed to be temporal or spatial stochastic processes, respectively.
ColorPalette Color Palettes Generator
Different methods to generate a color palette based on a specified base color and a number of colors that should be created.
colorplaner A ggplot2 Extension to Visualize Two Variables per Color Aesthetic Through Color Space Projections
A ggplot2 extension to visualize two variables through one color aesthetic via mapping to a color space projection. With this technique for 2-D color mapping, one can create a dichotomous choropleth in R as well as other visualizations with bivariate color scales. Includes two new scales and a new guide for ggplot2.
colorscience Color Science Methods and Data
Methods and data for color science – color conversions by observer, illuminant and gamma. Color matching functions and chromaticity diagrams. Color indices, color differences and spectral data conversion/analysis.
colorspace Color Space Manipulation
Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided.
colorSpec Color Calculations with Emphasis on Spectral Data
Calculate with spectral properties of light sources, materials, cameras, eyes, and scanners. Build complex systems from simpler parts using a spectral product algebra. For light sources, compute CCT and CRI. For object colors, compute optimal colors and Logvinenko coordinates. Work with the standard CIE illuminants and color matching functions, and read spectra from text files, including CGATS files. Sample text files, and 4 vignettes are included.
colourpicker A Colour Picker Widget for Shiny Apps, RStudio, R-markdown, and ‘htmlwidgets’
A colour picker that can be used as an input in Shiny apps or R-markdown documents. A colour picker RStudio addin is provided to let you select colours for use in your R code. The colour picker is also availble as an ‘htmlwidgets’ widget.
colr Functions to Select and Rename Data
Powerful functions to select and rename columns in dataframes, lists and numeric types by ‘Perl’ regular expression. Regular expression (‘regex’) are a very powerful grammar to match strings, such as column names.
Combine Game-Theoretic Probability Combination
Suite of R functions for combination of probabilities using a game-theoretic method.
combiter Combinatorics Iterators
Provides iterators for combinations, permutations, and subsets, which allow one to go through all elements without creating a huge set of all possible values.
cometExactTest Exact Test from the Combinations of Mutually Exclusive Alterations (CoMEt) Algorithm
An algorithm for identifying combinations of mutually exclusive alterations in cancer genomes. CoMEt represents the mutations in a set M of k genes with a 2^k dimensional contingency table, and then computes the tail probability of observing T(M) exclusive alterations using an exact statistical test.
commonmark Bindings to the ‘CommonMark’ Reference Implementation
The ‘CommonMark’ spec is a rationalized version of Markdown syntax. This package converts markdown text to various formats including a parse tree in XML format.
COMMUNAL Robust Selection of Cluster Number K
Facilitates optimal clustering of a data set. Provides a framework to run a wide range of clustering algorithms to determine the optimal number (k) of clusters in the data. Then analyzes the cluster assignments from each clustering algorithm to identify samples that repeatedly classify to the same group. We call these ‘core clusters’, providing a basis for later class discovery.
CompareCausalNetworks Interface to Diverse Estimation Methods of Causal Networks
Unified interface for the estimation of causal networks, including the methods ‘backShift’ (from package ‘backShift’), ‘bivariateANM’ (bivariate additive noise model), ‘bivariateCAM’ (bivariate causal additive model), ‘CAM’ (causal additive model) (from package ‘CAM’), ‘hiddenICP’ (invariant causal prediction with hidden variables), ‘ICP’ (invariant causal prediction) (from package ‘InvariantCausalPrediction’), ‘GES’ (greedy equivalence search), ‘GIES’ (greedy interventional equivalence search), ‘LINGAM’, ‘PC’ (PC Algorithm), ‘RFCI’ (really fast causal inference) (all from package ‘pcalg’) and regression.
compareDF Do a Git Style Diff of the Rows Between Two Dataframes with Similar Structure
Compares two dataframes which have the same column structure to show the rows that have changed. Also gives a git style diff format to quickly see what has changes in addition to summary statistics.
compete Analyzing Social Hierarchies
Organizing and Analyzing Social Dominance Hierarchy Data.
CompetingRisk The Semi-Parametric Cumulative Incidence Function
Computing the point estimator and pointwise confidence interval of the cumulative incidence function from the cause-specific hazards model.
Compind Composite indicators functions
Compind package contains several functions to enhance approaches to the Composite Indicators (http://…/detail.asp?ID=6278 , ) methods, focusing, in particular, on the normalisation and weighting-aggregation steps.
compLasso Implements the Component Lasso Method Functions
Implements the Component lasso method for linear regression using the sample covariance matrix connected-components structure, described in A Component Lasso, by Hussami and Tibshirani (2013)
complexity Calculate the Proportion of Permutations in Line with an Informative Hypothesis
Allows for the easy computation of complexity: the proportion of the parameter space in line with the hypothesis by chance.
Compositional Compositional Data Analysis
A collection of R functions for compositional data analysis.
compositions Compositional Data Analysis
The package provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by Aitchison and Pawlowsky-Glahn.
CompR Paired Comparison Data Analysis
Different tools for describing and analysing paired comparison data are presented. Main methods are estimation of products scores according Bradley Terry Luce model. A segmentation of the individual could be conducted on the basis of a mixture distribution approach. The number of classes can be tested by the use of Monte Carlo simulations. This package deals also with multi-criteria paired comparison data.
Conake Continuous Associated Kernel Estimation
Continuous smoothing of probability density function on a compact or semi-infinite support is performed using four continuous associated kernels: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The cross-validation technique is also implemented for bandwidth selection.
concatenate Human-Friendly Text from Unknown Strings
Simple functions for joining strings. Construct human-friendly messages whose elements aren’t known in advance, like in stop, warning, or message, from clean code.
conclust Pairwise Constraints Clustering
There are 3 main functions in this package: ckmeans(), lcvqe() and mpckm(). They take an unlabeled dataset and two lists of must-link and cannot-link constraints as input and produce a clustering as output.
concordance Product Concordance
A set of utilities for matching products in different classification codes used in international trade research. It supports concordance between HS (Combined), ISIC Rev. 2,3, and SITC1,2,3,4 product classification codes, as well as BEC, NAICS, and SIC classifications. It also provides code nomenclature / descriptions look-up, Rauch classification look-up (via concordance to SITC2) and trade elasticity look-up (via concordance to SITC2/3 or
condformat Conditional Formatting in Data Frames
Apply and visualize conditional formatting to data frames in R. It presents a data frame as an HTML table with cells CSS formatted according to criteria defined by rules, using a syntax similar to ‘ggplot2’. The table is printed either opening a web browser or within the ‘RStudio’ viewer if available. The conditional formatting rules allow to highlight cells matching a condition or add a gradient background to a given column based on a column values.
condir Computation of P Values and Bayes Factors for Conditioning Data
Set of functions for the easy analyses of conditioning data.
conditions Standardized Conditions for R
Implements specialized conditions, i.e., typed errors, warnings and messages. Offers a set of standardized conditions (value error, deprecated warning, io message, …) in the fashion of Python’s built-in exceptions.
condSURV Estimation of the Conditional Survival Function for Ordered Multivariate Failure Time Data
Method to implement some newly developed methods for the estimation of the conditional survival function.
condvis Conditional Visualization for Statistical Models
Exploring fitted model structures by interactively taking 2-D and 3-D sections in data space.
configr An Implementation of Parsing and Writing Configuration File (JSON/INI/YAML)
Implements the YAML parser, JSON parser and INI parser for R setting and writing of configuration file. The functionality of this package is similar to that of package ‘config’.
confinterpret Descriptive Interpretations of Confidence Intervals
Produces descriptive interpretations of confidence intervals. Includes (extensible) support for various test types, specified as sets of interpretations dependent on where the lower and upper confidence limits sit.
conformal Conformal Prediction for Regression and Classification
Implementation of conformal prediction using caret models for classification and regression
confSAM Estimates and Bounds for the False Discovery Proportion, by Permutation
For multiple testing. Computes estimates and confidence bounds for the False Discovery Proportion (FDP), the fraction of false positives among all rejected hypotheses. The methods in the package use permutations of the data. Doing so, they take into account the dependence structure in the data.
Conigrave Flexible Tools for Multiple Imputation
Provides a set of tools that can be used across ‘data.frame’ and ‘imputationList’ objects.
connect3 A Tool for Reproducible Research by Converting ‘LaTeX’ Files Generated by R Sweave to Rich Text Format Files#
Converts ‘LaTeX’ files (with extension ‘.tex’) generated by R Sweave using package ‘knitr’ to Rich Text Format files (with extension ‘.rtf’). Rich Text Format files can be read and written by most word processors.
conover.test Conover-Iman Test of Multiple Comparisons Using Rank Sums
Computes the Conover-Iman test (1979) for stochastic dominance and reports the results among multiple pairwise comparisons after a Kruskal-Wallis test for stochastic dominance among k groups (Kruskal and Wallis, 1952). The interpretation of stochastic dominance requires an assumption that the CDF of one group does not cross the CDF of the other. conover.test makes k(k-1)/2 multiple pairwise comparisons based on Conover-Iman t-test-statistic of the rank differences. The null hypothesis for each pairwise comparison is that the probability of observing a randomly selected value from the first group that is larger than a randomly selected value from the second group equals one half; this null hypothesis corresponds to that of the Wilcoxon-Mann-Whitney rank-sum test. Like the rank-sum test, if the data can be assumed to be continuous, and the distributions are assumed identical except for a difference in location, Conover-Iman test may be understood as a test for median difference. conover.test accounts for tied ranks. The Conover-Iman test is strictly valid if and only if the corresponding Kruskal-Wallis null hypothesis is rejected.
ConSpline Partial Linear Least-Squares Regression using Constrained Splines
Given response y, continuous predictor x, and covariate matrix, the relationship between E(y) and x is estimated with a shape-constrained regression spline. Function outputs fits and various types of inference.
ConsRank Compute the Median Ranking(s) According to the Kemeny’s Axiomatic Approach
Compute the median ranking according the Kemeny’s axiomatic approach. Rankings can or cannot contain ties, rankings can be both complete or incomplete.
ContaminatedMixt Model-Based Clustering and Classification with the Multivariate Contaminated Normal Distribution
Fits mixtures of multivariate contaminated normal distributions (with eigen-decomposed scale matrices) via the expectation conditional- maximization algorithm under a clustering or classification paradigm.
controlTest Median Comparison for Two-Sample Right-Censored Survival Data
Nonparametric two-sample procedure for comparing the median survival time.
convertGraph Convert Graphical Files Format
Converts graphical file formats (SVG, PNG, JPEG, BMP, GIF, PDF, etc) to one another. The exceptions are the SVG file format that can only be converted to other formats and in contrast, PDF format, which can only be created from others graphical formats. The main purpose of the package was to provide a solution for converting SVG file format to PNG which is often needed for exporting graphical files produced by R widgets.
convertr Convert Between Units
Provides conversion functionality between a broad range of scientific, historical, and industrial unit types.
convey Income Concentration Analysis with Complex Survey Samples
Variance estimation on indicators of income concentration and poverty using linearized or replication-based survey designs. Wrapper around the survey package.
convoSPAT Convolution-Based Nonstationary Spatial Modeling
Fits convolution-based nonstationary Gaussian process models to point-referenced spatial data. The nonstationary covariance function allows the user to specify the underlying correlation structure and which spatial dependence parameters should be allowed to vary over space: the anisotropy, nugget variance, and process variance. The parameters are estimated via maximum likelihood, using a local likelihood approach. Also provided are functions to fit stationary spatial models for comparison, calculate the kriging predictor and standard errors, and create various plots to visualize nonstationarity.
coop Co-Operation: Fast Covariance, Correlation, and Cosine Similarity Operations
Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R’s S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes.
copCAR Fitting the copCAR Regression Model for Discrete Areal Data
Provides tools for fitting the copCAR regression model for discrete areal data. Three types of estimation are supported: continuous extension, composite marginal likelihood, and distributional transform.
coprimary Sample Size Calculation for Two Primary Time-to-Event Endpoints in Clinical Trials
Computes the required number of patients for two time-to-event end-points as primary endpoint in phase III clinical trial.
coRanking Co-Ranking Matrix
Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.
Corbi Collection of Rudimentary Bioinformatics Tools
Provides a bundle of basic and fundamental bioinformatics tools, such as network querying and alignment.
cord Community Estimation in G-Models via CORD
Partition data points (variables) into communities/clusters, similar to clustering algorithms, such as k-means and hierarchical clustering. This package implements a clustering algorithm based on a new metric CORD, defined for high dimensional parametric or semi-parametric distributions. Read http://…/1508.01939 for more details.
CORE Cores of Recurrent Events
given a collection of intervals with integer start and end positions, find recurrently targeted regions and estimate the significance of finding. Randomization is implemented by parallel methods, either using local host machines, or submitting grid engine jobs.
corehunter Fast and Flexible Core Subset Selection
Interface to the Core Hunter software for core subset selection. Cores can be constructed based on genetic marker data, phenotypic traits, a precomputed distance matrix, or any combination of these. Various measures are included such as Modified Rogers’ distance and Shannon’s diversity index (for genotypes) and Gower’s distance (for phenotypes). Core Hunter can also optimize a weighted combination of multiple measures, to bring the different perspectives closer together.
CORElearn Classification, Regression and Feature Evaluation
This is a suite of machine learning algorithms written in C++ with R interface. It contains several machine learning model learning techniques in classification and regression, for example classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. It is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, DKM. These methods can be used for example to discretize numeric attributes. Its additional strength is OrdEval algorithm and its visualization used for evaluation of data sets with ordinal features and class enabling analysis according to the Kano model. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
coreSim Core Functionality for Simulating Quantities of Interest from Generalised Linear Models
Core functions for simulating quantities of interest from generalised linear models (GLM). This package will form the backbone of a series of other packages that improve the interpretation of GLM estimates.
corkscrew Preprocessor for Data Modeling
Includes binning categorical variables into lesser number of categories based on t-test, converting categorical variables into continuous features using the mean of the response variable for the respective categories, understanding the relationship between the response variable and predictor variables using data transformations.
corlink Record Linkage, Incorporating Imputation for Missing Agreement Patterns, and Modeling Correlation Patterns Between Fields
A matrix of agreement patterns and counts for record pairs is the input for the procedure. An EM algorithm is used to impute plausible values for missing record pairs. A second EM algorithm, incorporating possible correlations between per-field agreement, is used to estimate posterior probabilities that each pair is a true match – i.e. constitutes the same individual.
CorporaCoCo Corpora Co-Occurrence Comparison
A set of functions used to compare co-occurrence between two corpora.
corr2D Implementation of 2D Correlation Analysis
Implementation of two-dimensional (2D) correlation analysis based on the Fourier-transformation approach described by Isao Noda (I. Noda (1993) <DOI:10.1366/0003702934067694>). Additionally there are two plot functions for the resulting correlation matrix: The first one creates coloured 2D plots, while the second one generates 3D plots.
correctedAUC Correcting AUC for Measurement Error
Correcting area under ROC (AUC) for measurement error based on probit-shift model.
corregp Functions and Methods for Correspondence Regression
A collection of tools for correspondence regression, i.e. the correspondence analysis of the crosstabulation of a categorical variable Y in function of another one X, where X can in turn be made up of the combination of various categorical variables. Consequently, correspondence regression can be used to analyze the effects for a polytomous or multinomial outcome variable.
corrr Correlations in R
A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualising the matrix in terms of the strength of the correlations.
CorrToolBox Modeling Correlational Magnitude Transformations in Discretization Contexts
Modeling the correlation transitions under specified distributional assumptions within the realm of discretization in the context of the latency and threshold concepts.
corset Arbitrary Bounding of Series and Time Series Objects
Set of methods to constrain numerical series and time series within arbitrary boundaries.
CosW The CosW Distribution
Density, distribution function, quantile function, random generation and survival function for the Cosine Weibull Distribution as defined by SOUZA, L. New Trigonometric Class of Probabilistic Distributions. 219 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2015 (available at <http://…obabilistic-distributions-602633.html> ) and BRITO, C. C. R. Method Distributions generator and Probability Distributions Classes. 241 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2014 (available upon request).
Counterfactual Estimation and Inference Methods for Counterfactual Analysis
Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, Fernandez-Val and Melly (2013) <DOI:10.3982/ECTA10582> ‘Inference on Counterfactual Distributions,’ Econometrica, 81(6). The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be applied to estimate quantile treatment effects and wage decompositions.
Countr Flexible Univariate and Bivariate Count Process Probability
Flexible univariate and bivariate count models based on the Weibull distribution. The models may include covariates and can be specified with familiar formula syntax.
COUSCOus A Residue-Residue Contact Detecting Method
Contact prediction using shrinked covariance (COUSCOus). COUSCOus is a residue-residue contact detecting method approaching the contact inference using the glassofast implementation of Matyas and Sustik (2012, The University of Texas at Austin UTCS Technical Report 2012:1-3. TR-12-29.) that solves the L_1 regularised Gaussian maximum likelihood estimation of the inverse of a covariance matrix. Prior to the inverse covariance matrix estimation we utilise a covariance matrix shrinkage approach, the empirical Bayes covariance estimator, which has been shown by Haff (1980) <DOI:10.1214/aos/1176345010> to be the best estimator in a Bayesian framework, especially dominating estimators of the form aS, such as the smoothed covariance estimator applied in a related contact inference technique PSICOV.
covafillr Local Polynomial Regression of State Dependent Covariates in State-Space Models
Facilitates local polynomial regression for state dependent covariates in state-space models. The functionality can also be used from ‘C++’ based model builder tools such as ‘Rcpp’/’inline’, ‘TMB’, or ‘JAGS’.
covmat Covariance Matrix Estimation
We implement a collection of techniques for estimating covariance matrices. Covariance matrices can be built using missing data. Stambaugh Estimation and FMMC methods can be used to construct such matrices. Covariance matrices can be built by denoising or shrinking the eigenvalues of a sample covariance matrix. Such techniques work by exploiting the tools in Random Matrix Theory to analyse the distribution of eigenvalues. Covariance matrices can also be built assuming that data has many underlying regimes. Each regime is allowed to follow a Dynamic Conditional Correlation model. Robust covariance matrices can be constructed by multivariate cleaning and smoothing of noisy data.
covr Test Coverage for Packages
Track and report code coverage for your package and (optionally) upload the results to a coverage service like Codecov ( ) or Coveralls ( ). Code coverage is a measure of the amount of code being exercised by the tests. It is an indirect measure of test quality. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/Fortran code.
CovSelHigh Model-Free Covariate Selection in High Dimensions
Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) <DOI:10.1093/biomet/asr041> and VanderWeele and Shpitser (2011) <DOI:10.1111/j.1541-0420.2011.01619.x>. Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO.
cowplot Streamlined Plot Theme and Plot Annotations for ‘ggplot2’
Some helpful extensions and modifications to the ‘ggplot2’ library. In particular, this package makes it easy to combine multiple ‘ggplot2’ plots into one and label them with letters, e.g. A, B, C, etc., as is often required for scientific publications. The package also provides a streamlined and clean theme that is used in the Wilke lab, hence the package name, which stands for Claus O. Wilke’s plot library.
Coxnet Regularized Cox Model
Cox model regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty. In addition, it efficiently solves an approximate L0 variable selection based on truncated likelihood function. Moreover, it can also handle the adaptive version of these regularization forms, such as adaptive lasso and net adjusting for signs of linked coefficients. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
coxphMIC Sparse estimation method for Cox Proportional Hazards
coxphMIC, which implements the sparse estimation method for Cox proportional hazards models via approximated information criterion (Su et al., 2016 Biometrics). The developed methodology is named MIC which stands for ‘Minimizing approximated Information Criteria’. A reparameterization step is introduced to enforce sparsity while at the same time keeping the objective function smooth. As a result, MIC is computationally fast with a superior performance in sparse estimation.
CoxPlus Cox Regression (Proportional Hazards Model) with Multiple Causes and Mixed Effects
A high performance package estimating Proportional Hazards Model when an even can have more than one causes, including support for random and fixed effects, tied events, and time-varying variables.
CP Conditional Power Calculations
Functions for calculating the conditional power for different models in survival time analysis within randomized clinical trials with two different treatments to be compared and survival as an endpoint.
cpm Sequential and Batch Change Detection Using Parametric and Nonparametric Methods
Sequential and batch change detection for univariate data streams, using the change point model framework. Functions are provided to allow the parametric monitoring of sequences of Gaussian, Bernoulli and Exponential random variables, along with functions implementing more general nonparametric methods for monitoring sequences which have an unspecified or unknown distribution.
cpr Control Polygon Reduction
Implementation of the Control Polygon Reduction and Control Net Reduction methods for finding parsimonious B-spline regression models.
CPsurv Nonparametric Change Point Estimation for Survival Data
Nonparametric change point estimation for survival data based on p-values of exact binomial tests.
cpt Classification Permutation Test
Non-parametric test for equality of multivariate distributions. Trains a classifier to classify (multivariate) observations as coming from one of two distributions. If the classifier is able to classify the observations better than would be expected by chance (using permutation inference), then the null hypothesis that the two distributions are equal is rejected.
cpumemlog Monitor CPU and RAM usage of a process (and its children) is a Bash shell script that monitors CPU and RAM usage of a given process and its children. The main aim for writing this script was to get insight about the behaviour of a process and to spot bottlenecks without GUI tools, e.g., it is very useful to spot that the computationally intensive process on a remote server died due to hitting RAM limit or something of that sort. The statistics about CPU, RAM, and all that are gathered from the system utility ps. While the utility top can be used for this interactively, it is tedious to stare at its dynamic output and quite hard to spot consumption at the peak and follow the trends etc. Yet another similar utility is time, which though only gives consumption of resources at the peak. cpumemlogplot.R is a companion R script to used to summarize and plot the gathered data.
cqrReg Quantile, Composite Quantile Regression and Regularized Versions
Estimate quantile regression(QR) and composite quantile regression (cqr) and with adaptive lasso penalty using interior point (IP), majorize and minimize(MM), coordinate descent (CD), and alternating direction method of multipliers algorithms(ADMM).
cquad Conditional Maximum Likelihood for Quadratic Exponential Models for Binary Panel Data
Estimation, based on conditional maximum likelihood, of the quadratic exponential model proposed by Bartolucci, F. & Nigro, V. (2010, Econometrica) and of a simplified and a modified version of this model. The quadratic exponential model is suitable for the analysis of binary longitudinal data when state dependence (further to the effect of the covariates and a time-fixed individual intercept) has to be taken into account. Therefore, this is an alternative to the dynamic logit model having the advantage of easily allowing conditional inference in order to eliminate the individual intercepts and then getting consistent estimates of the parameters of main interest (for the covariates and the lagged response). The simplified version of this model does not distinguish, as the original model does, between the last time occasion and the previous occasions. The modified version formulates in a different way the interaction terms and it may be used to test in a easy way state dependence as shown in Bartolucci, F., Nigro, V. & Pigini, C. (2013, Econometric Reviews). The package also includes estimation of the dynamic logit model by a pseudo conditional estimator based on the quadratic exponential model, as proposed by Bartolucci, F. & Nigro, V. (2012, Journal of Econometrics).
crandatapkgs Find Data-Only Packages on CRAN
Provides a data.frame listing of known data-only and data-heavy packages available on CRAN.
cranlogs Download Logs from the RStudio CRAN Mirror
API to the database of CRAN package downloads from the RStudio CRAN mirror. The database itself is at , see https://…/ for the raw API.
crisp Fits a Model that Partitions the Covariate Space into Blocks in a Data- Adaptive Way
Implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://…/15-344.pdf>.
cronR Schedule R Scripts and Processes with the ‘cron’ Job Scheduler
Create, edit, and remove ‘cron’ jobs on your unix-alike system. The package provides a set of easy-to-use wrappers to ‘crontab’. It also provides an RStudio add-in to easily launch and schedule your scripts.
crop Graphics Cropping Tool
A device closing function which is able to crop graphics (e.g., PDF, PNG files) on Unix-like operating systems with the required underlying command-line tools installed.
CrossClustering A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters and Identification of Outliers
Computes a partial clustering algorithm that combines the Ward’s minimum variance and Complete Linkage algorithms, providing automatic estimation of a suitable number of clusters and identification of outlier elements.
crossdes Construction of Crossover Designs
Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.
Crossover Analysis and Search of Crossover Designs
Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.
crosstalk Inter-Widget Interactivity for HTML Widgets
Provides building blocks for allowing HTML widgets to communicate with each other, with Shiny or without (i.e. static .html files). Currently supports linked brushing and filtering.
crrp Penalized Variable Selection in Competing Risks Regression
In competing risks regression, the proportional subdistribution hazards(PSH) model is popular for its direct assessment of covariate effects on the cumulative incidence function. This package allows for penalized variable selection for the PSH model. Penalties include LASSO, SCAD, MCP, and their group versions.
crskdiag Diagnostics for Fine and Gray Model
Provides the implementation of analytical and graphical approaches for checking the assumptions of the Fine and Gray model.
crsnls Nonlinear Regression Parameters Estimation by ‘CRS4HC’ and ‘CRS4HCe’
Functions for nonlinear regression parameters estimation by algorithms based on Controlled Random Search algorithm. Both functions (crs4hc(), crs4hce()) adapt current search strategy by four heuristics competition. In addition, crs4hce() improves adaptability by adaptive stopping condition.
crtests Classification and Regression Tests
Provides wrapper functions for running classification and regression tests using different machine learning techniques, such as Random Forests and decision trees. The package provides standardized methods for preparing data to suit the algorithm’s needs, training a model, making predictions, and evaluating results. Also, some functions are provided to run multiple instances of a test.
CRTgeeDR Doubly Robust Inverse Probability Weighted Augmented GEE Estimator
Implements a semi-parametric GEE estimator accounting for missing data with Inverse-probability weighting (IPW) and for imbalance in covariates with augmentation (AUG). The estimator IPW-AUG-GEE is Doubly robust (DR).
crul HTTP Client
A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby’s ‘faraday’ gem (<https://…/faraday> ). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package ‘curl’, an interface to ‘libcurl’ (<https://…/libcurl> ).
crunch Data Tools
The service ( ) provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
CSeqpat Frequent Contiguous Sequential Pattern Mining of Text
Mines contiguous sequential patterns in text.
csn Closed Skew-Normal Distribution
Provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42.
csp Correlates of State Policy Data Set in R
Provides the Correlates of State Policy data set for easy use in R.
csrplus Methods to Test Hypotheses on the Distribution of Spatial Point Processes
Includes two functions to evaluate the hypothesis of complete spatial randomness (csr) in point processes. The function ‘mwin’ calculates quadrat counts to estimate the intensity of a spatial point process through the moving window approach proposed by Bailey and Gatrell (1995). Event counts are computed within a window of a set size over a fine lattice of points within the region of observation. The function ‘pielou’ uses the nearest neighbor test statistic and asymptotic distribution proposed by Pielou (1959) to compare the observed point process to one generated under csr. The value can be compared to that given by the more widely used test proposed by Clark and Evans (1954).
cssTools Cognitive Social Structure Tools
A collection of tools for estimating a network from a random sample of cognitive social structure (CSS) slices. Also contains functions for evaluating a CSS in terms of various error types observed in each slice.
cstab Selection of Number of Clusters via Normalized Clustering Instability
Selection of the number of clusters in cluster analysis using stability methods.
csv Read and Write CSV Files with Selected Conventions
Reads and writes CSV with selected conventions. Uses the same generic function for reading and writing to promote consistent formats.
cthreshER Continuous Threshold Expectile Regression
Estimation and inference methods for the continuous threshold expectile regression. It can fit the continuous threshold expectile regression and test the existence of change point, for the paper, ‘Feipeng Zhang and Qunhua Li (2016). A continuous threshold expectile regression, submitted.’
CTM A Text Mining Toolkit for Chinese Document
The CTM package is designed to solve problems of text mining and is specific for Chinese document.
ctmcd Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data
Functions for estimating Markov generator matrices from discrete-time observations. The implemented approaches comprise diagonal adjustment, weighted adjustment and quasi-optimization of matrix logarithm based candidate solutions, an expectation-maximization algorithm as well as a Gibbs sampler.
ctqr Censored and Truncated Quantile Regression
Estimation of quantile regression models for survival data.
ctsem Continuous Time Structural Equation Modelling
An easily accessible continuous (and discrete) time dynamic modelling package for panel and time series data, reliant upon the OpenMx. package ( ) for computation. Most dynamic modelling approaches to longitudinal data rely on the assumption that time intervals between observations are consistent. When this assumption is adhered to, the data gathering process is necessarily limited to a specific schedule, and when broken, the resulting parameter estimates may be biased and reduced in power. Continuous time models are conceptually similar to vector autoregressive models (thus also the latent change models popularised in a structural equation modelling context), however by explicitly including the length of time between observations, continuous time models are freed from the assumption that measurement intervals are consistent. This allows: data to be gathered irregularly; the elimination of noise and bias due to varying measurement intervals; parsimonious structures for complex dynamics. The application of such a model in this SEM framework allows full-information maximum-likelihood estimates for both N = 1 and N > 1 cases, multiple measured indicators per latent process, and the flexibility to incorporate additional elements, including individual heterogeneity in the latent process and manifest intercepts, and time dependent and independent exogenous covariates. Furthermore, due to the SEM implementation we are able to estimate a random effects model where the impact of time dependent and time independent predictors can be assessed simultaneously, but without the classic problems of random effects models assuming no covariance between unit level effects and predictors.
ctsmr Continuous Time Stochastic Modelling for R
CTSM is a tool for estimating embedded parameters in a continuous time stochastic state space model. CTSM has been developed at DTU Compute (former DTU Informatics) over several years. CTSM-R provides a new scripting interface through the statistical language R. Mixing CTSM with R provides easy access to data handling and plotting tools required in any kind of modelling.
CTTShiny Classical Test Theory via Shiny
Interactive shiny application for running classical test theory (item analysis).
CUB A Class of Mixture Models for Ordinal Data
Estimating and testing models for ordinal data within the family of CUB models and their extensions (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions).
Cubist Rule- and Instance-Based Regression Modeling
Regression modeling using rules with added instance-based corrections.
CuCubes MultiDimensional Feature Selection (MDFS)
Functions for MultiDimensional Feature Selection (MDFS): * calculating multidimensional information gains, * finding interesting tuples for chosen variables, * scoring variables, * finding important variables, * plotting selection results. CuCubes is also known as CUDA Cubes and it is a library that allows fast CUDA-accelerated computation of information gains in binary classification problems. This package wraps CuCubes and provides an alternative CPU version as well as helper functions for building MultiDimensional Feature Selectors.
CUFF Charles’s Utility Function using Formula
Utility functions that provides wrapper to descriptive base functions like correlation, mean and table . It makes use of the formula interface to pass variables to functions. It also provides operators like to concatenate (%+%), to repeat and manage character vector for nice display.
curl A Modern and Flexible Web Client for R
The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https://, ftps://), ‘gzip’ compression, authentication, and other ‘libcurl’ goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of ‘libcurl’ is recommended; for a more-user-friendly web client see the ‘httr’ package which builds on this package with HTTP specific tools and logic.
The curl package: a modern R interface to libcurl
curlconverter Tools to Transform ‘cURL’ Command-Line Calls to ‘httr’ Requests
Deciphering web/’REST’ ‘API’ and ‘XHR’ calls can be tricky, which is one reason why internet browsers provide ‘Copy as cURL’ functionality within their ‘Developer Tools’ pane(s). These ‘cURL’ command-lines can be difficult to wrangle into an ‘httr’ ‘GET’ or ‘POST’ request, but you can now ‘straighten’ these ‘cURLs’ either from data copied to the system clipboard or by passing in a vector of ‘cURL’ command-lines and getting back a list of parameter elements which can be used to form ‘httr’ requests. You can also make a complete/working/callable ‘httr::VERB’ function right from the tools provided.
curry Partial Function Application with %<%, %-<%
Partial application is the process of reducing the arity of a function by fixing one or more arguments, thus creating a new function lacking the fixed arguments. The curry package provides three different ways of performing partial function application by fixing arguments from either end of the argument list (currying and tail currying) or by fixing multiple named arguments (partial application). This package provides this functionality through the %<%, %-<%, and %><% operators which allows for a programming style comparable to modern functional languages. Compared to other implementations such a purrr::partial() the operators in curry composes functions with named arguments, aiding in autocomplete etc.
customizedTraining Customized Training for Lasso and Elastic-Net Regularized Generalized Linear Models
Customized training is a simple technique for transductive learning, when the test covariates are known at the time of training. The method identifies a subset of the training set to serve as the training set for each of a few identified subsets in the training set. This package implements customized training for the glmnet() and cv.glmnet() functions.
CUSUMdesign Compute Decision Interval and Average Run Length for CUSUM Charts
Computation of decision intervals (H) and average run lengths (ARL) for CUSUM charts.
cvequality Tests for the Equality of Coefficients of Variation from Multiple Groups
Contains functions for testing for significant differences between multiple coefficients of variation. Includes Feltz and Miller’s (1996) <DOI:10.1002/(SICI)1097-0258(19960330)15:6%3C647::AID-SIM184%3E3.0.CO;2-P> asymptotic test and Krishnamoorthy and Lee’s (2014) <DOI:10.1007/s00180-013-0445-2> modified signed-likelihood ratio test. See the vignette for more, including full details of citations.
CVR Canonical Variate Regression
Perform canonical variate regression (CVR) for two sets of covariates and a univariate response, with regularization and weight parameters tuned by cross validation.
cvxbiclustr Convex Biclustering Algorithm
An iterative algorithm for solving a convex formulation of the biclustering problem.
cyclocomp Cyclomatic Complexity of R Code
Cyclomatic complexity is a software metric (measurement), used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program’s source code. It was developed by Thomas J. McCabe, Sr. in 1976.
Cyclops Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis
This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets.


d3heatmap A D3.js-based heatmap htmlwidget for R
This is an R package that implements a heatmap htmlwidget. It has the following features:
• Highlight rows/columns by clicking axis labels
• Click and drag over colormap to zoom in (click on colormap to zoom out)
• Optional clustering and dendrograms, courtesy of base::heatmap
Interactive heat maps
D3M Two Sample Test with Wasserstein Metric
Two sample test based on Wasserstein metric. This is motivated from detection of differential DNA-methylation sites based on underlying distributions.
D3partitionR Plotting D3 Hierarchical Plots in R and Shiny
Plotting hierarchical plots in R such as Sunburst, Treemap, Circle Treemap and Partition Chart.
d3r d3.js’ Utilities for R
Helper functions for using ‘d3.js’ in R.
dad Three-Way Data Analysis Through Densities
The three-way data consists of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides functional methods (principal component analysis, multidimensional scaling, discriminant analysis…) for such probability densities.
daff Diff, Patch and Merge for Data.frames
Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff. Daff uses the V8 package to wrap the ‘daff.js’ javascript library which is included in the package. Daff exposes a subset of ‘daff.js’ functionality, tailored for usage within R.
dagitty Graphical Analysis of Structural Causal Models
A port of the web-based software “DAGitty” for analyzing structural causal models (also known as directed acyclic graphs or DAGs). The package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
dashboard Interactive Data Visualization with D3.js
The dashboard package allows users to create web pages which display interactive data visualizations working in a standard modern browser. It displays them locally using the Rook server. Nor knowledge about web technologies nor Internet connection are required. D3.js is a JavaScript library for manipulating documents based on data. D3 helps the dashboard package bring data to life using HTML, SVG and CSS.
dat Tools for Data Manipulation
An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to ‘dplyr’ for common transformations on data frames to work around non standard evaluation by default.
data.table Extension of data.frame
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.
data.tree Hierarchical Data Structures
Create tree structures from hierarchical data, and use the utility methods to traverse the tree in various orders. Aggregate, print, convert to and from data.frame, and apply functions to your tree data. Useful for decision trees, machine learning, finance, and many other applications.
datacheckr Data Frame Column Name, Class and Value Checking
The primary function check_data() checks a data frame for column presence, column class and column values. If the user-defined conditions are met the function returns the an invisible copy of the original data frame, otherwise the function throws an informative error.
DataClean Data Cleaning
Includes functions that researchers or practitioners may use to clean raw data, transferring html, xlsx, txt data file into other formats. And it also can be used to manipulate text variables, extract numeric variables from text variables and other variable cleaning processes. It is originated from a author’s project which focuses on creative performance in online education environment. The resulting paper of that study will be published soon.
datadr Divide and Recombine for Large, Complex Data
Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).
DataEntry Make it Easier to Enter Questionnaire Data
This is a GUI application for defining attributes and setting valid values of variables, and then, entering questionnaire data in a data.frame.
DataExplorer Data Explorer
Data exploration process for data analysis and model building, so that users could focus on understanding data and extracting insights. The package automatically scans through each variable and does data profiling. Typical graphical techniques will be performed for both discrete and continuous features.
datafsm Estimating Finite State Machine Models from Data
Our method automatically generates models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it’s ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
DataLoader Import Multiple File Types
Functions to import multiple files of multiple data file types (‘.xlsx’, ‘.xls’, ‘.csv’, ‘.txt’) from a given directory into R data frames.
dataMaid A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Cleaning Process
Data cleaning is an important first step of any statistical analysis. dataMaid provides an extendable suite of test for common potential errors in a dataset. It produces a document with a thorough summary of the checks and the results that a human can use to identify possible errors.
datapack A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI-ORE standard is described at <https://…/ore>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://…/draft-kunze-bagit-08>.
datarobot DataRobot Predictive Modeling API
For working with the DataRobot predictive modeling platform’s API.
datasets.load Interface for Loading Datasets
Visual interface for loading datasets in RStudio from all installed (unloaded) packages.
Datasmith Tools to Complete Euclidean Distance Matrices
Implements several algorithms for Euclidean distance matrix completion, Sensor Network Localization, and sparse Euclidean distance matrix completion using the minimum spanning tree.
datastepr An Implementation of a SAS-Style Data Step
Based on a SAS data step. This allows for row-wise dynamic building of data, iteratively importing slices of existing dataframes, conducting analyses, and exporting to a results frame. This is particularly useful for differential or time-series analyses, which are often not well suited to vector-based operations.
dawai Discriminant Analysis with Additional Information
In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate.
dbfaker A Tool to Ensure the Validity of Database Writes
A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.
dbscan Density Based Clustering of Applications with Noise (DBSCAN)
A fast reimplementation of the DBSCAN clustering algorithm using the kd-tree data structure for speedup.
DClusterm Model-Based Detection of Disease Clusters
Model-based methods for the detection of disease clusters using GLMs, GLMMs and zero-inflated models.
DCM Data Converter Module
Data Converter Module (DCM) converts the dataset format from split into stack and to the reverse.
dCovTS Distance Covariance and Correlation for Time Series Analysis
Computing and plotting the distance covariance and correlation function of a univariate or a multivariate time series. Test statistics for testing pairwise independence are also implemented. Some data sets are also included.
ddpcr Analysis and Visualization of Droplet Digital PCR in R and on the Web
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing duplex ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
ddR Distributed Data Structures in R
Provides distributed data structures and simplifies distributed computing in R.
DDRTree Learning Principal Graphs with DDRTree
Project data into a reduced dimensional space and construct a principal graph from the reduced dimension.
deadband Statistical Deadband Algorithms Comparison
Statistical deadband algorithms are based on the Send-On-Delta concept as in Miskowicz(2006,<doi:10.3390/s6010049>). A collection of functions compare effectiveness and fidelity of sampled signals using statistical deadband algorithms.
debugme Debug R Packages
Specify debug messages as special string constants, and control debugging of packages via environment variables.
decision Statistical Decision Analysis
Contains a function called dmur() which accepts four parameters like possible values, probabilities of the values, selling cost and preparation cost. The dmur() function generates various numeric decision parameters like MEMV (Maximum (optimum) expected monitory value), best choice, EPPI (Expected profit with perfect information), EVPI (Expected value of the perfect information), EOL (Expected opportunity loss), which facilitate effective decision-making.
DecisionCurve Calculate and Plot Decision Curves
Decision curves are a useful tool to evaluate the population impact of adopting a risk prediction instrument into clinical practice. Given one or more instruments (risk models) that predict the probability of a binary outcome, this package calculates and plots decision curves, which display estimates of the standardized net benefit by the probability threshold used to categorize observations as ‘high risk.’ Curves can be estimated using data from an observational cohort, or from case-control studies when an estimate of the population outcome prevalence is available. Confidence intervals calculated using the bootstrap can be displayed and a wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
decisionSupport Quantitative Support of Decision Making under Uncertainty
Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual and Clustered Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own.
decoder Decode Coded Variables to Plain Text (and Vice Versa)
Main function ‘decode’ is used to decode coded key values to plain text. Function ‘code” can be used to code plain text to code if there is a 1:1 relation between the two. The concept relies on ‘keyvalue’ objects used for translation. There are several ‘keyvalue” objects included in the areas of geographical regional codes, administrative health care unit codes, diagnosis codes et cetera but it is also easy to extend the use by arbitrary code sets.
deconvolveR Empirical Bayes Estimation Strategies
Empirical Bayes methods for learning prior distributions from data. An unknown prior distribution (g) has yielded (unobservable) parameters, each of which produces a data point from a parametric exponential family (f). The goal is to estimate the unknown prior (‘g-modeling’) by deconvolution and Empirical Bayes methods.
DecorateR Fit and Deploy DECORATE Trees
DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) builds an ensemble of J48 trees by recursively adding artificial samples of the training data (‘Melville, P., & Mooney, R. J. (2005). Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 99-111. <doi:10.1016/j.inffus.2004.04.001>’).
deductive Data Correction and Imputation Using Deductive Methods
Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.
deepboost Deep Boosting Ensemble Modeling
Provides deep boosting models training, evaluation, predicting and hyper parameter optimising using grid search and cross validation. Based on Google’s Deep Boosting algorithm, and Google’s C++ implementation. Cortes, C., Mohri, M., & Syed, U. (2014) <URL: http://…/icml2014c2_cortesb14>.
deeplearning An Implementation of Deep Neural Network for Regression and Classification
An implementation of deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization. A combination of these methods have achieved state-of-the-art performance in ImageNet classification by overcoming the gradient saturation problem experienced by many deep architecture neural network models in the past. In addition, batch normalization and dropout are implemented as a means of regularization. The deeplearning package is inspired by the darch package and uses its class DArch.
deepnet deep learning toolkit in R
Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
deformula Integration of One-Dimensional Functions with Double Exponential Formulas
Numerical quadrature of functions of one variable over a finite or infinite interval with double exponential formulas.
delt Estimation of Multivariate Densities Using Adaptive Partitions
We implement methods for estimating multivariate densities. We include a discretized kernel estimator, an adaptive histogram (a greedy histogram and a CART-histogram), stagewise minimization, and bootstrap aggregation.
deming Deming, Thiel-Sen and Passing-Bablock Regression
Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions.
dendextend Extending R’s Dendrogram Functionality
Offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings. You can (1) Adjust a trees graphical parameters – the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different dendrograms to one another.
denoiseR Regularized low rank matrix estimation
Regularized low rank matrix estimation
denseFLMM Functional Linear Mixed Models for Densely Sampled Data
Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.
densityClust Clustering by fast search and find of density peaks
An implementation of the clustering algorithm described by Alex Rodriguez and Alessandro Laio (Science, 2014 vol. 344), along with tools to inspect and visualize the results.
DensParcorr Dens-Based Method for Partial Correlation Estimation in Large Scale Brain Networks
Provide a Dens-based method for estimating functional connection in large scale brain networks using partial correlation.
densratio Density Ratio Estimation
Density ratio estimation. The estimated density ratio function can be used in many applications such as the inlier-based outlier detection, covariate shift adaptation and etc.
DEoptim Global Optimization by Differential Evolution
Implements the differential evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector.
depmixS4 Dependent Mixture Models – Hidden Markov Models of GLMs and Other Distributions in S4
Fit latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models
depth.plot Multivariate Analogy of Quantiles
Could be used to obtain spatial depths, spatial ranks and outliers of multivariate random variables. Could also be used to visualize DD-plots (a multivariate generalization of QQ-plots).
dequer An R ‘Deque’ Container
Offers a special data structure called a ‘deque’ (pronounced like ‘deck’), which is a list-like structure. However, unlike R’s list structure, data put into a ‘deque’ is not necessarily stored contiguously, making insertions and deletions at the front/end of the structure much faster. The implementation here is new and uses a doubly linked list, and whence does not rely on R’s environments. To avoid unnecessary data copying, most ‘deque’ operations are performed via side-effects.
desc Manipulate DESCRIPTION Files
Tools to read, write, create, and manipulate DESCRIPTION files. It is intended for packages that create or manipulate other packages.
describer Describe Data in R Using Common Descriptive Statistics
Allows users to quickly and easily describe data using common descriptive statistics.
descriptr Descriptive Statistics & Distributions Exploration
Generate descriptive statistics such as measures of location, dispersion, frequency tables, cross tables, group summaries and multiple one/two way tables. Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions.
designGLMM Finding Optimal Block Designs for a Generalised Linear Mixed Model
Use simulated annealing to find optimal designs for Poisson regression models with blocks.
deSolve General Solvers for Initial Value Problems of Ordinary Differential Equations (ODE), Partial Differential Equations (PDE), Differential Algebraic Equations (DAE), and Delay Differential Equations (DDE)
Functions that solve initial value problems of a system of first-order ordinary differential equations (ODE), of partial differential equations (PDE), of differential algebraic equations (DAE), and of delay differential equations. The functions provide an interface to the FORTRAN functions lsoda, lsodar, lsode, lsodes of the ODEPACK collection, to the FORTRAN functions dvode and daspk and a C-implementation of solvers of the Runge-Kutta family with fixed or variable time steps. The package contains routines designed for solving ODEs resulting from 1-D, 2-D and 3-D partial differential equations (PDE) that have been converted to ODEs by numerical differencing.
DESP Estimation of Diagonal Elements of Sparse Precision-Matrices
Several estimators of the diagonal elements of a sparse precision (inverse covariance) matrix from a sample of Gaussian vectors for a given matrix of estimated marginal regression coefficients. To install package ‘gurobi’, instructions at http://…/gurobi-optimizer and http://…/r_api_overview.html.
desplot Plotting Field Plans for Agricultural Experiments
A function for plotting maps of agricultural field experiments that are laid out in grids.
detector Detect Data Containing Personally Identifiable Information
Allows users to quickly and easily detect data containing Personally Identifiable Information (PII) through convenience functions.
DetMCD DetMCD Algorithm (Robust and Deterministic Estimation of Location and Scatter)
DetMCD is a new algorithm for robust and deterministic estimation of location and scatter. The benefits of robust and deterministic estimation are explained in Hubert, M., Rousseeuw, P.J. and Verdonck, T. (2012),’A deterministic algorithm for robust location and scatter’, Journal of Computational and Graphical Statistics, Volume 21, Number 3, Pages 618-637.
DetR Suite of Deterministic and Robust Algorithms for Linear Regression
DetLTS, DetMM (and DetS) Algorithms for Deterministic, Robust Linear Regression.
devEMF EMF Graphics Output Device
Output graphics to EMF (enhanced metafile).
devtools Tools to Make Developing R Packages Easier
Collection of package development tools.
dfphase1 Phase I Control Charts (with Emphasis on Distribution-Free Methods)
Statistical methods for retrospectively detecting changes in location and/or dispersion of univariate and multivariate variables. Data can be individual (one observation at each instant of time) or subgrouped (more than one observation at each instant of time). Control limits are computed, often using a permutation approach, so that a prescribed false alarm probability is guaranteed without making any parametric assumptions on the stable (in-control) distribution.
dga Capture-Recapture Estimation using Bayesian Model Averaging
Performs Bayesian model averaging for capture-recapture. This includes code to stratify records, check the strata for suitable overlap to be used for capture-recapture, and some functions to plot the estimated population size.
dGAselID Genetic Algorithm with Incomplete Dominance for Feature Selection
Feature selection from high dimensional data using a diploid genetic algorithm with Incomplete Dominance for genotype to phenotype mapping and Random Assortment of chromosomes approach to recombination.
dggridR Discrete Global Grids for R
Spatial analyses involving binning require that every bin have the same area, but this is impossible using a rectangular grid laid over the Earth or over any projection of the Earth. Discrete global grids use hexagons, triangles, and diamonds to overcome this issue, overlaying the Earth with equally-sized bins. This package provides utilities for working with discrete global grids, along with utilities to aid in plotting such data.
DHARMa Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models
The ‘DHARMa’ package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals from fitted generalized linear mixed models. Currently supported are ‘lme4’, ‘glm’ (except quasi-distributions) and ‘lm’ model classes. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problem, such as over/underdispersion, zero-inflation, and spatial and temporal autocorrelation.
dHSIC Independence Testing via Hilbert Schmidt Independence Criterion
Contains an implementation of the d-variable Hilbert Schmidt independence criterion and several hypothesis tests based on it.
diagis Diagnostic Plot and Multivariate Summary Statistics of Weighted Samples from Importance Sampling
Fast functions for effective sample size, weighted multivariate mean and variance computation, and weight diagnostic plot for generic importance sampling type results.
diagonals Block Diagonal Extraction or Replacement
Several tools for handling block-matrix diagonals and similar constructs are implemented. Block-diagonal matrices can be extracted or removed using two small functions implemented here. In addition, non-square matrices are supported. Block diagonal matrices occur when two dimensions of a data set are combined along one edge of a matrix. For example, trade-flow data in the ‘decompr’ and ‘gvc’ packages have each country-industry combination occur along both edges of the matrix.
DiagrammeR Create diagrams and flowcharts using R
Create diagrams and flowcharts using R.
DiallelAnalysisR Diallel Analysis with R
Performs Diallel Analysis with R using Griffing’s and Hayman’s approaches. Four different methods (1: Method-I (Parents + F1’s + reciprocals); 2: Method-II (Parents and one set of F1’s); 3: Method-III (One set of F1’s and reciprocals); 4: Method-IV (One set of F1’s only)) and two methods (1: Fixed Effects Model; 2: Random Effects Model) can be applied using Griffing’s approach.
dichromat Color Schemes for Dichromats
Collapse red-green or green-blue distinctions to simulate the effects of different types of color-blindness.
DidacticBoost A Simple Implementation and Demonstration of Gradient Boosting
A basic, clear implementation of tree-based gradient boosting designed to illustrate the core operation of boosting models. Tuning parameters (such as stochastic subsampling, modified learning rate, or regularization) are not implemented. The only adjustable parameter is the number of training rounds. If you are looking for a high performance boosting implementation with tuning parameters, consider the ‘xgboost’ package.
diezeit R Interface to the ZEIT ONLINE Content API
A wrapper for the ZEIT ONLINE Content API, available at <>. ‘diezeit’ gives access to articles and corresponding metadata from the ZEIT archive and from ZEIT ONLINE. A personal API key is required for usage.
DIFboost Detection of Differential Item Functioning (DIF) in Rasch Models by Boosting Techniques
Performs detection of Differential Item Functioning using the method DIFboost as proposed in Schauberger and Tutz (2015): Detection of Differential item functioning in Rasch models by boosting techniques, British Journal of Mathematical and Statistical Psychology.
Difdtl Difference of Two Precision Matrices Estimation
Difference of two precision matrices is estimated by the d-trace loss with lasso penalty, given two sample classes.
diffobj Diffs for R Objects
Generate a colorized diff of two R objects for an intuitive visualization of their differences.
diffrprojects Projects for Text Version Comparison and Analytics in R
Provides data structures and methods for measuring, coding, and analysing text within text corpora. The package allows for manual as well computer aided coding on character, token and text pair level.
diffrprojectswidget Visualization for ‘diffrprojects’
Interactive visualizations and tabulations for diffrprojects. All presentations are based on the htmlwidgets framework allowing for interactivity via HTML and Javascript, Rstudio viewer integration, RMarkdown integration, as well as Shiny compatibility.
diffusr Network Diffusion Algorithms
Implementation of network diffusion algorithms such as insulated heat propagation or Markov random walks. Network diffusion algorithms generally spread information in the form of node weights along the edges of a graph to other nodes. These weights can for example be interpreted as temperature, an initial amount of water, the activation of neurons in the brain, or the location of a random surfer in the internet. The information (node weights) is iteratively propagated to other nodes until a equilibrium state or stop criterion occurs.
difNLR Detection of Dichotomous Differential Item Functioning (DIF) by Non-Linear Regression Function
Detection of differential item functioning among dichotomously scored items with non-linear regression procedure.
difR Collection of methods to detect dichotomous differential item functioning (DIF) in psychometrics
The difR package contains several traditional methods to detect DIF in dichotomously scored items. Both uniform and non-uniform DIF effects can be detected, with methods relying upon item response models or not. Some methods deal with more than one focal group.
digest Create Cryptographic Hash Digests of R Objects
Implementation of a function ‘digest()’ for the creation of hash digests of arbitrary R objects (using the md5, sha-1, sha-256, crc32, xxhash and murmurhash algorithms) permitting easy comparison of R language objects, as well as a function ‘hmac()’ to create hash-based message authentication code. The md5 algorithm by Ron Rivest is specified in RFC 1321, the sha-1 and sha-256 algorithms are specified in FIPS-180-1 and FIPS-180-2, and the crc32 algorithm is described in For md5, sha-1, sha-256 and aes, this package uses small standalone implementations that were provided by Christophe Devine. For crc32, code from the zlib library is used. For sha-512, an implementation by Aaron D. Gifford is used. For xxHash, the implementation by Yann Collet is used. For murmurhash, an implementation by Shane Day is used. Please note that this package is not meant to be deployed for cryptographic purposes for which more comprehensive (and widely tested) libraries such as OpenSSL should be used.
digitize Use Data from Published Plots in R
Import data from a digital image; it requires user input for calibration and to locate the data points. The end result is similar to ‘DataThief’ and other other programs that ‘digitize’ published plots or graphs.
dimple dimple charts for R
The aim of dimple is to open up the power and flexibility of d3 to analysts. It aims to give a gentle learning curve and minimal code to achieve something productive. It also exposes the d3 objects so you can pick them up and run to create some really cool stuff.
dimRed A Framework for Dimensionality Reduction
A collection of dimensionality reduction techniques from R packages and provides a common interface for calling the methods.
Directional Directional Statistics
A collection of R functions for directional data analysis.
DirectStandardisation Adjusted Means and Proportions by Direct Standardisation
Calculate adjusted means and proportions of a variable by groups defined by another variable by direct standardisation, standardised to the structure of the dataset.
dirmcmc Directional Metropolis Hastings Algorithm
Implementation of Directional Metropolis Hastings Algorithm for MCMC.
discreteRV Create and Manipulate Discrete Random Variables
Create, manipulate, transform, and simulate from discrete random variables. The syntax is modeled after that which is used in mathematical statistics and probability courses, but with powerful support for more advanced probability calculations. This includes the creation of joint random variables, and the derivation and manipulation of their conditional and marginal distributions.
DisimForMixed Calculate Dissimilarity Matrix for Dataset with Mixed Attributes
Implement the methods proposed by Ahmad & Dey (2007) <doi:10.1016/j.datak.2007.03.016> in calculating the dissimilarity matrix at the presence of mixed attributes. This Package includes functions to discretize quantitative variables, calculate conditional probability for each pair of attribute values, distance between every pair of attribute values, significance of attributes, calculate dissimilarity between each pair of objects.
disparityfilter Disparity Filter Algorithm of Weighted Network
Disparity filter is a network reduction algorithm to extract the backbone structure of both directed and undirected weighted networks. Disparity filter can reduce the network without destroying the multi-scale nature of the network. The algorithm has been developed by M. Angeles Serrano, Marian Boguna, and Alessandro Vespignani in Extracting the multiscale backbone of complex weighted networks.
distance.sample.size Calculates Study Size Required for Distance Sampling
Calculates the study size (either number of detections, or proportion of region that should be covered) to achieve a target precision for the estimated abundance. The calculation allows for the penalty due to unknown detection function, and for overdispersion. The user must specify a guess at the true detection function.
distances Tools for Distances and Metrics
Provides tools for constructing, manipulating and using distance metrics.
distcomp Distributed Computations
Distcomp, a new R package available on GitHub from a group of Stanford researchers has the potential to significantly advance the practice of collaborative computing with large data sets distributed over separate sites that may be unwilling to explicitly share data. The fundamental idea is to be able to rapidly set up a web service based on Shiny and opencpu technology that manages and performs a series of master / slave computations which require sharing only intermediate results. The particular target application for distcomp is any group of medical researchers who would like to fit a statistical model using the data from several data sets, but face daunting difficulties with data aggregation or are constrained by privacy concerns. Distcomp and its methodology, however, ought to be of interest to any organization with data spread across multiple heterogeneous database environments.
DISTRIB Four Essential Functions for Statistical Distributions Analysis: A New Functional Approach
A different way for calculating pdf/pmf, cdf, quantile and random data such that the user is able to consider the name of related distribution as an argument and so easily can changed by a changing argument by user. It must be mentioned that the core and computation base of package ‘DISTRIB’ is package ‘stats’. Although similar functions are introduced previously in package ‘stats’, but the package ‘DISTRIB’ has some special applications in some special computational programs.
DJL Distance Measure Based Judgment and Learning
Implements various decision support tools related to the new product development. Subroutines include productivity evaluation using distance measures, benchmarking, risk analysis, technology adoption model, inverse optimization, etc.
DLASSO Implementation of Differentiable Lasso Penalty in Linear Models
An implementation of the differentiable lasso (dlasso) using iterative ridge algorithm. This package allows selecting the tuning parameter by AIC, BIC and GCV.
dlib Allow Access to the ‘Dlib’ C++ Library
Interface for ‘Rcpp’ users to ‘dlib’ <> which is a ‘C++’ toolkit containing machine learning algorithms and computer vision tools. It is used in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. This package allows R users to use ‘dlib’ through ‘Rcpp’.
dlm Bayesian and Likelihood Analysis of Dynamic Linear Models
Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models
dlsem Distributed-Lag Structural Equation Modelling
Fit distributed-lag structural equation models and perform path analysis at different time lags.
dlstats Download Stats of R Packages
Monthly download stats of ‘CRAN’ and ‘Bioconductor’ packages. Download stats of ‘CRAN’ packages is from the ‘RStudio’ ‘CRAN mirror’, see <>. ‘Bioconductor’ package download stats is at <https://…/>.
dml Distance Metric Learning in R
The state-of-the-art algorithms for distance metric learning, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.
dmm Dyadic Mixed Model for Pedigree Data
Dyadic mixed model analysis with multi-trait responses and pedigree-based partitioning of individual variation into a range of environmental and genetic variance components for individual and maternal effects.
dMod Dynamic Modeling and Parameter Estimation in ODE Models
The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.
dmutate Mutate Data Frames with Random Variates
Work within the ‘dplyr’ workflow to add random variates to your data frame. Variates can be added at any level of an existing column. Also, bounds can be specified for simulated variates.
dnc Dynamic Network Clustering
Community detection for dynamic networks, i.e., networks measured repeatedly over a sequence of discrete time points, using a latent space approach.
DNLC Differential Network Local Consistency Analysis
Using Local Moran’s I for detection of differential network local consistency.
DNMF Discriminant Non-Negative Matrix Factorization
Discriminant Non-Negative Matrix Factorization aims to extend the Non-negative Matrix Factorization algorithm in order to extract features that enforce not only the spatial locality, but also the separability between classes in a discriminant manner. This algorithm refers to an article, Zafeiriou, Stefanos, et al. “Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification.” Neural Networks, IEEE Transactions on 17.3 (2006): 683-695.
docopulae Optimal Designs for Copula Models
A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding Ds-optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common low-level tasks.
docstring Provides Docstring Capabilities to R Functions
Provides the ability to display something analogous to Python’s docstrings within R. By allowing the user to document their functions as comments at the beginning of their function without requiring putting the function into a package we allow more users to easily provide documentation for their functions. The documentation can be viewed just like any other help files for functions provided by packages as well.
doctr Easily Check Data Consistency and Quality
A tool that helps you check the consistency and the quality of data. Like a real doctor, it has functions for examining, diagnosing and assessing the progress of its ‘patients”.
docxtractr Extract Tables from Microsoft Word Documents with R
docxtractr is an R pacakge for extracting tables out of Word documents (docx) Microsoft Word docx files provide an XML structure that is fairly straightforward to navigate, especially when it applies to Word tables. The docxtractr package provides tools to determine table count, table structure and extract tables from Microsoft Word docx documents.
DODR Detection of Differential Rhythmicity
Detect Differences in rhythmic time series. Using linear least squares and the robust semi-parametric rfit() method. Differences in harmonic fitting could be detected as well as differences in scale of the noise distribution.
doFuture Foreach Parallel Adaptor using the Future API of the ‘future’ Package
Provides a ‘%dopar%’ adaptor such that any type of futures can be used as backends for the ‘foreach’ framework.
domaintools R API interface to the DomainTools API
The following functions are implemented:
• domaintools_api_key: Get or set DOMAINTOOLS_API_KEY value
• domaintools_username: Get or set DOMAINTOOLS_API_USERNAME value
• domain_profile: Domain Profile
• hosting_history: Hosting History
• parsed_whois: Parsed Whois
• reverse_ip: Reverse IP
• reverse_ns: Reverse Nameserver
• shared_ips: Shared IPs
• whois: Whois Lookup
• whois_history: Whois History
doMC Foreach parallel adaptor for the multicore package
Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package..
DOT Render and Export DOT Graphs in R
Renders DOT diagram markup language in R and also provides the possibility to export the graphs in PostScript and SVG (Scalable Vector Graphics) formats. In addition, it supports literate programming packages such as ‘knitr’ and ‘rmarkdown’.
DoTC Distribution of Typicality Coefficients
Calculation of cluster typicality coefficients as being generated by fuzzy k-means clustering.
dotwhisker Dot-and-Whisker Plots of Regression Coefficients from Tidy Data Frames
Quick and easy dot-and-whisker plots of regression models saved in tidy data frames.
Dowd Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk
Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods.
downsize A Tool to Scale Down Large Workflows for Testing
Toggles the test and production versions of a large workflow.
dplyr A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
dplyrr Utilities for comfortable use of dplyr with databases
dplyr is the most powerful package for data handling in R, and it has also the ability of working with databases(See Vignette). But the functionalities of dealing with databases in dplyr is developing yet. Now, I’m trying to make dplyr with databases more comfortable by using some functions. For that purpose, I’ve created dplyrr package.
New package ‘dplyrr’
dplyrXdf dplyr backend for Revolution Analytics xdf files
The dplyr package is a popular toolkit for data transformation and manipulation. Over the last year and a half, dplyr has become a hot topic in the R community, for the way in which it streamlines and simplifies many common data manipulation tasks. Out of the box, dplyr supports data frames, data tables (from the data.table package), and the following SQL databases: MySQL/MariaDB, SQLite, and PostgreSQL. However, a feature of dplyr is that it’s extensible: by writing a specific backend, you can make it work with many other kinds of data sources. For example the development version of the RSQLServer package implements a dplyr backend for Microsoft SQL Server. The dplyrXdf package implements such a backend for the xdf file format, a technology supplied as part of Revolution R Enterprise. All of the data transformation and modelling functions provided with Revolution R Enterprise support xdf files, which allow you to break R’s memory barrier: by storing the data on disk, rather than in memory, they make it possible to work with multi-gigabyte or terabyte-sized datasets. dplyrXdf brings the benefits of dplyr to xdf files, including support for pipeline notation, all major verbs, and the ability to incorporate xdfs into dplyr pipelines.
dpmr Data Package Manager for R
Create, install, and summarise data packages that follow the Open Knowledge Foundation’s Data Package Protocol.
dprep Data Pre-Processing and Visualization Functions for Classification
Data preprocessing techniques for classification. Functions for normalization, handling of missing values,discretization, outlier detection, feature selection, and data visualization are included.
drake Data Frames in R for Make
Efficiently keep your results up to date with your code.
drat Drat R Archive Template
Creation and Use of R Repositories via two helper functions to insert packages into a repository, and to add repository information to the current R session. Two primary types of repositories are support: gh-pages at GitHub, as well as local repositories on either the same machine or a local network. Drat is a recursive acronym which stands for Drat R Archive Template.
DRaWR Discriminative Random Walk with Restart
We present DRaWR, a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types, preserving more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only the relevant properties. We then rerank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork.
DrBats Data Representation: Bayesian Approach That’s Sparse
Feed longitudinal data into a Bayesian Latent Factor Model to obtain a low-rank representation. Parameters are estimated using a Hamiltonian Monte Carlo algorithm with STAN. See G. Weinrott, B. Fontez, N. Hilgert and S. Holmes, ‘Bayesian Latent Factor Model for Functional Data Analysis’, Actes des JdS 2016.
DREGAR Regularized Estimation of Dynamic Linear Regression in the Presence of Autocorrelated Residuals (DREGAR)
A penalized/non-penalized implementation for dynamic regression in the presence of autocorrelated residuals (DREGAR) using iterative penalized/ordinary least squares. It applies Mallows CP, AIC, BIC and GCV to select the tuning parameters.
DrillR R Driver for Apache Drill
Provides a R driver for Apache Drill<>, which could connect to the Apache Drill cluster<https://…/installing-drill-on-the-cluster> or drillbit<https://…/embedded-mode-prerequisites> and get result(in data frame) from the SQL query and check the current configuration status. This link <https://…/docs> contains more information about Apache Drill.
DRIP Discontinuous Regression and Image Processing
This is a collection of functions for discontinuous regression analysis and image processing.
dsmodels A Language to Facilitate the Creation and Visualization of Two- Dimensional Dynamical Systems
An expressive language to facilitate the creation and visualization of two-dimensional dynamical systems. The basic elements of the language are a model wrapping around a function(x,y) which outputs a list(x = xprime, y = yprime), and a range. The language supports three types of visual objects: visualizations, features, and backgrounds. Visualizations, including dots and arrows, depict the behavior of the dynamical system over the entire range. Features display user-defined curves and points, and their images under the system. Backgrounds define and color regions of interest, such as areas of convergence and divergence. The language can also automatically guess attractors and regions of convergence and divergence.
DSsim Distance Sampling Simulations
Performs distance sampling simulations. It repeatedly generates instances of a user defined population within a given survey region, generates realisations of a survey design (currently these must be pregenerated using Distance software <http://…/> ) and simulates the detection process. The data are then analysed so that the results can be compared for accuracy and precision across all replications. This will allow users to select survey designs which will give them the best accuracy and precision given their expectations about population distribution. Any uncertainty in population distribution or population parameters can be included by running the different survey designs for a number of different population descriptions. An example simulation can be found in the help file for make.simulation.
dst Using Dempster-Shafer Theory
This package allows you to make basic probability assignments on a set of possibilities (events) and combine these events with Dempster’s rule of combination.
dSVA Direct Surrogate Variable Analysis
Functions for direct surrogate variable analysis, which can identify hidden factors in high-dimensional biomedical data.
DT R Interface to the jQuery Plug-in DataTables
This package provides a function datatable() to display R data via the DataTables library (N.B. not to be confused with the data.table package).
An R interface to the DataTables library
An R interface to the DataTables library
dtables Simplifying Descriptive Frequencies and Statistics
Towards automation of descriptive frequencies and statistics tables.
dtplyr Data Table Back-End for ‘dplyr’
This implements the data table back-end for ‘dplyr’ so that you can seamlessly use data table and ‘dplyr’ together.
dtq data.table query
Auditing data transformation can be simply described as gathering metadata about the transformation process. The most basics metadata would be a timestamp, atomic transformation description, data volume on input, data volume on output, time elapsed. If you work with R only interactively you may find it more like a fancy tool. On the other hand for automated scheduled R jobs it may be quite helpful to have traceability on the lower grain of processing than just binary success or fail after the script is executed, for example a logging each query against the data. Similar features are already available in ETL tools for decades. I’ve addressed this in my dtq package.
DTRlearn Learning Algorithms for Dynamic Treatment Regimes
Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage by potentially time-varying patient features and intermediate outcomes observed in previous stages. There are 3 main type methods, O-learning, Q-learning and P-learning to learn the optimal Dynamic Treatment Regimes with continuous variables. This package provide these state of arts algorithms to learn DTRs.
DTRreg DTR Estimation and Inference via G-Estimation, Dynamic WOLS, and Q-Learning
Dynamic treatment regime estimation and inference via G-estimation, dynamic weighted ordinary least squares (dWOLS) and Q-learning. Inference via bootstrap and (for G-estimation) recursive sandwich estimation.
dtwclust Time Series Clustering with Dynamic Time Warping
Time series clustering using different techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Additionally, an implementation of k-Shape clustering is available.
dtwSat Time-Weighted Dynamic Time Warping for Remote Sensing Time Series Analysis
Provides a Time-Weighted Dynamic Time Warping (TWDTW) algorithm to measure similarity between two temporal sequences. This adaptation of the classical Dynamic Time Warping (DTW) algorithm is flexible to compare events that have a strong time dependency, such as phenological stages of cropland systems and tropical forests. This package provides methods for visualization of minimum cost paths, time series alignment, and time intervals classification.
DWreg Parametric Regression for Discrete Response
Regression for a discrete response, where the conditional distribution is modelled via a discrete Weibull distribution.
dwtools Data Warehouse related functions
Handy wrappers for extraction, loading, denormalization, normalization. Additionally data.table Nth key feature, timing+logging and more.
dygraphs Interface to Dygraphs Interactive Time Series Charting Library
An R interface to the dygraphs JavaScript charting library (a copy of which is included in the package). Provides rich facilities for charting time-series data in R, including highly configurable series- and axis-display and interactive features like zoom/pan and series/point highlighting.
DYM Did You Mean?
Add a ‘Did You Mean’ feature to the R interactive. With this package, error messages for misspelled input of variable names or package names suggest what you really want to do in addition to notification of the mistake.
dynamichazard Dynamic Hazard Models using State Space Models
Contains functions that lets you fit dynamic hazard models with binary outcomes using state space models. The methods are originally described in Fahrmeir (1992) <doi:10.1080/01621459.1992.10475232> and Fahrmeir (1994) <doi:10.1093/biomet/81.2.317>. The functions also provide an extension hereof where the Extended Kalman filter is replaced by an Unscented Kalman filter. Models are fitted with the regular coxph() like formula.
dynaTree Dynamic Trees for Learning and Design
Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=’dynaTree’).
dynetNLAResistance Resisting Neighbor Label Attack in a Dynamic Network
An anonymization algorithm to resist neighbor label attack in a dynamic network.
dynOmics Fast Fourier Transform to Identify Associations Between Time Course Omics Data
Implements the fast Fourier transform to estimate delays of expression initiation between trajectories to integrate and analyse time course omics data.
dynpanel Dynamic Panel Data Models
Computes the first stage GMM estimate of a dynamic linear model with p lags of the dependent variables.
dynRB Dynamic Range Boxes
Improves the concept of multivariate range boxes, which is highly susceptible for outliers and does not consider the distribution of the data. The package uses dynamic range boxes to overcome these problems.
dynsbm Dynamic Stochastic Block Models
Dynamic stochastic block model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time, developed in Matias and Miele (2016) <doi:10.1111/rssb.12200>.
DynTxRegime Methods for Estimating Dynamic Treatment Regimes
A comprehensive toolkit for estimating Dynamic Treatment Regimes. Available methods include Interactive Q-Learning, Q-Learning, and value-search methods based on Augmented Inverse Probability Weighted estimators and Inverse Probability Weighted estimators.
DySeq Functions for Dyadic Sequence Analyses
Small collection of functions for dyadic binary/dichotomous sequence analyses, e.g. transforming sequences into time-to-event data, implementation of Bakeman & Gottman’s (1997) approach of aggregated logit-models, and simulating expected number of low/zero frequencies for state-transition tables. Further functions will be added in future releases. References: Bakeman, R., & Gottman, J. M. (1997) <DOI:10.1017/cbo9780511527685>.


e1071 Misc Functions of the Department of Statistics (e1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
eAnalytics Dynamic Web Analytics for the Energy Industry
A ‘Shiny’ web application for energy industry analytics. Take an overview of the industry, measure Key Performance Indicators, identify changes in the industry over time, and discover new relationships in the data.
earth Multivariate Adaptive Regression Splines
Build regression models using the techniques in Friedman’s papers ‘Fast MARS’ and ‘Multivariate Adaptive Regression Splines’. (The term ‘MARS’ is trademarked and thus not used in the name of the package.)
earthtones Derive a Color Palette from a Particular Location on Earth
Downloads a satellite image via Google Maps/Earth (these are originally from a variety of aerial photography sources), translates the image into a perceptually uniform color space, runs one of a few different clustering algorithms on the colors in the image searching for a user-supplied number of colors, and returns the resulting color palette.
easyDes An Easy Way to Descriptive Analysis
Descriptive analysis is essential for publishing medical articles. This package provides an easy way to conduct the descriptive analysis. 1. Both numeric and factor variables can be handled. For numeric variables, normality test will be applied to choose the parametric and nonparametric test. 2. Both two or more groups can be handled. For groups more than two, the post hoc test will be applied, ‘Tukey’ for the numeric variables and ‘FDR’ for the factor variables. 3. ANOVA or Fisher test can be forced to apply.
easyformatr Tools for Building Formats
Builds format strings for both times and numbers.
easypackages Easy Loading and Installing of Packages
Easily load and install multiple packages from different sources, including CRAN and GitHub. The libraries function allows you to load or attach multiple packages in the same function call. The packages function will load one or more packages, and install any packages that are not installed on your system (after prompting you). Also included is a from_import function that allows you to import specific functions from a package into the global environment.
easypower Sample Size Estimation for Experimental Designs
Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package.
easyreg Easy Regression
Performs analysis of regression in simple designs with quantitative treatments, including mixed models and non linear models. Plot graphics (equations and data).
easySdcTable Easy Interface to the Statistical Disclosure Control Package ‘sdcTable’
The main function, ProtectTable(), performs table suppression according to a frequency rule with a data set as the only required input. Within this function, protectTable() or protectLinkedTables() in package ‘sdcTable’ is called. Lists of level-hierarchy (parameter ‘dimList’) and other required input to these functions are created automatically.
easyVerification Ensemble Forecast Verification for Large Datasets
Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large datasets. The forecast metrics are imported from the ‘SpecsVerification’ package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package.
EBASS Sample Size Calculation Method for Cost-Effectiveness Studies Based on Expected Value of Perfect Information
We propose a new sample size calculation method for trial-based cost-effectiveness analyses. Our strategy is based on the value of perfect information that would remain after the completion of the study.
EBrank Empirical Bayes Ranking
Empirical Bayes ranking applicable to parallel-estimation settings where the estimated parameters are asymptotically unbiased and normal, with known standard errors. A mixture normal prior for each parameter is estimated using Empirical Bayes methods, subsequentially ranks for each parameter are simulated from the resulting joint posterior over all parameters (The marginal posterior densities for each parameter are assumed independent). Finally, experiments are ordered by expected posterior rank, although computations minimizing other plausible rank-loss functions are also given.
ECctmc Simulation from Endpoint-Conditioned Continuous Time Markov Chains
Draw sample paths for endpoint-conditioned continuous time Markov chains via modified rejection sampling or uniformization.
ecd Elliptic Distribution Based on Elliptic Curves
An implementation of the univariate elliptic distribution and elliptic option pricing model. It provides detailed functionality and data sets for the distribution and modelling. Especially, it contains functions for the computation of density, probability, quantile, fitting procedures, option prices, volatility smile. It also comes with sample financial data, and plotting routines.
ecdfHT Empirical CDF for Heavy Tailed Data
Computes and plots a transformed empirical CDF (ecdf) as a diagnostic for heavy tailed data, specifically data with power law decay on the tails. Routines for annotating the plot, comparing data to a model, fitting a nonparametric model, and some multivariate extensions are given.
ECharts2Shiny Embedding Charts Generated with ECharts Library into Shiny Applications
With this package, users can embed interactive charts to their Shiny applications. These charts will be generated by ECharts library developed by Baidu ( ). Current version support line charts, bar charts, pie charts and gauge.
ecm Build Error Correction Models
Functions for easy building of error correction models (ECM) for time series regression.
EconDemand General Analysis of Various Economics Demand Systems
Tools for general properties including price, quantity, elasticity, convexity, marginal revenue and manifold of various economics demand systems including Linear, Translog, CES, LES and CREMR.
ECOSolveR Embedded Conic Solver in R
R interface to the Embedded COnic Solver (ECOS) for convex problems. Conic and equality constraints can be specified in addition to mixed integer problems.
ecp Nonparametric Multiple Change-Point Analysis of Multivariate Data
Implements hierarchical procedures to find multiple change-points through the use of U-statistics. The procedures do not make any distributional assumptions other than the existence of certain absolute moments. Both agglomerative and divisive procedures are included. These methods return the set of estimated change-points as well as other summary information.
ecr Evolutionary Computing in R
Provides a powerful framework for evolutionary computing in R. The user can easily construct powerful evolutionary algorithms for tackling both single- and multi-objective problems by plugging in different predefined evolutionary building blocks, e. g., operators for mutation, recombination and selection with just a few lines of code. Your problem cannot be easily solved with a standard EA which works on real-valued vectors, permutations or binary strings? No problem, ‘ecr’ has been developed with that in mind. Extending the framework with own operators is also possible. Additionally there are various comfort functions, like monitoring, logging and more.
edarf Exploratory Data Analysis using Random Forests
Functions useful for exploratory data analysis using random forests which can be used to compute multivariate partial dependence, observation, class, and variable-wise marginal and joint permutation importance as well as observation-specific measures of distance (supervised or unsupervised). All of the aforementioned functions are accompanied by ‘ggplot2’ plotting functions.
edci Edge Detection and Clustering in Images
Detection of edge points in images based on the difference of two asymmetric M-kernel estimators. Linear and circular regression clustering based on redescending M-estimators. Detection of linear edges in images.
edeaR Exploratory and Descriptive Event-Based Data Analysis
Functions for exploratory and descriptive analysis of event based data. Can be used to import and export xes-files, the IEEE eXtensible Event Stream standard. Provides methods for describing and selecting process data.
edesign Maximum Entropy Sampling
An implementation of maximum entropy sampling for spatial data is provided. An exact branch-and-bound algorithm as well as greedy and dual greedy heuristics are included.
edfun Creating Empirical Distribution Functions
Easily creating empirical distribution functions from data: ‘dfun’, ‘pfun’, ‘qfun’ and ‘rfun’.
edgeCorr Spatial Edge Correction
Facilitates basic spatial edge correction to point pattern data.
EditImputeCont Simultaneous Edit-Imputation for Continuous Microdata
An integrated editing and imputation method for continuous microdata under linear constraints is implemented. It relies on a Bayesian nonparametric hierarchical modeling approach in which the joint distribution of the data is estimated by a flexible joint probability model. The generated edit-imputed data are guaranteed to satisfy all imposed edit rules, whose types include ratio edits, balance edits and range restriction
editR A Rmarkdown editor with instant preview
editR is a basic Rmarkdown editor with instant previewing of your document. It allows you to create and edit Rmarkdown documents while instantly previewing the result of your writing and coding. It also allows you to render your Rmarkdown file in any format permitted by the rmarkdown R package.
edstan Stan Models for Item Response Theory
Provides convenience functions and pre-programmed Stan models related to item response theory. Its purpose is to make fitting common item response theory models using Stan easy.
eefAnalytics Analysing Education Trials
Provides tools for analysing education trials. Making different methods accessible in a single place is essential for sensitivity analysis of education trials, particularly the implication of the different methods in analysing simple randomised trials, cluster randomised trials and multisite trials.
eel Extended Empirical Likelihood
Compute the extended empirical log likelihood ratio (Tsao & Wu, 2014) for the mean and parameters defined by estimating equations.
EFAutilities Utility Functions for Exploratory Factor Analysis
A number of utility function for exploratory factor analysis are included in this package. In particular, it computes standard errors for parameter estimates and factor correlations under a variety of conditions.
effectFusion Bayesian Effect Fusion for Categorical Predictors
Variable selection and Bayesian effect fusion for categorical predictors in linear regression models. Effect fusion aims at the question which categories have a similar effect on the response and therefore can be fused to obtain a sparser representation of the model. Effect fusion and variable selection can be obtained either with a prior that has an interpretation as spike and slab prior on the level effect differences or with a sparse finite mixture prior on the level effects. The regression coefficients are estimated with a flat uninformative prior after model selection or model averaged. For posterior inference, an MCMC sampling scheme is used that involves only Gibbs sampling steps.
EffectLiteR Average and Conditional Effects
Use structural equation modeling to estimate average and conditional effects of a treatment variable on an outcome variable, taking into account multiple continuous and categorical covariates.
EffectStars Visualization of Categorical Response Models
The package provides functions to visualize regression models with categorical response. The effects of the covariates are plotted with star plots in order to allow for an optical impression of the fitted model.
EffectTreat Prediction of Therapeutic Success
In personalized medicine, one wants to know, for a given patient and his or her outcome for a predictor (pre-treatment variable), how likely it is that a treatment will be more beneficial than an alternative treatment. This package allows for the quantification of the predictive causal association(i.e., the association between the predictor variable and the individual causal effect of the treatment) and related metrics.
EfficientMaxEigenpair Efficient Initials for Computing the Maximal Eigenpair
An implementation for using efficient initials to compute the maximal eigenpair in R. It provides two algorithms to find the efficient initials under two cases: the tridiagonal matrix case and the general matrix case. Besides, it also provides algorithms for the next to the maximal eigenpair under these two cases.
efflog The Causal Effects for a Causal Loglinear Model
Fitting a causal loglinear model and calculating the causal effects for a causal loglinear model with the multiplicative interaction or without the multiplicative interaction, obtaining the natural direct, indirect and the total effect. It calculates also the cell effect, which is a new interaction effect.
EFS Tool for Ensemble Feature Selection
Provides a function to check the importance of a feature based on a dependent classification variable. An ensemble of correlation and importance measure tests are used to determine the normed importance value of all features. Combining these methods in one function (building the sum of the importance values) leads to a better tool for selecting most important features. This selection can also be viewed in a barplot using the barplot_fs() function and proved using an also provided function for a logistic regression model, namely logreg_test().
elasso Enhanced Least Absolute Shrinkage Operator
Performs some enhanced variable selection algorithms based on least absolute shrinkage operator for regression model.
elasticsearchr A Lightweight Interface for Interacting with Elasticsearch from R
A lightweight R interface to ‘Elasticsearch’ – a NoSQL search-engine and column store database (see <https://…/elasticsearch> for more information). This package implements a simple Domain-Specific Language (DSL) for indexing, deleting, querying, sorting and aggregating data using ‘Elasticsearch’.
elhmc Sampling from a Empirical Likelihood Bayesian Posterior of Parameters Using Hamiltonian Monte Carlo
A tool to draw samples from a Empirical Likelihood Bayesian posterior of parameters using Hamiltonian Monte Carlo.
ELMR Extreme Machine Learning (ELM)
Training and prediction functions are provided for the Extreme Learning Machine algorithm (ELM). The ELM use a Single Hidden Layer Feedforward Neural Network (SLFN) with random generated weights and no gradient-based backpropagation. The training time is very short and the online version allows to update the model using small chunk of the training set at each iteration. The only parameter to tune is the hidden layer size and the learning function.
EloChoice Preference Rating for Visual Stimuli Based on Elo Ratings
Allows calculating global scores for characteristics of visual stimuli. Stimuli are presented as sequence of pairwise comparisons (‘contests’), during each of which a rater expresses preference for one stimulus over the other. The algorithm for calculating global scores is based on Elo rating, which updates individual scores after each single pairwise contest. Elo rating is widely used to rank chess players according to their performance. Its core feature is that dyadic contests with expected outcomes lead to smaller changes of participants’ scores than outcomes that were unexpected. As such, Elo rating is an efficient tool to rate individual stimuli when a large number of such stimuli are paired against each other in the context of experiments where the goal is to rank stimuli according to some characteristic of interest.
elpatron Bicycling Data Analysis with R
Functions to facilitate cycling analysis within the R environment.
EMbC Expectation-Maximization Binary Clustering
Unsupervised, multivariate, clustering algorithm yielding a meaningful binary clustering taking into account the uncertainty in the data. A specific constructor for trajectory movement analysis yields behavioural annotation of the tracks based on estimated local measures of velocity and turning angle, eventually with solar position covariate as a daytime indicator.
emdi Estimating and Mapping Disaggregated Indicators
Functions that support estimating, assessing and mapping regional disaggregated indicators. So far, estimation methods comprise the model-based approach Empirical Best Prediction (see ‘Small area estimation of poverty indicators’ by Molina and Rao (2010)<doi:10.1002/cjs.10051>), as well as their precision estimates. The assessment of the used model is supported by a summary and diagnostic plots. For a suitable presentation of estimates, map plots can be easily created. Furthermore, results can easily be exported to excel.
emIRT EM Algorithms for Estimating Item Response Theory Models
Various Expectation-Maximization (EM) algorithms are implemented for item response theory (IRT) models. The current implementation includes IRT models for binary and ordinal responses, along with dynamic and hierarchical IRT models with binary responses. The latter two models are derived and implemented using variational EM.
eMLEloglin Fitting log-Linear Models in Sparse Contingency Tables
Log-linear modeling is a popular method for the analysis of contingency table data. When the table is sparse, the data can fall on the boundary of the convex support, and we say that ‘the MLE does not exist’ in the sense that some parameters cannot be estimated. However, an extended MLE always exists, and a subset of the original parameters will be estimable. The ‘eMLEloglin’ package determines which sampling zeros contribute to the non-existence of the MLE. These problematic zero cells can be removed from the contingency table and the model can then be fit (as far as is possible) using the glm() function.
EMMIXcskew Fitting Mixtures of CFUST Distributions
Functions to fit finite mixture of multivariate canonical fundamental skew t (FM-CFUST) distributions, random sample generation, 2D and 3D contour plots.
EMMLi A Maximum Likelihood Approach to the Analysis of Modularity
Fit models of modularity to morphological landmarks. Perform model selection on results. Fit models with a single within-module correlation or with separate within-module correlations fitted to each module.
emojifont Emoji Fonts for using in R
An implementation of using emoji font in both base and ‘ggplot2’ graphics.
EMSaov The Analysis of Variance with EMS
The analysis of variance table including the expected mean squares (EMS) for various types of experimental design is provided. When some variables are random effects or we use special experimental design such as nested design, repeated-measures design, or split-plot design, it is not easy to find the appropriate test, especially denominator for F-statistic which depends on EMS.
EMSC Extended Multiplicative Signal Correction
Background correction of spectral like data. Handles variations in scaling, polynomial baselines and interferents. Parameters for corrections are stored for further analysis, and spectra are corrected accordingly.
emuR Main Package of the EMU Speech Database Management System
Provides the next iteration of the EMU Speech Database Management System (EMU_SDMS) with database management, data extraction, data preparation and data visualization facilities.
encode Represent Ordered Lists and Pairs as Strings
Interconverts between ordered lists and compact string notation. Useful for capturing code lists, and pair-wise codes and decodes, for text storage. Analogous to factor levels and labels. Generics ‘encode’ and ‘decode’ perform interconversion, while ‘codes’ and ‘decodes’ extract components of an encoding. The function ‘encoded’ checks whether something is interpretable as an encoding.
endogenous Classical Simultaneous Equation Models
Likelihood-based approaches to estimate linear regression parameters and treatment effects in the presence of endogeneity. Specifically, this package includes James Heckman’s classical simultaneous equation models-the sample selection model for outcome selection bias and hybrid model with structural shift for endogenous treatment. For more information, see the seminal paper of Heckman (1978) <DOI:10.3386/w0177> in which the details of these models are provided. This package accommodates repeated measures on subjects with a working independence approach. The hybrid model further accommodates treatment effect modification.
endtoend Transmissions and Receptions in an End to End Network
Computes the expectation of the number of transmissions and receptions considering an End-to-End transport model with limited number of retransmissions per packet. It provides theoretical results and also estimated values based on Monte Carlo simulations.
enpls Ensemble Partial Least Squares (EnPLS) Regression
R package for ensemble partial least squares regression, a unified framework for feature selection, outlier detection, and ensemble learning.
enrichwith Methods to Enrich R Objects with Extra Components
The enrichwith package provides the ‘enrich’ method to enrich list-like R objects with new, relevant components. The current version has methods for enriching objects of class ‘family’, ‘link-glm’ and ‘glm’. The resulting objects preserve their class, so all methods associated to them still apply. The package can also be used to produce customisable source code templates for the structured implementation of methods to compute new components.
EnsembleCV Extensible Package for Cross-Validation-Based Integration of Base Learners
This package extends the base classes and methods of EnsembleBase package for cross-validation-based integration of base learners. Default implementation calculates average of repeated CV errors, and selects the base learner / configuration with minimum average error. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. The package can be extended, e.g. by adding variants of the current implementation.
EnsemblePCReg Extensible Package for Principal-Component-Regression-based Integration of Base Learners
This package extends the base classes and methods of EnsembleBase package for Principal-Components-Regression-based (PCR) integration of base learners. Default implementation uses cross-validation error to choose the optimal number of PC components for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package.
EnsemblePenReg Extensible Classes and Methods for Penalized-Regression-based Integration of Base Learners
Extending the base classes and methods of EnsembleBase package for Penalized-Regression-based (Ridge and Lasso) integration of base learners. Default implementation uses cross-validation error to choose the optimal lambda (shrinkage parameter) for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package.
ensembleR Ensemble Models in R
Functions to use ensembles of several machine learning models specified in caret package.
EntropyExplorer Tools for Exploring Differential Shannon Entropy, Differential Coefficient of Variation and Differential Expression
Rows of two matrices are compared for Shannon entropy, coefficient of variation, and expression. P-values can be requested for all metrics.
envestigate R package to interrogate environments.
R package to interrogate environments. Scary, I know.
EnviroPRA Environmental Probabilistic Risk Assessment Tools
Methods to perform a Probabilistic Environmental Risk assessment from exposure to toxic substances – i.e. USEPA (1997) <https://…iding-principles-monte-carlo-analysis> -.
epandist Statistical Functions for the Censored and Uncensored Epanechnikov Distribution
Analyzing censored variables usually requires the use of optimization algorithms. This package provides an alternative algebraic approach to the task of determining the expected value of a random censored variable with a known censoring point. Likewise this approach allows for the determination of the censoring point if the expected value is known. These results are derived under the assumption that the variable follows an Epanechnikov kernel distribution with known mean and range prior to censoring. Statistical functions related to the uncensored Epanechnikov distribution are also provided by this package.
EPGLM Gaussian Approximation of Bayesian Binary Regression Models
The main functions compute the expectation propagation approximation of a Bayesian probit/logit models with Gaussian prior. More information can be found in Chopin and Ridgway (2015). More models and priors should follow.
EpiWeek Conversion Between Epidemiological Weeks and Calendar Dates
Users can easily derive the calendar dates from epidemiological weeks, and vice versa.
equSA Estimate a Single or Multiple Graphical Models and Construct Networks
Provides an equivalent measure of partial correlation coefficients for high-dimensional Gaussian Graphical Models to learn and visualize the underlying relationships between variables from single or multiple datasets. You can refer to Liang, F., Song, Q. and Qiu, P. (2015) <doi:10.1080/01621459.2015.1012391> for more detail. Based on this method, the package also provides the method for constructing networks for Next Generation Sequencing Data. Besides, it includes the method for jointly estimating Gaussian Graphical Models of multiple datasets.
ergm.rank Fit, Simulate and Diagnose Exponential-Family Models for Rank-Order Relational Data
A set of extensions for the ‘ergm’ package to fit weighted networks whose edge weights are ranks.
errint Build Error Intervals
Build and analyze error intervals for a particular model predictions assuming different distributions for noise in the data.
errorizer Function Errorizer
Provides a function to convert existing R functions into ‘errorized’ versions with added logging and handling functionality when encountering errors or warnings. The errorize function accepts an existing R function as its first argument and returns a R function with the exact same arguments and functionality. However, if an error or warning occurs when running that ‘errorized’ R function, it will save a .Rds file to the current working directory with the relevant objects and information required to immediately recreate the error.
errorlocate Locate Errors with Validation Rules
Errors in data can be located and removed using validation rules from package ‘validate’.
esaBcv Estimate Number of Latent Factors and Factor Matrix for Factor Analysis
These functions estimate the latent factors of a given matrix, no matter it is high-dimensional or not. It tries to first estimate the number of factors using bi-cross-validation and then estimate the latent factor matrix and the noise variances. For more information about the method, see Art B. Owen and Jingshu Wang 2015 archived article on factor model (http://…/1503.03515 ).
esaddle Extended Empirical Saddlepoint Density Approximation
Tools for fitting the Extended Empirical Saddlepoint (EES) density.
esc Effect Size Computation for Meta Analysis
Implementation of the web-based ‘Practical Meta-Analysis Effect Size Calculator’ from David B. Wilson in R. Based on the input, the effect size can be returned as standardized mean difference, Hedges’ g, correlation coefficient r or Fisher’s transformation z, odds ratio or log odds effect size.
ESKNN Ensemble of Subset of K-Nearest Neighbours Classifiers for Classification and Class Membership Probability Estimation
Functions for classification and group membership probability estimation are given. The issue of non-informative features in the data is addressed by utilizing the ensemble method. A few optimal models are selected in the ensemble from an initially large set of base k-nearest neighbours (KNN) models, generated on subset of features from the training data. A two stage assessment is applied in selection of optimal models for the ensemble in the training function. The prediction functions for classification and class membership probability estimation returns class outcomes and class membership probability estimates for the test data. The package includes measure of classification error and brier score, for classification and probability estimation tasks respectively.
EstHer Estimation of Heritability in High Dimensional Sparse Linear Mixed Models using Variable Selection
Our method is a variable selection method to select active components in sparse linear mixed models in order to estimate the heritability. The selection allows us to reduce the size of the data sets which improves the accuracy of the estimations. Our package also provides a confidence interval for the estimated heritability.
estimability Estimability Tools for Linear Models
Provides tools for determining estimability of linear functions of regression coefficients, and alternative epredict methods for lm, glm, and mlm objects that handle non-estimable cases correctly.
EstimateGroupNetwork Perform the Joint Graphical Lasso and Selects Tuning Parameters
Can be used to simultaneously estimate networks (Gaussian Graphical Models) in data from different groups or classes via Joint Graphical Lasso. Tuning parameters are selected via information criteria (AIC / BIC / eBIC) or crossvalidation.
EstSimPDMP Estimation and Simulation for PDMPs
This package deals with the estimation of the jump rate for piecewise-deterministic Markov processes (PDMPs), from only one observation of the process within a long time. The main functions provide an estimate of this function. The state space may be discrete or continuous. The associated paper has been published in Scandinavian Journal of Statistics and is given in references. Other functions provide a method to simulate random variables from their (conditional) hazard rate, and then to simulate PDMPs.
etrunct Computes Moments of Univariate Truncated t Distribution
Computes moments of univariate truncated t distribution. There is only one exported function, e_trunct(), which should be seen for details.
eulerr Area-Proportional Euler Diagrams
If possible, generates exactly area-proportional Euler diagrams, or otherwise approximately proportional diagrams using numeric optimization. A Euler diagram is a generalization of a Venn diagram, relaxing the criterion that all interactions need to be represented.
EvaluationMeasures Collection of Model Evaluation Measure Functions
Provides Some of the most important evaluation measures for evaluating a model. Just by giving the real and predicted class, measures such as accuracy, sensitivity, specificity, ppv, npv, fmeasure, mcc and … will be returned.
evaluator Information Security Quantified Risk Assessment Toolkit
An open source information security strategic risk analysis toolkit based on the OpenFAIR taxonomy <https://…/C13K> and risk assessment standard <https://…/C13G>. Empowers an organization to perform a quantifiable, repeatable, and data-driven review of its security program.
evclass Evidential Distance-Based Classification
Different evidential distance-based classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule and the evidential neural network.
evclust Evidential Clustering
Various clustering algorithms that produce a credal partition, i.e., a set of Dempster-Shafer mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership uncertainty of the objects. The algorithms are: Evidential c-Means (ECM), Relational Evidential c-Means (RECM), Constrained Evidential c-Means (CECM), EVCLUS and EK-NNclus.
event Event History Procedures and Models
Functions for setting up and analyzing event history data.
evidenceFactors Reporting Tools for Sensitivity Analysis of Evidence Factors in Observational Studies
Integrated Sensitivity Analysis of Evidence Factors in Observational Studies.
Evomorph Evolutionary Morphometric Simulation
Evolutionary process simulation using geometric morphometric data. Manipulation of landmark data files (TPS), shape plotting and distances plotting functions.
evoper Evolutionary Parameter Estimation for ‘Repast Simphony’ Models
The EvoPER, Evolutionary Parameter Estimation for ‘Repast Simphony’ Agent-Based framework, provides optimization driven parameter estimation methods based on evolutionary computation techniques which could be more efficient and require, in some cases, fewer model evaluations than other alternatives relaying on experimental design.
EW Edgeworth Expansion
Edgeworth Expansion calculation.
exampletestr Help for Writing Tests Based on Function Examples
Take the examples written in your documentation of functions and use them to create shells (skeletons which must be manually completed by the user) of test files to be tested with the ‘testthat’ package. Documentation must be done with ‘roxygen2’.
exif Read EXIF Metadata from JPEGs
Extracts Exchangeable Image File Format (EXIF) metadata, such as camera make and model, ISO speed and the date-time the picture was taken on, from JPEG images. Incorporates the ‘easyexif’ (https://…/easyexif ) library.
exifr EXIF Image Data in R
Reads EXIF data using ExifTool <http://…/> and returns results as a data frame. ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. ExifTool supports many different metadata formats including EXIF, GPS, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the maker notes of many digital cameras by Canon, Casio, FLIR, FujiFilm, GE, HP, JVC/Victor, Kodak, Leaf, Minolta/Konica-Minolta, Motorola, Nikon, Nintendo, Olympus/Epson, Panasonic/Leica, Pentax/Asahi, Phase One, Reconyx, Ricoh, Samsung, Sanyo, Sigma/Foveon and Sony.
expandFunctions Feature Matrix Builder
Generates feature matrix outputs from R object inputs using a variety of expansion functions. The generated feature matrices have applications as inputs for a variety of machine learning algorithms. The expansion functions are based on coercing the input to a matrix, treating the columns as features and converting individual columns or combinations into blocks of columns. Currently these include expansion of columns by efficient sparse embedding by vectors of lags, quadratic expansion into squares and unique products, powers by vectors of degree, vectors of orthogonal polynomials functions, and block random affine projection transformations (RAPTs). The transformations are magrittr- and cbind-friendly, and can be used in a building block fashion. For instance, taking the cos() of the output of the RAPT transformation generates a stationary kernel expansion via Bochner’s theorem, and this expansion can then be cbind-ed with other features. Additionally, there are utilities for replacing features, removing rows with NAs, creating matrix samples of a given distribution, a simple wrapper for LASSO with CV, a Freeman-Tukey transform, generalizations of the outer function, matrix size-preserving discrete difference by row, plotting, etc.
ExpDE Modular Differential Evolution for Experimenting with Operators
Modular implementation of the Differential Evolution algorithm for experimenting with different types of operators.
expint Exponential Integral and Incomplete Gamma Function
The exponential integrals E_1(x), E_2(x), E_n(x) and Ei(x), and the incomplete gamma function G(a, x) defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details. A test package included in sub-directory example_API provides an implementation. C routines derived from the GNU Scientific Library <https://…/>.
ExplainPrediction Explanation of Predictions for Classification and Regression Models
Package contains methods to generate explanations for individual predictions of classification and regression models. Weighted averages of individual explanations form explanation of the whole model. The package extends ‘CORElearn’ package, but other prediction models can also be explained using a wrapper.
explor Interactive Interfaces for Results Exploration
Shiny interfaces and graphical functions for multivariate analysis results exploration.
exploreR Tools for Quickly Exploring Data
Simplifies some complicated and labor intensive processes involved in exploring and explaining data. Allows you to quickly and efficiently visualize the interaction between variables and simplifies the process of discovering covariation in your data. Also includes some convenience features designed to remove as much redundant typing as possible.
expm Matrix exponential
Computation of the matrix exponential and related quantities.
expss Some Useful Functions from Spreadsheets and ‘SPSS’ Statistics
Package implements several popular functions from Excel (‘COUNTIF’, ‘VLOOKUP’, etc.) and ‘SPSS’ Statistics (‘RECODE’, ‘COUNT’, etc.). Also there are functions for basic tables with value labels/variable labels support. Package aimed to help people to move data processing from Excel/’SPSS’ to R.
exreport Fast, Reliable and Elegant Reproducible Research
Analysis of experimental results and automatic report generation in both interactive HTML and LaTeX. This package ships with a rich interface for data modeling and built in functions for the rapid application of statistical tests and generation of common plots and tables with publish-ready quality.
EXRQ Extreme Regression of Quantiles
Estimation for high conditional quantiles based on quantile regression.
ExtDist Extending the Range of Functions for Probability Distributions
A consistent, unified and extensible framework for estimation of parameters for probability distributions, including parameter estimation procedures that allow for weighted samples; the current set of distributions included are: the standard beta, The four-parameter beta, Burr, gamma, Gumbel, Johnson SB and SU, Laplace, logistic, normal, symmetric truncated normal, truncated normal, symmetric-reflected truncated beta, standard symmetric-reflected truncated beta, triangular, uniform, and Weibull distributions; decision criteria and selections based on these decision criteria.
exteriorMatch Constructs the Exterior Match from Two Matched Control Groups
If one treated group is matched to one control reservoir in two different ways to produce two sets of treated-control matched pairs, then the two control groups may be entwined, in the sense that some control individuals are in both control groups. The exterior match is used to compare the two control groups.
extracat Categorical Data Analysis and Visualization
Categorical Data Analysis and Visualization.
ExtremeBounds ExtremeBounds: Extreme Bounds Analysis in R
An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and Sala-i-Martin’s versions of EBA, and allows users to customize all aspects of the analysis.
extremefit Estimation of Extreme Conditional Quantiles and Probabilities
Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.
extremeStat Extreme Value Statistics and Quantile Estimation
Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale.
extremogram Estimation of Extreme Value Dependence for Time Series Data
Estimation of the sample univariate, cross and return time extremograms. The package can also adds empirical confidence bands to each of the extremogram plots via a permutation procedure under the assumption that the data are independent. Finally, the stationary bootstrap allows us to construct credible confidence bands for the extremograms.
ezknitr Avoid the Typical Working Directory Pain When Using ‘knitr’
An extension of ‘knitr that adds flexibility in several ways. One common source of frustration with ‘knitr’ is that it assumes the directory where the source file lives should be the working directory, which is often not true. ‘ezknitr’ addresses this problem by giving you complete control over where all the inputs and outputs are, and adds several other convenient features to make rendering markdown/HTML documents easier.
ezsummary Summarise Data in the Quick and Easy Way
Functions that can fulfill the gap between the outcomes of ‘dplyr’ and a print-ready summary table.


fabCI FAB Confidence Intervals
Frequentist assisted by Bayes (FAB) confidence interval construction. See ‘Adaptive multigroup confidence intervals with constant coverage’ by Yu and Hoff <https://…/1612.08287>.
face Fast Covariance Estimation for Sparse Functional Data
Fast covariance estimation for sparse functional data.
factoextra Extract and Visualize the Results of Multivariate Data Analyses
Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including ‘PCA’ (Principal Component Analysis), ‘CA’ (Correspondence Analysis), ‘MCA’ (Multiple Correspondence Analysis), ‘MFA’ (Multiple Factor Analysis) and ‘HMFA’ (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides ‘ggplot2’ – based elegant data visualization.
FactoMineR Multivariate Exploratory Data Analysis and Data Mining
Exploratory data analysis methods such as principal component methods and clustering
factorcpt Simultaneous Change-Point and Factor Analysis
Identifies change-points in the common and the idiosyncratic components via factor modelling.
FactoRizationMachines Machine Learning with Higher-Order Factorization Machines
Implementation of three machine learning approaches: Support Vector Machines (SVM) with a linear kernel, second-order Factorization Machines (FM), and higher-order Factorization Machines (HoFM).
factorstochvol Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models
Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix.
Factoshiny Perform Factorial Analysis from FactoMineR with a Shiny Application
Perform factorial analysis with a menu and draw graphs interactively thanks to FactoMineR and a Shiny application.
faisalconjoint Faisal Conjoint Model: A New Approach to Conjoint Analysis
It is used for systematic analysis of decisions based on attributes and its levels.
fakeR Simulates Data from a Data Frame of Different Variable Types
Generates fake data from a dataset of different variable types. The package contains the functions simulate_dataset and simulate_dataset_ts to simulate time-independent and time-dependent data. It randomly samples character and factor variables from contingency tables and numeric and ordered factors from a multivariate normal distribution. It currently supports the simulation of stationary and zero-inflated count time series.
fancycut A Fancy Version of ‘base::cut’
Provides the function fancycut() which is like cut() except you can mix left open and right open intervals with point values, intervals that are closed on both ends and intervals that are open on both ends.
fanovaGraph Building Kriging Models from FANOVA Graphs
Estimation and plotting of a function’s FANOVA graph to identify the interaction structure and fitting, prediction and simulation of a Kriging model modified by the identified structure. The interactive function plotManipulate() can only be run on the RStudio IDE with RStudio’s package ‘manipulate’ loaded. RStudio is freely available (, and includes package ‘manipulate’. The equivalent function plotTk() bases on CRAN Repository packages only.
fanplot Visualisation of Sequential Probability Distributions Using Fan Charts
Visualise sequential distributions using a range of plotting styles. Sequential distribution data can be input as either simulations or values corresponding to percentiles over time. Plots are added to existing graphic devices using the fan function. Users can choose from four different styles, including fan chart type plots, where a set of coloured polygon, with shadings corresponding to the percentile values are layered to represent different uncertainty levels.
farff A Faster ‘ARFF’ File Reader and Writer
Reads and writes ‘ARFF’ files. ‘ARFF’ (Attribute-Relation File Format) files are like ‘CSV’ files, with a little bit of added meta information in a header and standardized NA values. They are quite often used for machine learning data sets and were introduced for the ‘WEKA’ machine learning ‘Java’ toolbox. See <http://…/ARFF> for further info on ‘ARFF’ and for <http://…/> for more info on ‘WEKA’. ‘farff’ gets rid of the ‘Java’ dependency that ‘RWeka’ enforces, and it is at least a faster reader (for bigger files). It uses ‘readr’ as parser back-end for the data section of the ‘ARFF’ file. Consistency with ‘RWeka’ is tested on ‘Github’ and ‘Travis CI’ with hundreds of ‘ARFF’ files from ‘OpenML’. Note that the ‘OpenML’ package is currently only available from ‘Github’ at: <https://…/openml-r>.
fastAdaboost a Fast Implementation of Adaboost
Implements Adaboost based on C++ backend code. This is blazingly fast and especially useful for large, in memory data sets. The package uses decision trees as weak classifiers. Once the classifiers have been trained, they can be used to predict new data. Currently, we support only binary classification tasks. The package implements the Adaboost.M1 algorithm and the real Adaboost(SAMME.R) algorithm.
FastBandChol Fast Estimation of a Covariance Matrix by Banding the Cholesky Factor
Fast and numerically stable estimation of a covariance matrix by banding the Cholesky factor using a modified Gram-Schmidt algorithm implemented in RcppArmadilo. See <http://…/~molst029> for details on the algorithm.
fastcmh Significant Interval Discovery with Categorical Covariates
A method which uses the Cochran-Mantel-Haenszel test with significant pattern mining to detect intervals in binary genotype data which are significantly associated with a particular phenotype, while accounting for categorical covariates.
fastdigest Fast, Low Memory-Footprint Digests of R Objects
Provides an R interface to Bob Jenkin’s streaming, non-cryptographic ‘SpookyHash’ hash algorithm for use in digest-based comparisons of R objects. ‘fastdigest’ plugs directly into R’s internal serialization machinery, allowing digests of all R objects the serialize() function supports, including reference-style objects via custom hooks. Speed is high and scales linearly by object size; memory usage is constant and negligible.
fasteraster Raster Images Processing and Vector Recognition
If there is a need to recognise edges on a raster image or a bitmap or any kind of a matrix, one can find packages that does only 90 degrees vectorization. Typically the nature of artefact images is linear and can be vectorized in much more efficient way than draw a series of 90 degrees lines. The fasteraster package does recognition of lines using only one pass.
fastGraph Fast Drawing and Shading of Graphs of Statistical Distributions
Provides functionality to produce graphs of probability density functions and cumulative distribution functions with few keystrokes, allows shading under the curve of the probability density function to illustrate concepts such as p-values and critical values, and fits a simple linear regression line on a scatter plot with the equation as the main title.
fastHorseshoe The Elliptical Slice Sampler for Bayesian Horseshoe Regression
The elliptical slice sampler for Bayesian shrinkage linear regression, such as horseshoe, double-exponential and user specific priors.
FastKM A Fast Multiple-Kernel Method Based on a Low-Rank Approximation
A computationally efficient and statistically rigorous fast Kernel Machine method for multi-kernel analysis. The approach is based on a low-rank approximation to the nuisance effect kernel matrices. The algorithm is applicable to continuous, binary, and survival traits and is implemented using the existing single-kernel analysis software ‘SKAT’ and ‘coxKM’. ‘coxKM’ can be obtained from http://…/software.html.
FastKNN Fast k-Nearest Neighbors
Compute labels for a test set according to the k-Nearest Neighbors classification. This is a fast way to do k-Nearest Neighbors classification because the distance matrix -between the features of the observations- is an input to the function rather than being calculated in the function itself every time.
fastLSU Fast Linear Step Up Procedure of Benjamini-Hochberg FDR Method for Huge-Scale Testing Problems
An efficient algorithm to apply the Benjamini-Hochberg Linear Step Up FDR controlling procedure in huge-scale testing problems (proposed in Vered Madar and Sandra Batista(2016) <DOI:10.1093/bioinformatics/btw029>). Unlike ‘BH’ method, the package does not require any p value ordering. Besides, it permits separating p values arbitrarily into computationally feasible chunks of arbitrary size and produces the same results as those from applying linear step up BH procedure to the entire set of tests.
fastnet Large-Scale Social Network Analysis
We present an implementation of the algorithms required to simulate large-scale social networks and retrieve their most relevant metrics.
fastpseudo Fast Pseudo Observations
Computes pseudo-observations for survival analysis on right-censored data based on restricted mean survival time.
fasttime Fast Utility Function for Time Parsing and Conversion
Fast functions for timestamp manipulation that avoid system calls and take shortcuts to facilitate operations on very large data.
fauxpas HTTP Error Helpers
HTTP error helpers. Methods included for general purpose HTTP error handling, as well as individual methods for every HTTP status code, both via status code numbers as well as their descriptive names. Supports ability to adjust behavior to stop, message or warning. Includes ability to use custom whisker template to have any configuration of status code, short description, and verbose message. Currently supports integration with ‘crul’, ‘curl’, and ‘httr’.
fbRads Analyzing and Managing Facebook Ads from R
Wrapper functions around the Facebook Marketing ‘API’ to create, read, update and delete custom audiences, images, campaigns, ad sets, ads and related content.
fbroc Fast Algorithms to Bootstrap ROC Curves
Implements a very fast C++ algorithm to quickly bootstrap ROC Curves and derived performance metrics (e.g. AUC). You can also plot the results and calculate confidence intervals. Currently the calculation of 100000 bootstrap replicates for 500 observations takes about one second.
FCMapper Fuzzy Cognitive Mapping
Provides several functions to create and manipulate fuzzy cognitive maps. It is based on FCMapper for Excel, distributed at http://…/joomla , developed by Michael Bachhofer and Martin Wildenberg. Maps are inputted as adjacency matrices. Attributes of the maps and the equilibrium values of the concepts (including with user-defined constrained values) can be calculated. The maps can be graphed with a function that calls “igraph”. Multiple maps with shared concepts can be aggregated.
FCNN4R Fast Compressed Neural Networks for R
The FCNN4R package provides an interface to kernel routines from the FCNN C++ library. FCNN is based on a completely new Artificial Neural Network representation that offers unmatched efficiency, modularity, and extensibility. FCNN4R provides standard teaching (backpropagation, Rprop) and pruning algorithms (minimum magnitude, Optimal Brain Surgeon), but it is first and foremost an efficient computational engine. Users can easily implement their algorithms by taking advantage of fast gradient computing routines, as well as network reconstruction functionality (removing weights and redundant neurons).
fdapace Functional Data Analysis and Empirical Dynamics
Provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. PACE is useful for the analysis of data that have been generated by a sample of underlying (but usually not fully observed) random trajectories. It does not rely on pre-smoothing of trajectories, which is problematic if functional data are sparsely sampled. PACE provides options for functional regression and correlation, for Longitudinal Data Analysis, the analysis of stochastic processes from samples of realized trajectories, and for the analysis of underlying dynamics. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ ‘glue’.
fdaPDE Regression with Partial Differential Regularizations, using the Finite Element Method
An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization.
FDboost Boosting Functional Regression Models
Regression models for functional data, i.e. scalar-on-function, function-on-scalar and function-on-function regression models are fitted using a component-wise gradient boosting algorithm.
fdcov Analysis of Covariance Operators
Provides a variety of tools for the analysis of covariance operators.
FDRsampsize Compute Sample Size that Meets Requirements for Average Power and FDR
Defines a collection of functions to compute average power and sample size for studies that use the false discovery rate as the final measure of statistical significance.
FeaLect Scores Features for Feature Selection
For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models. Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
FeatureHashing Implement Feature Hashing on Model Matrix
Feature hashing, also called as the hashing trick, is a method to transform features to vector. Without looking the indices up in an associative array, it applies a hash function to the features and uses their hash values as indices directly. This package implements the method of feature hashing proposed in Weinberger et. al. (2009) with Murmurhash3 and provides a formula interface in R. See the for more information.
FedData Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources
Functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package allows for retrieval of four datasets: The National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS); The National Hydrography Dataset (USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; and the Global Historical Climatology Network (GHCN), coordinated by National Climatic Data Center at NOAA. Additional data sources are in the works, including global DEM resources (ETOPO1, ETOPO5, ETOPO30, SRTM), global soils (HWSD), tree-ring records (ITRDB), MODIS satellite data products, the National Atlas (US), Natural Earth, PRISM, and WorldClim.
feedeR Read RSS/Atom Feeds from R
Retrieve data from RSS/Atom feeds.
FENmlm Fixed Effects Nonlinear Maximum Likelihood Models
Efficient estimation of fixed-effect maximum likelihood models with, possibly, non-linear right hand sides.
ffstream Forgetting Factor Methods for Change Detection in Streaming Data
An implementation of the adaptive forgetting factor scheme described in Bodenham and Adams (2016) <doi:10.1007/s11222-016-9684-8> which adaptively estimates the mean and variance of a stream in order to detect multiple changepoints in streaming data. The implementation is in C++ and uses Rcpp. Additionally, implementations of the fixed forgetting factor scheme from the same paper, as well as the classic CUSUM and EWMA methods, are included.
FFTrees Generate, Visualise, and Compare Fast and Frugal Decision Trees (FFTs)
Fast and Frugal Trees (FFTs) are very simply decision trees for classifying cases (i.e.; breast cancer patients) into one of two classes (e.g.; no cancer vs. true cancer). FFTs can be preferable to more complex algorithms (such as logistic regression) because they are easy to communicate and implement, and are robust against noisy data. This package contains several functions that allow users to input their own data, set model criteria and visualize the best tree(s) for their data.
fheatmap Draw Heatmaps with Colored Dendogram
R function to plot high quality, elegant heatmap using ‘ggplot2’ graphics . Some of the important features of this package are, coloring of row/column side tree with respect to the number of user defined cuts in the cluster, add annotations to both columns and rows, option to input annotation palette for tree and column annotations and multiple parameters to modify aesthetics (style, color, font) of texts in the plot.
fiery A Lightweight and Flexible Web Framework
A very flexible framework for building server side logic in R. The framework is unoppinionated when it comes to how HTTP requests and WebSocket messages are handled and supports all levels of app complexity; from serving static content to full-blown dynamic web-apps. Fiery does not hold your hand as much as e.g. the shiny package does, but instead sets you free to create your web app the way you want.
filematrix File-Backed Matrix Class with Convenient Read and Write Access
Interface for working with large matrices stored in files, not in computer memory. Supports multiple data types (double, integer, logical and raw) of different sizes (e.g. 4, 2, or 1 byte integers). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. Supports very large matrices (tested on 1 terabyte matrix), allowing for more than 2^32 rows or columns. Cross-platform as the package has R code only, no C/C++.
filenamer Easy Management of File Names
Create descriptive file names with ease. New file names are automatically (but optionally) time stamped and placed in date stamped directories. Streamline your analysis pipeline with input and output file names that have informative tags and proper file extensions.
fileplyr Chunk Processing or Split-Apply-Combine on Delimited Files(CSV Etc)
Perform chunk processing or split-apply-combine on data in a delimited file(example: CSV) across multiple cores of a single machine with low memory footprint. These functions are a convenient wrapper over the versatile package ‘datadr’.
filesstrings Handy String and File Manipulation
Handy string and file processing and manipulation tools. Built on top of the functionality of base and ‘stringr’. Good for those who like to do all of their file and string manipulation from within R.
FinAna Financial Analysis and Regression Diagnostic Analysis
Functions for regression analysis and financial modeling, including batch graphs generation, beta calculation, descriptive statistics.
findviews A View Generator for Multidimensional Data
A tool to explore wide data sets, by detecting, ranking and plotting groups of statistically dependent columns.
finreportr Financial Data from U.S. Securities and Exchange Commission
Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://…/companysearch.html> for more information.
fitur Fit Univariate Distributions
Wrapper for computing parameters and then assigning to distribution function families.
fixedTimeEvents The Distribution of Distances Between Discrete Events in Fixed Time
Distribution functions and test for over-representation of short distances in the Liland distribution. Simulation functions are included for comparison.
FixSeqMTP Fixed Sequence Multiple Testing Procedures
Generalized Fixed Sequence Multiple Testing Procedures (g-FSMTPs) are used to test a sequence of pre- ordered hypotheses. The proposed three Family-wise Error Rate (FWER) controlling g-FSMTPs utilize numbers of rejections and acceptances, all the procedures are designed under arbitrary dependence. The proposed two False Discovery Rate (FDR) controlling g-FSMTPs allows more but a given number of acceptances (k>=1), the procedures are designed for arbitrary dependence and independence. The main functions for each proposed g-FSMTPs are designed to calculate adjusted p-values and critical values, respectively. For users’ convenience, the output results also include the option of decision rules for convenience.
flacco Feature-Based Landscape Analysis of Continuous and Constraint Optimization Problems
Contains tools and features, which can be used for an exploratory landscape analysis of continuous optimization problems. Those are able to quantify rather complex properties, such as the global structure, separability, etc., of continuous optimization problems.
flare Family of Lasso Regression
The package ‘flare’ provides the implementation of a family of Lasso variants including Dantzig Selector, LAD Lasso, SQRT Lasso, Lq Lasso for estimating high dimensional sparse linear model. We adopt the alternating direction method of multipliers and convert the original optimization problem into a sequential L1 penalized least square minimization problem, which can be efficiently solved by linearization algorithm. A multi-stage screening approach is adopted for further acceleration. Besides the sparse linear model estimation, we also provide the extension of these Lasso variants to sparse Gaussian graphical model estimation including TIGER and CLIME using either L1 or adaptive penalty. Missing values can be tolerated for Dantzig selector and CLIME. The computation is memory-optimized using the sparse matrix output.
flars Functional LARS
Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors.
FlexDir Tools to Work with the Flexible Dirichlet Distribution
Provides tools to work with the Flexible Dirichlet distribution. The main features are an E-M algorithm for computing the maximum likelihood estimate of the parameter vector and a function based on conditional bootstrap to estimate its asymptotic variance-covariance matrix. It contains also functions to plot graphs, to generate random observations and to handle compositional data.
FlexParamCurve Tools to Fit Flexible Parametric Curves
Model selection tools and ‘selfStart’ functions to fit parametric curves in ‘nls’, ‘nlsList’ and ‘nlme’ frameworks.
flexPM Flexible Parametric Models for Censored and Truncated Data
Estimation of flexible parametric models for survival data.
flexrsurv Flexible Relative Survival
Perform relative survival analyses using approaches described in Remontet et al. (2007) <DOI:10.1002/sim.2656> and Mahboubi et al. (2011) <DOI:10.1002/sim.4208>. It implements non-linear, non-proportional effects and both non proportional and non linear effects using splines (B-spline and truncated power basis).
flifo Don’t Get Stuck with Stacks in R
Functions to create and manipulate FIFO (First In First Out), LIFO (Last In First Out), and NINO (Not In or Never Out) stacks in R.
FLIM Farewell’s Linear Increments Model
FLIM fits linear models for the observed increments in a longitudinal dataset, and imputes missing values according to the models.
flock Process Synchronization Using File Locks
Implements synchronization between R processes (spawned by using the ‘parallel’ package for instance) using file locks. Supports both exclusive and shared locking.
flowr Streamlining Design and Deployment of Complex Workflows
An interface to streamline design of complex workflows and their deployment to a High Performance Computing Cluster.
flows Flow Selection and Analysis
Selections on flow matrices, statistics on selected flows, map and graph visualisations.
fmbasics Financial Market Building Blocks
Implements basic financial market objects like currencies, currency pairs, interest rates and interest rate indices. You will be able to use Benchmark instances of these objects which have been defined using their most common conventions or those defined by International Swap Dealer Association (ISDA, <> ) legal documentation.
FMC Factorial Experiments with Minimum Level Changes
Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs.
fmrs Variable Selection in Finite Mixture of AFT Regression and FMR
Provides parameter estimation as well as variable selection in Finite Mixture of Accelerated Failure Time Regression Models and Finite Mixture of Regression models. It also provides the Ridge regression and Elastic Net.
FMsmsnReg Regression Models with Finite Mixtures of Skew Heavy-Tailed Errors
Fit linear regression models where the random errors follow a finite mixture of of Skew Heavy-Tailed Errors.
foghorn Summarizes CRAN Check Results in the Terminal
The CRAN check results in your R terminal.
fontquiver Set of Installed Fonts
Provides a set of fonts with permissive licences. This is useful when you want to avoid system fonts to make sure your outputs are reproducible.
forcats Tools for Working with Categorical Variables (Factors)
Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, anonymising, and manually recoding).
foreach Foreach looping construct for R
Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
ForecastCombinations Forecast Combinations
Aim: Supports the most frequently used methods to combine forecasts. Among others: Simple average, Ordinary Least Squares, Least Absolute Deviation, Constrained Least Squares, Variance-based, Best Individual model, Complete subset regressions and Information-theoretic (information criteria based).
forecastHybrid Convenient Functions for Ensemble Time Series Forecasts
Convenient functions for ensemble forecasts in R combining approaches from the ‘forecast’ package. Forecasts generated from auto.arima(), ets(), nnetar(), stlm(), and tbats() can be combined with equal weights or weights based on in-sample errors. Future methods such as cross validation are planned.
forecastSNSTS Forecasting for Stationary and Non-Stationary Time Series
Methods to compute linear h-step prediction coefficients based on localised and iterated Yule-Walker estimates and empirical mean square prediction errors for the resulting predictors.
forecTheta Forecasting Time Series by Theta Method
Routines for forecasting univariate time series using Theta Method and Optimised Theta Method (Fioruci et al, 2015). Contains two cross-validation routines of Tashman (2000).
forega Floating-Point Genetic Algorithms with Statistical Forecast Based Inheritance Operator
The implemented algorithm performs a floating-point genetic algorithm search with a statistical forecasting operator that generates offspring which probably will be generated in future generations. Use of this operator enhances the search capabilities of floating-point genetic algorithms because offspring generated by usual genetic operators rapidly forecasted before performing more generations.
forestFloor Visualizes Random Forests with Feature Contributions
Enables user to form appropriate visualization of high dimensional mapping curvature of random forests.
forestinventory Design-Based Global and Small-Area Estimations for Multiphase Forest Inventories
Extensive global and small-area estimation procedures for multiphase forest inventories under the design-based Monte-Carlo approach are provided. The implementation includes estimators for simple and cluster sampling published by Daniel Mandallaz in 2007 (<DOI:10.1201/9781584889779>), 2013 (<DOI:10.1139/cjfr-2012-0381>, <DOI:10.1139/cjfr-2013-0181>, <DOI:10.1139/cjfr-2013-0449>, <DOI:10.3929/ethz-a-009990020>) and 2016 (<DOI:10.3929/ethz-a-010579388>). It provides point estimates, their external- and design-based variances as well as confidence intervals. The procedures have also been optimized for the use of remote sensing data as auxiliary information.
forestmodel Forest Plots from Regression Models
Produces forest plots using ‘ggplot2’ from models produced by functions such as stats::lm(), stats::glm() and survival::coxph().
forestplot Advanced Forest Plot Using ‘grid’ Graphics
The plot allows for multiple confidence intervals per row, custom fonts for each text element, custom confidence intervals, text mixed with expressions, and more. The aim is to extend the use of forest plots beyond meta-analyses. This is a more general version of the original ‘rmeta’ package’s forestplot function and relies heavily on the ‘grid’ package.
ForestTools Analysing Remotely Sensed Forest Data
Forest Tools provides functions for analyzing remotely sensed forest data.
formattable Formattable Data Structures
Provides functions to create formattable vectors and data frames. Formattable vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in markdown to improve the readability of data presented in tabular form rendered as web pages.
forward Forward search
Forward search approach to robust analysis in linear and generalized linear regression models.
ForwardSearch Forward Search using asymptotic theory
Forward Search analysis of time series regressions. Implements the asymptotic theory developed in Johansen and Nielsen (2013, 2014).
fourierin Computes Numeric Fourier Integrals
Computes Fourier integrals of functions of one and two variables using the Fast Fourier transform. The Fourier transforms must be evaluated on a regular grid.
fourPNO Bayesian 4 Parameter Item Response Model
Estimate Lord & Barton’s four parameter IRT model with lower and upper asymptotes using Bayesian formulation described by Culpepper (2015).
fpa Spatio-Temporal Fixation Pattern Analysis
Spatio-temporal Fixation Pattern Analysis (FPA) is a new method of analyzing eye movement data, developed by Mr. Jinlu Cao under the supervision of Prof. Chen Hsuan-Chih at The Chinese University of Hong Kong, and Prof. Wang Suiping at the South China Normal Univeristy. The package ‘fpa’ is a R implementation which makes FPA analysis much easier. There are four major functions in the package: ft2fp(), get_pattern(), plot_pattern(), and lineplot(). The function ft2fp() is the core function, which can complete all the preprocessing within moments. The other three functions are supportive functions which visualize the eye fixation patterns.
FPCA2D Two Dimensional Functional Principal Component Analysis
Compute the two dimension functional principal component scores for a series of two dimension images.
fpCompare Reliable Comparison of Floating Point Numbers
Comparisons of floating point numbers are problematic due to errors associated with the binary representation of decimal numbers. Despite being aware of these problems, people still use numerical methods that fail to account for these and other rounding errors (this pitfall is the first to be highlighted in Circle 1 of Burns (2012, http://…/R_inferno.pdf ). This package provides four new relational operators useful for performing floating point number comparisons with a set tolerance.
FPDclustering PD-Clustering and Factor PD-Clustering
Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PD-clustering is a flexible method that can be used with non-spherical clusters, outliers, or noisy data. Facto PD-clustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It allows clustering of high dimensional data sets.
fractional Vulgar Fractions in R
The main function of this package allows numerical vector objects to be displayed with their values in vulgar fractional form. This is convenient if patterns can then be more easily detected. In some cases replacing the components of a numeric vector by a rational approximation can also be expected to remove some component of round-off error. The main functions form a re-implementation of the functions ‘fractions’ and ‘rational’ of the MASS package, but using a radically improved programming strategy.
fragilityindex Fragility Index
Implements the fragility index calculation for dichotomous results as described in Walsh, Srinathan, McAuley. Mrkobrada, Levine, Ribic, Molnar, Dattani, Burke, Guyatt, Thabane, Walter, Pogue and Devereaux PJ (2014) <DOI:10.1016/j.jclinepi.2013.10.019>.
frailtyEM Fitting Frailty Models with the EM Algorithm
Contains functions for fitting shared frailty models with a semi-parametric baseline hazard with the Expectation-Maximization algorithm. Supported data formats include clustered failures with left truncation and recurrent events in gap-time or Andersen-Gill format. Several frailty distributions, such as the the gamma, positive stable and the Power Variance Family are supported.
frailtySurv General Semiparametric Shared Frailty Model
Simulates and fits semiparametric shared frailty models under a wide range of frailty distributions using a consistent and asymptotically-normal estimator. Currently supports: gamma, power variance function, log-normal, and inverse Gaussian frailty models.
franc Detect the Language of Text
With no external dependencies and support for 335 languages; all languages spoken by more than one million speakers. ‘Franc’ is a port of the ‘JavaScript’ project of the same name, see <https://…/franc>.
frbs Fuzzy Rule-Based Systems for Classification and Regression Tasks
An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.
frbs: Fuzzy Rule-Based Systems for Classification and Regression in R
freqdist Frequency Distribution
Generates a frequency distribution. The frequency distribution includes raw frequencies, percentages in each category, and cumulative frequencies. The frequency distribution can be stored as a data frame.
freqdom Frequency Domain Analysis for Multivariate Time Series
Methods for the analysis of multivariate time series using frequency domain techniques. Implementations of dynamic principle components analysis (DPCA) and estimators of operators in lagged regression. Examples of usage in functional data analysis setup.
FreqProf Frequency Profiles Computing and Plotting
Tools for generating an informative type of line graph, the frequency profile, which allows single behaviors, multiple behaviors, or the specific behavioral patterns of individual subjects to be graphed from occurrence/nonoccurrence behavioral data.
FRK Fixed Rank Kriging
Fixed Rank Kriging is a tool for spatial/spatio-temporal modelling and prediction with large datasets. The approach, discussed in Cressie and Johannesson (2008), decomposes the field, and hence the covariance function, using a fixed set of n basis functions, where n is typically much smaller than the number of data points (or polygons) m. The method naturally allows for non-stationary, anisotropic covariance functions and the use of observations with varying support (with known error variance). The projected field is a key building block of the Spatial Random Effects (SRE) model, on which this package is based. The package FRK provides helper functions to model, fit, and predict using an SRE with relative ease. Reference: Cressie, N. and Johannesson, G. (2008) <DOI:10.1111/j.1467-9868.2007.00633.x>.
fromo Fast Robust Moments
Fast computation of moments via ‘Rcpp’. Supports computation on vectors and matrices, and Monoidal append of moments.
FSelectorRcpp Rcpp’ Implementation of ‘FSelector’ Entropy-Based Feature Selection Algorithms with a Sparse Matrix Support
Rcpp’ (free of ‘Java’/’Weka’) implementation of ‘FSelector’ entropy-based feature selection algorithms with a sparse matrix support. It is also equipped with a parallel backend.
FSInteract Fast Searches for Interactions
Performs fast detection of interactions in large-scale data using the method of random intersection trees introduced in ‘Shah, R. D. and Meinshausen, N. (2014) Random Intersection Trees’. The algorithm finds potentially high-order interactions in high-dimensional binary two-class classification data, without requiring lower order interactions to be informative. The search is particularly fast when the matrices of predictors are sparse. It can also be used to perform market basket analysis when supplied with a single binary data matrix. Here it will find collections of columns which for many rows contain all 1’s.
fst Lightning Fast Serialization of Data Frames for R
Read and write data frames at high speed. Compress your data with fast and efficient type-optimized algorithms that allow for random access of stored data frames (columns and rows).
FTRLProximal FTRL Proximal Implementation for Elastic Net Regression
Implementation of Follow The Regularized Leader (FTRL) Proximal algorithm used for online training of large scale regression models using a mixture of L1 and L2 regularization.
ftsspec Spectral Density Estimation and Comparison for Functional Time Series
Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.
fullfact Full Factorial Breeding Analysis
Package for the analysis of full factorial breeding designs.
fulltext Full Text of ‘Scholarly’ Articles Across Many Data Sources
Provides a single interface to many sources of full text ‘scholarly’ data, including ‘Biomed Central’, Public Library of Science, ‘Pubmed Central’, ‘eLife’, ‘F1000Research’, ‘PeerJ’, ‘Pensoft’, ‘Hindawi’, ‘arXiv’ ‘preprints’, and more. Functionality included for searching for articles, downloading full or partial text, converting to various data formats used in and outside of R.
funchir Convenience Functions by Michael Chirico
A set of functions, some subset of which I use in every .R file I write. Examples are table2(), which adds useful functionalities to base table (sorting, built-in proportion argument, etc.); lyx.xtable(), which converts xtable() output to a format more easily copy-pasted into LyX; pdf2(), which writes a plot to file while also displaying it in the RStudio plot window; and abbr_to_colClass(), which is a much more concise way of feeding many types to a colClass argument in a data reader.
functools Extending Functional Programming in R
Extending functional programming in R by providing support to the usual higher order functional suspects (Map, Reduce, Filter, etc.).
funcy Functional Clustering Algorithms
Unified framework to cluster functional data according to one of seven models. All models are based on the projection of the curves onto a basis. The main function funcit() calls wrapper functions for the existing algorithms, so that input parameters are the same. A list is returned with each entry representing the same or extended output for the corresponding method. Method specific as well as general visualization tools are available.
funData An S4 Class for Functional Data
S4 classes for univariate and multivariate functional data with utility functions.
funFEM Clustering in the Discriminative Functional Subspace
The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.
fungible Fungible Coefficients and Monte Carlo Functions
Functions for computing fungible coefficients and Monte Carlo data.
funHDDC Model-based clustering in group-specific functional subspaces
The package provides the funHDDC algorithm (Bouveyron & Jacques, 2011) which allows to cluster functional data by modeling each group within a specific functional subspace.
funModeling Learning Data Cleaning, Visual Analysis and Model Performance
Learn data cleaning, visual data analysis and model performance assessment (KS, AUC and ROC), package core is in the vignette documentation explaining last topics as a tutorial.
funr Simple Utility Providing Terminal Access to all R Functions
A small utility which wraps Rscript and provides access to all R functions from the shell.
funrar Functional Rarity Indices Computation
Computes functional rarity indices as proposed by Violle et al (in revision). Various indices can be computed using both regional and local information. Functional Rarity combines both the functional aspect of rarity as well as the extent aspect of rarity.
FUNTA Functional Tangential Angle Pseudo-Depth
Computes the functional tangential angle pseudo-depth and its robustified version from the paper by Kuhnt and Rehage (2016). See Kuhnt, S.; Rehage, A. (2016): An angle-based multivariate functional pseudo-depth for shape outlier detection, JMVA 146, 325-340, <doi:10.1016/j.jmva.2015.10.016> for details.
funtimes Functions for Time Series Analysis
Includes non-parametric estimators and tests for time series analysis. The functions allow to test for presence of possibly non-monotonic trends and for synchronism of trends in multiple time series, using modern bootstrap techniques and robust non-parametric difference-based estimators.
future A Future API for R
A Future API for R is provided. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available. Futures are useful constructs in for instance concurrent evaluation, e.g. multicore parallel processing and distributed processing on compute clusters. The purpose of this package is to provide a lightweight interface for using futures in R. Functions ‘future()’ and ‘value()’ exist for creating futures and requesting their values. An infix assignment operator ‘%<=%’ exists for creating futures whose values are accessible by the assigned variables (as promises). This package implements the synchronous ‘lazy’ and ‘eager’ futures, and the asynchronous ‘multicore’ future (not on Windows). Additional types of futures are provided by other packages enhancing this package.
A Future API for R
future.BatchJobs A Future for BatchJobs
Simple parallel and distributed processing using futures that utilizes the ‘BatchJobs’ framework, e.g. ‘fit %<-% {, y) }’. This package implements the Future API of the ‘future’ package.
fuzzr Fuzz-Test R Functions
Test function arguments with a wide array of inputs, and produce reports summarizing messages, warnings, errors, and returned values.
Fuzzy.p.value Computing Fuzzy p-Value
The main goal of this package is drawing the membership function of the fuzzy p-value which is defined as a fuzzy set on the unit interval for three following problems: (1) testing crisp hypotheses based on fuzzy data, (2) testing fuzzy hypotheses based on crisp data, and (3) testing fuzzy hypotheses based on fuzzy data. In all cases, the fuzziness of data or/and the fuzziness of the boundary of null fuzzy hypothesis transported via the p-value function and causes to produce the fuzzy p-value. If the p-value is fuzzy, it is more appropriate to consider a fuzzy significance level for the problem. Therefore, the comparison of the fuzzy p-value and the fuzzy significance level is evaluated by a fuzzy ranking method in this package.
FuzzyAHP (Fuzzy) AHP Calculation
Calculation of AHP (Analytic Hierarchy Process – <http://…/Analytic_hierarchy_process> ) with classic and fuzzy weights based on Saaty’s pairwise comparison method for determination of weights.
fuzzyforest Fuzzy Forests
Fuzzy forests, a new algorithm based on random forests, is designed to reduce the bias seen in random forest feature selection caused by the presence of correlated features. Fuzzy forests uses recursive feature elimination random forests to select features from separate blocks of correlated features where the correlation within each block of features is high and the correlation between blocks of features is low. One final random forest is fit using the surviving features. This package fits random forests using the ‘randomForest’ package and allows for easy use of ‘WGCNA’ to split features into distinct blocks.
fuzzyjoin Join Tables Together on Inexact Matching
Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.
FuzzyLP Fuzzy Linear Programming
Methods to solve Fuzzy Linear Programming Problems with fuzzy constraints (by Verdegay, Zimmermann, Werner, Tanaka), fuzzy costs (multiobjective, interval arithmetic, stratified piecewise reduction, defuzzification-based), and fuzzy technological matrix.
FuzzyMCDM Multi-Criteria Decision Making Methods for Fuzzy Data
Implementation of several MCDM methods for fuzzy data (triangular fuzzy numbers) for decision making problems. The methods that are implemented in this package are Fuzzy TOPSIS (with two normalization procedures), Fuzzy VIKOR, Fuzzy Multi-MOORA and Fuzzy WASPAS. In addition, function MetaRanking() calculates a new ranking from the sum of the rankings calculated, as well as an aggregated ranking.
FuzzyNumbers.Ext.2 Apply Two Fuzzy Numbers on a Monotone Function
One can easily draw the membership function of f(x,y) by package ‘FuzzyNumbers.Ext.2’ in which f(.,.) is supposed monotone and x and y are two fuzzy numbers. This work is possible using function f2apply() which is an extension of function fapply() from Package ‘FuzzyNumbers’ for two-variable monotone functions.
FuzzyR Fuzzy Logic Toolkit for R
Design and simulate fuzzy logic systems using Type 1 Fuzzy Logic. This toolkit includes with graphical user interface (GUI) and an adaptive neuro-fuzzy inference system (ANFIS). This toolkit is a continuation from the previous package (‘FuzzyToolkitUoN’). Produced by the Intelligent Modelling & Analysis Group, University of Nottingham.
FuzzyStatTra Statistical Methods for Trapezoidal Fuzzy Numbers
The aim of the package is to provide some basic functions for doing statistics with trapezoidal fuzzy numbers. In particular, the package contains several functions for simulating trapezoidal fuzzy numbers, as well as for calculating some central tendency measures (mean and two types of median), some scale measures (variance, ADD, MDD, Sn, Qn, Tn and some M-estimators) and one diversity index and one inequality index. Moreover, functions for calculating the 1-norm distance, the mid/spr distance and the (phi,theta)-wabl/ldev/rdev distance between fuzzy numbers are included, and a function to calculate the value phi-wabl given a sample of trapezoidal fuzzy numbers.


GAabbreviate Abbreviating Questionnaires (or Other Measures) Using Genetic Algorithms
The GAabbreviate uses Genetic Algorithms as an optimization tool to create abbreviated forms of lengthy questionnaires (or other measures) that maximally capture the variance in the original data of the long form of the measure.
gafit Genetic Algorithm for Curve Fitting
A group of sample points are evaluated against a user-defined expression, the sample points are lists of parameters with values that may be substituted into that expression. The genetic algorithm attempts to make the result of the expression as low as possible (usually this would be the sum of residuals squared).
gains Gains Table Package
This package constructs gains tables and lift charts for prediction algorithms. Gains tables and lift charts are commonly used in direct marketing applications.
gamCopula Generalized Additive Models for Bivariate Conditional Dependence Structures and Vine Copulas
Implementation of various inference and simulation tools to apply generalized additive models to bivariate dependence structures and non-simplified vine copulas.
GAMens Applies GAMbag, GAMrsm and GAMens Ensemble Classifiers for Binary Classification
Ensemble classifiers based upon generalized additive models for binary classification (De Bock et al. (2010) <DOI:10.1016/j.csda.2009.12.013>). The ensembles implement Bagging (Breiman (1996) <DOI:10.1023/A:1018054314350>), the Random Subspace Method (Ho (1998) <DOI:10.1109/34.709601>), or both, and use Hastie and Tibshirani’s (1990) generalized additive models (GAMs) as base classifiers. Once an ensemble classifier has been trained, it can be used for predictions on new data. A function for cross validation is also included.
GameTheory Cooperative Game Theory
Implementation of a common set of punctual solutions for Cooperative Game Theory.
GameTheoryAllocation Tools for Calculating Allocations in Game Theory
Many situations can be modeled as game theoretic situations. Some procedures are included in this package to calculate the most important allocations rules in Game Theory: Shapley value, Owen value or nucleolus, among other. First, we must define as an argument the value of the unions of the envolved agents with the characteristic function.
gammSlice Generalized additive mixed model analysis via slice sampling
Uses a slice sampling-based Markov chain Monte Carlo to conduct Bayesian fitting and inference for generalized additive mixed models (GAMM). Generalized linear mixed models and generalized additive models are also handled as special cases of GAMM.
gamreg Robust and Sparse Regression via Gamma-Divergence
Robust regression via gamma-divergence with L1, elastic net and ridge.
gamsel Fit Regularization Path for Generalized Additive Models
Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families.
GAR Authorize and Request Google Analytics Data
The functions included are used to obtain initial authentication with Google Analytics as well as simple and organized data retrieval from the API. Allows for retrieval from multiple profiles at once.
GAS Generalized Autoregressive Score Models
Simulate, Estimate and Forecast using univariate and multivariate GAS models.
gaselect Genetic Algorithm (GA) for Variable Selection from High-Dimensional Data
Provides a genetic algorithm for finding variable subsets in high dimensional data with high prediction performance. The genetic algorithm can use ordinary least squares (OLS) regression models or partial least squares (PLS) regression models to evaluate the prediction power of variable subsets. By supporting different cross-validation schemes, the user can fine-tune the tradeoff between speed and quality of the solution.
gatepoints Easily Gate or Select Points on a Scatter Plot
Allows user to choose/gate a region on the plot and returns points within it.
gbm Generalized Boosted Regression Models
An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart).
gbp A Bin Packing Problem Solver
Basic infrastructure and several algorithms for 1d-4d bin packing problem. This package provides a set of c-level classes and solvers for 1d-4d bin packing problem, and an r-level solver for 4d bin packing problem, which is a wrapper over the c-level 4d bin packing problem solver. The 4d bin packing problem solver aims to solve bin packing problem, a.k.a container loading problem, with an additional constraint on weight. Given a set of rectangular-shaped items, and a set of rectangular-shaped bins with weight limit, the solver looks for an orthogonal packing solution such that minimizes the number of bins and maximize volume utilization. Each rectangular-shaped item i = 1, .. , n is characterized by length l_i, depth d_i, height h_i, and weight w_i, and each rectangular-shaped bin j = 1, .. , m is specified similarly by length l_j, depth d_j, height h_j, and weight limit w_j. The item can be rotated into any orthogonal direction, and no further restrictions implied.
gbts Hyperparameter Search for Gradient Boosted Trees
An implementation of hyperparameter optimization for Gradient Boosted Trees on binary classification and regression problems. The current version provides two optimization methods: active learning and random search.
GCalignR Simple Peak Alignment for Gas-Chromatography Data
Aligns chromatography peaks with a three step algorithm: (1) Linear transformation of retention times to maximise shared peaks among samples (2) Align peaks within a certain error-interval (3) Merges rows that are likely representing the same substance (i.e. no sample shows peaks in both rows and the rows have similar retention time means). The method was first described in Stoffel et al. (2015) <doi:10.1073/pnas.1506076112>.
gcerisk Generalized Competing Event Model
Generalized competing event model based on Cox PH model and Fine-Gray model. This function is designed to develop optimized risk-stratification methods for competing risks data, such as described in: 1. Carmona R, Gulaya S, Murphy JD, Rose BS, Wu J, Noticewala S,McHale MT, Yashar CM, Vaida F, and Mell LK.(2014) Validated competing event model for thestage I-II endometrial cancer population. Int J Radiat Oncol Biol Phys.89:888-98. <DOI:10.1016/j.ijrobp.2014.03.047>. 2. Carmona R, Zakeri K, Green G, Hwang L, Gulaya S, Xu B, Verma R, Williamson CW, Triplett DP, Rose BS, Shen H, Vaida F, Murphy JD, and Mell LK. (2016) Improved method to stratify elderly cancer patients at risk for competing events. J Clin press. <DOI:10.1200/JCO.2015.65.0739>.
gcKrig Analyze and Interpolate Geostatistical Count Data using Gaussian Copula
Provides a variety of functions to analyze and model geostatistical count data with Gaussian copulas, including 1) data simulation and visualization; 2) correlation structure assessment (here also known as the NORTA); 3) calculate multivariate normal rectangle probabilities; 4) likelihood inference and parallel prediction at unsampled locations.
GDAtools A toolbox for the analysis of categorical data in social sciences, and especially Geometric Data Analysis
This package contains functions for ‘specific’ MCA (Multiple Correspondence Analysis), ‘class specific’ MCA, computing and plotting structuring factors and concentration ellipses, ‘standardized’ MCA, inductive tests and others tools for Geometric Data Analysis. It also provides functions for the translation of logit models coefficients into percentages (forthcoming), weighted contingency tables and an association measure – i.e. Percentages of Maximum Deviation from Independence (PEM).
gdm Functions for Generalized Dissimilarity Modeling
A toolkit with functions to fit, plot, and summarize Generalized Dissimilarity Models.
gdns Tools to Work with Google DNS Over HTTPS API
To address the problem of insecurity of UDP-based DNS requests, Google Public DNS offers DNS resolution over an encrypted HTTPS connection. DNS-over-HTTPS greatly enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups. Functions that enable querying individual requests that bulk requests that return detailed responses and bulk requests are both provided. Support for reverse lookups is also provided. See <https://…/dns-over-https> for more information.
gdpc Generalized Dynamic Principal Components
Functions to compute the Generalized Dynamic Principal Components introduced in Peña and Yohai (2016) <DOI:10.1080/01621459.2015.1072542>.
gds Descriptive Statistics of Grouped Data
Contains a function called gds() which accepts three input parameters like lower limits, upper limits and the frequencies of the corresponding classes. The gds() function calculate and return the values of mean (‘gmean’), median (‘gmedian’), mode (‘gmode’), variance (‘gvar’), standard deviation (‘gstdev’), coefficient of variance (‘gcv’), quartiles (‘gq1’, ‘gq2’, ‘gq3’), inter-quartile range (‘gIQR’), skewness (‘g1’), and kurtosis (‘g2’) which facilitate effective data analysis. For skewness and kurtosis calculations we use moments.
gdtools Utilities for Graphical Rendering
Useful tools for writing vector graphics devices.
gear Geostatistical Analysis in R
Implements common geostatistical methods in a clean, straightforward, efficient manner. A quasi reboot of the SpatialTools R package.
gee4 Generalised Estimating Equations (GEE/WGEE) using ‘Armadillo’ and S4
Fit joint mean-covariance models for longitudinal data within the framework of (weighted) generalised estimating equations (GEE/WGEE). The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Armadillo’ C++ library for numerical linear algebra and ‘RcppArmadillo’ glue.
GEEmediate Mediation Analysis for Generalized Linear Models Using the Difference Method
Causal mediation analysis for a single exposure/treatment and a single mediator, both allowed to be either continuous or binary. The package implements the difference method and provide point and interval estimates as well as testing for the natural direct and indirect effects and the mediation proportion.
gelnet Generalized Elastic Nets
The package implements several extensions of the elastic net regularization scheme. These extensions include individual feature penalties for the L1 term and feature-feature penalties for the L2 term.
gemmR General Monotone Model
An R-language implementation of the General Monotone Model proposed by Michael Dougherty and Rick Thomas. It is a procedure for estimating weights for a set of independent predictors that minimize the rank-order inversions between the model predictions and some outcome.
gencve General Cross Validation Engine
Engines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for ‘glmnet’, ‘lars’, ‘plus’, ‘MASS’, ‘rpart’, ‘C50’ and ‘randomforest’. It is easy for the user to add other regression or classification algorithms. The ‘parallel’ package is used to improve speed. Several data generation algorithms for problems in regression and classification are provided.
genderizeR Gender Prediction Based on First Names
Utilizes the API to predict gender from first names extracted from a text vector. The accuracy of prediction could be controlled by two parameters: counts of a first name in the database and probability of prediction.
genderNames Client for the Genderize API That Determines the Gender of Names
API client for which will tell you the gender of the name you input. Use the first name of the person you are interested in to find their gender.
gendist Generated Probability Distribution Models
Computes the probability density function (pdf), cumulative distribution function (cdf), quantile function (qf) and generates random values (rg) for the following general models : mixture models, composite models, folded models, skewed symmetric models and arc tan models.
GeneralOaxaca Blinder-Oaxaca Decomposition for Generalized Linear Model
Perform the Blinder-Oaxaca decomposition for generalized linear model with bootstrapped standard errors. The twofold and threefold decomposition are given, even the generalized linear model output in each group.
GeneralTree General Tree Data Structure
A general tree data structure implementation in R.
generator Generate Data Containing Fake Personally Identifiable Information
Allows users to quickly and easily generate fake data containing Personally Identifiable Information (PII) through convenience functions.
GeNetIt Spatial Graph-Theoretic Genetic Gravity Modelling
Implementation of spatial graph-theoretic genetic gravity models. The model framework is applicable for other types of spatial flow questions. Includes functions for constructing spatial graphs, sampling and summarizing associated raster variables and building unconstrained and singly constrained gravity models.
GenForImp The Forward Imputation: A Sequential Distance-Based Approach for Imputing Missing Data
Two methods based on the Forward Imputation approach are implemented for the imputation of quantitative missing data. One method alternates Nearest Neighbour Imputation and Principal Component Analysis (function ‘ForImp.PCA’), the other uses Nearest Neig
genie A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm
A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed.
genpathmox Generalized PATHMOX Algorithm for PLS-PM, LS and LAD Regression
genpathmox provides a very interesting solution for handling segmentation variables in complex statistical methodology. It contains en extended version of the PATHMOX algorithm in the context of partial least square path modeling (Sanchez, 2009) including the F-block test (to detect the responsible latent endogenous equations of the difference), the F-coefficient (to detect the path coefficients responsible of the difference) and the invariance test (to realize a comparison between the sub-models’ latent variables). Furthermore, the package contains a generalized version of the PATHMOX algorithm to approach different methodologies: linear regression and least absolute regression models.
gensphere Generalized Spherical Distributions
Define and compute with generalized spherical distributions – multivariate probability laws that are specified by a star shaped contour (directional behavior) and a radial component.
geoaxe Split ‘Geospatial’ Objects into Pieces
Split ‘geospatial’ objects into pieces. Includes support for some spatial object inputs, ‘Well-Known Text’, and ‘GeoJSON’.
geofd Spatial Prediction for Function Value Data
Kriging based methods are used for predicting functional data (curves) with spatial dependence.
geoGAM Select Sparse Geoadditive Models for Spatial Prediction
A model building procedure to select a sparse geoadditive model from a large number of covariates. Continuous, binary and ordered categorical responses are supported. The model building is based on component wise gradient boosting with linear effects and smoothing splines. The resulting covariate set after gradient boosting is further reduced through cross validated backward selection and aggregation of factor levels. The package provides a model based bootstrap method to simulate prediction intervals for point predictions. A test data set of a soil mapping case study is provided.
geohash Tools for Geohash Creation and Manipulation
Provides tools to encode lat/long pairs into geohashes, decode those geohashes, and identify their neighbours.
geojson Classes for ‘GeoJSON’
Classes for ‘GeoJSON’ to make working with ‘GeoJSON’ easier.
geojsonio Convert Data from and to ‘geoJSON’ or ‘topoJSON’
Convert data to ‘geoJSON’ or ‘topoJSON’ from various R classes, including vectors, lists, data frames, shape files, and spatial classes. ‘geojsonio’ does not aim to replace packages like ‘sp’, ‘rgdal’, ‘rgeos’, but rather aims to be a high level client to simplify conversions of data from and to ‘geoJSON’ and ‘topoJSON’.
geojsonlint Tools for Validating ‘GeoJSON’
Tools for linting ‘GeoJSON’. Includes tools for interacting with the online tool <>, the ‘Javascript’ library ‘geojsonhint’ (<https://…/geojsonhint> ), and validating against a GeoJSON schema via the ‘Javascript’ library (<https://…/is-my-json-valid> ). Some tools work locally while others require an internet connection.
geojsonR A GeoJson Processing Toolkit
Includes functions for processing GeoJson objects <https://…/GeoJSON> relying on ‘RFC 7946’ <https://…/rfc7946.pdf>. The geojson encoding is based on ‘json11’, a tiny JSON library for ‘C++11’ <https://…/json11>. Furthermore, the source code is exported in R through the ‘Rcpp’ and ‘RcppArmadillo’ packages.
GeomComb (Geometric) Forecast Combination Methods
Provides eigenvector-based (geometric) forecast combination methods; also includes simple approaches (simple average, median, trimmed and winsorized mean, inverse rank method) and regression-based combination. Tools for data pre-processing are available in order to deal with common problems in forecast combination (missingness, collinearity).
geomorph Geometric Morphometric Analyses of 2D/3D Landmark Data
Geomorph allows users to read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.
geonames Interface to web service
Code for querying the web service at
geoparser Interface to the API for Identifying and Disambiguating Places Mentioned in Text
A wrapper for the API version 0.4.0 (see <https://…/> ), which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. Basic, limited API access is free with paid plans to accommodate larger workloads.
geosapi GeoServer REST API R Interface
Provides an R interface to the GeoServer REST API, allowing to upload and publish data in a GeoServer web-application and expose data to OGC Web-Services. The package currently supports all CRUD (Create,Read,Update,Delete) operations on GeoServer workspaces, namespaces, datastores (stores of vector data), featuretypes, layers, styles, as well as vector data upload operations. For more information about the GeoServer REST API, see <http://…/>.
geosptdb Spatio-Temporal; Inverse Distance Weighting and Radial Basis Functions with Distance-Based Regression
Spatio-temporal: Inverse Distance Weighting (IDW) and radial basis functions; optimization, prediction, summary statistics from leave-one-out cross-validation, adjusting distance-based linear regression model and generation of the principal coordinates of a new individual from Gower’s distance.
GERGM Estimation and Fit Diagnostics for Generalized Exponential Random Graph Models
Estimation and diagnosis of the convergence of Generalized Exponential Random Graph Models (GERGM) via Gibbs sampling or Metropolis Hastings with exponential down weighting.
gesca Generalized Structured Component Analysis (GSCA)
Fit a variety of component-based structural equation models.
getPass Masked User Input
A micro-package for reading ‘passwords’, i.e. reading user input with masking, so that the input is not displayed as it is typed. Currently we have support for ‘RStudio’, the command line (every OS), and any platform where ‘tcltk’ is present.
gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods
Automated multi-path General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting structural breaks in the mean. The mean can be specified as an autoregressive model with covariates (an ‘AR-X’ model), and the variance can be specified as a log-variance model with covariates (a ‘log-ARCH-X’ model). The four main functions of the package are arx, getsm, getsv and isat. The first function, arx, estimates an AR-X model with log-ARCH-X errors. The second function, getsm, undertakes GETS model selection of the mean specification of an arx object. The third function, getsv, undertakes GETS model selection of the log-variance specification of an arx object. The fourth function, isat, undertakes GETS model selection of an indicator saturated mean specification.
gettz Get the Timezone Information
A function to retrieve the system timezone on Unix systems which has been found to find an answer when ‘Sys.timezone()’ has failed. It is based on an answer by Duane McCully posted on ‘StackOverflow’, and adapted to be callable from R.
GFA Group Factor Analysis
Factor analysis implementation for multiple data sources, i.e., for groups of variables. The whole data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The model group factor analysis (GFA) is inferred with Gibbs sampling.
GFD Tests for General Factorial Designs
Implemented are the Wald-type statistic, a permuted version thereof as well as the ANOVA-type statistic for general factorial designs, even with non-normal error terms and/or heteroscedastic variances, for crossed designs with an arbitrary number of factors and nested designs with up to three factors.
ggalt Extra Coordinate Systems, Geoms and Statistical Transformations for ‘ggplot2’
A compendium of ‘geoms’, ‘coords’ and ‘stats’ for ‘ggplot2’, including splines, 1d and 2d densities, univariate average shifted histograms and a new map coordinate system based on the ‘PROJ.4’-library.
ggbeeswarm Categorical Scatter (Violin Point) Plots
Provides two methods of plotting categorical scatter plots such that the arrangement of points within a category reflects the density of data at that region, and avoids over-plotting.
ggcorrplot Visualization of a Correlation Matrix using ‘ggplot2’
The ‘ggcorrplot’ package can be used to visualize easily a correlation matrix using ‘ggplot2’. It provides a solution for reordering the correlation matrix and displays the significance level on the plot. It also includes a function for computing a matrix of correlation p-values.
ggdmc Dynamic Model of Choice with Parallel Computation, and C++ Capabilities
A fast engine for computing hierarchical Bayesian model implemented in the Dynamic Model of Choice.
ggenealogy Visualization Tools for Genealogical Data
Methods for searching through genealogical data and displaying the results. Plotting algorithms assist with data exploration and publication-quality image generation. Uses the Grammar of Graphics.
ggExtra Collection of Functions and Layers to Enhance ggplot2
Collection of functions and layers to enhance ggplot2.
ggforce Accelerating ‘ggplot2’
The aim of ‘ggplot2’ is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. ‘ggforce’ aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using ‘ggforce’ should be a stable experience.
ggfortify Data Visualization Tools for Statistical Analysis Results
Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using ‘ggplot2’.
ggghost Capture the Spirit of Your ‘ggplot2’ Calls
Creates a reproducible ‘ggplot2’ object by storing the data and calls.
ggimage Use Image in ‘ggplot2’
Supports aesthetic mapping of image files to be visualized in ‘ggplot2’ graphic system. files as a scatterplot.
ggiraph Make ‘ggplot2’ Graphics Interactive Using ‘htmlwidgets’
Create interactive ‘ggplot2’ graphics that are usable in the ‘RStudio’ viewer pane, in ‘R Markdown’ documents and in ‘Shiny’ apps.
ggiraphExtra Make Interactive ‘ggplot2’. Extension to ‘ggplot2’ and ‘ggiraph’
Collection of functions to enhance ‘ggplot2’ and ‘ggiraph’. Provides functions for exploratory plots. All plot can be a ‘static’ plot or an ‘interactive’ plot using ‘ggiraph’.
Geom for Logo Sequence Plots
Visualize sequences in (modified) logo plots. The design choices used by these logo plots allow sequencing data to be more easily analyzed. Because it is integrated into the ‘ggplot2’ geom framework, these logo plots support native features such as faceting.
ggloop Create ‘ggplot2’ Plots in a Loop
Pass a data frame and mapping aesthetics to ggloop() in order to create a list of ‘ggplot2’ plots. The way x-y and dots are paired together is controlled by the remapping arguments. Geoms, themes, facets, and other features can be added with the special %L+% (L-plus) operator.
ggm Functions for graphical Markov models
Functions and datasets for maximum likelihood fitting of some classes of graphical Markov models.
ggmap Spatial Visualization with Google Maps and OpenStreetMap
Easily visualize of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps with ggplot2.
ggmosaic Mosaic Plots in the ‘ggplot2’ Framework
Mosaic plots in the ‘ggplot2’ framework. Mosaic plot functionality is provided in a single ‘ggplot2’ layer by calling the geom ‘mosaic’.
GGMridge Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation
Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM).
ggnetwork Geometries to Plot Networks with ‘ggplot2’
Geometries to plot network objects with ‘ggplot2’.
ggplot2 An Implementation of the Grammar of Graphics
An implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes. See for more information, documentation and examples.
ggpmisc Miscellaneous Extensions to ‘ggplot2’
Implements extensions to ‘ggplot2’ respecting the grammar of graphics paradigm. Provides new stats to locate and tag peaks and valleys in 2D plots, a stat to add a label by group with the equation of a polynomial fitted with lm(), or R^2 or adjusted R^2 values for any model fitted with function lm(). Provides a function for flexibly converting time series to data frames suitable for plotting with ggplot(). In addition provides two stats useful for diagnosing what data are passed to compute_group() and compute_panel() functions.
ggpolypath Polygons with Holes for the Grammar of Graphics
Tools for working with polygons with holes in ‘ggplot2’, with a new ‘geom’ for drawing a ‘polypath’ applying the ‘evenodd’ or ‘winding’ rules.
ggpubr ggplot2′ Based Publication Ready Plots
ggplot2′ is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ‘ggplot’, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. ‘ggpubr’ provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.
ggraptR Allows Interactive Visualization of Data Through a Web Browser GUI
Intended for both technical and non-technical users to create interactive data visualizations through a web browser GUI without writing any code.
ggrepel Repulsive Text and Label Geoms for ‘ggplot2’
Provides text and label geoms for ‘ggplot2’ that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.
ggsci Scientific Journal and Sci-Fi Themed Color Palettes for ‘ggplot2’
A collection of ‘ggplot2’ color palettes inspired by scientific journals and science fiction TV shows.
ggseas seasonal adjustment on the fly extension for ggplot2
Seasonal adjustment on the fly extension for ggplot2. Convenience functions that let you easily do seasonal adjustment on the fly with ggplot. Depends on the seasonal package to give you access to X13-SEATS-ARIMA.
ggspectra Extensions to ‘ggplot2’ for Radiation Spectra
Additional annotations, stats and scales for plotting ‘light’ spectra with ‘ggplot2’, together with specializations of ggplot() and plot() methods for spectral data stored in objects of the classes defined in package ‘photobiology’ and a plot() method for objects of class ‘waveband’, also defined in package ‘photobiology’.
ggstance Horizontal ‘ggplot2’ Components
A ‘ggplot2’ extension that provides flipped components: horizontal versions of ‘Stats’ and ‘Geoms’, and vertical versions of ‘Positions’.
ggtern An Extension to ‘ggplot2’, for the Creation of Ternary Diagrams
Extends the functionality of ggplot2, providing the capability to plot ternary diagrams for (subset of) the ggplot2 geometries. Additionally, ggtern has implemented several NEW geometries which are unavailable to the standard ggplot2 release. For further examples and documentation, please proceed to the ggtern website.
ggThemeAssist Add-in to Customize ‘ggplot2’ Themes
Rstudio add-in that delivers a graphical interface for editing ‘ggplot2’ theme elements.
ggtree A phylogenetic tree viewer for different types of tree annotations
ggtree extends the ggplot2 plotting system which implemented the grammar of graphics. ggtree is designed for visualizing phylogenetic tree and different types of associated annotation data.
ggvis Interactive Grammar of Graphics
An implementation of an interactive grammar of graphics, taking the best parts of ggplot2, combining them with shiny’s reactive framework and drawing web graphics using vega.
ghit Lightweight GitHub Package Installer
A lightweight, vectorized drop-in replacement for ‘devtools::install_github()’ that uses native git and R methods to clone and install a package from GitHub.
gimme Group Iterative Multiple Model Estimation
Automated identification and estimation of group- and individual-level relations in time series data from within a structural equation modeling framework.
GiniWegNeg Computing the Gini Coefficient for Weighted and Negative Attributes
Computation of the Gini coefficient in the presence of weighted and/or negative attributes. Two different approaches are considered in order to fulfill, in the case of negative attributes, the normalization principle, that is a value of the Gini coefficient bounded into the close range [0,1]. The first approach is based on the proposal by Chen, Tsaur and Rhai (1982) and Berebbi and Silber (1985), while the second approach is based on a recent proposal by Raffinetti, Siletti and Vernizzi (2015). The plot of the curve of maximum inequality, defined in the contribution of Raffinetti, Siletti and Vernizzi (2015), is provided.
GiRaF Gibbs Random Fields Analysis
Allows calculation on, and sampling from Gibbs Random Fields, and more precisely general homogeneous Potts model. The primary tool is the exact computation of the intractable normalising constant for small rectangular lattices. Beside the latter function, it contains method that give exact sample from the likelihood for small enough rectangular lattices or approximate sample from the likelihood using MCMC samplers for large lattices.
gistr Work with GitHub Gists
Work with GitHub gists from R (e.g., http://…/GitHub#Gist , https://…/about-gists ). A gist is simply one or more files with code/text/images/etc. gistr allows the user to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated. Some requests require authentication and some do not. Gists website: .
git2r Provides Access to Git Repositories
Interface to the libgit2 library, which is a pure C implementation of the Git core methods. Provides access to Git repositories to extract data and running some basic git commands.
gitgadget Rstudio Addin for Version Control and Assignment Management using Git
An Rstudio addin for version control that allows users to clone repos, create and delete branches, and sync forks on GitHub, GitLab, etc. Furthermore, the addin uses the GitLab API to allow instructors to create forks and merge requests for all students/teams with one click of a button.
givitiR The GiViTI Calibration Test and Belt
Functions to assess the calibration of logistic regression models with the GiViTI (Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units – see <http://…/> ) approach. The approach consists in a graphical tool, namely the GiViTI calibration belt, and in the associated statistical test. These tools can be used both to evaluate the internal calibration (i.e. the goodness of fit) and to assess the validity of an externally developed model.
gjam Generalized Joint Attribute Modeling
Analyzes joint attribute data (e.g., species abundance) that are combinations of continuous and discrete data with Gibbs sampling.
gk g-and-k and g-and-h Distribution Functions
Functions for the g-and-k and generalised g-and-h distributions.
GK2011 Gaines and Kuklinski (2011) Estimators for Hybrid Experiments
Implementations of the treatment effect estimators for hybrid (self-selection) experiments, as developed by Brian J. Gaines and James H. Kuklinski, (2011), ‘Experimental Estimation of Heterogeneous Treatment Effects Related to Self-Selection,’ American Journal of Political Science 55(3): 724-736.
glamlasso Lasso Penalization in Large Scale Generalized Linear Array Models
Efficient design matrix free procedure for Lasso regularized estimation in large scale 3-dimensional generalized linear array models. The Gaussian model with identity link, the Binomial model with logit link, the Poisson model with log link and the Gamma model with log link is currently implemented.
GLDEX Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods
The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
GLDreg Fit GLD Regression Model and GLD Quantile Regression Model to Empirical Data
Owing to the rich shapes of GLDs, GLD standard/quantile regression is a competitive flexible model compared to standard/quantile regression. The proposed method has some major advantages: 1) it provides a reference line which is very robust to outliers with the attractive property of zero mean residuals and 2) it gives a unified, elegant quantile regression model from the reference line with smooth regression coefficients across different quantiles. The goodness of fit of the proposed model can be assessed via QQ plots and the Kolmogorov-Smirnov test, to ensure the appropriateness of the statistical inference under consideration. Statistical distributions of coefficients of the GLD regression line are obtained using simulation, and interval estimates are obtained directly from simulated data.
glm.ddR Distributed ‘glm’ for Big Data using ‘ddR’ API
Distributed training and prediction of generalized linear models using ‘ddR’ (Distributed Data Structures) API in the ‘ddR’ package.
glm.predict Predicted Values and Discrete Changes for GLM
Functions to calculate predicted values and the difference between the two cases with confidence interval for glm, glm.nb, polr and multinom.
GLMaSPU An Adaptive Test on High Dimensional Parameters in Generalized Linear Models
Several tests for high dimensional generalized linear models have been proposed recently. In this package, we implemented a new test called adaptive sum of powered score (aSPU) for high dimensional generalized linear models, which is often more powerful than the existing methods in a wide scenarios. We also implemented permutation based version of several existing methods for research purpose. We recommend users use the aSPU test for their real testing problem. You can learn more about the tests implemented in the package via the following papers: 1. Pan, W., Kim, J., Zhang, Y., Shen, X. and Wei, P. (2014) <DOI:10.1534/genetics.114.165035> A powerful and adaptive association test for rare variants, Genetics, 197(4). 2. Guo, B., and Chen, S. X. (2016) <DOI:10.1111/rssb.12152>. Tests for high dimensional generalized linear models. Journal of the Royal Statistical Society: Series B. 3. Goeman, J. J., Van Houwelingen, H. C., and Finos, L. (2011) <DOI:10.1093/biomet/asr016>. Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control. Biometrika, 98(2).
glmbb All Hierarchical Models for Generalized Linear Model
Find all hierarchical models of specified generalized linear model with information criterion (AIC, BIC, or AICc) within specified cutoff of minimum value. Use branch and bound algorithm so we do not have to fit all models.
glmBfp Bayesian Fractional Polynomials for GLMs
Implements the Bayesian paradigm for fractional polynomials in generalized linear models. See package ‘bfp’ for the treatment of normal models.
glmgraph Graph-Constrained Regularization for Sparse Generalized Linear Models
We propose to use sparse regression model to achieve variable selection while accounting for graph-constraints among coefficients. Different linear combination of a sparsity penalty(L1) and a smoothness(MCP) penalty has been used, which induces both sparsity of the solution and certain smoothness on the linear coefficients.
glmm Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation
Approximates the likelihood of a generalized linear mixed model using Monte Carlo likelihood approximation. Then maximizes the likelihood approximation to return maximum likelihood estimates, observed Fisher information, and other model information.
GLMMRR GLM for Binary Randomized Response Data
GLM for Binary Randomized Response Data. Includes Cauchit, Log-log, Logistic, and Probit link functions for Bernoulli distributed RR data.
glmmsr Fit a Generalized Linear Mixed Model
Conduct inference about generalized linear mixed models, with a choice about which method to use to approximate the likelihood. In addition to the Laplace and adaptive Gaussian quadrature approximations, which are borrowed from ‘lme4’, the likelihood may be approximated by the sequential reduction approximation, or an importance sampling approximation. These methods provide an accurate approximation to the likelihood in some situations where it is not possible to use adaptive Gaussian quadrature.
glmmTMB Generalized Linear Mixed Models using Template Model Builder
Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via ‘TMB’ (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.
glmnet Lasso and Elastic-Net Regularized Generalized Linear Models
Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multiple-response Gaussian, and the grouped multinomial. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the paper linked to via the URL below.
glmnetUtils Utilities for ‘Glmnet’
Provides a formula interface for the ‘glmnet’ package for elasticnet regression, a method for cross-validating the alpha parameter, and other quality-of-life tools.
glmvsd Variable Selection Deviation Measures and Instability Tests for High-Dimensional Generalized Linear Models
Variable selection deviation (VSD) measures and instability tests for high-dimensional model selection methods such as LASSO, SCAD and MCP, etc., to decide whether the sparse patterns identified by those methods are reliable.
globals Identify Global Objects in R Expressions
Identifies global (‘unknown’) objects in R expressions by code inspection using various strategies, e.g. conservative or liberal. The objective of this package is to make it as simple as possible to identify global objects for the purpose of exporting them in distributed compute environments.
globe Plot 2D and 3D Views of the Earth, Including Major Coastline
Basic functions for plotting 2D and 3D views of a sphere, by default the Earth with its major coastline, and additional lines and points.
glogis Fitting and Testing Generalized Logistic Distributions
Tools for the generalized logistic distribution (Type I, also known as skew-logistic distribution), encompassing basic distribution functions (p, q, d, r, score), maximum likelihood estimation, and structural change methods.
glrt Generalized Logrank Tests for Interval-censored Failure Time Data
Functions to conduct four generalized logrank tests and a score test under a proportional hazards model
gmapsdistance Distance and Travel Time Between Two Points from Google Maps
Get distance and travel time between two points from Google Maps. Four possible modes of transportation (bicycling, walking, driving and public transportation).
GMDH Predicting and Forecasting Time Series via GMDH-Type Neural Network Algorithms
Group method of data handling (GMDH) – type neural network algorithm is the heuristic self-organization method for modelling the complex systems. In this package, GMDH-type neural network algorithms are applied to predict and forecast a univariate time series.
Gmedian Geometric Median, k-Median Clustering and Robust Median PCA
Fast algorithms based on averaged stochastic gradient for robust estimation with large samples (with data whose dimension is larger than 2). Estimation of the geometric median, robust k-Gmedian clustering, and robust PCA based on the Gmedian covariation matrix.
gmeta Meta-Analysis via a Unified Framework of Confidence Distribution
An implementation of an all-in-one function for a wide range of meta-analysis problems. It contains a single function gmeta() that unifies all standard meta-analysis methods and also several newly developed ones under a framework of combining confidence distributions (CDs). Specifically, the package can perform classical p-value combination methods (such as methods of Fisher, Stouffer, Tippett, etc.), fit meta-analysis fixed-effect and random-effects models, and synthesizes 2×2 tables. Furthermore, it can perform robust meta-analysis, which provides protection against model-misspecifications, and limits the impact of any unknown outlying studies. In addition, the package implements two exact meta-analysis methods from synthesizing 2×2 tables with rare events (e.g., zero total event). A plot function to visualize individual and combined CDs through extended forest plots is also available.
gmnl Multinomial Logit Models with Random Parameters
An implementation of maximum simulated likelihood method for the estimation of multinomial logit models with random coefficients. Specifically, it allows estimating models with continuous heterogeneity such as the mixed multinomial logit and the generalized multinomial logit. It also allows estimating models with discrete heterogeneity such as the latent class and the mixed-mixed multinomial logit model.
gMOIP 2D plots of linear or integer programming models’
Make 2D plots of the polyeder of a LP or IP problem, including integer points and iso profit curve. Can also make a plot of a bi-objective criterion space.
gmum.r GMUM Machine Learning Group Package
Direct R interface to Support Vector Machine libraries (‘LIBSVM’ and ‘SVMLight’) and efficient C++ implementations of Growing Neural Gas and models developed by ‘GMUM’ group (Cross Entropy Clustering and 2eSVM).
gmwm Generalized Method of Wavelet Moments
Generalized Method of Wavelet Moments (GMWM) is an estimation technique for the parameters of time series models. It uses the wavelet variance in a moment matching approach that makes it particularly suitable for the estimation of certain state-space models. Furthermore, there exists a robust implementation of GMWM, which allows the robust estimation of some state-space models and ARIMA models. Lastly, the package provides the ability to quickly generate time series data, perform different wavelet decompositions, and visualizations.
gnlm Generalized Nonlinear Regression Models
A variety of functions to fit linear and nonlinear regression with a large selection of distributions.
gofastr Fast DocumentTermMatrix and TermDocumentMatrix Creation
Harness the power of ‘quanteda’, ‘data.table’ & ‘stringi’ to quickly generate ‘tm’ DocumentTermMatrix and TermDocumentMatrix data structures.
gofCopula Goodness-of-Fit Tests for Copulae
Several GoF tests for Copulae are provided. A new hybrid test is implemented which supports all of the individual tests. Estimation methods for the margins are provided. All the tests support parameter estimation and predefined values. The parameters are estimated by pseudo maximum likelihood but if it fails the estimation switches automatically to inversion of Kendall’s tau.
GofKmt Khmaladze Martingale Transformation Goodness-of-Fit Test
Consider a goodness-of-fit(GOF) problem of testing whether a random sample comes from one sample location-scale model where location and scale parameters are unknown. It is well known that Khmaladze martingale transformation method provides asymptotic distribution free test for the GOF problem. This package contains one function: KhmaladzeTrans(). In this version, KhmaladzeTrans() provides test statistic and critical value of GOF test for normal, Cauchy, and logistic distributions.
GoodmanKruskal Association Analysis for Categorical Variables
Association analysis between categorical variables using the Goodman and Kruskal tau measure. This asymmetric association measure allows the detection of asymmetric relations between categorical variables (e.g., one variable obtained by re-grouping another).
googleAnalyticsR Google Analytics API into R
R library for interacting with the Google Analytics Reporting API v3 and v4.
googleAuthR Easy Authentication with Google OAuth2 APIs
Create R functions that interact with OAuth2 Google APIs easily, with auto-refresh and Shiny compatibility.
googleCloudStorageR R Interface with Google Cloud Storage
Interact with Google Cloud Storage API in R. Part of the ‘cloudyr’ project.
googleComputeEngineR R Interface with Google Compute Engine
Interact with the Google Compute Engine API in R. Lets you create, start and stop instances in the Google Cloud. Support for preconfigured instances, with templates for common R needs.
googleformr Collect Data Programmatically by POST Methods to Google Forms
GET and POST data to Google Forms; more secure than having to expose Google Sheets in order to POST data.
googlePublicData Working with Google Public Data Explorer DSPL Metadata Files
Provides a collection of functions designed for working with ‘Google Public Data Explorer’. Automatically builds up the corresponding DSPL (XML) metadata files and CSV files; compressing all the files and leaving them ready to be published on the ‘Public Data Explorer’.
googlesheets Google Spreadsheets R API
Access and manage Google spreadsheets from R with googlesheets. Features:
• Access a spreadsheet by its title, key or URL.
• Extract data or edit data.
• Create | delete | rename | copy | upload | download spreadsheets and worksheets.
googleVis R Interface to Google Charts
R interface to Google Charts API, allowing users to create interactive charts based on data frames. Charts are displayed locally via the R HTTP help server. A modern browser with Internet connection is required and for some charts a Flash player. The data remains local and is not uploaded to Google.
googleway Retrieve Routes from Google Directions API and Decode Encoded Polylines
Retrieves routes and decodes polylines generated from Google’s directions API (https://…/directions ).
GORCure Fit Generalized Odds Rate Mixture Cure Model with Interval Censored Data
Generalized Odds Rate Mixture Cure (GORMC) model is a flexible model of fitting survival data with a cure fraction, including the Proportional Hazards Mixture Cure (PHMC) model and the Proportional Odds Mixture Cure Model as special cases. This package fit the GORMC model with interval censored data.
Goslate Goslate Interface
An interface to the Python package Goslate (Version 1.5.0). Goslate provides an API to Google’s free online language translation service by querying the Google translation website. See <https://…/> for more information about the Python package.
gower Gower’s Distance
Compute Gower’s distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting openMP.
GPareto Gaussian Processes for Pareto Front Estimation and Optimization
Gaussian process regression models, a.k.a. kriging models, are applied to global multiobjective optimization of black-box functions. Multiobjective Expected Improvement and Stepwise Uncertainty Reduction sequential infill criteria are available. A quantification of uncertainty on Pareto fronts is provided using conditional simulations
GPB Generalized Poisson Binomial Distribution
Functions that compute the distribution functions for the Generalized Poisson Binomial distribution, which provides the cdf, pmf, quantile function, and random number generation for the distribution.
GPfit Gaussian Processes Modeling
A computationally stable approach of fitting a Gaussian Process (GP) model to a deterministic simulator. Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version ofthe model proposed by Ranjan et al. (2011 ) in the R package GPfit. A novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm yield robust optimization that is typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit . Several test functions are used for performance comparison with the popular R package mlegp . We also use GPfit for a real application, i.e., for emulating the tidal kinetic energy model for the Bay of Fundy, Nova Scotia, Canada. GPfit is free software and distributed under the General Public License and available from the Comprehensive R Archive Network.
gpg GNU Privacy Guard for R
Bindings to GnuPG for working with OpenGPG (RFC4880) cryptographic methods. Includes utilities for public key encryption, creating and verifying digital signatures, and managing your local keyring. Note that some functionality depends on the version of GnuPG that is installed on the system. In particular GnuPG 2.1 mandates the use of ‘gpg-agent’ for entering passphrases, which only works if R runs in a terminal session.
GPrank Gaussian Process Ranking of Multiple Time Series
Implements a Gaussian process (GP)-based ranking method which can be used to rank multiple time series according to their temporal activity levels. An example is the case when expression levels of all genes are measured over a time course and the main concern is to identify the most active genes, i.e. genes which show significant non-random variation in their expression levels. This is achieved by computing Bayes factors for each time series by comparing the marginal likelihoods under time-dependent and time-independent GP models. Additional variance information from pre-processing of the observations is incorporated into the GP models, which makes the ranking more robust against model overfitting. The package supports exporting the results to ‘tigreBrowser’ for visualisation, filtering or ranking.
gpuR GPU Functions for R Objects
Provides GPU enabled functions for R objects in a simple and approachable manner. New gpu* and vcl* classes have been provided to wrap typical R objects (e.g. vector, matrix), in both host and device spaces, to mirror typical R syntax without the need to know OpenCL.
Grace Graph-Constrained Estimation and Hypothesis Testing
Use the graph-constrained estimation (Grace) procedure to estimate graph-guided linear regression coefficients and use the Grace and GraceR tests to perform graph-guided hypothesis test on the association between the response and the predictor.
gradDescent Gradient Descent for Regression Tasks
An implementation of various learning algorithms based on Gradient Descent for dealing with regression tasks. The variants of gradient descent algorithm are : Mini-Batch Gradient Descent (MBGD), an optimization to use training data partially to reduce the computation load. Stochastic Gradient Descent (SGD), an optimization to use a random data in learning to reduce the computation load drastically. Stochastic Average Gradient (SAG), a SGD-based algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), an optimization to speed-up gradient descent learning. Accelerated Gradient Descent (AGD), an optimization to accelerate gradient descent learning. Adagrad, a gradient-descent-based algorithm that accumulate previous cost to do adaptive learning. Adadelta, a gradient-descent-based algorithm that use hessian approximation to do adaptive learning. RMSprop, a gradient-descent-based algorithm that combine Adagrad and Adadelta adaptive learning ability. Adam, a gradient-descent-based algorithm that mean and variance moment to do adaptive learning.
GRANBase Creating Continuously Integrated Package Repositories from Manifests
Repository based tools for department and analysis level reproducibility. ‘GRANBase’ allows creation of custom branched, continuous integration-ready R repositories, including incremental testing of only packages which have changed versions since the last repository build.
GraphFactor Network Topology of Intravariable Clusters with Intervariable Links
A Network Implementation of Fuzzy Sets: Build Network Objects from Multivariate Flat Files. For more information on fuzzy sets, refer to: Zadeh, L.A. (1964) <DOI:10.1016/S0019-9958(65)90241-X>.
graphicalVAR Graphical VAR for Experience Sampling Data
Estimates within and between time point interactions in experience sampling data, using the Graphical VAR model in combination with LASSO and EBIC.
graphkernels Graph Kernels
A fast C++ implementation of various graph kernels.
GraphKit Estimating Structural Invariants of Graphical Models
Efficient methods for constructing confidence intervals of monotone graph invariants, as well as testing for monotone graph properties. Many packages are available to estimate precision matrices, this package serves as a tool to extract structural properties from their induced graphs. By iteratively bootstrapping on only the relevant edge set, we are able to obtain the optimal interval size.
graphql A GraphQL Query Parser
Bindings to the ‘libgraphqlparser’ C++ library. Currently parses GraphQL and exports the AST in JSON format.
graphscan Cluster Detection with Hypothesis Free Scan Statistic
Multiple scan statistic with variable window for one dimension data and scan statistic based on connected components in 2D or 3D.
graphTweets Visualise Twitter Interactions
Allows building an edge table from data frame of tweets, also provides function to build vertices (meta-data).
gravity A Compilation of Different Estimation Methods for Gravity Models
One can use gravity models to explain bilateral flows related to the sizes of bilateral partners, a measure of distance between them and other influences on interaction costs. The underlying idea is rather simple. The greater the masses of two bodies and the smaller the distance between them, the stronger they attract each other. This concept is applied to several research topics such as trade, migration or foreign direct investment. Even though the basic idea of gravity models is rather simple, they can become very complex when it comes to the choice of models or estimation methods. The package gravity targets to provide R users with the functions necessary to execute the most common estimation methods for gravity models, especially for cross-sectional data. It contains the functions Ordinary Least Squares (OLS), Fixed Effects, Double Demeaning (DDM), Bonus vetus OLS with simple averages (BVU) and with GDP-weights (BVW), Structural Iterated Least Squares (SILS), Tetrads as well as Poisson Pseudo Maximum Likelihood (PPML). By considering the descriptions of the estimation methods, users can see which method and data may be suited for a certain research question. In order to illustrate the estimation methods, this package includes a dataset called Gravity (see the description of the dataset for more information). On the Gravity Cookbook website (<https://…/> ) Keith Head and Thierry Mayer provide Stata code for the most common estimation methods for gravity models when using cross-sectional data. In order to get comparable results in R, the methods presented in the package gravity are designed to be consistent with this Stata code when choosing the option of robust variance estimation. However, compared to the Stata code available, the functions presented in this package provide users with more flexibility regarding the type of estimation (robust or not robust), the number and type of independent variables as well as the possible data. The functions all estimate gravity models, but they differ in whether they estimate them in their multiplicative or additive form, their requirements with respect to the data, their handling of Multilateral Resistance terms as well as their possibilities concerning the inclusion of unilateral independent variables. Therefore, they normally lead to different estimation results. We refer the user to the Gravity Cookbook website (<https://…/> ) for more information on gravity models in general. Head, K. and Mayer, T. (2014) <DOI:10.1016/B978-0-444-54314-1.00003-3> provide a comprehensive and accessible overview of the theoretical and empirical development of the gravity literature as well as the use of gravity models and the various estimation methods, especially their merits and potential problems regarding applicability as well as different gravity datasets. All functions were tested to work on cross-sectional data and are consistent with the Stata code mentioned above. For the use with panel data no tests were performed. Therefore, it is up to the user to ensure that the functions can be applied to panel data. For a comprehensive overview of gravity models for panel data see Egger, P., & Pfaffermayr, M. (2003) <DOI:10.1007/s001810200146>, Gomez-Herrera, E. (2013) <DOI:10.1007/s00181-012-0576-2> and Head, K., Mayer, T., & Ries, J. (2010) <DOI:10.1016/j.jinteco.2010.01.002> as well as the references therein (see also the references included in the descriptions of the different functions). Depending on the panel dataset and the variables – specifically the type of fixed effects – included in the model, it may easily occur that the model is not computable. Also, note that by including bilateral fixed effects such as country-pair effects, the coefficients of time-invariant observables such as distance can no longer be estimated. Depending on the specific model, the code of the respective function may has to be changed in order to exclude the distance variable from the estimation. At the very least, the user should take special care with respect to the meaning of the estimated coefficients and variances as well as the decision about which effects to include in the estimation. As, to our knowledge at the moment, there is no explicit literature covering the estimation of a gravity equation by Double Demeaning, Structural Iterated Least Squares or Bonus Vetus OLS using panel data, we do not recommend to apply these methods in this case. Contributions, extensions and error corrections are very welcome. Please do not hesitate to contact us.
Greg Regression Helper Functions
Methods for manipulating regression models and for describing these in a style adapted for medical journals. Contains functions for generating an HTML table with crude and adjusted estimates, plotting hazard ratio, plotting model estimates and confidence intervals using forest plots, extending this to comparing multiple models in a single forest plots. In addition to the descriptives methods, there are addons for the robust covariance matrix provided by the sandwich package, a function for adding non-linearities to a model, and a wrapper around the Epi package’s Lexis functions for time-spliting a dataset when modeling non-proportional hazards in Cox regressions.
greyzoneSurv Fit a Grey-Zone Model with Survival Data
Allows one to classify patients into low, intermediate, and high risk groups for disease progression based on a continuous marker that is associated with progression-free survival. It uses a latent class model to link the marker and survival outcome and produces two cutoffs for the marker to divide patients into three groups. See the References section for more details.
gridGraphics Redraw Base Graphics Using grid Graphics
Functions to convert a page of plots drawn with the graphics package into identical output drawn with the grid package. The result looks like the original graphics-based plot, but consists of grid grobs and viewports that can then be manipulated with grid functions (e.g., edit grobs and revisit viewports).
gridsample Tools for Grid-Based Survey Sampling Design
Multi-stage cluster household surveys are commonly performed by governments and programs to monitor population demographic, social, economic, and health outcomes. In these surveys, communities are sampled in a first stage of sampling from within subpopulations of interest (or strata), households are sampled in a second stage of sampling, and sometimes individuals are listed and further sampled within households. The first stage of sampling, where communities of sample populations are defined, are called Primary Sampling Units (PSUs) while the households are secondary sampling units (SSUs). Census data are typically used to select PSUs within strata. If census data are outdated, inaccurate, or not available at fine enough scale, however, gridded population data can be used instead. This tool selects PSUs within user-defined strata using gridded population data, given desired numbers of sampled households within each PSU. The population densities used to create PSUs are drawn from rasters such as the population data from the WorldPop Project ( ). PSUs are defined within a stratum using a serpentine sampling method, and can be set to have a certain ratio of urban and rural PSUs, or to be evenly distributed across a coarse, user-defined grid.
gridsampler A Simulation Tool to Determine the Required Sample Size for Repertory Grid Studies
Simulation tool to facilitate determination of required sample size to achieve category saturation for studies using multiple repertory grids in conjunction with content analysis.
gromovlab Gromov-Hausdorff Type Distances for Labeled Metric Spaces
Computing Gromov-Hausdorff type l^p distances for labeled metric spaces. These distances were introduced in V.Liebscher, Gromov meets Phylogenetics – new Animals for the Zoo of Metrics on Tree Space. preprint arXiv:1504.05795, for phylogenetic trees but may apply to much more situations.
groupdata2 Creating Groups from Data
Subsetting methods for balanced cross-validation, time series windowing, and general grouping and splitting of data.
groupRemMap Regularized Multivariate Regression for Identifying Master Predictors Using the GroupRemMap Penalty
An implementation of the GroupRemMap penalty for fitting regularized multivariate response regression models under the high-dimension-low-sample-size setting. When the predictors naturally fall into groups, the GroupRemMap penalty encourages procedure to select groups of predictors, while control for the overall sparsity of the final model.
GroupTest Multiple Testing Procedure for Grouped Hypotheses
Contains functions for a two-stage multiple testing procedure for grouped hypothesis, aiming at controlling both the total posterior false discovery rate and within-group false discovery rate.
grove Wavelet Functional ANOVA Through Markov Groves
Functional denoising and functional ANOVA through wavelet-domain Markov groves. Fore more details see: Ma L. and Soriano J. (2016) Efficient functional ANOVA through wavelet-domain Markov groves. <arXiv:1602.03990v2 [stat.ME]>.
GrowingSOM Growing Self-Organizing Maps
A growing self-organizing map (GrowingSOM, GSOM) is a growing variant of the popular self-organizing map (SOM). A growing self-organizing map is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional representation of the input space of the training samples, called a map.
growth Multivariate Normal and Elliptically-Contoured Repeated Measurements Models
Functions for fitting various normal theory (growth curve) and elliptically-contoured repeated measurements models with ARMA and random effects dependence.
growthrates Estimate Growth Rates from Experimental Data
A collection of methods to determine growth rates from experimental data, in particular from batch experiments and plate reader trials.
grpregOverlap Penalized Regression Models with Overlapping Grouped Covariates
Fit the regularization path of linear, logistic or poisson models with overlapping grouped covariates based on the latent group lasso approach. Latent group MCP/SCAD as well as bi-level selection methods, namely the group exponential lasso and the composite MCP are also available. This package serves as an extension of R package ‘grpreg’ (by Dr. Patrick Breheny <>) for grouped variable selection involving overlaps between groups.
grpSLOPE Group Sorted L1 Penalized Estimation
Group SLOPE is a penalized linear regression method that is used for adaptive selection of groups of significant predictors in a high-dimensional linear model. The Group SLOPE method can control the (group) false discovery rate at a user-specified level (i.e., control the expected proportion of irrelevant among all selected groups of predictors).
grpss Group Screening and Selection
Contains the tools to screen grouped variables, and select screened grouped variables afterwards. The main function grpss() can perform the grouped variables screening as well as selection for ultra-high dimensional data with group structure. The screening step is primarily used to reduce the dimensions of data so that the selection procedure can easily handle the moderate or low dimensions instead of ultra-high dimensions.
GrpString Patterns and Statistical Differences Between Two Groups of Strings
Methods include converting series of event names to strings, discovering common patterns in a group of strings, discovering ‘unique’ patterns when comparing two groups of strings as well as the number and starting position of each ‘unique’ pattern in each string, finding the transition information, and statistically comparing the difference between two groups of strings.
grr Alternate Implementations of Base R Functions
Alternate implementations of some base R functions, including sort, order, and match. Functions are faster and/or have been otherwise augmented.
GRS.test GRS Test for Portfolio Efficiency and Its Statistical Power Analysis
Computational resources for test proposed by Gibbons, Ross, Shanken (1989)<DOI:10.2307/1913625>.
GSED Group Sequential Enrichment Design
Provides function to apply ‘Group sequential enrichment design incorporating subgroup selection’ (GSED) method proposed by Magnusson and Turnbull (2013) <doi:10.1002/sim.5738>.
gSEM Semi-Supervised Generalized Structural Equation Modelling
Conducts a semi-gSEM statistical analysis (semi-supervised generalized structural equation modeling) on a data frame of coincident observations of multiple continuous variables, via two functions sgSEMp1() and sgSEMp2(), representing fittings based on two statistical principles. Principle 1 determines the univariate relationships in the spirit of the Markovian process. The relationship between each pair of system elements, including predictors and the system level response, is determined with the Markovian property that assumes the value of the current predictor is sufficient in relating to the next level variable, i.e., the relationship is independent of the specific value of the preceding-level variable to the current predictor, given the current value. Principle 2 resembles the multiple regression principle in the way multiple predictors are considered simultaneously. Specifically, the first-level predictors to the system level variable, such as, Time and unit level variables, acted on the system level variable collectively by an additive model. This collective additive model can be found with a generalized stepwise variable selection (using the step() function in R, which performs variable selection on the basis of AIC) and this proceeds iteratively.
gsheet Download Google Sheets Using Just the URL
Simple package to download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually. Google Sheets is the new name for Google Docs Spreadsheets.
GSparO Group Sparse Optimization
Approaches a group sparse solution of an underdetermined linear system. It implements the proximal gradient algorithm to solve a lower regularization model of group sparse learning. For details, please refer to the paper ‘Y. Hu, C. Li, K. Meng, J. Qin and X. Yang. Group sparse optimization via l_{p,q} regularization. Journal of Machine Learning Research, to appear, 2017’.
gt4ireval Generalizability Theory for Information Retrieval Evaluation
Provides tools to measure the reliability of an Information Retrieval test collection. It allows users to estimate reliability using Generalizability Theory and map those estimates onto well-known indicators such as Kendall tau correlation or sensitivity.
gtable Arrange grobs in tables
Tools to make it easier to work with ‘tables’ of grobs.
gTests Graph-Based Two-Sample Tests
Three graph-based tests are provided for testing whether two samples are from the same distribution.
gtheory Apply Generalizability Theory with R
Estimates variance components, generalizability coefficients, universe scores, and standard errors when observed scores contain variation from one or more measurement facets (e.g., items and raters).
gtop Game-Theoretically OPtimal (GTOP) Reconciliation Method
In hierarchical time series (HTS) forecasting, the hierarchical relation between multiple time series is exploited to make better forecasts. This hierarchical relation implies one or more aggregate consistency constraints that the series are known to satisfy. Many existing approaches, like for example bottom-up or top-down forecasting, therefore attempt to achieve this goal in a way that guarantees that the forecasts will also be aggregate consistent. This package provides with an implementation of the Game-Theoretically OPtimal (GTOP) reconciliation method proposed in van Erven and Cugliari (2015), which is guaranteed to only improve any given set of forecasts. This opens up new possibilities for constructing the forecasts. For example, it is not necessary to assume that bottom-level forecasts are unbiased, and aggregate forecasts may be constructed by regressing both on bottom-level forecasts and on other covariates that may only be available at the aggregate level.
gtrendsR R Functions to Perform and Display Google Trends Queries
An interface for retrieving and displaying the information returned online by Google Trends is provided. Trends (number of hits) over the time as well as geographic representation of the results can be displayed.
guess Adjust Estimates of Learning for Guessing
Adjust Estimates of Learning for Guessing. The package provides standard guessing correction, and a latent class model that leverages informative pre-post transitions. For details of the latent class model, see <http://…/guess.pdf>.
GUIProfiler Graphical User Interface for Rprof()
Graphical User Interface for Rprof() Regularized Categorical Effects/Categorical Effect Modifiers/Continuous/Smooth Effects in GLMs
Generalized structured regression models with regularized categorical effects, categorical effect modifiers, continuous effects and smooth effects.
gwdegree A Shiny App to Aid Interpretation of Geometrically-Weighted Degree Estimates in Exponential Random Graph Models
This is a Shiny application intended to provide better understanding of how geometrically-weighted degree terms function in exponential random graph models of networks.
gWidgets2RGtk2 Implementation of gWidgets2 for the RGtk2 Package
Implements the ‘gWidgets2’ API for ‘RGtk2.’
gWidgets2tcltk Toolkit Implementation of gWidgets2 for tcltk
Port of the ‘gWidgets2’ API for the ‘tcltk’ package.
GWLelast Geographically Weighted Logistic Elastic Net Regression
Fit a geographically weighted logistic elastic net regression.
gWQS Generalized Weighted Quantile Sum Regression
Fits Weighted Quantile Sum (WQS) regressions for continuous or binomial outcomes.
gym Provides Access to the OpenAI Gym API
OpenAI Gym is a open-source Python toolkit for developing and comparing reinforcement learning algorithms. This is a wrapper for the OpenAI Gym API, and enables access to an ever-growing variety of environments. For more details on OpenAI Gym, please see here: <https://…/gym>. For more details on the OpenAI Gym API specification, please see here: <https://…/gym-http-api>.


haploReconstruct Reconstruction of Haplotype-Blocks from Time Series Data
Reconstruction of founder haplotype blocks from time series data.
HarmonicRegression Harmonic Regression to One or more Time Series
Fits the first harmonics in a Fourier expansion to one or more time series. Trend elimination can be performed. Computed values include estimates of amplitudes and phases, as well as confidence intervals and p-values for the null hypothesis of Gaussian noise.
HARtools Read HTTP Archive (‘HAR’) Data
The goal of ‘HARtools’ is to provide a simple set of functions to read/parse, write and visualise HTTP Archive (‘HAR’) files in R.
Harvest.Tree Harvest the Classification Tree
Aimed at applying the Harvest classification tree algorithm, which is a modified algorithm of classic classification tree. It was firstly used in drug discovery field, but it also performs well in other kind of data, especially when active region is unrelated.To learn more about the harvest classification algorithm, you can go to http://…/220.pdf for more information.
hashids Generate Short Unique YouTube-Like IDs (Hashes) from Integers
An R port of the hashids library. hashids generates YouTube-like hashes from integers or vector of integers. Hashes generated from integers are relatively short, unique and non-seqential. hashids can be used to generate unique ids for URLs and hide database row numbers from the user. By default hashids will avoid generating common English cursewords by preventing certain letters being next to each other. hashids are not one-way: it is easy to encode an integer to a hashid and decode a hashid back into an integer.
hashmap The Faster Hash Map
Provides a hash table class for fast key-value storage of atomic vector types. Internally, hashmap makes extensive use of Rcpp, boost::variant, and boost::unordered_map to achieve high performance, type-safety, and versatility, while maintaining compliance with the C++98 standard.
haven Import SPSS, Stata and SAS Files
Import foreign statistical formats into R via the embedded ReadStat C library (https://…/ReadStat ). Package includes preliminary support for writing Stata and SPSS formats.
hBayesDM Hierarchical Bayesian Modeling of Decision-Making Tasks
Fit an array of decision-making tasks with computational models in a hierarchical Bayesian framework. Can perform hierarchical Bayesian analysis of various computational models with a single line of coding.
HDGLM Tests for High Dimensional Generalized Linear Models
Test the significance of coefficients in high dimensional generalized linear models.
HDInterval Highest (Posterior) Density Intervals
A generic function and a set of methods to calculate highest density intervals for a variety of classes of objects which can specify a probability density distribution, including MCMC output, fitted density objects, and functions.
hdm High-Dimensional Metrics
Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty.
hdnom Nomograms for High-Dimensional Cox Models
Build nomograms for high-dimensional Cox models, with support for model validation and calibration.
HDoutliers Leland Wilkinson’s Algorithm for Detecting Multidimensional Outliers
An implementation of an algorithm for outlier detection that can handle a) data with a mixed categorical and continuous variables, b) many columns of data, c) many rows of data, d) outliers that mask other outliers, and e) both unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, HDoutliers is based on a distributional model that uses probabilities to determine outliers.
hdpca Principal Component Analysis in High-Dimensional Data
In high-dimensional settings: Estimate the number of distant spikes based on the Generalized Spiked Population (GSP) model. Estimate the population eigenvalues, angles between the sample and population eigenvectors, correlations between the sample and population PC scores, and the asymptotic shrinkage factors. Adjust the shrinkage bias in the predicted PC scores.
hds Hazard Discrimination Summary
Functions for calculating the hazard discrimination summary and its standard errors, as described in Liang and Heagerty (2016) <doi:10.1111/biom.12628>.
healthcareai Tools for Healthcare Machine Learning
A machine learning toolbox tailored to healthcare data. Aids in data cleaning, model development, hyperparameter tuning, and model deployment in a production SQL environment. Algorithms currently supported are Lasso, Random Forest, and Linear Mixed Model.
heatmaply Interactive Heat Maps Using ‘plotly’
Create interactive heatmaps that are usable from the R console, in the ‘RStudio’ viewer pane, in ‘R Markdown’ documents, and in ‘Shiny’ apps. Hover the mouse pointer over a cell to show details or drag a rectangle to zoom. A heatmap is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by dendrograms. Heatmaps are used in many fields for visualizing observations, correlations, missing values patterns, and more. Interactive heatmaps allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the heatmap by dragging a rectangle around the relevant area. This work is based on the ‘ggplot2’ and ‘plotly.js’ engine. It produces similar heatmaps as ‘d3heatmap’, with the advantage of speed (‘plotly.js’ is able to handle larger size matrix), and the ability to zoom from the dendrogram panes.
hellno Providing ‘stringsAsFactors=FALSE’ Variants of ‘data.frame()’ and ‘’
Base R’s default setting for ‘stringsAsFactors’ within ‘data.frame()’ and ‘’ is supposedly the most often complained about piece of code in the R infrastructure. The ‘hellno’ package provides an explicit solution without changing R itself or having to mess around with options. It tries to solve this problem by providing alternative ‘data.frame()’ and ‘’ functions that are in fact simple wrappers around base R’s ‘data.frame()’ and ‘’ with ‘stringsAsFactors’ option set to ‘HELLNO’ ( which in turn equals FALSE ) by default.
hetmeta Heterogeneity Measures in Meta-Analysis
Assess the presence of statistical heterogeneity and quantify its impact in the context of meta-analysis. It includes test for heterogeneity as well as other statistical measures (R_b, I^2, R_I).
heuristica Heuristics Including Take the Best and Unit-Weight Linear
Implements various heuristics like Take The Best and unit-weight linear, which do two-alternative choice: which of two objects will have a higher criterion? Also offers functions to assess performance, e.g. percent correct across all row pairs in a data set and finding row pairs where models disagree. New models can be added by implementing a fit and predict function– see vignette.
hextri Hexbin Plots with Triangles
Display hexagonally binned scatterplots for multi-class data, using coloured triangles to show class proportions.
hgm Holonomic Gradient Method and Gradient Descent
The holonomic gradient method (HGM, hgm) gives a way to evaluate normalization constants of unnormalized probability distributions by utilizing holonomic systems of differential or difference equations. The holonomic gradient descent (HGD, hgd) gives a method to find maximal likelihood estimates by utilizing the HGM.
HiDimDA High Dimensional Discriminant Analysis
Performs linear discriminant analysis in high dimensional problems based on reliable covariance estimators for problems with (many) more variables than observations. Includes routines for classifier training, prediction, cross-validation and variable selection.
hierarchicalSets Set Data Visualization Using Hierarchies
Pure set data visualization approaches are often limited in scalability due to the combinatorial explosion of distinct set families as the number of sets under investigation increases. hierarchicalSets applies a set centric hierarchical clustering of the sets under investigation and uses this hierarchy as a basis for a range of scalable visual representations. hierarchicalSets is especially well suited for collections of sets that describe comparable comparable entities as it relies on the sets to have a meaningful relational structure.
hierband Convex Banding of the Covariance Matrix
Implementation of the convex banding procedure (using a hierarchical group lasso penalty) for covariance estimation that is introduced in Bien, Bunea, Xiao (2015) Convex Banding of the Covariance Matrix. Accepted for publication in JASA.
hiertest Convex Hierarchical Testing of Interactions
Implementation of the convex hierarchical testing (CHT) procedure introduced in Bien, Simon, and Tibshirani (2015) Convex Hierarchical Testing of Interactions. Annals of Applied Statistics. Vol. 9, No. 1, 27-42.
highcharter A Wrapper for the ‘Highcharts’ Library
A wrapper for the ‘Highcharts’ library including shortcut functions to plot R objects. ‘Highcharts’ <http://…/> is a charting library offering numerous chart types with a simple configuration syntax.
HighDimOut Outlier Detection Algorithms for High-Dimensional Data
Three high-dimensional outlier detection algorithms and a outlier unification scheme are implemented in this package. The angle-based outlier detection (ABOD) algorithm is based on the work of Kriegel, Schubert, and Zimek [2008]. The subspace outlier detection (SOD) algorithm is based on the work of Kriegel, Kroger, Schubert, and Zimek [2009]. The feature bagging-based outlier detection (FBOD) algorithm is based on the work of Lazarevic and Kumar [2005]. The outlier unification scheme is based on the work of Kriegel, Kroger, Schubert, and Zimek [2011].
highlightHTML Highlight HTML Text and Tables
A tool to highlight specific cells in an HTML table or more generally text from an HTML document. This may be helpful for those using markdown to create reproducible documents. In addition, the ability to compile directly from R markdown files is also possible using the ‘knitr’ package.
highmean Two-Sample Tests for High-Dimensional Mean Vectors
Provides various tests for comparing high-dimensional mean vectors in two groups.
HistDAWass Histogram-Valued Data Analysis
In the framework of Symbolic Data Analysis, a relatively new approach to the statistical analysis of multi-valued data, we consider histogram-valued data, i.e., data described by univariate histograms. The methods and the basic statistics for histogram-valued data are mainly based on the L2 Wasserstein metric between distributions, i.e., a Euclidean metric between quantile functions. The package contains unsupervised classification techniques, least square regression and tools for histogram-valued data and for histogram time series.
histmdl A Most Informative Histogram-Like Model
Using the MDL principle, it is possible to estimate parameters for a histogram-like model. The package contains the implementation of such an estimation method.
HistogramTools Utility Functions for R Histograms
Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
hit Hierarchical Inference Testing
Hierarchical inference testing (HIT) for linear models with correlated covariates applicable to high-dimensional settings.
hkclustering Ensemble Clustering using K Means and Hierarchical Clustering
Implements an ensemble algorithm for clustering combining a k-means and a hierarchical clustering approach.
hkevp Spatial Extreme Value Analysis with the Hierarchical Model of Reich and Shaby (2012)
Simulation and fitting procedures for a particular hierarchical max-stable model: the HKEVP of Reich and Shaby (2012) <DOI:10.1214/12-AOAS591>. Spatial prediction and marginal distribution extrapolation are also available, which allows a risk estimation at an ungauged site.
HKprocess Hurst-Kolmogorov Process
Methods to make inference about the Hurst-Kolmogorov and the AR(1) process.
HLSM Hierarchical latent space network model (HLSM)
Hierarchical latent space network model for ensemble of networks
hmi Hierarchical Multiple Imputation
Runs single level and multilevel imputation models. The user just has to pass the data to the main function and, optionally, his analysis model. Basically the package then translates this analysis model into commands to impute the data according to it with functions from ‘mice’, ‘MCMCglmm’ or routines build for this package.
HMM Hidden Markov Models
Easy to use library to setup, apply and make inference with discrete time and discrete space Hidden Markov Models
hmmm hierarchical multinomial marginal models
Functions for specifying and fitting marginal models for contingency tables proposed by Bergsma and Rudas (2002) here called hierarchical multinomial marginal models (hmmm) and their extensions presented by Bartolucci et al. (2007); multinomial Poisson homogeneous (mph) models and homogeneous linear predictor (hlp) models for contingency tables proposed by Lang (2004) and (2005); hidden Markov models where the distribution of the observed variables is described by a marginal model. Inequality constraints on the parameters are allowed and can be tested.
HMVD Group Association Test using a Hidden Markov Model
Perform association test between a group of variable and the outcome.
hNMF Hierarchical Non-Negative Matrix Factorization
Hierarchical non-negative matrix factorization for tumor segmentation based on multi-parametric MRI data.
hoa Higher Order Likelihood Inference
Performs likelihood-based inference for a wide range of regression models. Provides higher-order approximations for inference based on extensions of saddlepoint type arguments as discussed in the book Applied Asymptotics: Case Studies in Small-Sample Statistics by Brazzale, Davison, and Reid (2007).
Homeric Doughnut Plots
A simple implementation of doughnut plots – pie charts with a blank center. The package is named after Homer Simpson – arguably the best-known lover of doughnuts.
horizon Horizon Search Algorithm
Calculates horizon elevation angle and sky view factor from a digital terrain model.
hornpa Horn’s (1965) Test to Determine the Number of Components/Factors
A stand-alone function that generates a user specified number of random datasets and computes eigenvalues using the random datasets (i.e., implements Horn’s parallel analysis). Users then compare the resulting eigenvalues (the mean or the specified percentile) from the random datasets (i.e., eigenvalues resulting from noise) to the eigenvalues generated with the user’s data. Can be used for both principal components analysis (PCA) and common/exploratory factor analysis (EFA). The output table shows how large eigenvalues can be as a result of merely using randomly generated datasets. If the user’s own dataset has actual eigenvalues greater than the corresponding eigenvalues, that lends support to retain that factor/component. In other words, if the i(th) eigenvalue from the actual data was larger than the percentile of the (i)th eigenvalue generated using randomly generated data, empirical support is provided to retain that factor/component. Horn, J. (1965). A rationale and test for the number of factors in factor analysis.
horseshoe Implementation of the Horseshoe Prior
Contains functions for applying the horseshoe prior to high- dimensional linear regression, yielding the posterior mean and credible intervals, amongst other things. The key parameter tau can be equipped with a prior or estimated via maximum marginal likelihood estimation (MMLE). The main function, horseshoe, is for linear regression. In addition, there are functions specifically for the sparse normal means problem, allowing for faster computation of for example the posterior mean and posterior variance. Finally, there is a function available to perform variable selection, using either a form of thresholding, or credible intervals.
hot.deck Multiple Hot-deck Imputation
Performs multiple hot-deck imputation of categorical and continuous variables in a data frame.
HotDeckImputation Hot Deck Imputation Methods for Missing Data
This package provides hot deck imputation methods to resolve missing data.
hotspot Software Hotspot Analysis
Contains data for software hotspot analysis, along with a function performing the analysis itself.
hqreg Regularization Paths for Huber Loss Regression and Quantile Regression Penalized by Lasso or Elastic-Net
Efficient algorithms for fitting entire regularization paths for Huber loss regression and quantile regression penalized by lasso or elastic-net.
hrbrthemes Additional Themes, Theme Components and Utilities for ‘ggplot2’
A compilation of extra ‘ggplot2’ themes, scales and utilities, including a spell check function plot label fields and an overall emphasis on typography. A copy of the ‘Google’ font ‘Roboto Condensed’ <https://…/> is also included to support one of the typography-oriented themes.
HSAR Hierarchical Spatial Autoregressive Model (HSAR)
A library of the Hierarchical Spatial Autoregressive Model (HSAR), based on a Bayesian Markov Chain Monte Carlo (MCMC) algorithm.
htdp Horizontal Time Dependent Positioning
Provides bindings to the National Geodetic Survey (NGS) Horizontal Time Dependent Positioning (HTDP) utility, v3.2.5, written by Richard Snay, Chris Pearson, and Jarir Saleh of NGS. HTDP is a utility that allows users to transform positional coordinates across time and between spatial reference frames. See <https://…/Htdp.shtml> for more information.
htmltab Assemble Data Frames from HTML Tables
htmltab is a package for extracting structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides two major advantages. First, the package automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table. Additionally, the function preprocesses table code, removes unneeded parts and so helps to alleviate the need for tedious post-processing.
htmlTable Advanced Tables for Markdown/HTML
A package for creating tables with state of the art layout containing row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout the underlying CSS structure is simple in order to maximize compatibility with MS Word/LibreOffice. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.
htmltidy Clean Up or Pretty Print Gnarly HTML and XHTML
HTML documents can be beautiful and pristine. They can also be wretched, evil, malformed demon-spawn. Now, you can tidy up that HTML and XHTML before processing it with your favorite angle-bracket crunching tools, going beyond the limited tidying that ‘libxml2’ affords in the ‘XML’ and ‘xml2’ packages and taming even the ugliest HTML code generated by the likes of Google Docs and Microsoft Word. It’s also possible to use the functions provided to format or ‘pretty print’ HTML content as it is being tidied.
htmlwidgets HTML Widgets for R
A framework for creating HTML widgets that render in various contexts including the R console, R Markdown documents, and Shiny web applications.
hts Hierarchical and grouped time series
Methods for analysing and forecasting hierarchical and g