A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |R Packages| = 3869 A A3 Accurate, Adaptable, and Accessible Error Metrics for Predictive Models Supplies tools for tabulating and analyzing the results of predictive models. The methods employed are applicable to virtually any predictive model and make comparisons between different methodologies straightforward. abc Tools for Approximate Bayesian Computation (ABC) Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models. abc.data Data Only: Tools for Approximate Bayesian Computation (ABC) Contains data which are used by functions of the ‘abc’ package. ABCanalysis Computed ABC Analysis For a given data set, the package provides a novel method of computing precise limits to acquire subsets which are easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphically representing the cumulative distribution function. Based on an ABC analysis the algorithm calculates, with the help of the ABC curve , the optimal limits by exploiting the mathematical properties pertaining to distribution of analyzed items. The data containing positive values is divided into three disjoint subsets A, B and C, with subset A comprising very profitable values, i.e. largest data values (“the important few”) subset B comprising values where the profit equals to the effort required to obtain it, and the subset C comprising of non-profitable values, i.e., the smallest data sets (“the trivial many”). abcrf Approximate Bayesian Computation via Random Forests Performs Approximate Bayesian Computation (ABC) model choice via random forests. abctools Tools for ABC Analyses Tools for approximate Bayesian computation including summary statistic selection and assessing coverage. An R Package for Tuning Approximate Bayesian Computation Analyses abodOutlier Angle-Based Outlier Detection Performs angle-based outlier detection on a given dataframe. Three methods are available, a full but slow implementation using all the data that has cubic complexity, a fully randomized one which is way more efficient and another using k-nearest neighbours. These algorithms are specially well suited for high dimensional data outlier detection. ACA Abrupt Change-Point or Aberration Detection in Point Series Offers an interactive function for the detection of breakpoints in series. accelmissing Missing Value Imputation for Accelerometer Data Imputation for the missing count values in accelerometer data. The methodology includes both parametric and semi-parametric multiple imputations under the zero-inflated Poisson lognormal model. This package also provides multiple functions to pre-process the accelerometer data previous to the missing data imputation. These includes detecting wearing and non-wearing time, selecting valid days and subjects, and creating plots. ACDm Tools for Autoregressive Conditional Duration Models Package for Autoregressive Conditional Duration (ACD, Engle and Russell, 1998) models. Creates trade, price or volume durations from transactions (tic) data, performs diurnal adjustments, fits various ACD models and tests them. Acinonyx High-Performance interactive graphics system iPlots eXtreme Acinonyx (genus of cheetah – for its speed) is the codename for the next generation of a high-performance interactive graphics system iPlots eXtreme. It is a continuation of the iPlots project, allowing visualization and exploratory analysis of large data. Due to its highly flexible design and focus on speed optimization, it can also be used as a general graphics system (e.g. it is the fastest R graphics device if you have a good GPU) and an interactive toolkit. It is a complete re-write of iPlots from scratch, taking the best from iPlots design and focusing on speed and flexibility. The main focus compared to the previous iPlots project is on: • speed and scalability to support large data (it uses OpenGL, optimized native code and object sharing to allow visualization of millions of datapoints). • enhanced support for adding statistical models to plots with full interactivity • seamless integration in GUIs (Windows and Mac OS X) AcousticNDLCodeR Coding Sound Files for Use with NDL Make acoustic cues to use with the R packages ‘ndl’ or ‘ndl2’. The package implements functions used in the PLOS ONE paper: Denis Arnold, Fabian Tomaschek, Konstantin Sering, Florence Lopez, and R. Harald Baayen (accepted). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLOS ONE More details can be found in the paper and the supplement. ‘ndl’ is available on CRAN. ‘ndl2’ is available by request from . acp Autoregressive Conditional Poisson Time series analysis of count data AcrossTic A Cost-Minimal Regular Spanning Subgraph with TreeClust Construct minimum-cost regular spanning subgraph as part of a non-parametric two-sample test for equality of distribution. acrt Autocorrelation Robust Testing Functions for testing affine hypotheses on the regression coefficient vector in regression models with autocorrelated errors. AdapEnetClass A Class of Adaptive Elastic Net Methods for Censored Data Provides new approaches to variable selection for AFT model. adapr Implementation of an Accountable Data Analysis Process Tracks reading and writing within R scripts that are organized into a directed acyclic graph. Contains an interactive shiny application adaprApp(). Uses Git and file hashes to track version histories of input and output. adaptDA Adaptive Mixture Discriminant Analysis The adaptive mixture discriminant analysis (AMDA) allows to adapt a model-based classifier to the situation where a class represented in the test set may have not been encountered earlier in the learning phase. AdaptGauss Gaussian Mixture Models (GMM) Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot and Chi-squared test. adaptiveGPCA Adaptive Generalized PCA Implements adaptive gPCA, as described in: Fukuyama, J. (2017) . The package also includes functionality for applying the method to ‘phyloseq’ objects so that the method can be easily applied to microbiome data and a ‘shiny’ app for interactive visualization. ADCT Adaptive Design in Clinical Trials Existing adaptive design methods in clinical trials. The package includes power, stopping boundaries (sample size) calculation functions for two-group group sequential designs, adaptive design with coprimary endpoints, biomarker-informed adaptive design, etc. addhaz Binomial and Multinomial Additive Hazards Models Functions to fit the binomial and multinomial additive hazards models and to calculate the contribution of diseases/conditions to the disability prevalence, as proposed by Nusselder and Looman (2004) . addhazard Fit Additive Hazards Models for Survival Analysis Contains tools to fit additive hazards model to random sampling, two-phase sampling and two-phase sampling with auxiliary information. This package provides regression parameter estimates and their model-based and robust standard errors. It also offers tools to make prediction of individual specific hazards. ADDT A Package for Analysis of Accelerated Destructive Degradation Test Data Accelerated destructive degradation tests (ADDT) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the collected data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparison. This package performs the least squares (LS) and maximum likelihood (ML) procedures for estimating TI for polymeric materials. The LS approach is a two-step approach that is currently used in industrial standards, while the ML procedure is widely used in the statistical literature. The ML approach allows one to do statistical inference such as quantifying uncertainties in estimation, hypothesis testing, and predictions. Two publicly available datasets are provided to allow users to experiment and practice with the functions. adeba Adaptive Density Estimation by Bayesian Averaging Univariate and multivariate non-parametric kernel density estimation with adaptive bandwidth using a Bayesian approach to Abramson’s square root law. adegraphics An S4 Lattice-Based Package for the Representation of Multivariate Data Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the ‘ade4’ package. adepro A Shiny Application for the (Audio-)Visualization of Adverse Event Profiles The name of this package is an abbreviation for Animation of Adverse Event Profiles and refers to a shiny application which (audio-)visualizes adverse events occurring in clinical trials. As this data is usually considered sensitive, this tool is provided as a stand-alone application that can be launched from any local machine on which the data is stored. adespatial Multivariate Multiscale Spatial Analysis Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran’s Eigenvectors Maps, MEM). ADMMnet Regularized Model with Selecting the Number of Non-Zeros Fit linear and cox models regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty, and their adaptive forms, such as adaptive lasso and net adjusting for signs of linked coefficients. In addition, it treats the number of non-zero coefficients as another tuning parameter and simultaneously selects with the regularization parameter. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients. ADPclust Fast Clustering Using Adaptive Density Peak Detection An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio[2014]’s idea. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘density-distance plot’. advclust Object Oriented Advanced Clustering S4 Object Oriented for Advanced Fuzzy Clustering and Fuzzy COnsensus Clustering. Techniques that provided by this package are Fuzzy C-Means, Gustafson Kessel (Babuska Version), Gath-Geva, Sum Voting Consensus, Product Voting Consensus, and Borda Voting Consensus. This package also provide visualization via Biplot and Radar Plot. AEDForecasting Change Point Analysis in ARIMA Forecasting Package to incorporate change point analysis in ARIMA forecasting. afc Generalized Discrimination Score This is an implementation of the Generalized Discrimination Score (also known as Two Alternatives Forced Choice Score, 2AFC) for various representations of forecasts and verifying observations. The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) . afex Analysis of Factorial Experiments Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed between-within (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex uses type 3 sums of squares as default (imitating commercial statistical software). affluenceIndex Affluence Indices Computes the statistical indices of affluence (richness) and constructs bootstrap confidence intervals for these indices. Also computes the Wolfson polarization index. AFM Atomic Force Microscope Image Analysis Provides Atomic Force Microscope images analysis such as Power Spectrum Density, roughness against lengthscale, variogram and variance, fractal dimension and scale. after Run Code in the Background Run an R function in the background, possibly after a delay. The current version uses the Tcl event loop and was ported from the ‘tcltk2’ package. aftgee Accelerated Failure Time Model with Generalized Estimating Equations This package features both rank-based estimates and least square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan’s weight and the general’s weight such as the logrank weight. For the least square estimation, the estimating equation is solved with Generalized Estimating Equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE’s setting. AhoCorasickTrie Fast Searching for Multiple Keywords in Multiple Texts Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported. ahp Analytical Hierarchy Process (AHP) with R An R package to model complex decision making problems using AHP (Analytic Hierarchy Process). AHP lets you analyse complex decision making problems. AHR Estimation and Testing of Average Hazard Ratios Methods for estimation of multivariate average hazard ratios as defined by Kalbfleisch and Prentice. The underlying survival functions of the event of interest in each group can be estimated using either the (weighted) Kaplan-Meier estimator or the Aalen-Johansen estimator for the transition probabilities in Markov multi-state models. Right-censored and left-truncated data is supported. Moreover, the difference in restricted mean survival can be estimated. Ake Associated Kernel Estimations Continuous and discrete (count or categorical) estimation of density, probability mass function (pmf) and regression functions are performed using associated kernels. The cross-validation technique and the local Bayesian procedure are also implemented for bandwidth selection. algorithmia Allows you to Easily Interact with the Algorithmia Platform The company, Algorithmia, houses the largest marketplace of online algorithms. This package essentially holds a bunch of REST wrappers that make it very easy to call algorithms in the Algorithmia platform and access files and directories in the Algorithmia data API. To learn more about the services they offer and the algorithms in the platform visit . More information for developers can be found at . algstat Algebraic statistics in R algstat provides functionality for algebraic statistics in R. Current applications include exact inference in log-linear models for contingency table data, analysis of ranked and partially ranked data, and general purpose tools for multivariate polynomials, building on the mpoly package. To aid in the process, algstat has ports to Macaulay2, Bertini, LattE-integrale and 4ti2. AlignStat Comparison of Alternative Multiple Aequence Alignments Methods for comparing two alternative multiple sequence alignments (MSAs) to determine whether they align homologous residues in the same columns as one another. It then classifies similarities and differences into conserved gaps, conserved sequence, merges, splits or shifts of one MSA relative to the other. Summarising these categories for each MSA column yields information on which sequence regions are agreed upon my both MSAs, and which differ. Several plotting functions enable easily visualisation of the comparison data for analysis. alineR Alignment of Phonetic Sequence Using the ‘ALINE’ Algorithm Functions are provided to calculate the ‘ALINE’ Distance between a cognate pair. The score is based on phonetic features represented using the Unicode-compliant International Phonetic Alphabet (IPA). Parameterized features weights used to determine the optimal alignment and functions are provided to estimate optimum values.This project was funded by the National Science Foundation Cultural Anthropology Program (Grant number SBS-1030031) and the University of Maryland College of Behavioral and Social Sciences. allanvar Allan Variance Analysis A collection of tools for stochastic sensor error characterization using the Allan Variance technique originally developed by D. Allan. alluvial Alluvial Diagrams Creating alluvial diagrams (also known as parallel sets plots) for multivariate and time series-like data. alphaOutlier Obtain Alpha-Outlier Regions for Well-Known Probability Distributions Given the parameters of a distribution, the package uses the concept of alpha-outliers by Davies and Gather (1993) to flag outliers in a data set. See Davies, L.; Gather, U. (1993): The identification of multiple outliers, JASA, 88 423, 782-792, doi: 10.1080/01621459.1993.10476339 for details. altmeta Alternative Meta-Analysis Methods Provides alternative statistical methods for meta-analysis, including new heterogeneity tests, estimators of between-study variance, and heterogeneity measures that are robust to outliers. AMCTestmakeR Generate LaTeX Code for Auto-Multiple-Choice (AMC) Generate code for use with the Optical Mark Recognition free software Auto Multiple Choice (AMC). More specifically, this package provides functions that use as input the question and answer texts, and output the LaTeX code for AMC. ampd An Algorithm for Automatic Peak Detection in Noisy Periodic and Quasi- Periodic Signals A method for automatic detection of peaks in noisy periodic and quasi-periodic signals. This method, called automatic multiscale-based peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scale-dependent occurrences of local maxima. analyz Model Layer for Automatic Data Analysis Class with methods to read and execute R commands described as steps in a CSV file. anfis Adaptive Neuro Fuzzy Inference System in R The package implements ANFIS Type 3 Takagi and Sugeno’s fuzzy if-then rule network with the following features: (1) Independent number of membership functions(MF) for each input, and also different MF extensible types. (2) Type 3 Takagi and Sugeno’s fuzzy if-then rule (3) Full Rule combinations, e.g. 2 inputs 2 membership funtions -> 4 fuzzy rules (4) Hibrid learning, i.e. Descent Gradient for precedents and Least Squares Estimation for consequents (5) Multiple outputs. aniDom Inferring Dominance Hierarchies and Estimating Uncertainty Provides: (1) Tools to infer dominance hierarchies based on calculating Elo scores, but with custom functions to improve estimates in animals with relatively stable dominance ranks. (2) Tools to plot the shape of the dominance hierarchy and estimate the uncertainty of a given data set. ANLP Build Text Prediction Model Library to sample and clean text data, build N-gram model, Backoff algorithm etc. anMC Compute High Dimensional Orthant Probabilities Computationally efficient method to estimate orthant probabilities of high-dimensional Gaussian vectors. Further implements a function to compute conservative estimates of excursion sets under Gaussian random field priors. anocva A Non-Parametric Statistical Test to Compare Clustering Structures Provides ANOCVA (ANalysis Of Cluster VAriability), a non-parametric statistical test to compare clustering structures with applications in functional magnetic resonance imaging data (fMRI). The ANOCVA allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. ANOM Analysis of Means Analysis of means (ANOM) as used in technometrical computing. The package takes results from multiple comparisons with the grand mean (obtained with multcomp, SimComp, nparcomp, or MCPAN) or corresponding simultaneous confidence intervals as input and produces ANOM decision charts that illustrate which group means deviate significantly from the grand mean. anomalous Anomalous time series package for R It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo. anomalous-acm Anomalous time series package for R (ACM) It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo. anomalyDetection Implementation of Augmented Network Log Anomaly Detection Procedures Implements procedures to aid in detecting network log anomalies. By combining various multivariate analytic approaches relevant to network anomaly detection, it provides cyber analysts efficient means to detect suspected anomalies requiring further evaluation. AnomalyDetection Anomaly Detection with R AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences. anonymizer Anonymize Data Containing Personally Identifiable Information Allows users to quickly and easily anonymize data containing Personally Identifiable Information (PII) through convenience functions. antiword Extract Text from Microsoft Word Documents Wraps the ‘AntiWord’ utility to extract text from Microsoft Word documents. The utility only supports the old ‘doc’ format, not the new xml based ‘docx’ format. anytime Anything to ‘POSIXct’ Converter Convert input in character, integer, or numeric form into ‘POSIXct’ objects, using one of a number of predefined formats, and relying on Boost facilities for date and time parsing. apa Format Outputs of Statistical Tests According to APA Guidelines Formatter functions in the ‘apa’ package take the return value of a statistical test function, e.g. a call to chisq.test() and return a string formatted according to the guidelines of the APA (American Psychological Association). apc Age-Period-Cohort Analysis Functions for age-period-cohort analysis. The data can be organised in matrices indexed by age-cohort, age-period or cohort-period. The data can include dose and response or just doses. The statistical model is a generalized linear model (GLM) allowing for 3,2,1 or 0 of the age-period-cohort factors. The canonical parametrisation of Kuang, Nielsen and Nielsen (2008) is used. Thus, the analysis does not rely on ad hoc identification. apc: An R Package for Age-Period-Cohort Analysis apdesign An Implementation of the Additive Polynomial Design Matrix An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality. APfun Geo-Processing Base Functions Base tools for facilitating the creation geo-processing functions in R. aphid Analysis with Profile Hidden Markov Models Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification. APML0 Augmented and Penalized Minimization Method L0 Fit linear and Cox models regularized with L0, lasso (L1), elastic-net (L1 and L2), or net (L1 and Laplacian) penalty, and their adaptive forms, such as adaptive lasso / elastic-net and net adjusting for signs of linked coefficients. It solves L0 penalty problem by simultaneously selecting regularization parameters and the number of non-zero coefficients. This augmented and penalized minimization method provides an approximation solution to the L0 penalty problem, but runs as fast as L1 regularization problem. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients. It could deal with very high dimensional data and has superior selection performance. apng Convert Png Files into Animated Png Convert several png files into an animated png file. This package exports only a single function apng’. Call the apng function with a vector of file names (which should be png files) to convert them to a single animated png file. apricom Tools for the a Priori Comparison of Regression Modelling Strategies Tools to compare several model adjustment and validation methods prior to application in a final analysis. APtools Average Positive Predictive Values (AP) for Binary Outcomes and Censored Event Times We provide tools to estimate two prediction performance metrics, the average positive predictive values (AP) as well as the well-known AUC (the area under the receiver operator characteristic curve) for risk scores or marker. The outcome of interest is either binary or censored event time. Note that for censored event time, our functions estimate the AP and the AUC are time-dependent for pre-specified time interval(s). A function that compares the APs of two risk scores/markers is also included. Optional outputs include positive predictive values and true positive fractions at the specified marker cut-off values, and a plot of the time-dependent AP versus time (available for event time data). AR Another Look at the Acceptance-Rejection Method In mathematics, ‘rejection sampling’ is a basic technique used to generate observations from a distribution. It is also commonly called ‘the Acceptance-Rejection method’ or ‘Accept-Reject algorithm’ and is a type of Monte Carlo method. ‘Acceptance-Rejection method’ is based on the observation that to sample a random variable one can perform a uniformly random sampling of the 2D cartesian graph, and keep the samples in the region under the graph of its density function. Package ‘AR’ is able to generate/simulate random data from a probability density function by Acceptance-Rejection method. Moreover, this package is a useful teaching resource for graphical presentation of Acceptance-Rejection method. From the practical point of view, the user needs to calculate a constant in Acceptance-Rejection method, which package ‘AR’ is able to compute this constant by optimization tools. Several numerical examples are provided to illustrate the graphical presentation for the Acceptance-Rejection Method. arabicStemR Arabic Stemmer for Text Analysis Allows users to stem Arabic texts for text analysis. arc Association Rule Classification Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification (ARC). The package also contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the ‘arules’ package. ARCensReg Fitting Univariate Censored Linear Regression Model with Autoregressive Errors It fits an univariate left or right censored linear regression model with autoregressive errors under the normal distribution. It provides estimates and standard errors of the parameters, prediction of future observations and it supports missing values on the dependent variable. It also provides convergence plots when exists at least one censored observation. ArCo Artificial Counterfactual Package Set of functions to analyse and estimate Artificial Counterfactual models from Carvalho, Masini and Medeiros (2016) . ArfimaMLM Arfima-MLM Estimation For Repeated Cross-Sectional Data Functions to facilitate the estimation of Arfima-MLM models for repeated cross-sectional data and pooled cross-sectional time-series data (see Lebo and Weber 2015). The estimation procedure uses double filtering with Arfima methods to account for autocorrelation in repeated cross-sectional data followed by multilevel modeling (MLM) to estimate aggregate as well as individual-level parameters simultaneously. argon2 Secure Password Hashing Utilities for secure password hashing via the argon2 algorithm. It is a relatively new hashing algorithm and is believed to be very secure. The ‘argon2’ implementation included in the package is the reference implementation. The package also includes some utilities that should be useful for digest authentication, including a wrapper of ‘blake2b’. For similar R packages, see sodium and ‘bcrypt’. See or for more information. ArgumentCheck Improved Communication to Users with Respect to Problems in Function Arguments The typical process of checking arguments in functions is iterative. In this process, an error may be returned and the user may fix it only to receive another error on a different argument. ‘ArgumentCheck’ facilitates a more helpful way to perform argument checks allowing the programmer to run all of the checks and then return all of the errors and warnings in a single message. arqas Application in R for Queueing Analysis and Simulation Provides functions for compute the main characteristics of the following queueing models: M/M/1, M/M/s, M/M/1/k, M/M/s/k, M/M/1/Inf/H, M/ M/s/Inf/H, M/M/s/Inf/H with Y replacements, M/M/Inf, Open Jackson Networks and Closed Jackson Networks. Moreover, it is also possible to simulate similar queueing models with any type of arrival or service distribution: G/ G/1, G/G/s, G/G/1/k, G/G/s/k, G/G/1/Inf/H, G/G/s/Inf/H, G/G/s/Inf/H with Y replacements, Open Networks and Closed Networks. Finally, contains functions for fit data to a statistic distribution. arsenal An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types ‘by’ the levels of a categorical variable; modelsum(), which performs simple model fits on the same endpoint for many variables (univariate or adjusted for standard covariates); and freqlist(), a powerful frequency table across many categorical variables. ART Aligned Rank Transform for Nonparametric Factorial Analysis An implementation of the Aligned Rank Transform technique for factorial analysis (see references below for details) including models with missing terms (unsaturated factorial models). The function first computes a separate aligned ranked response variable for each effect of the user-specified model, and then runs a classic ANOVA on each of the aligned ranked responses. For further details, see Higgins, J. J. and Tashtoush, S. (1994). An aligned rank transform test for interaction. Nonlinear World 1 (2), pp. 201-211. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins,J.J. (2011). The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’11). New York: ACM Press, pp. 143-146. . artfima Fit ARTFIMA Model Fit and simulate ARTFIMA. Theoretical autocovariance function and spectral density function for stationary ARTFIMA. ARTIVA Time-Varying DBN Inference with the ARTIVA (Auto Regressive TIme VArying) Model Reversible Jump MCMC (RJ-MCMC)sampling for approximating the posterior distribution of a time varying regulatory network, under the Auto Regressive TIme VArying (ARTIVA) model (for a detailed description of the algorithm, see Lebre et al. BMC Systems Biology, 2010). Starting from time-course gene expression measurements for a gene of interest (referred to as ‘target gene’) and a set of genes (referred to as ‘parent genes’) which may explain the expression of the target gene, the ARTIVA procedure identifies temporal segments for which a set of interactions occur between the ‘parent genes’ and the ‘target gene’. The time points that delimit the different temporal segments are referred to as changepoints (CP). arules Mining Association Rules and Frequent Itemsets Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt. arulesCBA Classification Based on Association Rules Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier. aRxiv Interface to the arXiv API An interface to the API for arXiv, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics. GitHub as.color Assign Random Colors to Unique Items in a Vector The as.color function takes an R vector of any class as an input, and outputs a vector of unique hexadecimal color values that correspond to the unique input values. This is most handy when overlaying points and lines for data that correspond to different levels or factors. The function will also print the random seed used to generate the colors. If you like the color palette generated, you can save the seed and reuse those colors. asht Applied Statistical Hypothesis Tests Some hypothesis test functions with a focus on non-asymptotic methods that have matching confidence intervals. AsioHeaders Asio C++ Header Files Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. ‘Asio’ is also included in Boost but requires linking when used with Boost. Standalone it can be used header-only provided a recent-enough compiler. ‘Asio’ is written and maintained by Christopher M. Kohlhoff. ‘Asio’ is released under the ‘Boost Software License’, Version 1.0. ASIP Automated Satellite Image Processing Perform complex satellite image processes automatically and efficiently. Package currently supports satellite images from most widely used Landsat 4,5,7 and 8 and ASTER L1T data. The primary uses of this package is given below. 1. Conversion of optical bands to top of atmosphere reflectance. 2. Conversion of thermal bands to corresponding temperature images. 3. Derive application oriented products directly from source satellite image bands. 4. Compute user defined equation and produce corresponding image product. 5. Other basic tools for satellite image processing. References. i. Chander and Markham (2003) . ii. Roy et.al, (2014) . iii. Abrams (2000) . aSPC An Adaptive Sum of Powered Correlation Test (aSPC) for Global Association Between Two Random Vectors The aSPC test is designed to test global association between two groups of variables potentially with moderate to high dimension (e.g. in hundreds). The aSPC is particularly useful when the association signals between two groups of variables are sparse. aSPU Adaptive Sum of Powered Score Test R codes for the (adaptive) Sum of Powered Score (‘SPU’ and ‘aSPU’) tests, inverse variance weighted Sum of Powered score (‘SPUw’ and ‘aSPUw’) tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests (‘SPUpath’) and adaptive ‘SPUpath’ (‘aSPUpath’) test, Gene-based Association Test that uses an extended Simes procedure (‘GATES’), Hybrid Set-based Test (‘HYST’), extended version of ‘GATES’ test for pathway-based association testing (‘Gates-Simes’). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative. asremlPlus Augments the Use of ‘Asreml’ in Fitting Mixed Models Provides functions that assist in automating the testing of terms in mixed models when ‘asreml’ is used to fit the models. The package ‘asreml’ is marketed by ‘VSNi’ (http://www.vsni.co.uk ) as ‘asreml-R’ and provides a computationally efficient algorithm for fitting mixed models using Residual Maximum Likelihood. The content falls into the following natural groupings: (i) Data, (ii) Object manipulation functions, (iii) Model modification functions, (iv) Model testing functions, (v) Model diagnostics functions, (vi) Prediction production and presentation functions, (vii) Response transformation functions, and (viii) Miscellaneous functions. A history of the fitting of a sequence of models is kept in a data frame. Procedures are available for choosing models that conform to the hierarchy or marginality principle and for displaying predictions for significant terms in tables and graphs. AssayCorrector Detection and Correction of Spatial Bias in HTS Screens (1) Detects plate-specific spatial bias by identifying rows and columns of all plates of the assay affected by this bias (following the results of the Mann-Whitney U test) as well as assay-specific spatial bias by identifying well locations (i.e., well positions scanned across all plates of a given assay) affected by this bias (also following the results of the Mann-Whitney U test); (2) Allows one to correct plate-specific spatial bias using either the additive or multiplicative PMP (Partial Mean Polish) method (the most appropriate spatial bias model can be either specified by the user or determined by the program following the results of the Kolmogorov-Smirnov two-sample test) to correct the assay measurements as well as to correct assay-specific spatial bias by carrying out robust Z-scores within each plate of the assay and then traditional Z-scores across well locations. assertive.data Assertions to Check Properties of Data A set of predicates and assertions for checking the properties of (country independent) complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.data.us Assertions to Check Properties of Strings A set of predicates and assertions for checking the properties of US-specific complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.files Assertions to Check Properties of Files A set of predicates and assertions for checking the properties of files and connections. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.numbers Assertions to Check Properties of Numbers A set of predicates and assertions for checking the properties of numbers. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.properties Assertions to Check Properties of Variables A set of predicates and assertions for checking the properties of variables, such as length, names and attributes. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.reflection Assertions for Checking the State of R A set of predicates and assertions for checking the state and capabilities of R, the operating system it is running on, and the IDE being used. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.sets Assertions to Check Properties of Sets A set of predicates and assertions for checking the properties of sets. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.strings Assertions to Check Properties of Strings A set of predicates and assertions for checking the properties of strings. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertive.types Assertions to Check Types of Variables A set of predicates and assertions for checking the types of variables. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. assertr Assertive programming for R analysis pipelines The assertr package supplies a suite of functions designed to verify assumptions about data early in an dplyr/magrittr analysis pipeline so that data errors are spotted early and can be addressed quickly. assist A Suite of R Functions Implementing Spline Smoothing Techniques A comprehensive package for fitting various non-parametric/semi-parametric linear/nonlinear fixed/mixed smoothing spline models. assortnet Calculate the Assortativity Coefficient of Weighted and Binary Networks Functions to calculate the assortment of vertices in social networks. This can be measured on both weighted and binary networks, with discrete or continuous vertex values. asVPC Average Shifted Visual Predictive Checks The visual predictive checks are well-known method to validate the nonlinear mixed effect model, especially in pharmacometrics area. The average shifted visual predictive checks are the newly developed method of Visual predictive checks combined with the idea of the average shifted histogram. asymmetry The Slide-Vector Model for Multidimensional Scaling of Asymmetric Data The slide-vector model is provided in this package together with functions for the analysis and graphical display of asymmetry. The slide vector model is a scaling model for asymmetric data. A distance model is fitted to the symmetric part of the data whereas the asymmetric part of the data is represented by projections of the coordinates onto the slide-vector. The slide-vector points in the direction of large asymmetries in the data. The distance is modified in such a way that the distance between two points that are parallel to the slide-vector is larger in the direction of this vector. The distance is smaller in the opposite direction. If the line connecting two points is perpendicular to the slide-vector the difference between the two projections is zero. In this case the distance between the two points is symmetric. The algorithm for fitting this model is derived from the majorization approach to multidimensional scaling. ATE Inference for Average Treatment Effects using Covariate Balancing Nonparametric estimation and inference for average treatment effects based on covariate balancing. aTSA Alternative Time Series Analysis Contains some tools for testing, analyzing time series data and fitting popular time series models such as ARIMA, Moving Average and Holt Winters, etc. Most functions also provide nice and clear outputs like SAS does, such as identify, estimate and forecast, which are the same statements in PROC ARIMA in SAS. attrCUSUM Tools for Attribute VSI CUSUM Control Chart An implementation of tools for design of attribute variable sampling interval cumulative sum chart. It currently provides information for monitoring of mean increase such as average number of sample to signal, average time to signal, a matrix of transient probabilities, suitable control limits when the data are (zero inflated) Poisson/binomial distribution. Functions in the tools can be easily applied to other count processes. Also, tools might be extended to more complicated cumulative sum control chart. We leave these issues as our perpetual work. aurelius Generates PFA Documents from R Code and Optionally Runs Them Provides tools for converting R objects and syntax into the Portable Format for Analytics (PFA). Allows for testing validity and runtime behavior of PFA documents through rPython and Titus, a more complete implementation of PFA for Python. The Portable Format for Analytics is a specification for event-based processors that perform predictive or analytic calculations and is aimed at helping smooth the transition from statistical model development to large-scale and/or online production. See for more information. auRoc Various Methods to Estimate the AUC Estimate the AUC using a variety of methods as follows: (1) frequentist nonparametric methods based on the Mann-Whitney statistic or kernel methods. (2) frequentist parametric methods using the likelihood ratio test based on higher-order asymptotic results, the signed log-likelihood ratio test, the Wald test, or the approximate ”t” solution to the Behrens-Fisher problem. (3) Bayesian parametric MCMC methods. autoBagging Learning to Rank Bagging Workflows with Metalearning A framework for automated machine learning. Concretely, the focus is on the optimisation of bagging workflows. A bagging workflows is composed by three phases: (i) generation: which and how many predictive models to learn; (ii) pruning: after learning a set of models, the worst ones are cut off from the ensemble; and (iii) integration: how the models are combined for predicting a new observation. autoBagging optimises these processes by combining metalearning and a learning to rank approach to learn from metadata. It automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. A complete description of the method can be found in: Pinto, F., Cerqueira, V., Soares, C., Mendes-Moreira, J. (2017): ‘autoBagging: Learning to Rank Bagging Workflows with Metalearning’ arXiv preprint arXiv:1706.09367. automagic Automagically Document and Install Packages Necessary to Run R Code Parse R code in a given directory for R packages and attempt to install them from CRAN or GitHub. Optionally use a dependencies file for tighter control over which package versions to install. AutoModel Automated Hierarchical Multiple Regression with Assumptions Checking A set of functions that automates the process and produces reasonable output for hierarchical multiple regression models. It allows you to specify predictor blocks, from which it generates all of the linear models, and checks the assumptions of the model, producing the requisite plots and statistics to allow you to judge the suitability of the model. AutoregressionMDE Minimum Distance Estimation in Autoregressive Model Consider autoregressive model of order p where the distribution function of innovation is unknown, but innovations are independent and symmetrically distributed. The package contains a function named ARMDE which takes X (vector of n observations) and p (order of the model) as input argument and returns minimum distance estimator of the parameters in the model. autoSEM Performs Specification Search in Structural Equation Models Implements multiple heuristic search algorithms for automatically creating structural equation models. aVirtualTwins Adaptation of Virtual Twins Method from Jared Foster Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method. AWR AWS’ Java ‘SDK’ for R Installs the compiled Java modules of the Amazon Web Services (‘AWS’) ‘SDK’ to be used in downstream R packages interacting with ‘AWS’. See for more information on the ‘AWS’ ‘SDK’ for Java. AWR.Kinesis Amazon ‘Kinesis’ Consumer Application for Stream Processing Fetching data from Amazon ‘Kinesis’ Streams using the Java-based ‘MultiLangDaemon’ interacting with Amazon Web Services (‘AWS’) for easy stream processing from R. For more information on ‘Kinesis’, see . AWR.KMS A Simple Client to the ‘AWS’ Key Management Service Encrypt plain text and ‘decrypt’ cipher text using encryption keys hosted at Amazon Web Services (‘AWS’) Key Management Service (‘KMS’), on which see for more information. aws.alexa Client for the Amazon Alexa Web Information Services API Use the Amazon Alexa Web Information Services API to find information about domains, including the kind of content that they carry, how popular are they—rank and traffic history, sites linking to them, among other things. See for more information. aws.cloudtrail AWS CloudTrail Client Package A simple client package for the Amazon Web Services (‘AWS’) ‘CloudTrail’ ‘API’ . aws.iam AWS IAM Client Package A simple client for the Amazon Web Services (‘AWS’) Identity and Access Management (‘IAM’) ‘API’ . aws.polly Client for AWS Polly A client for AWS Polly , a speech synthesis service. aws.s3 AWS S3 Client Package A simple client package for the Amazon Web Services (AWS) Simple Storage Service (S3) REST API . aws.ses AWS SES Client Package A simple client package for the Amazon Web Services (AWS) Simple Email Service (SES) REST API. aws.signature Amazon Web Services Request Signatures Generates request signatures for Amazon Web Services (AWS) APIs. aws.sns AWS SNS Client Package A simple client package for the Amazon Web Services (AWS) Simple Notification Service (SNS) API. aws.sqs AWS SQS Client Package A simple client package for the Amazon Web Services (AWS) Simple Queue Service (SQS) API. awsjavasdk Boilerplate R Access to the Amazon Web Services (‘AWS’) Java SDK Provides boilerplate access to all of the classes included in the Amazon Web Services (‘AWS’) Java Software Development Kit (SDK) via package:’rJava’. According to Amazon, the ‘SDK helps take the complexity out of coding by providing Java APIs for many AWS services including Amazon S3, Amazon EC2, DynamoDB, and more’. You can read more about the included Java code on Amazon’s website: . awspack Amazon Web Services Bundle Package A bundle of all of ‘cloudyr’ project packages for Amazon Web Services (‘AWS’) . It depends upon all of the ‘cloudyr’ project’s ‘AWS’ packages. It is mainly useful for installing the entire suite of packages; more likely than not you will only want to load individual packages one at a time. AzureML Discover, Publish and Consume Web Services on Microsoft Azure Machine Learning Provides an interface with Microsoft Azure to easily publish functions and trained models as a web service, and discover and consume web service. B BACCT Bayesian Augmented Control for Clinical Trials Implements the Bayesian Augmented Control (BAC, a.k.a. Bayesian historical data borrowing) method under clinical trial setting by calling ‘Just Another Gibbs Sampler’ (‘JAGS’) software. In addition, the ‘BACCT’ package evaluates user-specified decision rules by computing the type-I error/power, or probability of correct go/no-go decision at interim look. The evaluation can be presented numerically or graphically. Users need to have ‘JAGS’ 4.0.0 or newer installed due to a compatibility issue with ‘rjags’ package. Currently, the package implements the BAC method for binary outcome only. Support for continuous and survival endpoints will be added in future releases. We would like to thank AbbVie’s Statistical Innovation group and Clinical Statistics group for their support in developing the ‘BACCT’ package. backpipe Backward Pipe Operator Provides a backward-pipe operator for ‘magrittr’ (%<%) or ‘pipeR’ (%<<%) that allows for a performing operations from right-to-left. This is useful in instances where there is right-to-left ordering commonly observed with nested structures such as trees/directories and markup languages such as HTML and XML. backports Reimplementations of Functions Introduced Since R-3.0.0 Provides implementations of functions which have been introduced in R since version 3.0.0. The backports are conditionally exported which results in R resolving the function names to the version shipped with R (if available) and uses the implemented backports as fallback. This way package developers can make use of the new functions without without worrying about the minimum required R version. backShift Learning Causal Cyclic Graphs from Unknown Shift Interventions Code for ‘backShift’, an algorithm to estimate the connectivity matrix of a directed (possibly cyclic) graph with hidden variables. The underlying system is required to be linear and we assume that observations under different shift interventions are available. For more details, see http://…/1506.02494 . bacr Bayesian Adjustment for Confounding Estimating the average causal effect based on the Bayesian Adjustment for Confounding (BAC) algorithm. badger Badge for R Package Query information and generate badge for using in README and GitHub Pages. BalanceCheck Balance Check for Multiple Covariates in Matched Observational Studies Two practical tests are provided for assessing whether multiple covariates in a treatment group and a matched control group are balanced in observational studies. BAMBI Bivariate Angular Mixture Models Fit (using Bayesian methods) and simulate mixtures of univariate and bivariate angular distributions. bamlss Bayesian Additive Models for Location Scale and Shape (and Beyond) R infrastructures for Bayesian regression models. BANFF Bayesian Network Feature Finder Provides efficient Bayesian nonparametric models for network feature selection bannerCommenter Make Banner Comments with a Consistent Format A convenience package for use while drafting code. It facilitates making stand-out comment lines decorated with bands of characters. The input text strings are converted into R comment lines, suitably formatted. These are then displayed in a console window and, if possible, automatically transferred to a clipboard ready for pasting into an R script. Designed to save time when drafting R scripts that will need to be navigated and maintained by other programmers. BarBorGradient Function Minimum Approximator Tool to find where a function has its lowest value(minimum). The functions can be any dimensions. Recommended use is with eps=10^-10, but can be run with 10^-20, although this depends on the function. Two more methods are in this package, simple gradient method (Gradmod) and Powell method (Powell). These are not recommended for use, their purpose are purely for comparison. Barnard Barnard’s Unconditional Test Barnard’s unconditional test for 2×2 contingency tables. BART Bayesian Additive Regression Trees Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) and Sparapani, Logan, McCulloch and Laud (2016) . bartMachine Bayesian Additive Regression Trees An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization. bartMachineJARs bartMachine JARs These are bartMachine’s Java dependency libraries. Note: this package has no functionality of its own and should not be installed as a standalone package without bartMachine. Barycenter Wasserstein Barycenter Computation of a Wasserstein Barycenter. The package implements a method described in Cuturi (2014) ‘Fast Computation of Wasserstein Barycenters’. The paper is available at . To speed up the computation time the main iteration step is based on ‘RcppArmadillo’. BAS Bayesian Model Averaging using Bayesian Adaptive Sampling Package for Bayesian Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors (JASA 2008) or mixtures of g-priors in GLMS of Li and Clyde 2015. Other model selection criteria include AIC and BIC. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Allows uniform or beta-binomial prior distributions on models, and may force variables to always be included. base64url Fast and URL-Safe Base64 Encoder and Decoder In contrast to RFC3548, the 62nd character (‘+’) is replaced with ‘-‘, the 63rd character (‘/’) is replaced with ‘_’. Furthermore, the encoder does not fill the string with trailing ‘=’. The resulting encoded strings comply to the regular expression pattern ‘[A-Za-z0-9_-]’ and thus are safe to use in URLs or for file names. basefun Infrastructure for Computing with Basis Functions Some very simple infrastructure for basis functions. BASS Bayesian Adaptive Spline Surfaces Bayesian fitting and sensitivity analysis methods for adaptive spline surfaces. Built to handle continuous and categorical inputs as well as functional or scalar output. An extension of the methodology in Denison, Mallick and Smith (1998) . bastah Big Data Statistical Analysis for High-Dimensional Models Big data statistical analysis for high-dimensional models is made possible by modifying lasso.proj() in ‘hdi’ package by replacing its nodewise-regression with sparse precision matrix computation using ‘BigQUIC’. BatchExperiments Statistical Experiments on Batch Computing Clusters Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page. BatchGetSymbols Downloads and Organizes Financial Data for Multiple Tickers Makes it easy to download a large number of trade data from Yahoo or Google Finance. BatchJobs Batch Computing with R Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page. batchtools Tools for Computation on Batch Systems As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ ( ), ‘OpenLava’ ( ), ‘Univia Grid Engine’/’Oracle Grid Engine’ ( ), ‘Slurm’ ( ), ‘Torque/PBS’ ( ), or ‘Docker Swarm’ ( ). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way. BaTFLED3D Bayesian Tensor Factorization Linked to External Data BaTFLED is a machine learning algorithm designed to make predictions and determine interactions in data that varies along three independent modes. For example BaTFLED was developed to predict the growth of cell lines when treated with drugs at different doses. The first mode corresponds to cell lines and incorporates predictors such as cell line genomics and growth conditions. The second mode corresponds to drugs and incorporates predictors indicating known targets and structural features. The third mode corresponds to dose and there are no dose-specific predictors (although the algorithm is capable of including predictors for the third mode if present). See ‘BaTFLED3D_vignette.rmd’ for a simulated example. batteryreduction An R Package for Data Reduction by Battery Reduction Battery reduction is a method used in data reduction. It uses Gram-Schmidt orthogonal rotations to find out a subset of variables best representing the original set of variables. bayesAB Fast Bayesian Methods for AB Testing bayesAB provides a suite of functions that allow the user to analyze A/B test data in a Bayesian framework. bayesAB is intended to be a drop-in replacement for common frequentist hypothesis test such as the t-test and chi-sq test. Bayesian methods provide several benefits over frequentist methods in the context of A/B tests – namely in interpretability. Instead of p-values you get direct probabilities on whether A is better than B (and by how much). Instead of point estimates your posterior distributions are parametrized random variables which can be summarized any number of ways. Bayesian tests are also immune to ‘peeking’ and are thus valid whenever a test is stopped. BayesBinMix Bayesian Estimation of Mixtures of Multivariate Bernoulli Distributions Fully Bayesian inference for estimating the number of clusters and related parameters to heterogeneous binary data. bayesboot An Implementation of Rubin’s (1981) Bayesian Bootstrap Functions for performing the Bayesian bootstrap as introduced by Rubin (1981) and for summarizing the result. The implementation can handle both summary statistics that works on a weighted version of the data and summary statistics that works on a resampled data set. BayesBridge Bridge Regression Bayesian bridge regression. bayesCL Bayesian Inference on a GPU using OpenCL Bayesian Inference on a GPU. The package currently supports sampling from PolyaGamma, Multinomial logit and Bayesian lasso. BayesCombo Bayesian Evidence Combination Combine diverse evidence across multiple studies to test a high level scientific theory. The methods can also be used as an alternative to a standard meta-analysis. bayesDP Tools for the Bayesian Discount Prior Function Functions for augmenting data with historical controls using the Bayesian discount prior function for 1 arm and 2 arm clinical trials. BayesFactor Computation of Bayes Factors for Common Designs A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression. BayesFactorExtras Extra functions for use with the BayesFactor R package BayesFactorExtras is an R package which contains extra features related to the BayesFactor package, such as plots and analysis reports. BayesFM Bayesian Inference for Factor Modeling Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: Bayesian Exploratory Factor Analysis (befa), an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling. More approaches will be included in future releases of this package. BayesH Bayesian Regression Model with Mixture of Two Scaled Inverse Chi Square as Hyperprior Functions to performs Bayesian regression model with mixture of two scaled inverse chi square as hyperprior distribution for variance of each regression coefficient. BayesianGLasso Bayesian Graphical Lasso Implements a data-augmented block Gibbs sampler for simulating the posterior distribution of concentration matrices for specifying the topology and parameterization of a Gaussian Graphical Model (GGM). This sampler was originally proposed in Wang (2012) . BayesianNetwork Bayesian Network Modeling and Analysis A Shiny web application for creating interactive Bayesian Network models, learning the structure and parameters of Bayesian networks, and utilities for classical network analysis. BayesianTools General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics General-purpose MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. Implemented samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter. bayesImageS Bayesian Methods for Image Segmentation using a Potts Model Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, and approximate Bayesian computation (ABC). BayesLCA Bayesian Latent Class Analysis Bayesian Latent Class Analysis using several different methods. bayeslm Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors Efficient sampling for Gaussian linear regression with arbitrary priors. bayesloglin Bayesian Analysis of Contingency Table Data The function MC3() searches for log-linear models with the highest posterior probability. The function gibbsSampler() is a blocked Gibbs sampler for sampling from the posterior distribution of the log-linear parameters. The functions findPostMean() and findPostCov() compute the posterior mean and covariance matrix for decomposable models which, for these models, is available in closed form. BayesMAMS Designing Bayesian Multi-Arm Multi-Stage Studies Calculating Bayesian sample sizes for multi-arm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages. bayesmeta Bayesian Random-Effects Meta-Analysis A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, etc. BayesNetBP Bayesian Network Belief Propagation Belief propagation methods in Bayesian Networks to propagate evidence through the network. The implementation of these methods are based on the article: Cowell, RG (2005). Local Propagation in Conditional Gaussian Bayesian Networks . BayesPiecewiseICAR Hierarchical Bayesian Model for a Hazard Function Fits a piecewise exponential hazard to survival data using a Hierarchical Bayesian model with an Intrinsic Conditional Autoregressive formulation for the spatial dependency in the hazard rates for each piece. This function uses Metropolis- Hastings-Green MCMC to allow the number of split points to vary. This function outputs graphics that display the histogram of the number of split points and the trace plots of the hierarchical parameters. The function outputs a list that contains the posterior samples for the number of split points, the location of the split points, and the log hazard rates corresponding to these splits. Additionally, this outputs the posterior samples of the two hierarchical parameters, Mu and Sigma^2. bayesplot Plotting for Bayesian Models Plotting functions for posterior analysis, model checking, and MCMC diagnostics. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with Stan. bayesreg Bayesian Regression Models with Continuous Shrinkage Priors Fits linear or logistic regression model using Bayesian continuous shrinkage prior distributions. Handles ridge, lasso, horseshoe and horseshoe+ regression with logistic, Gaussian, Laplace or Student-t distributed targets. BayesS5 Bayesian Variable Selection Using Simplified Shotgun Stochastic Search with Screening (S5) In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities). BayesSpec Bayesian Spectral Analysis Techniques An implementation of methods for spectral analysis using the Bayesian framework. It includes functions for modelling spectrum as well as appropriate plotting and output estimates. There is segmentation capability with RJ MCMC (Reversible Jump Markov Chain Monte Carlo). The package takes these methods predominantly from the 2012 paper ‘AdaptSPEC: Adaptive Spectral Estimation for Nonstationary Time Series’ . BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function read.regress.data.ff utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks. BayesTree Bayesian Additive Regression Trees Implementation of BART:Bayesian Additive Regression Trees, Chipman, George, McCulloch (2010) BayesTreePrior Bayesian Tree Prior Simulation Provides a way to simulate from the prior distribution of Bayesian trees by Chipman et al. (1998) . The prior distribution of Bayesian trees is highly dependent on the design matrix X, therefore using the suggested hyperparameters by Chipman et al. (1998) is not recommended and could lead to unexpected prior distribution. This work is part of my master thesis (In revision, expected 2016) and a journal publication I’m working on. bazar Miscellaneous Basic Functions A collection of miscellaneous functions for copying objects to the clipboard (‘Copy’); manipulating strings (‘concat’, ‘mgsub’, ‘trim’, ‘verlan’); loading or showing packages (‘library_with_rep’, ‘require_with_rep’, ‘sessionPackages’); creating or testing for named lists (‘nlist’, ‘as.nlist’, ‘is.nlist’), formulas (‘is.formula’), empty objects (‘as.empty’, ‘is.empty’), whole numbers (‘as.wholenumber’, ‘is.wholenumber’); testing for equality (‘almost.equal’, ‘almost.zero’); getting modified versions of usual functions (‘rle2’, ‘sumNA’); making a pause or a stop (‘pause’, ‘stopif’); and others (‘erase’, ‘%nin%’, ‘unwhich’). BCEA Bayesian Cost Effectiveness Analysis Produces an economic evaluation of a Bayesian model in the form of MCMC simulations. Given suitable variables of cost and effectiveness / utility for two or more interventions, BCEA computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis. BCEE The Bayesian Causal Effect Estimation Algorithm Implementation of the Bayesian Causal Effect Estimation algorithm, a data-driven method for the estimation of the causal effect of a continuous exposure on a continuous outcome. For more details, see Talbot et al. (2015). bcpa Behavioral change point analysis of animal movement The Behavioral Change Point Analysis (BCPA) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. The method is based on: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) Ecology Letters 12:5 395-408. bcROCsurface Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption. bcrypt Blowfish’ Password Hashing Algorithm An R interface to the ‘OpenBSD Blowfish’ password hashing algorithm, as described in ‘A Future-Adaptable Password Scheme’ by ‘Niels Provos’. The implementation is derived from the ‘py-bcrypt’ module for Python which is a wrapper for the ‘OpenBSD’ implementation. bcs Bayesian Compressive Sensing Using Laplace Priors A Bayesian method for solving the compressive sensing problem. In particular, this package implements the algorithm ‘Fast Laplace’ found in the paper ‘Bayesian Compressive Sensing Using Laplace Priors’ by Babacan, Molina, Katsaggelos (2010) . bdlp Transparent and Reproducible Artificial Data Generation The main function generateDataset() processes a user-supplied .R file that contains metadata parameters in order to generate actual data. The metadata parameters have to be structured in the form of metadata objects, the format of which is outlined in the package vignette. This approach allows to generate artificial data in a transparent and reproducible manner. bdots Bootstrapped Differences of Time Series Analyze differences among time series curves with Oleson et al’s modified p-value technique. bdpopt Optimisation of Bayesian Decision Problems Optimisation of the expected utility in single-stage and multi-stage Bayesian decision problems. The expected utility is estimated by simulation. For single-stage problems, JAGS is used to draw MCMC samples. bdvis Biodiversity Data Visualizations Biodiversity data visualizations using R would be helpful to understand completeness of biodiversity inventory, extent of geographical, taxonomic and temporal coverage, gaps and biases in data. BDWreg Bayesian Inference for Discrete Weibull Regression A Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit. benchr High Precise Measurement of R Expressions Execution Time Provides infrastructure to accurately measure and compare the execution time of R expressions. bentcableAR Bent-Cable Regression for Independent Data or Autoregressive Time Series Included are two main interfaces for fitting and diagnosing bent-cable regressions for autoregressive time-series data or independent data (time series or otherwise): ‘bentcable.ar()’ and ‘bentcable.dev.plot()’. Some components in the package can also be used as stand-alone functions. The bent cable (linear-quadratic-linear) generalizes the broken stick (linear-linear), which is also handled by this package. Version 0.2 corrects a glitch in the computation of confidence intervals for the CTP. References that were updated from Versions 0.2.1 and 0.2.2 appear in Version 0.2.3 and up. Version 0.3.0 improves robustness of the error-message producing mechanism. It is the author’s intention to distribute any future updates via GitHub. Bergm Bayesian Exponential Random Graph Models Set of tools to analyse Bayesian exponential random graph models. BeSS Best Subset Selection for Sparse Generalized Linear Model and Cox Model An implementation of best subset selection in generalized linear model and Cox proportional hazard model via the primal dual active set algorithm. The algorithm formulates coefficient parameters and residuals as primal and dual variables and utilizes efficient active set selection strategies based on the complementarity of the primal and dual variables. betacal Beta Calibration Fit beta calibration models and obtain calibrated probabilities from them. betas Standardized Beta Coefficients Computes standardized beta coefficients and corresponding standard errors for the following models: – linear regression models with numerical covariates only – linear regression models with numerical and factorial covariates – weighted linear regression models – robust linear regression models with numerical covariates only. beyondWhittle Bayesian Spectral Inference for Stationary Time Series Implementations of a Bayesian parametric (autoregressive), a Bayesian nonparametric (Whittle likelihood with Bernstein-Dirichlet prior) and a Bayesian semiparametric (autoregressive likelihood with Bernstein-Dirichlet correction) procedure are provided. The work is based on the corrected parametric likelihood by C. Kirch et al (2017) . It was supported by DFG grant KI 1443/3-1. bfork Basic Unix Process Control Wrappers for fork()/waitpid() meant to allow R users to quickly and easily fork child processes and wait for them to finish. bfp Bayesian Fractional Polynomials Implements the Bayesian paradigm for fractional polynomial models under the assumption of normally distributed error terms. bgeva Binary Generalized Extreme Value Additive Models Routine for fitting regression models for binary rare events with linear and nonlinear covariate effects when using the quantile function of the Generalized Extreme Value random variable. bgsmtr Bayesian Group Sparse Multi-Task Regression Fits a Bayesian group-sparse multi-task regression model using Gibbs sampling. The hierarchical prior encourages shrinkage of the estimated regression coefficients at both the gene and SNP level. The model has been applied successfully to imaging phenotypes of dimension up to 100; it can be used more generally for multivariate (non-imaging) phenotypes. BH Boost C++ Header Files Boost provides free peer-reviewed portable C++ source libraries. A large part of Boost is provided as C++ template code which is resolved entirely at compile-time without linking. This package aims to provide the most useful subset of Boost libraries for template use among CRAN package. By placing these libraries in this package, we offer a more efficient distribution system for CRAN as replication of this code in the sources of other packages is avoided. BHPMF Uncertainty Quantified Matrix Completion using Bayesian Hierarchical Matrix Factorization Fills the gaps of a matrix incorporating a hierarchical side information while providing uncertainty quantification. bib2df Parse a BibTeX File to a Tibble Parse a BibTeX file to a tidy tibble (trimmed down version of data.frame) to make it accessible for further analysis and visualization. BiBitR R Wrapper for Java Implementation of BiBit A simple R wrapper for the Java BiBit algorithm from ‘A biclustering algorithm for extracting bit-patterns from binary datasets’ from Domingo et al. (2011) . An adaption for the BiBit algorithm which allows noise in the biclusters is also included. BibPlots Plot Functions for JIF (Journal Impact Factor) and Paper Percentiles Currently, the package provides two functions for plotting and analyzing bibliometric data (JIF and paper percentile values). Further extension to more plot variants is planned. biclique Maximal Complete Bipartite Graphs A tool for enumerating maximal complete bipartite graphs. The input should be a edge list file or a binary matrix file. The output are maximal complete bipartite graphs. Algorithms used can be found in this paper Y Zhang et al. BMC Bioinformatics 2014 15:110 . bife Binary Choice Models with Fixed Effects Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with a bias-correction proposed by Hahn and Newey (2004) . BIGDAWG Case-Control Analysis of Multi-Allelic Loci Data sets and functions for chi-squared Hardy-Weinberg and case-control association tests of highly polymorphic genetic data [e.g., human leukocyte antigen (HLA) data]. Performs association tests at multiple levels of polymorphism (haplotype, locus and HLA amino-acids) as described in Pappas DJ, Marin W, Hollenbach JA, Mack SJ (2016) . Combines rare variants to a common class to account for sparse cells in tables as described by Hollenbach JA, Mack SJ, Thomson G, Gourraud PA (2012) . bigFastlm Fast Linear Models for Objects from the ‘bigmemory’ Package A reimplementation of the fastLm() functionality of ‘RcppEigen’ for big.matrix objects for fast out-of-memory linear model fitting. bigKRLS Optimized Kernel Regularized Least Squares Functions for Kernel-Regularized Least Squares optimized for speed and memory usage are provided along with visualization tools. For working papers, sample code, and recent presentations visit . biglasso Big Lasso: Extending Lasso Model Fitting to Big Data in R Extend lasso and elastic-net model fitting for ultrahigh-dimensional, multi-gigabyte data sets that cannot be loaded into memory. Compared to existing lasso-fitting packages, it preserves equivalently fast computation speed but is much more memory-efficient, thus allowing for very powerful big data analysis even with only a single laptop. bigReg Generalized Linear Models (GLM) for Large Data Sets Allows the user to carry out GLM on very large data sets. Data can be created using the data_frame() function and appended to the object with object\$append(data); data_frame and data_matrix objects are available that allow the user to store large data on disk. The data is stored as doubles in binary format and any character columns are transformed to factors and then stored as numeric (binary) data while a look-up table is stored in a separate .meta_data file in the same folder. The data is stored in blocks and GLM regression algorithm is modified and carries out a MapReduce- like algorithm to fit the model. The functions bglm(), and summary() and bglm_predict() are available for creating and post-processing of models. The library requires Armadillo installed on your system. It probably won’t function on windows since multi-core processing is done using mclapply() which forks R on Unix/Linux type operating systems. bigrquery An Interface to Google’s BigQuery API Easily talk to Google’s BigQuery database from R. bigRR Generalized Ridge Regression (with special advantage for p >> n cases) The package fits large-scale (generalized) ridge regression for various distributions of response. The shrinkage parameters (lambdas) can be pre-specified or estimated using an internal update routine (fitting a heteroscedastic effects model, or HEM). It gives possibility to shrink any subset of parameters in the model. It has special computational advantage for the cases when the number of shrinkage parameters exceeds the number of observations. For example, the package is very useful for fitting large-scale omics data, such as high-throughput genotype data (genomics), gene expression data (transcriptomics), metabolomics data, etc. BigSEM Constructing Large Systems of Structural Equations Construct large systems of structural equations using the two-stage penalized least squares (2SPLS) method proposed by Chen, Zhang and Zhang (2016). bigstep Stepwise Selection for Large Data Sets Selecting linear models for large data sets using modified stepwise procedure and modern selection criteria (like modifications of Bayesian Information Criterion). Selection can be performed on data which exceed RAM capacity. Special selection strategy is available, faster than classical stepwise procedure. bigtcr Nonparametric Analysis of Bivariate Gap Time with Competing Risks For studying recurrent disease and death with competing risks, comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events. Alternatively, comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. This package implements a nonparametric estimator for the conditional cumulative incidence function and a nonparametric conditional bivariate cumulative incidence function for the bivariate gap times proposed in Huang et al. (2016) . bimixt Estimates Mixture Models for Case-Control Data Estimates non-Gaussian mixture models of case-control data. The four types of models supported are binormal, two component constrained, two component unconstrained, and four component. The most general model is the four component model, under which both cases and controls are distributed according to a mixture of two unimodal distributions. In the four component model, the two component distributions of the control mixture may be distinct from the two components of the case mixture distribution. In the two component unconstrained model, the components of the control and case mixtures are the same; however the mixture probabilities may differ for cases and controls. In the two component constrained model, all controls are distributed according to one of the two components while cases follow a mixture distribution of the two components. In the binormal model, cases and controls are distributed according to distinct unimodal distributions. These models assume that Box-Cox transformed case and control data with a common lambda parameter are distributed according to Gaussian mixture distributions. Model parameters are estimated using the expectation-maximization (EM) algorithm. Likelihood ratio test comparison of nested models can be performed using the lr.test function. AUC and PAUC values can be computed for the model-based and empirical ROC curves using the auc and pauc functions, respectively. The model-based and empirical ROC curves can be graphed using the roc.plot function. Finally, the model-based density estimates can be visualized by plotting a model object created with the bimixt.model function. BimodalIndex The Bimodality Index Defines the functions used to compute the bimodal index as defined by Wang et al. (2009) . Binarize Binarization of One-Dimensional Data Provides methods for the binarization of one-dimensional data and some visualization functions. BinaryEMVS Variable Selection for Binary Data Using the EM Algorithm Implements variable selection for high dimensional datasets with a binary response variable using the EM algorithm. Both probit and logit models are supported. Also included is a useful function to generate high dimensional data with correlated variables. BinaryEPPM Mean and Variance Modeling of Binary Data Modeling under- and over-dispersed binary data using extended Poisson process models (EPPM). binaryLogic Binary Logic Convert to binary numbers (Base2). Shift, rotate, summary. Based on logical vector. bindr Parametrized Active Bindings Provides a simple interface for creating active bindings where the bound function accepts additional arguments. bindrcpp An ‘Rcpp’ Interface to Active Bindings Provides an easy way to fill an environment with active bindings that call a C++ function. binman A Binary Download Manager Tools and functions for managing the download of binary files. Binary repositories are defined in ‘YAML’ format. Defining new pre-download, download and post-download templates allow additional repositories to be added. binomen Taxonomic’ Specification and Parsing Methods Includes functions for working with taxonomic data, including functions for combining, separating, and filtering taxonomic groups by any rank or name. Allows standard (SE) and non-standard evaluation (NSE). binsmooth Generate PDFs and CDFs from Binned Data Provides several methods for generating density functions based on binned data. Data are assumed to be nonnegative, but the bin widths need not be uniform, and the top bin may be unbounded. All PDF smoothing methods maintain the areas specified by the binned data. (Equivalently, all CDF smoothing methods interpolate the points specified by the binned data.) An estimate for the mean of the distribution may be supplied as an optional argument, which greatly improves the reliability of statistics computed from the smoothed density functions. Methods include step function, recursive subdivision, and optimized spline. binst Data Preprocessing, Binning for Classification and Regression Various supervised and unsupervised binning tools including using entropy, recursive partition methods and clustering. Biocomb Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis Contains functions for the data analysis with the emphasis on biological data, including several algorithms for feature ranking, feature selection, classification algorithms with the embedded validation procedures. The functions can deal with numerical as well as with nominal features. Includes also the functions for calculation of feature AUC (Area Under the ROC Curve) and HUM (hypervolume under manifold) values and construction 2D- and 3D- ROC curves. Biocomb provides the calculation of Area Above the RCC (AAC) values and construction of Relative Cost Curves (RCC) to estimate the classifier performance under unequal misclassification costs problem. Biocomb has the special function to deal with missing values, including different imputing schemes. biogeo Point Data Quality Assessment and Coordinate Conversion Functions for error detection and correction in point data quality datasets that are used in species distribution modelling. Includes functions for parsing and converting coordinates into decimal degrees from various formats. bioplots Visualization of Overlapping Results with Heatmap Visualization of complex biological datasets is essential to understand complementary spects of biology in big data era. In addition, analyzing of multiple datasets enables to understand biologcal processes deeply and accurately. Multiple datasets produce multiple analysis results, and these overlappings are usually visualized in Venn diagram. bioplots is a tiny R package that generates a heatmap to visualize overlappings instead of using Venn diagram. biorxivr Search and Download Papers from the bioRxiv Preprint Server The bioRxiv preprint server (http://www.biorxiv.org ) is a website where scientists can post preprints of scholarly texts in biology. Users can search and download PDFs in bulk from the preprint server. The text of abstracts are stored as raw text within R, and PDFs can easily be saved and imported for text mining with packages such as ‘tm’. bipartite Visualising bipartite networks and calculating some (ecological) indices Bipartite provides functions to visualise webs and calculate a series of indices commonly used to describe pattern in ecological webs. It focuses on webs consisting of only two trophic levels, e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the webs topology. BiplotGUI Interactive Biplots in R Provides a GUI with which users can construct and interact with biplots. birdnik Connector for the Wordnik API A connector to the API for ‘Wordnik’ , a dictionary service that also provides bigram generation, word frequency data, and a whole host of other functionality. bitops Bitwise Operations Functions for bitwise operations on integer vectors. BiTrinA Binarization and Trinarization of One-Dimensional Data Provides methods for the binarization and trinarization of one-dimensional data and some visualization functions. BivRegBLS Tolerance Intervals and Errors-in-Variables Regressions in Method Comparison Studies Assess the agreement in method comparison studies by tolerance intervals and errors-in-variables regressions. The Ordinary Least Square regressions (OLSv and OLSh), the Deming Regression (DR), and the (Correlated)-Bivariate Least Square regressions (BLS and CBLS) can be used with unreplicated or replicated data. The BLS and CBLS are the two main functions to estimate a regression line, while XY.plot and MD.plot are the two main graphical functions to display, respectively an (X,Y) plot or (M,D) plot with the BLS or CBLS results. Assuming no proportional bias, the (M,D) plot (Band-Altman plot) may be simplified by calculating horizontal lines intervals with tolerance intervals (beta-expectation (type I) or beta-gamma content (type II)). bivrp Bivariate Residual Plots with Simulation Polygons Generates bivariate residual plots with simulation polygons for any diagnostics and bivariate model from which functions to extract the desired diagnostics, simulate new data and refit the models are available. bkmr Bayesian Kernel Machine Regression Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures. BKPC Bayesian Kernel Projection Classifier Bayesian kernel projection classifier is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. A Gibbs sampler is implemented to find the posterior distributions of the parameters. blackbox Black Box Optimization and Exploration of Parameter Space Performs prediction of a response function from simulated response values, allowing black-box optimization of functions estimated with some error. blackbox includes a simple user interface for such applications, as well as more specialized functions designed to be called by the Migraine software (see URL). The latter functions are used for prediction of likelihood surfaces and implied likelihood ratio confidence intervals, and for exploration of predictor space of the surface. Prediction of the response is based on ordinary kriging (with residual error) of the input. Estimation of smoothing parameters is performed by generalized cross validation. BlandAltmanLeh Plots (slightly extended) Bland-Altman plots Bland-Altman Plots using base graphics as well as ggplot2, slightly extended by confidence intervals, with detailed return values and a sunflowerplot option for data with ties. blatr Send Emails Using ‘Blat’ for Windows A wrapper around the Blat command line SMTP mailer for Windows. Blat is public domain software, but be sure to read the license before use. It can be found at the Blat website http://www.blat.net . blavaan Bayesian Latent Variable Analysis Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. BLCOP Black-Litterman and Copula Opinion Pooling Frameworks An implementation of the Black-Litterman Model and Atilio Meucci’s copula opinion pooling framework. blendedLink A New Link Function that Blends Two Specified Link Functions A new link function that equals one specified link function up to a cutover then a linear rescaling of another specified link function. For use in glm() or glm2(). The intended use is in binary regression, in which case the first link should be set to ‘log’ and the second to ‘logit’. This ensures that fitted probabilities are between 0 and 1 and that exponentiated coefficients can be interpreted as relative risks for probabilities up to the cutoff. blkbox Data Exploration with Multiple Machine Learning Algorithms Allows data to be processed by multiple machine learning algorithms at the same time, enables feature selection of data by single a algorithm or combinations of multiple. Easy to use tool for k-fold cross validation and nested cross validation. BLModel Black-Litterman Posterior Distribution Posterior distribution in the Black-Litterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function. blob A Simple S3 Class for Representing Vectors of Binary Data (‘BLOBS’) R’s raw vector is useful for storing a single binary object. What if you want to put a vector of them in a data frame? The blob package provides the blob object, a list of raw vectors, suitable for use as a column in data frame. blockseg Two Dimensional Change-Points Detection Segments a matrix in blocks with constant values. Blossom Functions for making statistical comparisons with distance-function based permutation tests Blossom is an R package with functions for making statistical comparisons with distance-function based permutation tests developed by P.W. Mielke, Jr. and colleagues at Colorado State University and for testing parameters estimated in linear models with permutation procedures developed by B. S. Cade and colleagues at the Fort Collins Science Center, U.S. Geological Survey. This implementation in R has allowed for numerous improvements not supported by the Cade and Richards Fortran implementation, including use of categorical predictor variables in most routines. Blossom Statistical Package for R blsAPI Request Data From The U.S. Bureau of Labor Statistics API Allows users to request data for one or multiple series through the U.S. Bureau of Labor Statistics API. Users provide parameters as specified in http://…/api_signature.htm and the function returns a JSON string. BMA Bayesian Model Averaging Package for Bayesian model averaging for linear models, generalizable linear models and survival models (cox regression). BMAmevt Multivariate Extremes: Bayesian Estimation of the Spectral Measure Toolkit for Bayesian estimation of the dependence structure in Multivariate Extreme Value parametric models. BMisc Miscellaneous Functions for Panel Data, Quantiles, and Printing Results These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make ecdf functions from a set of data points (this is particularly useful when a distribution function is created in several steps) and to combine distribution functions based on some external weights; these distribution functions can easily be inverted to obtain quantiles. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to drop covariates from formulas. bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. bmlm Bayesian Multilevel Mediation Easy estimation of Bayesian multilevel mediation models with Stan. bnclassify Learning Bayesian Network Classifiers from Data Implementation of different algorithms for learning discrete Bayesian network classifiers from data, including wrapper algorithms and those based on Chow-Liu’s algorithm. BNDataGenerator Data Generator based on Bayesian Network Model Data generator based on Bayesian network model bnnSurvival Bagged k-Nearest Neighbors Survival Prediction Implements a bootstrap aggregated (bagged) version of the k-nearest neighbors survival probability prediction method (Lowsky et al. 2013). In addition to the bootstrapping of training samples, the features can be subsampled in each baselearner to break the correlation between them. The Rcpp package is used to speed up the computation. bnormnlr Bayesian Estimation for Normal Heteroscedastic Nonlinear Regression Models Implementation of Bayesian estimation in normal heteroscedastic nonlinear regression Models following Cepeda-Cuervo, (2001) bnpa Bayesian Networks & Path Analysis We proposed a hybrid approach using the computational and statistical resources of the Bayesian Networks to learn a network structure from a data set using 4 different algorithms and the robustness of the statistical methods present in the Structural Equation Modeling to check the goodness of fit from model over data. We built an intermediate algorithm to join the features of ‘bnlearn’ and ‘lavaan’ R packages. The Bayesian Networks structure learning algorithms used were ‘Hill-Climbing’, ‘Max-Min Hill-Climbing’, ‘Restricted Maximization’ and ‘Tabu Search’. BNPMIXcluster Bayesian Nonparametric Model for Clustering with Mixed Scale Variables Bayesian nonparametric approach for clustering that is capable to combine different types of variables (continuous, ordinal and nominal) and also accommodates for different sampling probabilities in a complex survey design. The model is based on a location mixture model with a Poisson-Dirichlet process prior on the location parameters of the associated latent variables. The package performs the clustering model described in Carmona, C., Nieto-Barajas, L. E., Canale, A. (2016) . BNPTSclust A Bayesian Nonparametric Algorithm for Time Series Clustering Performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014). BNSL Bayesian Network Structure Learning From a given dataframe, this package learns its Bayesian network structure based on a selected score. bnspatial Spatial Implementation of Bayesian Networks and Mapping Package for the spatial implementation of Bayesian Networks and mapping in geographical space. It makes maps of expected value (or most likely state) given known and unknown conditions, maps of uncertainty measured as both coefficient of variation or Shannon index (entropy), maps of probability associated to any states of any node of the network. Some additional features are provided as well, such as parallel processing options, data discretization routines and function wrappers designed for users with minimal knowledge of the R programming language. bnstruct Bayesian Network Structure Learning from Data with Missing Values Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Hill-climbing heuristic search, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference. BonEV An Improved Multiple Testing Procedure for Controlling False Discovery Rates An improved multiple testing procedure for controlling false discovery rates which is developed based on the Bonferroni procedure with integrated estimates from the Benjamini-Hochberg procedure and the Storey’s q-value procedure. It controls false discovery rates through controlling the expected number of false discoveries. bookdown Authoring Books with R Markdown Output formats and utilities for authoring books with R Markdown. bookdownplus Generate Varied Books and Documents with R ‘bookdown’ Package A collection and selector of R ‘bookdown’ templates. ‘bookdownplus’ helps you write academic journal articles, guitar books, chemical equations, mails, calendars, and diaries. R ‘bookdownplus’ extends the features of ‘bookdown’, and simplifies the procedure. Users only have to choose a template, clarify the book title and author name, and then focus on writing the text. No need to struggle in YAML and LaTeX. BoolFilter Optimal Estimation of Partially Observed Boolean Dynamical Systems Tools for optimal and approximate state estimation as well as network inference of Partially-Observed Boolean Dynamical Systems. boostmtree Boosted Multivariate Trees for Longitudinal Data Implements Friedman’s gradient descent boosting algorithm for longitudinal data using multivariate tree base learners. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter. bootnet Bootstrap Methods for Various Network Estimation Routines Bootstrap standard errors on various network estimation routines, such as EBICglasso from the qgraph package and IsingFit from the IsingFit package. bootsPLS Bootstrap Subsamplings of Sparse Partial Least Squares – Discriminant Analysis for Classification and Signature Identification Bootstrap Subsamplings of sparse Partial Least Squares – Discriminant Analysis (sPLS-DA) for Classification and Signature Identification. The method is applicable to any classification problem with more than 2 classes. It relies on bootstrap subsamplings of sPLS-DA and provides tools to select the most stable variables (defined as the ones consistently selected over the bootstrap subsamplings) and to predict the class of test samples. bootTimeInference Robust Performance Hypothesis Testing with the Sharpe Ratio Applied researchers often test for the difference of the Sharpe ratios of two investment strategies. A very popular tool to this end is the test of Jobson and Korkie, which has been corrected by Memmel. Unfortunately, this test is not valid when returns have tails heavier than the normal distribution or are of time series nature. Instead, we propose the use of robust inference methods. In particular, we suggest to construct a studentized time series bootstrap confidence interval for the difference of the Sharpe ratios and to declare the two ratios different if zero is not contained in the obtained interval. This approach has the advantage that one can simply resample from the observed data as opposed to some null-restricted data. boottol Bootstrap Tolerance Levels for Credit Scoring Validation Statistics Used to create bootstrap tolerance levels for the Kolmogorov-Smirnov (KS) statistic, the area under receiver operator characteristic curve (AUROC) statistic, and the Gini coefficient for each score cutoff. BootWPTOS Test Stationarity using Bootstrap Wavelet Packet Tests Provides significance tests for second-order stationarity for time series using bootstrap wavelet packet tests. Boruta Wrapper Algorithm for All Relevant Feature Selection An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies. bpa Basic Pattern Analysis Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats. bpp Computations Around Bayesian Predictive Power Implements functions to update Bayesian Predictive Power Computations after not stopping a clinical trial at an interim analysis. Such an interim analysis can either be blinded or unblinded. Code is provided for Normally distributed endpoints with known variance, with a prominent example being the hazard ratio. BradleyTerryScalable Fits the Bradley-Terry Model to Potentially Large and Sparse Networks of Comparison Data Facilities are provided for fitting the simple, unstructured Bradley-Terry model to networks of binary comparisons. The implemented methods are designed to scale well to large, potentially sparse, networks. A fairly high degree of scalability is achieved through the use of EM and MM algorithms, which are relatively undemanding in terms of memory usage (relative to some other commonly used methods such as iterative weighted least squares, for example). Both maximum likelihood and Bayesian MAP estimation methods are implemented. The package provides various standard methods for a newly defined ‘btfit’ model class, such as the extraction and summarisation of model parameters and the simulation of new datasets from a fitted model. Tools are also provided for reshaping data into the newly defined ‘btdata’ class, and for analysing the comparison network, prior to fitting the Bradley-Terry model. This package complements, rather than replaces, the existing ‘BradleyTerry2’ package. (BradleyTerry2 has rather different aims, which are mainly the specification and fitting of ‘structured’ Bradley-Terry models in which the strength parameters depend on covariates.) braidReports Visualize Combined Action Response Surfaces and Report BRAID Analyses Provides functions to generate, format, and style surface plots for visualizing combined action data. Also provides functions for reporting on a BRAID analysis, including plotting curve-shifts, calculating IAE values, and producing full BRAID analysis reports. braidrm Fitting Dose Response with the BRAID Combined Action Model Contains functions for evaluating, analyzing, and fitting combined action dose response surfaces with the Bivariate Response to Additive Interacting Dose (BRAID) model of combined action. brant Test for Parallel Regression Assumption Tests the parallel regression assumption for ordinal logit models generated with the function polr() from the package MASS. brea Bayesian Recurrent Event Analysis A function to produce MCMC samples for posterior inference in semiparametric Bayesian discrete time competing risks recurrent events models. breakfast Multiple Change-Point Detection and Segmentation Performs multiple change-point detection in data sequences, or data sequence segmentation, using computationally efficient multiscale methods. This version only implements the ‘Tail-Greedy Unbalanced Haar’ change-point detection methodology; more methods will be added in future versions. To start with, see the function segment.mean. BreakoutDetection Breakout Detection via Robust E-Statistics BreakoutDetection is an open-source R package that makes breakout detection simple and fast. The BreakoutDetection package can be used in wide variety of contexts. For example, detecting breakout in user engagement post an A/B test, detecting behavioral change, or for problems in econometrics, financial engineering, political and social sciences. brglm2 Bias Reduction in Generalized Linear Models Estimation and inference from generalized linear models based on various methods for bias reduction. The brglmFit fitting method can achieve reduction of estimation bias either through the adjusted score equations approach in Firth (1993) and Kosmidis and Firth (2009) , or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) . In the special case of generalized linear models for binomial and multinomial responses, the adjusted score equations approach returns estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation). Estimation in all cases takes place via a quasi Fisher scoring algorithm, and S3 methods for the construction of of confidence intervals for the reduced-bias estimates are provided. bridgedist An Implementation of the Bridge Distribution with Logit-Link as in Wang and Louis (2003) An implementation of the bridge distribution with logit-link in R. In Wang and Louis (2003) , such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian. briskaR Biological Risk Assessment A spatio-temporal exposure-hazard model for assessing biological risk and impact. The model is based on stochastic geometry for describing the landscape and the exposed individuals, a dispersal kernel for the dissemination of contaminants and an ecotoxicological equation. brlrmr Bias Reduction with Missing Binary Response Provides two main functions, il() and fil(). The il() function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil() function implements the algorithm proposed by Maity et. al. (2017+) to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) . brm Binary Regression Model Fits novel models for the conditional relative risk, risk difference and odds ratio. brms Bayesian Regression Models using Stan Write and fit Bayesian generalized linear mixed models using Stan for full Bayesian inference. broom Convert Statistical Analysis Objects into Tidy Data Frames Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model’s statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics. http://…/broom-intro http://…/broom-slides brotli A Compression Format Optimized for the Web A lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-purpose compression methods. Brotli is similar in speed to deflate but offers more dense compression. Brq Bayesian Analysis of Quantile Regression Models Bayesian estimation and variable selection for quantile regression models. brr Bayesian Inference on the Ratio of Two Poisson Rates Implementation of the Bayesian inference for the two independent Poisson samples model, using the semi-conjugate family of prior distributions. brt Biological Relevance Testing Analyses of large-scale -omics datasets commonly use p-values as the indicators of statistical significance. However, considering p-value alone neglects the importance of effect size (i.e., the mean difference between groups) in determining the biological relevance of a significant difference. Here, we present a novel algorithm for computing a new statistic, the biological relevance testing (BRT) index, in the frequentist hypothesis testing framework to address this problem. bsearchtools Binary Search Tools Exposes the binary search functions of the C++ standard library (std::lower_bound, std::upper_bound) plus other convenience functions, allowing faster lookups on sorted vectors. BSGS Bayesian Sparse Group Selection The integration of Bayesian variable and sparse group variable selection approaches for regression models. BSGW Bayesian Survival Model using Generalized Weibull Regression Bayesian survival model using Weibull regression on both scale and shape parameters. bshazard Nonparametric Smoothing of the Hazard Function The function estimates the hazard function non parametrically from a survival object (possibly adjusted for covariates). The smoothed estimate is based on B-splines from the perspective of generalized linear mixed models. Left truncated and right censoring data are allowed. bsplinePsd Bayesian Nonparametric Spectral Density Estimation Using B-Spline Priors Implementation of a Metropolis-within-Gibbs MCMC algorithm to flexibly estimate the spectral density of a stationary time series. The algorithm updates a nonparametric B-spline prior using the Whittle likelihood to produce pseudo-posterior samples and is based on the work presented by Edwards, Meyer, and Christensen (2017) . bssm Bayesian Inference of State Space Models Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo and importance sampling type corrected Markov chain Monte Carlo. Gaussian, Poisson, binomial, or negative binomial observation densities and Gaussian state dynamics, as well as general non-linear Gaussian models are supported. btb Beyond the Border Kernel density estimation dedicated to urban geography. btergm Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs. BTR Training and Analysing Asynchronous Boolean Models Tools for inferring asynchronous Boolean models from single-cell expression data. BUCSS Bias and Uncertainty Corrected Sample Size Implements a method of correcting for publication bias and uncertainty when planning sample sizes in a future study from an original study. bupaR Business Process Analytics in R Functionalities for process analysis in R. This packages implements an S3-class for event log objects, and related handler functions. Imports related packages for subsetting event data, computation of descriptive statistics, handling of Petri Net objects and visualization of process maps. bvarsv Bayesian Analysis of a Vector Autoregressive Model with Stochastic Volatility and Time-Varying Parameters R/C++ implementation of the model proposed by Primiceri (‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies, 2005), with a focus on generating posterior predictive distributions. BWStest Baumgartner Weiss Schindler Test of Equal Distributions Performs the ‘Baumgartner-Weiss-Schindler’ two-sample test of equal probability distributions. bytescircle Statistics About Bytes Contained in a File as a Circle Plot Shows statistics about bytes contained in a file as a circle graph of deviations from mean in sigma increments. The function can be useful for statistically analyze the content of files in a glimpse: text files are shown as a green centered crown, compressed and encrypted files should be shown as equally distributed variations with a very low CV (sigma/mean), and other types of files can be classified between these two categories depending on their text vs binary content, which can be useful to quickly determine how information is stored inside them (databases, multimedia files, etc). C c060 Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models c060 provides additional functions to perform stability selection, model validation and parameter tuning for glmnet models CADStat Provides a GUI to Several Statistical Methods Using JGR, provides a GUI to several statistical methods – scatterplot, boxplot, linear regression, generalized linear regression, quantile, regression, conditional probability calculations, and regression trees. caesar Encrypts and Decrypts Strings Encrypts and decrypts strings using either the Caesar cipher or a pseudorandom number generation (using set.seed()) method. calACS Count All Common Subsequences Count all common subsequences between 2 string sequences, with items separated by the same delimiter. The first string input is a length- one vector, the second string input can be a vector or list containing multiple strings. Algorithm from Wang, H. All common subsequences (2007) IJCAI International Joint Conference on Artificial Intelligence, pp. 635-640. Calculator.LR.FNs Calculator for LR Fuzzy Numbers Arithmetic operations scalar multiplication, addition, subtraction, multiplication and division of LR fuzzy numbers (which are on the basis of Zadeh extension principle) have a complicate form for using in fuzzy Statistics, fuzzy Mathematics, machine learning, fuzzy data analysis and etc. Calculator for LR Fuzzy Numbers package, i.e. Calculator.LR.FNs package, relieve and aid applied users to achieve a simple and closed form for some complicated operator based on LR fuzzy numbers and also the user can easily draw the membership function of the obtained result by this package. CALF Coarse Approximation Linear Function Contains a greedy algorithm for coarse approximation linear function. CalibrateSSB Weighting and Estimation for Panel Data with Non-Response Function to calculate weights and estimates for panel data with non-response. callr Call R from R It is sometimes useful to perform a computation in a separate R process, without affecting the current R process at all. This packages does exactly that. CAM Causal Additive Model (CAM) The code takes an n x p data matrix and fits a Causal Additive Model (CAM) for estimating the causal structure of the underlying process. The output is a p x p adjacency matrix (a one in entry (i,j) indicates an edge from i to j). Details of the algorithm can be found in: P. Bühlmann, J. Peters, J. Ernest: “CAM: Causal Additive Models, high-dimensional order search and penalized regression”, Annals of Statistics 42:2526-2556, 2014. canvasXpress Visualization Package for CanvasXpress in R Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See for more information. capn Capital Asset Pricing for Nature Implements approximation methods for natural capital asset prices suggested by Fenichel and Abbott (2014) in Journal of the Associations of Environmental and Resource Economists (JAERE), Fenichel et al. (2016) in Proceedings of the National Academy of Sciences (PNAS), and Yun et al. (2017) in PNAS (accepted), and their extensions: creating Chebyshev polynomial nodes and grids, calculating basis of Chebyshev polynomials, approximation and their simulations for: V-approximation (single and multiple stocks, PNAS), P-approximation (single stock, PNAS), and Pdot-approximation (single stock, JAERE). Development of this package was generously supported by the Knobloch Family Foundation. caret Classification and Regression Training Misc functions for training and plotting classification and regression models. caretEnsemble Ensembles of Caret Models Functions for creating ensembles of caret models: caretList, caretEnsemble, and caretStack. caretList is a convenience function for fitting multiple caret::train models to the same dataset. caretEnsemble will make a linear combination of these models using greedy forward selection, and caretStack will make linear or non-linear combinations of these models, using a caret::train model as a meta-model. GitHub carpenter Build Common Tables of Summary Statistics for Reports Mainly used to build tables that are commonly presented for bio-medical/health research, such as basic characteristic tables or descriptive statistics. cartogram Create Cartograms with R Construct a continuous area cartogram by a rubber sheet distortion algorithm. Cartographer Interactive Maps for Data Exploration Cartographer provides interactive maps in R Markdown documents or at the R console. These maps are suitable for data exploration. This package is an R wrapper around Elijah Meeks’s d3-carto-map and d3.js, using htmlwidgets for R. cartography Thematic Cartography Create and integrate maps in your R workflow. This package allows various cartographic representations: proportional symbols, chroropleth, typology, flows, discontinuities… It also proposes some additional useful features: cartographic palettes, layout (scale, north arrow, title…), labels, legends, access to cartographic API… carx Censored Autoregressive Model with Exogenous Covariates A censored time series class is designed. An estimation procedure is implemented to estimate the Censored AutoRegressive time series with eXogenous covariates (CARX), assuming normality of the innovations. Some other functions that might be useful are also included. casebase Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression Implements the case-base sampling approach of Hanley and Miettinen (2009) , Saarela and Arjas (2015) , and Saarela (2015) , for fitting flexible hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. From the fitted hazard function, cumulative incidence, risk functions of time, treatment and profile can be derived. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots. catdap Categorical Data Analysis Program Package Categorical data analysis program package. cate High Dimensional Factor Analysis and Confounder Adjusted Testing and Estimation Provides several methods for factor analysis in high dimension (both n,p >> 1) and methods to adjust for possible confounders in multiple hypothesis testing. CatEncoders Encoders for Categorical Variables Contains some commonly used categorical variable encoders, such as ‘LabelEncoder’ and ‘OneHotEncoder’. Inspired by the encoders implemented in python ‘sklearn.preprocessing’ package (see ). CATkit Chronomics Analysis Toolkit (CAT): Analyze Periodicity Performs analysis of sinusoidal rhythms in time series data: actogram, smoothing, autocorrelation, crosscorrelation, several flavors of cosinor. catSurv Computerized Adaptive Testing for Survey Research Provides methods of computerized adaptive testing for survey researchers. Includes functionality for data fit with the classic item response methods including the latent trait model, Birnbaums three parameter model, the graded response, and the generalized partial credit model. Additionally, includes several ability parameter estimation and item selection routines. During item selection, all calculations are done in compiled C++ code. CATT The Cochran-Armitage Trend Test The Cochran-Armitage trend test can be applied to a two by k contingency table. The test statistic (Z) and p-value will be reported. A linear trend in the frequencies will be calculated, because the weights (0,1,2) will be used by default. CausalFX Methods for Estimating Causal Effects from Observational Data Estimate causal effects of one variable on another, currently for binary data only. Methods include instrumental variable bounds, adjustment by a given covariate set, adjustment by an induced covariate set using a variation of the PC algorithm, and an effect bounding method (the Witness Protection Program) based on covariate adjustment with observable independence constraints. CausalImpact An R package for causal inference in time series This R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred. As with all approaches to causal inference on non-experimental data, valid conclusions require strong assumptions. The CausalImpact package, in particular, assumes that the outcome time series can be explained in terms of a set of control time series that were themselves not affected by the intervention. Furthermore, the relation between treated series and control series is assumed to be stable during the post-intervention period. Understanding and checking these assumptions for any given application is critical for obtaining valid conclusions. cbanalysis Coffee Break Descriptive Analysis Contains function which subsets the input data frame based on the variable types and returns list of data frames. cbar Contextual Bayesian Anomaly Detection in R Detect contextual anomalies in time-series data with Bayesian data analysis. It focuses on determining a normal range of target value, and provides simple-to-use functions to abstract the outcome. cbird Clustering of Multivariate Binary Data with Dimension Reduction via L1-Regularized Likelihood Maximization The clustering of binary data with reducing the dimensionality (CLUSBIRD) proposed by Yamamoto and Hayashi (2015) . cccp Cone Constrained Convex Problems Routines for solving convex optimization problems with cone constraints by means of interior-point methods. The implemented algorithms are partially ported from CVXOPT, a Python module for convex optimization (see http://cvxopt.org for more information ). ccdrAlgorithm CCDr Algorithm for Learning Sparse Gaussian Bayesian Networks Implementation of the CCDr (Concave penalized Coordinate Descent with reparametrization) structure learning algorithm as described in Aragam and Zhou (2015) . This is a fast, score-based method for learning Bayesian networks that uses sparse regularization and block-cyclic coordinate descent. CCMnet Simulate Congruence Class Model for Networks Tools to simulate networks based on Congruence Class models. cdata Wrappers for ‘tidyr::gather()’ and ‘tidyr::spread()’ Supplies deliberately verbose wrappers for ‘tidyr::gather()’ and ‘tidyr::spread()’, and an explanatory vignette. Useful for training and for enforcing preconditions. cdcsis Conditional Distance Correlation and Its Related Feature Screening Method Gives conditional distance correlation and performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data. The conditional distance correlation is a novel conditional dependence measurement of two random variables given a third variable. The conditional distance correlation sure independence screening is used for screening variables in ultrahigh dimensional setting. CDVineCopulaConditional Sampling from Conditional C- and D-Vine Copulas Provides tools for sampling from a conditional copula density decomposed via Pair-Copula Constructions as C- or D- vine. Here, the vines which can be used for such sampling are those which sample as first the conditioning variables (when following the sampling algorithms shown in Aas et al. (2009) ). The used sampling algorithm is presented and discussed in Bevacqua et al. (2017) , and it is a modified version of that from Aas et al. (2009) . A function is available to select the best vine (based on information criteria) among those which allow for such conditional sampling. The package includes a function to compare scatterplot matrices and pair-dependencies of two multivariate datasets. CEC Cross-Entropy Clustering Cross-Entropy Clustering (CEC) divides the data into Gaussian type clusters. It performs the automatic reduction of unnecessary clusters, while at the same time allows the simultaneous use of various type Gaussian mixture models. cellWise Analyzing Data with Cellwise Outliers Tools for detecting cellwise outliers and robust methods to analyze data which may contain them. cems Conditional Expectation Manifolds Conditional expectation manifolds are an approach to compute principal curves and surfaces. censorcopula Estimate Parameter of Bivariate Copula Implement an interval censor method to break ties when using data with ties to fitting a bivariate copula. CensSpatial Censored Spatial Models It fits linear regression models for censored spatial data. It provides different estimation methods as the SAEM (Stochastic Approximation of Expectation Maximization) algorithm and seminaive that uses Kriging prediction to estimate the response at censored locations and predict new values at unknown locations. It also offers graphical tools for assessing the fitted model. centiserve Find Graph Centrality Indices Calculates centrality indices additional to the ‘igraph’ package centrality functions. cents Censored time series Fit censored time series CEoptim Cross-Entropy R Package for Optimization Optimization solver based on the Cross-Entropy method. CepLDA Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability Performs cepstral based discriminant analysis of groups of time series when there exists Variability in power spectra from time series within the same group as described in R.T. Krafty (2016) ‘Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability’ Journal of Time Series Analysis. cfa Configural Frequency Analysis (CFA) Analysis of configuration frequencies for simple and repeated measures, multiple-samples CFA, hierarchical CFA, bootstrap CFA, functional CFA, Kieser-Victor CFA, and Lindner’s test using a conventional and an accelerated algorithm. CFC Cause-Specific Framework for Competing-Risk Analysis Functions for combining survival curves of competing risks to produce cumulative incidence and event-free probability functions, and for summarizing and plotting the results. Survival curves can be either time-denominated or probability-denominated. Point estimates as well as Bayesian, sample-based representations of survival can utilize this framework. cghRA Array CGH Data Analysis and Visualization Provides functions to import data from Agilent CGH arrays and process them according to the cghRA workflow. Implements several algorithms such as WACA, STEPS and cnvScore and an interactive graphical interface. CGP Composite Gaussian process models Fit composite Gaussian process (CGP) models as described in Ba and Joseph (2012) ‘Composite Gaussian Process Models for Emulating Expensive Functions’, Annals of Applied Statistics. The CGP model is capable of approximating complex surfaces that are not second-order stationary. Important functions in this package are CGP, print.CGP, summary.CGP, predict.CGP and plotCGP. cgwtools Miscellaneous Tools A set of tools the author has found useful for performing quick observations or evaluations of data, including a variety of ways to list objects by size, class, etc. Several other tools mimic Unix shell commands, including ‘head’, ‘tail’ ,’pushd’ ,and ‘popd’. The functions ‘seqle’ and ‘reverse.seqle’ mimic the base ‘rle’ but can search for linear sequences. The function ‘splatnd’ allows the user to generate zero-argument commands without the need for ‘makeActiveBinding’ . changepoint An R package for changepoint analysis Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean, cpt.var, cpt.meanvar functions should be your first point of call. changepoint.np Methods for Nonparametric Changepoint Detection Implements the multiple changepoint algorithm PELT with a nonparametric cost function based on the empirical distribution of the data. The cpt.np() function should be your first point of call. This package is an extension to the \code{changepoint} package which uses parametric changepoint methods. For further information on the methods see the documentation for \code{changepoint}. ChangepointTesting Change Point Estimation for Clustered Signals A multiple testing procedure for clustered alternative hypotheses. It is assumed that the p-values under the null hypotheses follow U(0,1) and that the distributions of p-values from the alternative hypotheses are stochastically smaller than U(0,1). By aggregating information, this method is more sensitive to detecting signals of low magnitude than standard methods. Additionally, sporadic small p-values appearing within a null hypotheses sequence are avoided by averaging on the neighboring p-values. ChannelAttributionApp Shiny Web Application for the Multichannel Attribution Problem Shiny Web Application for the Multichannel Attribution Problem. It is basically a user-friendly graphical interface for running and comparing all the attribution models in package ‘ChannelAttribution’. For customizations or interest in other statistical methodologies for web data analysis please contact . Chaos01 0-1 Test for Chaos Computes and plot the results of the 0-1 test for chaos proposed by Gottwald and Melbourne (2004) . The algorithm is available in parallel for the independent values of parameter c. CharFun Numerical Computation Cumulative Distribution Function and Probability Density Function from Characteristic Function The Characteristic Functions Toolbox (CharFun) consists of a set of algorithms for evaluating selected characteristic functions and algorithms for numerical inversion of the (combined and/or compound) characteristic functions, used to evaluate the probability density function (PDF) and the cumulative distribution function (CDF). charlatan Make Fake Data Make fake data, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers (‘DOIs’), jobs, phone numbers, ‘DNA’ sequences, doubles and integers from distributions and within a range. checkarg Check the Basic Validity of a (Function) Argument Utility functions that allow checking the basic validity of a function argument or any other value, including generating an error and assigning a default in a single line of code. The main purpose of the package is to provide simple and easily readable argument checking to improve code robustness. checkpoint Install Packages from Snapshots on the Checkpoint Server for Reproducibility The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint() function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint’s checkpoint() can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) we refresh the Austria CRAN mirror, on the “Managed R Archived Network” server (http://mran.revolutionanalytics.com ). Immediately after completion of the rsync mirror process, we take a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17. CHFF Closest History Flow Field Forecasting for Bivariate Time Series The software matches the current history to the closest history in a time series to build a forecast. chi2x3way Chi-Squared and Tau Index Partitions for Three-Way Contingency Tables Provides two index partitions for three-way contingency tables: partition of the association measure chi-squared and of the predictability index tau under several representative hypotheses about the expected frequencies (hypothesized probabilities). ChIPtest Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data Nonparametric Tests to identify the differential enrichment region for two conditions or time-course ChIP-seq data. It includes: data preprocessing function, estimation of a small constant used in hypothesis testing, a kernel-based two sample nonparametric test, two assumption-free two sample nonparametric test. CHMM Coupled Hidden Markov Models An exact and a variational inference for coupled Hidden Markov Models applied to the joint detection of copy number variations. chopthin The Chopthin Resampler Resampling is a standard step in particle filtering and in sequential Monte Carlo. This package implements the chopthin resampler, which keeps a bound on the ratio between the largest and the smallest weights after resampling. ChoR Chordalysis R Package Learning the structure of graphical models from datasets with thousands of variables. More information about the research papers detailing the theory behind Chordalysis is available at (KDD 2016, SDM 2015, ICDM 2014, ICDM 2013). The R package development site is . choroplethr Simplify the Creation of Choropleth Maps in R Choropleths are thematic maps where geographic regions, such as states, are colored according to some metric, such as the number of people who live in that state. This package simplifies this process by 1. Providing ready-made functions for creating choropleths of common maps. 2. Providing data and API connections to interesting data sources for making choropleths. 3. Providing a framework for creating choropleths from arbitrary shapefiles. Please see the vignettes for more details. chunked Chunkwise Text-File Processing for ‘dplyr’ Text data can be processed chunkwise using ‘dplyr’ commands. These are recorded and executed per data chunk, so large files can be processed with limited memory using the ‘LaF’ package. CircOutlier Detecting of Outliers in Circular Regression Detecting of outliers in circular-circular regression models, modifying its and estimating of models parameters. cIRT Choice Item Response Theory Jointly model the accuracy of cognitive responses and item choices within a bayesian hierarchical framework as described by Culpepper and Balamuta (2015) . In addition, the package contains the datasets used within the analysis of the paper. Cite An RStudio Addin to Insert BibTex Citation in Rmarkdown Documents Contain an RStudio addin to insert BibTex citation in Rmarkdown documents with a minimal user interface. citr RStudio Add-in to Insert Markdown Citations Functions and an RStudio add-in to search a BibTeX-file to create and insert formatted Markdown citations into the current document. clampSeg Idealisation of Patch Clamp Recordings Allows for idealisation of patch clamp recordings by implementing the non-parametric JUmp Local dEconvolution Segmentation filter JULES. clarifai Access to Clarifai API Get description of images from Clarifai API. For more information, see http://clarifai.com. Clarifai uses a large deep learning cloud to come up with descriptive labels of the things in an image. It also provides how confident it is about each of the labels. classifierplots Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier! classiFunc Classification of Functional Data Efficient implementation of k-nearest neighbor estimator and a kernel estimator for functional data classification. cld2 Google’s Compact Language Detector 2 Bindings to Google’s C++ library Compact Language Detector 2 (see for more information). Probabilistically detects over 80 languages in UTF-8 text (plain text or HTML). For mixed-language input it returns the top three languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). cld3 Google’s Compact Language Detector 3 Google’s Compact Language Detector 3 is a neural network model for language identification and the successor of ‘cld2’ (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from ‘cld2’. See for more information. cleanEHR The Critical Care Clinical Data Processing Tools A toolset to deal with the Critical Care Health Informatics Collaborative dataset. It is created to address various data reliability and accessibility problems of electronic healthcare records (EHR). It provides a unique platform which enables data manipulation, transformation, reduction, anonymisation, cleaning and validation. cleanNLP A Tidy Data Model for Natural Language Processing Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford’s CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish. cleanr Helps You to Code Cleaner Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) in ‘The C Programming Language’ did, be it explicitly as R.C. Martin (2008) in ‘Clean Code: A Handbook of Agile Software Craftsmanship’ did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout https://…/lintr instead. clickR Fix Data and Create Report Tables from Different Objects Fixes data errors in numerical, factor and date variables and performs report tables from models and summaries. clikcorr Censoring Data and Likelihood-Based Correlation Estimation A profile likelihood based method of estimation and inference on the correlation coefficient of bivariate data with different types of censoring and missingness. climbeR Calculate Average Minimal Depth of a Maximal Subtree for ‘ranger’ Package Forests Calculates first, and second order, average minimal depth of a maximal subtree for a forest object produced by the R ‘ranger’ package. This variable importance metric is implemented as described in Ishwaran et. al. (‘High-Dimensional Variable Selection for Survival Data’, March 2010, ). clipr Read and Write from the System Clipboard Simple utility functions to read from and write to the system clipboards of Windows, OS X, and Linux. clisymbols Unicode Symbols at the R Prompt A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the ‘figures’ ‘npm’ package (see https://…/figures ). CLME Constrained Inference for Linear Mixed Effects Models Constrained inference for linear mixed effects models using residual bootstrap methodology clogitboost Boosting Conditional Logit Model A set of functions to fit a boosting conditional logit model. clogitLasso Lasso Estimation of Conditional Logistic Regression Models Fit a sequence of conditional logistic regression models with lasso, for small to large sized samples. clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models. Several adjustments are incorporated to improve small- sample performance. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple-contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple-contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm(), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’). ClueR CLUster Evaluation (CLUE) CLUE is an R package for identifying optimal number of clusters in a given time-course dataset clustered by cmeans or kmeans algorithms. CluMix Clustering and Visualization of Mixed-Type Data Provides utilities for clustering subjects and variables of mixed data types. Similarities between subjects are measured by Gower’s general similarity coefficient with an extension of Podani for ordinal variables. Similarities between variables are assessed by combination of appropriate measures of association for different pairs of data types. Alternatively, variables can also be clustered by the ‘ClustOfVar’ approach. The main feature of the package is the generation of a mixed-data heatmap. For visualizing similarities between either subjects or variables, a heatmap of the corresponding distance matrix can be drawn. Associations between variables can be explored by a ‘confounderPlot’, which allows visual detection of possible confounding, collinear, or surrogate factors for some variables of primary interest. Distance matrices and dendrograms for subjects and variables can be derived and used for further visualizations and applications. clusrank Wilcoxon Rank Sum Test for Clustered Data Non-parametric tests (Wilcoxon rank sum test and Wilcoxon signed rank test) for clustered data. clust.bin.pair Statistical Methods for Analyzing Clustered Matched Pair Data Tests, utilities, and case studies for analyzing significance in clustered binary matched-pair data. The central function clust.bin.pair uses one of several tests to calculate a Chi-square statistic. Implemented are the tests Eliasziw, Obuchowski, Durkalski, and Yang with McNemar included for comparison. The utility functions nested.to.contingency and paired.to.contingency convert data between various useful formats. Thyroids and psychiatry are the canonical datasets from Obuchowski and Petryshen respectively. cluster Cluster Analysis Extended Rousseeuw et al Cluster analysis methods. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990). ClusterBootstrap Analyze Clustered Data with Generalized Linear Models using the Cluster Bootstrap Provides functionality for the analysis of clustered data using the cluster bootstrap. clusterCrit Clustering Indices Compute clustering validation indices clustering.sc.dp Optimal Distance-Based Clustering for Multidimensional Data with Sequential Constraint A dynamic programming algorithm for optimal clustering multidimensional data with sequential constraint. The algorithm minimizes the sum of squares of within-cluster distances. The sequential constraint allows only subsequent items of the input data to form a cluster. The sequential constraint is typically required in clustering data streams or items with time stamps such as video frames, GPS signals of a vehicle, movement data of a person, e-pen data, etc. The algorithm represents an extension of Ckmeans.1d.dp to multiple dimensional spaces. Similarly to the one-dimensional case, the algorithm guarantees optimality and repeatability of clustering. Method clustering.sc.dp can find the optimal clustering if the number of clusters is known. Otherwise, methods findwithinss.sc.dp and backtracking.sc.dp can be used. ClusterR Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans and K-Medoids Clustering Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions. ClusterRankTest Rank Tests for Clustered Data Nonparametric rank based tests (rank-sum tests and signed-rank tests) for clustered data, especially useful for clusters having informative cluster size and intra-cluster group size. ClusterStability Assessment of Stability of Individual Object or Clusters in Partitioning Solutions Allows one to assess the stability of individual objects, clusters and whole clustering solutions based on repeated runs of the K-means and K-medoids partitioning algorithms. clustertend Check the Clustering Tendency Calculate some statistics aiming to help analyzing the clustering tendency of given data. In the first version, Hopkins’ statistic is implemented. clustMixType k-Prototypes Clustering for Mixed Variable-Type Data Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304, . ClustMMDD Variable Selection in Clustering by Mixture Models for Discrete Data An implementation of a variable selection procedure in clustering by mixture of multinomial models for discrete data. Genotype data are examples of such data with two unordered observations (alleles) at each locus for diploid individual. The two-fold problem is seen as a model selection problem where competing models are characterized by the number of clusters K, and the subset S of clustering variables. Competing models are compared by penalized maximum likelihood criteria. We considered asymptotic criteria such as Akaike and Bayesian Information criteria, and a family of penalized criteria with penalty function to be data driven calibrated. clustRcompaR Easy Interface for Clustering a Set of Documents and Exploring Group- Based Patterns Provides an interface to perform cluster analysis on a corpus of text. Interfaces to Quanteda to assemble text corpuses easily. Deviationalizes text vectors prior to clustering using technique described by Sherin (Sherin, B. [2013]. A computational study of commonsense science: An exploration in the automated analysis of clinical interview data. Journal of the Learning Sciences, 22(4), 600-638. Chicago. http://…/10508406.2013.836654 ). Uses cosine similarity as distance metric for two stage clustering process, involving Ward’s algorithm hierarchical agglomerative clustering, and k-means clustering. Selects optimal number of clusters to maximize ‘variance explained’ by clusters, adjusted by the number of clusters. Provides plotted output of clustering results as well as printed output. Assesses ‘model fit’ of clustering solution to a set of preexisting groups in dataset. ClustVarLV Clustering of Variables Around Latent Variables The clustering of variables is a strategy for deciphering the underlying structure of a data set. Adopting an exploratory data analysis point of view, the Clustering of Variables around Latent Variables (CLV) approach has been proposed by Vigneau and Qannari (2003). Based on a family of optimization criteria, the CLV approach is adaptable to many situations. In particular, constraints may be introduced in order to take account of additional information about the observations and/or the variables. In this paper, the CLV method is depicted and the R package ClustVarLV including a set of functions developed so far within this framework is introduced. Considering successively different types of situations, the underlying CLV criteria are detailed and the various functions of the package are illustrated using real case studies. cmaesr Covariance Matrix Adaption Evolutionary Strategy Pure R implementation of the Covariance Matrix Adaption – Evolution Strategy (CMA-ES) with optional restarts (IPOP-CMA-ES). CMplot Circle Manhattan Plot To visualize the results of Genome-Wide Association Study, Manhattan plot was born. However, it will take much time to draw an elaborate one. Here, this package gives a function named ‘CMplot’ can easily solve the problem. Inputting the results of GWAS and adjusting certain parameters, users will obtain the desired Manhattan plot. Also, a circle Manhattan plot is first put forward, which demonstrates multiple traits in one circle plot. A more visualized figure can spare the length of a paper and lift the paper to a higher level. cmprskQR Analysis of Competing Risks Using Quantile Regressions Estimation, testing and regression modeling of subdistribution functions in competing risks using quantile regressions, as described in Peng and Fine (2009) . cna A Package for Coincidence Analysis (CNA) Provides functions for performing Coincidence Analysis (CNA). cnbdistr Conditional Negative Binomial Distribution Provided R functions for working with the Conditional Negative Binomial distribution. CNLTreg Complex-Valued Wavelet Lifting for Signal Denoising Implementations of recent complex-valued wavelet shrinkage procedures for smoothing irregularly sampled signals. cobalt Covariate Balance Tables and Plots Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with ‘MatchIt’, ‘twang’, ‘Matching’, and ‘CBPS’ for assessing balance on the output of their preprocessing functions. Users can also specify their data not generated through the above packages. cocor Comparing Correlations Statistical tests for the comparison between two correlations based on either independent or dependent groups. Dependent correlations can either be overlapping or nonoverlapping. A web interface is available on the website http://comparingcorrelations.org. A plugin for the R GUI and IDE RKWard is included. Please install RKWard from https://rkward.kde.org to use this feature. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard. cocoreg Extracts Shared Variation in Collections of Datasets Using Regression Models The cocoreg algorithm extracts shared variation from a collection of datasets using regression models. coda.base A Basic Set of Functions for Compositional Data Analysis A minimum set of functions to perform compositional data analysis using the log-ratio approach introduced by John Aitchison in 1982. Main functions have been implemented in c++ for better performance. cOde Automated C Code Generation for Use with the ‘deSolve’ and ‘bvpSolve” Packages Generates all necessary C functions allowing the user to work with the compiled-code interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. The package also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis. CodeDepends Analysis of R Code for Reproducible Research and Code Comprehension Tools for analyzing R expressions or blocks of code and determining the dependencies between them. It focuses on R scripts, but can be used on the bodies of functions. There are many facilities including the ability to summarize or get a high-level view of code, determining dependencies between variables, code improvement suggestions. codingMatrices Alternative Factor Coding Matrices for Linear Model Formulae A collection of coding functions as alternatives to the standard functions in the stats package, which have names starting with ‘contr.’. Their main advantage is that they provide a consistent method for defining marginal effects in multi-way factorial models. In a simple one-way ANOVA model the intercept term is always the simple average of the class means. codyn Community Dynamics Metrics A toolbox of ecological community dynamics metrics that are explicitly temporal. Functions fall into two categories: temporal diversity indices and community stability metrics. The diversity indices are temporal analogs to traditional diversity indices such as richness and rank-abundance curves. Specifically, functions are provided to calculate species turnover, mean rank shifts, and lags in community similarity between time points. The community stability metrics calculate overall stability and patterns of species covariance and synchrony over time. cofeatureR Generate Cofeature Matrices Generate cofeature (feature by sample) matrices. The package utilizies ggplot2::geom_tile to generate the matrix allowing for easy additions from the base matrix. CoFRA Complete Functional Regulation Analysis Calculates complete functional regulation analysis and visualize the results in a single heatmap. The provided example data is for biological data but the methodology can be used for large data sets to compare quantitative entities that can be grouped. For example, a store might divide entities into cloth, food, car products etc and want to see how sales changes in the groups after some event. The theoretical background for the calculations are provided in New insights into functional regulation in MS-based drug profiling, Ana Sofia Carvalho, Henrik Molina & Rune Matthiesen, Scientific Reports, . coga Convolution of Gamma Distributions Convolution of gamma distributions in R. The convolution of gamma distributions is the sum of series of gamma distributions and all gamma distributions here can have different parameters. This package can calculate density, distribution function and do simulation work. cointmonitoR Consistent Monitoring of Stationarity and Cointegrating Relationships We propose a consistent monitoring procedure to detect a structural change from a cointegrating relationship to a spurious relationship. The procedure is based on residuals from modified least squares estimation, using either Fully Modified, Dynamic or Integrated Modified OLS. It is inspired by Chu et al. (1996) in that it is based on parameter estimation on a pre-break ‘calibration’ period only, rather than being based on sequential estimation over the full sample. See the discussion paper for further information. This package provides the monitoring procedures for both the cointegration and the stationarity case (while the latter is just a special case of the former one) as well as printing and plotting methods for a clear presentation of the results. cointReg Parameter Estimation and Inference in a Cointegrating Regression Cointegration methods are widely used in empirical macroeconomics and empirical finance. It is well known that in a cointegrating regression the ordinary least squares (OLS) estimator of the parameters is super-consistent, i.e. converges at rate equal to the sample size T. When the regressors are endogenous, the limiting distribution of the OLS estimator is contaminated by so-called second order bias terms, see e.g. Phillips and Hansen (1990) . The presence of these bias terms renders inference difficult. Consequently, several modifications to OLS that lead to zero mean Gaussian mixture limiting distributions have been proposed, which in turn make standard asymptotic inference feasible. These methods include the fully modified OLS (FM-OLS) approach of Phillips and Hansen (1990) , the dynamic OLS (D-OLS) approach of Phillips and Loretan (1991) , Saikkonen (1991) and Stock and Watson (1993) and the new estimation approach called integrated modified OLS (IM-OLS) of Vogelsang and Wagner (2014) . The latter is based on an augmented partial sum (integration) transformation of the regression model. IM-OLS is similar in spirit to the FM- and D-OLS approaches, with the key difference that it does not require estimation of long run variance matrices and avoids the need to choose tuning parameters (kernels, bandwidths, lags). However, inference does require that a long run variance be scaled out. This package provides functions for the parameter estimation and inference with all three modified OLS approaches. That includes the automatic bandwidth selection approaches of Andrews (1991) and of Newey and West (1994) as well as the calculation of the long run variance. colf Constrained Optimization on Linear Function Performs least squares constrained optimization on a linear objective function. It contains a number of algorithms to choose from and offers a formula syntax similar to lm(). CollapsABEL Generalized CDH (GCDH) Analysis Implements a generalized version of the CDH test for detecting compound heterozygosity on a genome-wide level, due to usage of generalized linear models it allows flexible analysis of binary and continuous traits with covariates. collapsibleTree Interactive Collapsible Tree Diagrams using ‘D3.js’ Interactive Reingold-Tilford tree diagrams created using ‘D3.js’, where every node can be expanded and collapsed by clicking on it. Tooltips and color gradients can be mapped to nodes using a numeric column in the source data frame. See ‘collapsibleTree’ website for more information and examples. collpcm Collapsed Latent Position Cluster Model for Social Networks Markov chain Monte Carlo based inference routines for collapsed latent position cluster models or social networks, which includes searches over the model space (number of clusters in the latent position cluster model). The label switching algorithm used is that of Nobile and Fearnside (2007) which relies on the algorithm of Carpaneto and Toth (1980) . collUtils Auxiliary Package for Package ‘CollapsABEL’ Provides some low level functions for processing PLINK input and output files. coloredICA Implementation of Colored Independent Component Analysis and Spatial Colored Independent Component Analysis It implements colored Independent Component Analysis (Lee et al., 2011) and spatial colored Independent Component Analysis (Shen et al., 2014). They are two algorithms to perform ICA when sources are assumed to be temporal or spatial stochastic processes, respectively. ColorPalette Color Palettes Generator Different methods to generate a color palette based on a specified base color and a number of colors that should be created. colorpatch Optimized Rendering of Fold Changes and Confidence Values Shows color patches for encoding fold changes (e.g. log ratios) together with confidence values within a single diagram. This is especially useful for rendering gene expression data as well as other types of differential experiments. In addition to different rendering methods (ggplot extensions) functionality for perceptually optimizing color palettes are provided. Furthermore the package provides extension methods of the colorspace color-class in order to simplify the work with palettes (a.o. length, as.list, and append are supported). colorplaner A ggplot2 Extension to Visualize Two Variables per Color Aesthetic Through Color Space Projections A ggplot2 extension to visualize two variables through one color aesthetic via mapping to a color space projection. With this technique for 2-D color mapping, one can create a dichotomous choropleth in R as well as other visualizations with bivariate color scales. Includes two new scales and a new guide for ggplot2. colorscience Color Science Methods and Data Methods and data for color science – color conversions by observer, illuminant and gamma. Color matching functions and chromaticity diagrams. Color indices, color differences and spectral data conversion/analysis. colorspace Color Space Manipulation Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided. colorSpec Color Calculations with Emphasis on Spectral Data Calculate with spectral properties of light sources, materials, cameras, eyes, and scanners. Build complex systems from simpler parts using a spectral product algebra. For light sources, compute CCT and CRI. For object colors, compute optimal colors and Logvinenko coordinates. Work with the standard CIE illuminants and color matching functions, and read spectra from text files, including CGATS files. Sample text files, and 4 vignettes are included. colourpicker A Colour Picker Widget for Shiny Apps, RStudio, R-markdown, and ‘htmlwidgets’ A colour picker that can be used as an input in Shiny apps or R-markdown documents. A colour picker RStudio addin is provided to let you select colours for use in your R code. The colour picker is also availble as an ‘htmlwidgets’ widget. colr Functions to Select and Rename Data Powerful functions to select and rename columns in dataframes, lists and numeric types by ‘Perl’ regular expression. Regular expression (‘regex’) are a very powerful grammar to match strings, such as column names. Combine Game-Theoretic Probability Combination Suite of R functions for combination of probabilities using a game-theoretic method. combiter Combinatorics Iterators Provides iterators for combinations, permutations, and subsets, which allow one to go through all elements without creating a huge set of all possible values. cometExactTest Exact Test from the Combinations of Mutually Exclusive Alterations (CoMEt) Algorithm An algorithm for identifying combinations of mutually exclusive alterations in cancer genomes. CoMEt represents the mutations in a set M of k genes with a 2^k dimensional contingency table, and then computes the tail probability of observing T(M) exclusive alterations using an exact statistical test. commonmark Bindings to the ‘CommonMark’ Reference Implementation The ‘CommonMark’ spec is a rationalized version of Markdown syntax. This package converts markdown text to various formats including a parse tree in XML format. commonsMath JAR Files of the Apache Commons Mathematics Library Java JAR files for the Apache Commons Mathematics Library for use by users and other packages. COMMUNAL Robust Selection of Cluster Number K Facilitates optimal clustering of a data set. Provides a framework to run a wide range of clustering algorithms to determine the optimal number (k) of clusters in the data. Then analyzes the cluster assignments from each clustering algorithm to identify samples that repeatedly classify to the same group. We call these ‘core clusters’, providing a basis for later class discovery. CompareCausalNetworks Interface to Diverse Estimation Methods of Causal Networks Unified interface for the estimation of causal networks, including the methods ‘backShift’ (from package ‘backShift’), ‘bivariateANM’ (bivariate additive noise model), ‘bivariateCAM’ (bivariate causal additive model), ‘CAM’ (causal additive model) (from package ‘CAM’), ‘hiddenICP’ (invariant causal prediction with hidden variables), ‘ICP’ (invariant causal prediction) (from package ‘InvariantCausalPrediction’), ‘GES’ (greedy equivalence search), ‘GIES’ (greedy interventional equivalence search), ‘LINGAM’, ‘PC’ (PC Algorithm), ‘RFCI’ (really fast causal inference) (all from package ‘pcalg’) and regression. compareDF Do a Git Style Diff of the Rows Between Two Dataframes with Similar Structure Compares two dataframes which have the same column structure to show the rows that have changed. Also gives a git style diff format to quickly see what has changes in addition to summary statistics. compete Analyzing Social Hierarchies Organizing and Analyzing Social Dominance Hierarchy Data. CompetingRisk The Semi-Parametric Cumulative Incidence Function Computing the point estimator and pointwise confidence interval of the cumulative incidence function from the cause-specific hazards model. Compind Composite indicators functions Compind package contains several functions to enhance approaches to the Composite Indicators (http://…/detail.asp?ID=6278 , https://composite-indicators.jrc.ec.europa.eu ) methods, focusing, in particular, on the normalisation and weighting-aggregation steps. compLasso Implements the Component Lasso Method Functions Implements the Component lasso method for linear regression using the sample covariance matrix connected-components structure, described in A Component Lasso, by Hussami and Tibshirani (2013) complexity Calculate the Proportion of Permutations in Line with an Informative Hypothesis Allows for the easy computation of complexity: the proportion of the parameter space in line with the hypothesis by chance. Compositional Compositional Data Analysis A collection of R functions for compositional data analysis. compositions Compositional Data Analysis The package provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by Aitchison and Pawlowsky-Glahn. CompR Paired Comparison Data Analysis Different tools for describing and analysing paired comparison data are presented. Main methods are estimation of products scores according Bradley Terry Luce model. A segmentation of the individual could be conducted on the basis of a mixture distribution approach. The number of classes can be tested by the use of Monte Carlo simulations. This package deals also with multi-criteria paired comparison data. Conake Continuous Associated Kernel Estimation Continuous smoothing of probability density function on a compact or semi-infinite support is performed using four continuous associated kernels: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The cross-validation technique is also implemented for bandwidth selection. concatenate Human-Friendly Text from Unknown Strings Simple functions for joining strings. Construct human-friendly messages whose elements aren’t known in advance, like in stop, warning, or message, from clean code. conclust Pairwise Constraints Clustering There are 3 main functions in this package: ckmeans(), lcvqe() and mpckm(). They take an unlabeled dataset and two lists of must-link and cannot-link constraints as input and produce a clustering as output. concordance Product Concordance A set of utilities for matching products in different classification codes used in international trade research. It supports concordance between HS (Combined), ISIC Rev. 2,3, and SITC1,2,3,4 product classification codes, as well as BEC, NAICS, and SIC classifications. It also provides code nomenclature / descriptions look-up, Rauch classification look-up (via concordance to SITC2) and trade elasticity look-up (via concordance to SITC2/3 or HS3.ss). condformat Conditional Formatting in Data Frames Apply and visualize conditional formatting to data frames in R. It presents a data frame as an HTML table with cells CSS formatted according to criteria defined by rules, using a syntax similar to ‘ggplot2’. The table is printed either opening a web browser or within the ‘RStudio’ viewer if available. The conditional formatting rules allow to highlight cells matching a condition or add a gradient background to a given column based on a column values. CondIndTests Nonlinear Conditional Independence Tests Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, ), Residual Prediction test (based on Shah and Buehlmann, ), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., ). condir Computation of P Values and Bayes Factors for Conditioning Data Set of functions for the easy analyses of conditioning data. conditions Standardized Conditions for R Implements specialized conditions, i.e., typed errors, warnings and messages. Offers a set of standardized conditions (value error, deprecated warning, io message, …) in the fashion of Python’s built-in exceptions. condSURV Estimation of the Conditional Survival Function for Ordered Multivariate Failure Time Data Method to implement some newly developed methods for the estimation of the conditional survival function. condvis Conditional Visualization for Statistical Models Exploring fitted model structures by interactively taking 2-D and 3-D sections in data space. configr An Implementation of Parsing and Writing Configuration File (JSON/INI/YAML) Implements the YAML parser, JSON parser and INI parser for R setting and writing of configuration file. The functionality of this package is similar to that of package ‘config’. confinterpret Descriptive Interpretations of Confidence Intervals Produces descriptive interpretations of confidence intervals. Includes (extensible) support for various test types, specified as sets of interpretations dependent on where the lower and upper confidence limits sit. conformal Conformal Prediction for Regression and Classification Implementation of conformal prediction using caret models for classification and regression ConfoundedMeta Sensitivity Analyses for Unmeasured Confounding in Meta-Analyses Conducts sensitivity analyses for unmeasured confounding in random-effects meta-analysis per Mathur & VanderWeele (in preparation). Given output from a random-effects meta-analysis with a relative risk outcome, computes point estimates and inference for: (1) the proportion of studies with true causal effect sizes more extreme than a specified threshold of scientific significance; and (2) the minimum bias factor and confounding strength required to reduce to less than a specified threshold the proportion of studies with true effect sizes of scientifically significant size. Creates plots and tables for visualizing these metrics across a range of bias values. confSAM Estimates and Bounds for the False Discovery Proportion, by Permutation For multiple testing. Computes estimates and confidence bounds for the False Discovery Proportion (FDP), the fraction of false positives among all rejected hypotheses. The methods in the package use permutations of the data. Doing so, they take into account the dependence structure in the data. Conigrave Flexible Tools for Multiple Imputation Provides a set of tools that can be used across ‘data.frame’ and ‘imputationList’ objects. connect3 A Tool for Reproducible Research by Converting ‘LaTeX’ Files Generated by R Sweave to Rich Text Format Files# Converts ‘LaTeX’ files (with extension ‘.tex’) generated by R Sweave using package ‘knitr’ to Rich Text Format files (with extension ‘.rtf’). Rich Text Format files can be read and written by most word processors. conover.test Conover-Iman Test of Multiple Comparisons Using Rank Sums Computes the Conover-Iman test (1979) for stochastic dominance and reports the results among multiple pairwise comparisons after a Kruskal-Wallis test for stochastic dominance among k groups (Kruskal and Wallis, 1952). The interpretation of stochastic dominance requires an assumption that the CDF of one group does not cross the CDF of the other. conover.test makes k(k-1)/2 multiple pairwise comparisons based on Conover-Iman t-test-statistic of the rank differences. The null hypothesis for each pairwise comparison is that the probability of observing a randomly selected value from the first group that is larger than a randomly selected value from the second group equals one half; this null hypothesis corresponds to that of the Wilcoxon-Mann-Whitney rank-sum test. Like the rank-sum test, if the data can be assumed to be continuous, and the distributions are assumed identical except for a difference in location, Conover-Iman test may be understood as a test for median difference. conover.test accounts for tied ranks. The Conover-Iman test is strictly valid if and only if the corresponding Kruskal-Wallis null hypothesis is rejected. ConSpline Partial Linear Least-Squares Regression using Constrained Splines Given response y, continuous predictor x, and covariate matrix, the relationship between E(y) and x is estimated with a shape-constrained regression spline. Function outputs fits and various types of inference. ConsRank Compute the Median Ranking(s) According to the Kemeny’s Axiomatic Approach Compute the median ranking according the Kemeny’s axiomatic approach. Rankings can or cannot contain ties, rankings can be both complete or incomplete. constants Reference on Constants, Units and Uncertainty CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with errors and/or the values with units are also provided if the ‘errors’ and/or the ‘units’ packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This package contains the ‘2014 CODATA’ version, published on 25 June 2015: Mohr, P. J., Newell, D. B. and Taylor, B. N. (2016) , . ContaminatedMixt Model-Based Clustering and Classification with the Multivariate Contaminated Normal Distribution Fits mixtures of multivariate contaminated normal distributions (with eigen-decomposed scale matrices) via the expectation conditional- maximization algorithm under a clustering or classification paradigm. ContourFunctions Create Contour Plots from Data or a Function Provides functions for making contour plots. The contour plot can be created from grid data, a function, or a data set. If non-grid data is given, then a Gaussian process is fit to the data and used to create the contour plot. controlTest Median Comparison for Two-Sample Right-Censored Survival Data Nonparametric two-sample procedure for comparing the median survival time. convertGraph Convert Graphical Files Format Converts graphical file formats (SVG, PNG, JPEG, BMP, GIF, PDF, etc) to one another. The exceptions are the SVG file format that can only be converted to other formats and in contrast, PDF format, which can only be created from others graphical formats. The main purpose of the package was to provide a solution for converting SVG file format to PNG which is often needed for exporting graphical files produced by R widgets. convertr Convert Between Units Provides conversion functionality between a broad range of scientific, historical, and industrial unit types. convexjlr Disciplined Convex Programming in R using Convex.jl Package convexjlr provides a simple high-level wrapper for Julia package ‘Convex.jl’ (see for more information), which makes it easy to describe and solve convex optimization problems in R. The problems can be dealt with include: linear programs, second-order cone programs, semidefinite programs, exponential cone programs. convey Income Concentration Analysis with Complex Survey Samples Variance estimation on indicators of income concentration and poverty using linearized or replication-based survey designs. Wrapper around the survey package. convoSPAT Convolution-Based Nonstationary Spatial Modeling Fits convolution-based nonstationary Gaussian process models to point-referenced spatial data. The nonstationary covariance function allows the user to specify the underlying correlation structure and which spatial dependence parameters should be allowed to vary over space: the anisotropy, nugget variance, and process variance. The parameters are estimated via maximum likelihood, using a local likelihood approach. Also provided are functions to fit stationary spatial models for comparison, calculate the kriging predictor and standard errors, and create various plots to visualize nonstationarity. coop Co-Operation: Fast Covariance, Correlation, and Cosine Similarity Operations Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R’s S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes. copCAR Fitting the copCAR Regression Model for Discrete Areal Data Provides tools for fitting the copCAR regression model for discrete areal data. Three types of estimation are supported: continuous extension, composite marginal likelihood, and distributional transform. coprimary Sample Size Calculation for Two Primary Time-to-Event Endpoints in Clinical Trials Computes the required number of patients for two time-to-event end-points as primary endpoint in phase III clinical trial. coRanking Co-Ranking Matrix Calculates the co-ranking matrix to assess the quality of a dimensionality reduction. Corbi Collection of Rudimentary Bioinformatics Tools Provides a bundle of basic and fundamental bioinformatics tools, such as network querying and alignment. cord Community Estimation in G-Models via CORD Partition data points (variables) into communities/clusters, similar to clustering algorithms, such as k-means and hierarchical clustering. This package implements a clustering algorithm based on a new metric CORD, defined for high dimensional parametric or semi-parametric distributions. Read http://…/1508.01939 for more details. CORE Cores of Recurrent Events given a collection of intervals with integer start and end positions, find recurrently targeted regions and estimate the significance of finding. Randomization is implemented by parallel methods, either using local host machines, or submitting grid engine jobs. corehunter Fast and Flexible Core Subset Selection Interface to the Core Hunter software for core subset selection. Cores can be constructed based on genetic marker data, phenotypic traits, a precomputed distance matrix, or any combination of these. Various measures are included such as Modified Rogers’ distance and Shannon’s diversity index (for genotypes) and Gower’s distance (for phenotypes). Core Hunter can also optimize a weighted combination of multiple measures, to bring the different perspectives closer together. CORElearn Classification, Regression and Feature Evaluation This is a suite of machine learning algorithms written in C++ with R interface. It contains several machine learning model learning techniques in classification and regression, for example classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. It is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, DKM. These methods can be used for example to discretize numeric attributes. Its additional strength is OrdEval algorithm and its visualization used for evaluation of data sets with ordinal features and class enabling analysis according to the Kano model. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn. coreSim Core Functionality for Simulating Quantities of Interest from Generalised Linear Models Core functions for simulating quantities of interest from generalised linear models (GLM). This package will form the backbone of a series of other packages that improve the interpretation of GLM estimates. corkscrew Preprocessor for Data Modeling Includes binning categorical variables into lesser number of categories based on t-test, converting categorical variables into continuous features using the mean of the response variable for the respective categories, understanding the relationship between the response variable and predictor variables using data transformations. corlink Record Linkage, Incorporating Imputation for Missing Agreement Patterns, and Modeling Correlation Patterns Between Fields A matrix of agreement patterns and counts for record pairs is the input for the procedure. An EM algorithm is used to impute plausible values for missing record pairs. A second EM algorithm, incorporating possible correlations between per-field agreement, is used to estimate posterior probabilities that each pair is a true match – i.e. constitutes the same individual. CorporaCoCo Corpora Co-Occurrence Comparison A set of functions used to compare co-occurrence between two corpora. corpus Text Corpus Analysis Text corpus data analysis, with full support for UTF8-encoded Unicode text. The package provides the ability to seamlessly read and process text from large JSON files without holding all of the data in memory simultaneously. corr2D Implementation of 2D Correlation Analysis Implementation of two-dimensional (2D) correlation analysis based on the Fourier-transformation approach described by Isao Noda (I. Noda (1993) ). Additionally there are two plot functions for the resulting correlation matrix: The first one creates coloured 2D plots, while the second one generates 3D plots. correctedAUC Correcting AUC for Measurement Error Correcting area under ROC (AUC) for measurement error based on probit-shift model. corregp Functions and Methods for Correspondence Regression A collection of tools for correspondence regression, i.e. the correspondence analysis of the crosstabulation of a categorical variable Y in function of another one X, where X can in turn be made up of the combination of various categorical variables. Consequently, correspondence regression can be used to analyze the effects for a polytomous or multinomial outcome variable. corrr Correlations in R A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualising the matrix in terms of the strength of the correlations. CorrToolBox Modeling Correlational Magnitude Transformations in Discretization Contexts Modeling the correlation transitions under specified distributional assumptions within the realm of discretization in the context of the latency and threshold concepts. corset Arbitrary Bounding of Series and Time Series Objects Set of methods to constrain numerical series and time series within arbitrary boundaries. CosW The CosW Distribution Density, distribution function, quantile function, random generation and survival function for the Cosine Weibull Distribution as defined by SOUZA, L. New Trigonometric Class of Probabilistic Distributions. 219 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2015 (available at ) and BRITO, C. C. R. Method Distributions generator and Probability Distributions Classes. 241 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2014 (available upon request). Counterfactual Estimation and Inference Methods for Counterfactual Analysis Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, Fernandez-Val and Melly (2013) ‘Inference on Counterfactual Distributions,’ Econometrica, 81(6). The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be applied to estimate quantile treatment effects and wage decompositions. Countr Flexible Univariate and Bivariate Count Process Probability Flexible univariate and bivariate count models based on the Weibull distribution. The models may include covariates and can be specified with familiar formula syntax. COUSCOus A Residue-Residue Contact Detecting Method Contact prediction using shrinked covariance (COUSCOus). COUSCOus is a residue-residue contact detecting method approaching the contact inference using the glassofast implementation of Matyas and Sustik (2012, The University of Texas at Austin UTCS Technical Report 2012:1-3. TR-12-29.) that solves the L_1 regularised Gaussian maximum likelihood estimation of the inverse of a covariance matrix. Prior to the inverse covariance matrix estimation we utilise a covariance matrix shrinkage approach, the empirical Bayes covariance estimator, which has been shown by Haff (1980) to be the best estimator in a Bayesian framework, especially dominating estimators of the form aS, such as the smoothed covariance estimator applied in a related contact inference technique PSICOV. covafillr Local Polynomial Regression of State Dependent Covariates in State-Space Models Facilitates local polynomial regression for state dependent covariates in state-space models. The functionality can also be used from ‘C++’ based model builder tools such as ‘Rcpp’/’inline’, ‘TMB’, or ‘JAGS’. covatest Tests on Properties of Space-Time Covariance Functions Tests on properties of space-time covariance functions. Tests on symmetry, separability and for assessing different forms of non-separability are available. Moreover tests on some classes of covariance functions, such that the classes of product-sum models, Gneiting models and integrated product models have been provided. covmat Covariance Matrix Estimation We implement a collection of techniques for estimating covariance matrices. Covariance matrices can be built using missing data. Stambaugh Estimation and FMMC methods can be used to construct such matrices. Covariance matrices can be built by denoising or shrinking the eigenvalues of a sample covariance matrix. Such techniques work by exploiting the tools in Random Matrix Theory to analyse the distribution of eigenvalues. Covariance matrices can also be built assuming that data has many underlying regimes. Each regime is allowed to follow a Dynamic Conditional Correlation model. Robust covariance matrices can be constructed by multivariate cleaning and smoothing of noisy data. covr Test Coverage for Packages Track and report code coverage for your package and (optionally) upload the results to a coverage service like Codecov (http://codecov.io ) or Coveralls (http://coveralls.io ). Code coverage is a measure of the amount of code being exercised by the tests. It is an indirect measure of test quality. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/Fortran code. CovSelHigh Model-Free Covariate Selection in High Dimensions Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) and VanderWeele and Shpitser (2011) . Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO. cowbell Performs Segmented Linear Regression on Two Independent Variables Implements a specific form of segmented linear regression with two independent variables. The visualization of that function looks like a quarter segment of a cowbell giving the package its name. The package has been specifically constructed for the case where minimum and maximum value of the dependent and two independent variables are known a prior, which is usually the case when those values are derived from Likert scales. cowplot Streamlined Plot Theme and Plot Annotations for ‘ggplot2’ Some helpful extensions and modifications to the ‘ggplot2’ library. In particular, this package makes it easy to combine multiple ‘ggplot2’ plots into one and label them with letters, e.g. A, B, C, etc., as is often required for scientific publications. The package also provides a streamlined and clean theme that is used in the Wilke lab, hence the package name, which stands for Claus O. Wilke’s plot library. Coxnet Regularized Cox Model Cox model regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty. In addition, it efficiently solves an approximate L0 variable selection based on truncated likelihood function. Moreover, it can also handle the adaptive version of these regularization forms, such as adaptive lasso and net adjusting for signs of linked coefficients. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients. coxphMIC Sparse estimation method for Cox Proportional Hazards coxphMIC, which implements the sparse estimation method for Cox proportional hazards models via approximated information criterion (Su et al., 2016 Biometrics). The developed methodology is named MIC which stands for ‘Minimizing approximated Information Criteria’. A reparameterization step is introduced to enforce sparsity while at the same time keeping the objective function smooth. As a result, MIC is computationally fast with a superior performance in sparse estimation. CoxPlus Cox Regression (Proportional Hazards Model) with Multiple Causes and Mixed Effects A high performance package estimating Proportional Hazards Model when an even can have more than one causes, including support for random and fixed effects, tied events, and time-varying variables. CP Conditional Power Calculations Functions for calculating the conditional power for different models in survival time analysis within randomized clinical trials with two different treatments to be compared and survival as an endpoint. cpm Sequential and Batch Change Detection Using Parametric and Nonparametric Methods Sequential and batch change detection for univariate data streams, using the change point model framework. Functions are provided to allow the parametric monitoring of sequences of Gaussian, Bernoulli and Exponential random variables, along with functions implementing more general nonparametric methods for monitoring sequences which have an unspecified or unknown distribution. cpr Control Polygon Reduction Implementation of the Control Polygon Reduction and Control Net Reduction methods for finding parsimonious B-spline regression models. CPsurv Nonparametric Change Point Estimation for Survival Data Nonparametric change point estimation for survival data based on p-values of exact binomial tests. cpt Classification Permutation Test Non-parametric test for equality of multivariate distributions. Trains a classifier to classify (multivariate) observations as coming from one of two distributions. If the classifier is able to classify the observations better than would be expected by chance (using permutation inference), then the null hypothesis that the two distributions are equal is rejected. cpumemlog Monitor CPU and RAM usage of a process (and its children) cpumemlog.sh is a Bash shell script that monitors CPU and RAM usage of a given process and its children. The main aim for writing this script was to get insight about the behaviour of a process and to spot bottlenecks without GUI tools, e.g., cpumemlog.sh it is very useful to spot that the computationally intensive process on a remote server died due to hitting RAM limit or something of that sort. The statistics about CPU, RAM, and all that are gathered from the system utility ps. While the utility top can be used for this interactively, it is tedious to stare at its dynamic output and quite hard to spot consumption at the peak and follow the trends etc. Yet another similar utility is time, which though only gives consumption of resources at the peak. cpumemlogplot.R is a companion R script to cpumemlog.sh used to summarize and plot the gathered data. cqrReg Quantile, Composite Quantile Regression and Regularized Versions Estimate quantile regression(QR) and composite quantile regression (cqr) and with adaptive lasso penalty using interior point (IP), majorize and minimize(MM), coordinate descent (CD), and alternating direction method of multipliers algorithms(ADMM). cquad Conditional Maximum Likelihood for Quadratic Exponential Models for Binary Panel Data Estimation, based on conditional maximum likelihood, of the quadratic exponential model proposed by Bartolucci, F. & Nigro, V. (2010, Econometrica) and of a simplified and a modified version of this model. The quadratic exponential model is suitable for the analysis of binary longitudinal data when state dependence (further to the effect of the covariates and a time-fixed individual intercept) has to be taken into account. Therefore, this is an alternative to the dynamic logit model having the advantage of easily allowing conditional inference in order to eliminate the individual intercepts and then getting consistent estimates of the parameters of main interest (for the covariates and the lagged response). The simplified version of this model does not distinguish, as the original model does, between the last time occasion and the previous occasions. The modified version formulates in a different way the interaction terms and it may be used to test in a easy way state dependence as shown in Bartolucci, F., Nigro, V. & Pigini, C. (2013, Econometric Reviews). The package also includes estimation of the dynamic logit model by a pseudo conditional estimator based on the quadratic exponential model, as proposed by Bartolucci, F. & Nigro, V. (2012, Journal of Econometrics). crandatapkgs Find Data-Only Packages on CRAN Provides a data.frame listing of known data-only and data-heavy packages available on CRAN. cranlike Tools for ‘CRAN’-Like Repositories A set of functions to manage ‘CRAN’-like repositories efficiently. cranlogs Download Logs from the RStudio CRAN Mirror API to the database of CRAN package downloads from the RStudio CRAN mirror. The database itself is at http://cranlogs.r-pkg.org , see https://…/cranlogs.app for the raw API. CRANsearcher RStudio Addin for Searching Packages in CRAN Database Based on Keywords One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to Bayesian inference and from spatial analyses to pharmacokinetics ( ). There is probably not an area of quantitative research that isn’t represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio. credsubs Credible Subsets Functions for constructing simultaneous credible bands and identifying subsets via the ‘credible subsets’ (also called ‘credible subgroups’) method. crisp Fits a Model that Partitions the Covariate Space into Blocks in a Data- Adaptive Way Implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 . crminer Fetch ‘Scholary’ Full Text from ‘Crossref’ Text mining client for ‘Crossref’ ( ). Includes functions for getting getting links to full text of articles, fetching full text articles from those links or Digital Object Identifiers (‘DOIs’), and text extraction from ‘PDFs’. crochet Implementation Helper for [ and [<- Of Custom Matrix-Like Types Functions to help implement the extraction / subsetting / indexing function [ and replacement function [<- of custom matrix-like types (based on S3, S4, etc.), modeled as closely to the base matrix class as possible (with tests to prove it). cronR Schedule R Scripts and Processes with the ‘cron’ Job Scheduler Create, edit, and remove ‘cron’ jobs on your unix-alike system. The package provides a set of easy-to-use wrappers to ‘crontab’. It also provides an RStudio add-in to easily launch and schedule your scripts. crop Graphics Cropping Tool A device closing function which is able to crop graphics (e.g., PDF, PNG files) on Unix-like operating systems with the required underlying command-line tools installed. CrossClustering A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters and Identification of Outliers Computes a partial clustering algorithm that combines the Ward’s minimum variance and Complete Linkage algorithms, providing automatic estimation of a suitable number of clusters and identification of outlier elements. crossdes Construction of Crossover Designs Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance. Crossover Analysis and Search of Crossover Designs Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them. crosstalk Inter-Widget Interactivity for HTML Widgets Provides building blocks for allowing HTML widgets to communicate with each other, with Shiny or without (i.e. static .html files). Currently supports linked brushing and filtering. crrp Penalized Variable Selection in Competing Risks Regression In competing risks regression, the proportional subdistribution hazards(PSH) model is popular for its direct assessment of covariate effects on the cumulative incidence function. This package allows for penalized variable selection for the PSH model. Penalties include LASSO, SCAD, MCP, and their group versions. crskdiag Diagnostics for Fine and Gray Model Provides the implementation of analytical and graphical approaches for checking the assumptions of the Fine and Gray model. crsnls Nonlinear Regression Parameters Estimation by ‘CRS4HC’ and ‘CRS4HCe’ Functions for nonlinear regression parameters estimation by algorithms based on Controlled Random Search algorithm. Both functions (crs4hc(), crs4hce()) adapt current search strategy by four heuristics competition. In addition, crs4hce() improves adaptability by adaptive stopping condition. crtests Classification and Regression Tests Provides wrapper functions for running classification and regression tests using different machine learning techniques, such as Random Forests and decision trees. The package provides standardized methods for preparing data to suit the algorithm’s needs, training a model, making predictions, and evaluating results. Also, some functions are provided to run multiple instances of a test. CRTgeeDR Doubly Robust Inverse Probability Weighted Augmented GEE Estimator Implements a semi-parametric GEE estimator accounting for missing data with Inverse-probability weighting (IPW) and for imbalance in covariates with augmentation (AUG). The estimator IPW-AUG-GEE is Doubly robust (DR). crul HTTP Client A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby’s ‘faraday’ gem ( ). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package ‘curl’, an interface to ‘libcurl’ ( ). crunch Crunch.io Data Tools The Crunch.io service (http://crunch.io ) provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface. crunchy Shiny Apps on Crunch To facilitate building custom dashboards on the Crunch data platform , the ‘crunchy’ package provides tools for working with ‘shiny’. These tools include utilities to manage authentication and authorization automatically and custom stylesheets to help match the look and feel of the Crunch web application. CSeqpat Frequent Contiguous Sequential Pattern Mining of Text Mines contiguous sequential patterns in text. csn Closed Skew-Normal Distribution Provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42. csp Correlates of State Policy Data Set in R Provides the Correlates of State Policy data set for easy use in R. csrplus Methods to Test Hypotheses on the Distribution of Spatial Point Processes Includes two functions to evaluate the hypothesis of complete spatial randomness (csr) in point processes. The function ‘mwin’ calculates quadrat counts to estimate the intensity of a spatial point process through the moving window approach proposed by Bailey and Gatrell (1995). Event counts are computed within a window of a set size over a fine lattice of points within the region of observation. The function ‘pielou’ uses the nearest neighbor test statistic and asymptotic distribution proposed by Pielou (1959) to compare the observed point process to one generated under csr. The value can be compared to that given by the more widely used test proposed by Clark and Evans (1954). cssTools Cognitive Social Structure Tools A collection of tools for estimating a network from a random sample of cognitive social structure (CSS) slices. Also contains functions for evaluating a CSS in terms of various error types observed in each slice. cstab Selection of Number of Clusters via Normalized Clustering Instability Selection of the number of clusters in cluster analysis using stability methods. csv Read and Write CSV Files with Selected Conventions Reads and writes CSV with selected conventions. Uses the same generic function for reading and writing to promote consistent formats. cthreshER Continuous Threshold Expectile Regression Estimation and inference methods for the continuous threshold expectile regression. It can fit the continuous threshold expectile regression and test the existence of change point, for the paper, ‘Feipeng Zhang and Qunhua Li (2016). A continuous threshold expectile regression, submitted.’ CTM A Text Mining Toolkit for Chinese Document The CTM package is designed to solve problems of text mining and is specific for Chinese document. ctmcd Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data Functions for estimating Markov generator matrices from discrete-time observations. The implemented approaches comprise diagonal adjustment, weighted adjustment and quasi-optimization of matrix logarithm based candidate solutions, an expectation-maximization algorithm as well as a Gibbs sampler. ctqr Censored and Truncated Quantile Regression Estimation of quantile regression models for survival data. ctsem Continuous Time Structural Equation Modelling An easily accessible continuous (and discrete) time dynamic modelling package for panel and time series data, reliant upon the OpenMx. package (http://openmx.psyc.virginia.edu ) for computation. Most dynamic modelling approaches to longitudinal data rely on the assumption that time intervals between observations are consistent. When this assumption is adhered to, the data gathering process is necessarily limited to a specific schedule, and when broken, the resulting parameter estimates may be biased and reduced in power. Continuous time models are conceptually similar to vector autoregressive models (thus also the latent change models popularised in a structural equation modelling context), however by explicitly including the length of time between observations, continuous time models are freed from the assumption that measurement intervals are consistent. This allows: data to be gathered irregularly; the elimination of noise and bias due to varying measurement intervals; parsimonious structures for complex dynamics. The application of such a model in this SEM framework allows full-information maximum-likelihood estimates for both N = 1 and N > 1 cases, multiple measured indicators per latent process, and the flexibility to incorporate additional elements, including individual heterogeneity in the latent process and manifest intercepts, and time dependent and independent exogenous covariates. Furthermore, due to the SEM implementation we are able to estimate a random effects model where the impact of time dependent and time independent predictors can be assessed simultaneously, but without the classic problems of random effects models assuming no covariance between unit level effects and predictors. ctsmr Continuous Time Stochastic Modelling for R CTSM is a tool for estimating embedded parameters in a continuous time stochastic state space model. CTSM has been developed at DTU Compute (former DTU Informatics) over several years. CTSM-R provides a new scripting interface through the statistical language R. Mixing CTSM with R provides easy access to data handling and plotting tools required in any kind of modelling. CTTShiny Classical Test Theory via Shiny Interactive shiny application for running classical test theory (item analysis). CUB A Class of Mixture Models for Ordinal Data Estimating and testing models for ordinal data within the family of CUB models and their extensions (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions). Cubist Rule- and Instance-Based Regression Modeling Regression modeling using rules with added instance-based corrections. CuCubes MultiDimensional Feature Selection (MDFS) Functions for MultiDimensional Feature Selection (MDFS): * calculating multidimensional information gains, * finding interesting tuples for chosen variables, * scoring variables, * finding important variables, * plotting selection results. CuCubes is also known as CUDA Cubes and it is a library that allows fast CUDA-accelerated computation of information gains in binary classification problems. This package wraps CuCubes and provides an alternative CPU version as well as helper functions for building MultiDimensional Feature Selectors. CUFF Charles’s Utility Function using Formula Utility functions that provides wrapper to descriptive base functions like correlation, mean and table . It makes use of the formula interface to pass variables to functions. It also provides operators like to concatenate (%+%), to repeat and manage character vector for nice display. curl A Modern and Flexible Web Client for R The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https://, ftps://), ‘gzip’ compression, authentication, and other ‘libcurl’ goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of ‘libcurl’ is recommended; for a more-user-friendly web client see the ‘httr’ package which builds on this package with HTTP specific tools and logic. The curl package: a modern R interface to libcurl curlconverter Tools to Transform ‘cURL’ Command-Line Calls to ‘httr’ Requests Deciphering web/’REST’ ‘API’ and ‘XHR’ calls can be tricky, which is one reason why internet browsers provide ‘Copy as cURL’ functionality within their ‘Developer Tools’ pane(s). These ‘cURL’ command-lines can be difficult to wrangle into an ‘httr’ ‘GET’ or ‘POST’ request, but you can now ‘straighten’ these ‘cURLs’ either from data copied to the system clipboard or by passing in a vector of ‘cURL’ command-lines and getting back a list of parameter elements which can be used to form ‘httr’ requests. You can also make a complete/working/callable ‘httr::VERB’ function right from the tools provided. curry Partial Function Application with %<%, %-<% Partial application is the process of reducing the arity of a function by fixing one or more arguments, thus creating a new function lacking the fixed arguments. The curry package provides three different ways of performing partial function application by fixing arguments from either end of the argument list (currying and tail currying) or by fixing multiple named arguments (partial application). This package provides this functionality through the %<%, %-<%, and %><% operators which allows for a programming style comparable to modern functional languages. Compared to other implementations such a purrr::partial() the operators in curry composes functions with named arguments, aiding in autocomplete etc. customizedTraining Customized Training for Lasso and Elastic-Net Regularized Generalized Linear Models Customized training is a simple technique for transductive learning, when the test covariates are known at the time of training. The method identifies a subset of the training set to serve as the training set for each of a few identified subsets in the training set. This package implements customized training for the glmnet() and cv.glmnet() functions. CUSUMdesign Compute Decision Interval and Average Run Length for CUSUM Charts Computation of decision intervals (H) and average run lengths (ARL) for CUSUM charts. cvequality Tests for the Equality of Coefficients of Variation from Multiple Groups Contains functions for testing for significant differences between multiple coefficients of variation. Includes Feltz and Miller’s (1996) asymptotic test and Krishnamoorthy and Lee’s (2014) modified signed-likelihood ratio test. See the vignette for more, including full details of citations. CVR Canonical Variate Regression Perform canonical variate regression (CVR) for two sets of covariates and a univariate response, with regularization and weight parameters tuned by cross validation. cvxbiclustr Convex Biclustering Algorithm An iterative algorithm for solving a convex formulation of the biclustering problem. cyclocomp Cyclomatic Complexity of R Code Cyclomatic complexity is a software metric (measurement), used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program’s source code. It was developed by Thomas J. McCabe, Sr. in 1976. Cyclops Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets. D d3heatmap A D3.js-based heatmap htmlwidget for R This is an R package that implements a heatmap htmlwidget. It has the following features: • Highlight rows/columns by clicking axis labels • Click and drag over colormap to zoom in (click on colormap to zoom out) • Optional clustering and dendrograms, courtesy of base::heatmap Interactive heat maps D3M Two Sample Test with Wasserstein Metric Two sample test based on Wasserstein metric. This is motivated from detection of differential DNA-methylation sites based on underlying distributions. D3partitionR Plotting D3 Hierarchical Plots in R and Shiny Plotting hierarchical plots in R such as Sunburst, Treemap, Circle Treemap and Partition Chart. d3r d3.js’ Utilities for R Helper functions for using ‘d3.js’ in R. d3Tree Create Interactive Collapsible Trees with the JavaScript ‘D3’ Library Create and customize interactive collapsible ‘D3’ trees using the ‘D3’ JavaScript library and the ‘htmlwidgets’ package. These trees can be used directly from the R console, from ‘RStudio’, in Shiny apps and R Markdown documents. When in Shiny the tree layout is observed by the server and can be used as a reactive filter of structured data. DA.MRFA Dimensionality Assessment using Minimum Rank Factor Analysis Performs Parallel Analysis for assessing the dimensionality of a set of variables using Minimum Rank Factor Analysis (see Timmerman & Lorenzo-Seva (2011) and ten Berge & Kiers (1991) for more information). The package also includes the option to compute Minimum Rank Factor Analysis by itself, as well as the Greater Lower Bound calculation. dad Three-Way Data Analysis Through Densities The three-way data consists of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides functional methods (principal component analysis, multidimensional scaling, discriminant analysis…) for such probability densities. daff Diff, Patch and Merge for Data.frames Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff. Daff uses the V8 package to wrap the ‘daff.js’ javascript library which is included in the package. Daff exposes a subset of ‘daff.js’ functionality, tailored for usage within R. dagitty Graphical Analysis of Structural Causal Models A port of the web-based software “DAGitty” for analyzing structural causal models (also known as directed acyclic graphs or DAGs). The package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation. dashboard Interactive Data Visualization with D3.js The dashboard package allows users to create web pages which display interactive data visualizations working in a standard modern browser. It displays them locally using the Rook server. Nor knowledge about web technologies nor Internet connection are required. D3.js is a JavaScript library for manipulating documents based on data. D3 helps the dashboard package bring data to life using HTML, SVG and CSS. dat Tools for Data Manipulation An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to ‘dplyr’ for common transformations on data frames to work around non standard evaluation by default. data.table Extension of data.frame Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Offers a natural and flexible syntax, for faster development. data.tree Hierarchical Data Structures Create tree structures from hierarchical data, and use the utility methods to traverse the tree in various orders. Aggregate, print, convert to and from data.frame, and apply functions to your tree data. Useful for decision trees, machine learning, finance, and many other applications. data.world Main Package for Working with ‘data.world’ Data Sets High-level tools for working with data.world data sets. data.world is a community where you can find interesting data, store and showcase your own data and data projects, and find and collaborate with other members. In addition to exploring, querying and charting data on the data.world site, you can access data via ‘API’ endpoints and integrations. Use this package to access, query and explore data sets, and to integrate data into R projects. Visit , for additional information. datacheckr Data Frame Column Name, Class and Value Checking The primary function check_data() checks a data frame for column presence, column class and column values. If the user-defined conditions are met the function returns the an invisible copy of the original data frame, otherwise the function throws an informative error. DataClean Data Cleaning Includes functions that researchers or practitioners may use to clean raw data, transferring html, xlsx, txt data file into other formats. And it also can be used to manipulate text variables, extract numeric variables from text variables and other variable cleaning processes. It is originated from a author’s project which focuses on creative performance in online education environment. The resulting paper of that study will be published soon. dataCompareR Compare Two Data Frames and Summarise the Difference Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn’t intended to replace all.equal() as a way to test for equality. datadr Divide and Recombine for Large, Complex Data Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE). DataEntry Make it Easier to Enter Questionnaire Data This is a GUI application for defining attributes and setting valid values of variables, and then, entering questionnaire data in a data.frame. DataExplorer Data Explorer Data exploration process for data analysis and model building, so that users could focus on understanding data and extracting insights. The package automatically scans through each variable and does data profiling. Typical graphical techniques will be performed for both discrete and continuous features. datafsm Estimating Finite State Machine Models from Data Our method automatically generates models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it’s ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity. DataLoader Import Multiple File Types Functions to import multiple files of multiple data file types (‘.xlsx’, ‘.xls’, ‘.csv’, ‘.txt’) from a given directory into R data frames. dataMaid A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Cleaning Process Data cleaning is an important first step of any statistical analysis. dataMaid provides an extendable suite of test for common potential errors in a dataset. It produces a document with a thorough summary of the checks and the results that a human can use to identify possible errors. dataMeta Create and Append a Data Dictionary for an R Dataset Designed to create a basic data dictionary and append to the original dataset’s attributes list. The package makes use of a tidy dataset and creates a data frame that will serve as a linker that will aid in building the dictionary. The dictionary is then appended to the list of the original dataset’s attributes. The user will have the option of entering variable and item descriptions by writing code or use alternate functions that will prompt the user to add these. datapack A Flexible Container to Transport and Manipulate Data and Associated Resources Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI-ORE standard is described at . Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at . datarobot DataRobot Predictive Modeling API For working with the DataRobot predictive modeling platform’s API. datasauRus Datasets from the Datasaurus Dozen The Datasaurus Dozen is a set of datasets with the same summary statistics. They retain the same summary statistics despite having radically different distributions. The datasets represent a larger and quirkier object lesson that is typically taught via Anscombe’s Quartet (available in the ‘datasets’ package). Anscombe’s Quartet contains four very different distributions with the same summary statistics and as such highlights the value of visualisation in understanding data, over and above summary statistics. As well as being an engaging variant on the Quartet, the data is generated in a novel way. The simulated annealing process used to derive datasets from the original Datasaurus is detailed in ‘Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing’ . datasets.load Interface for Loading Datasets Visual interface for loading datasets in RStudio from all installed (unloaded) packages. Datasmith Tools to Complete Euclidean Distance Matrices Implements several algorithms for Euclidean distance matrix completion, Sensor Network Localization, and sparse Euclidean distance matrix completion using the minimum spanning tree. datastepr An Implementation of a SAS-Style Data Step Based on a SAS data step. This allows for row-wise dynamic building of data, iteratively importing slices of existing dataframes, conducting analyses, and exporting to a results frame. This is particularly useful for differential or time-series analyses, which are often not well suited to vector-based operations. datastructures Implementation of Core Data Structures Implementation of advanced data structures such as hashmaps, heaps, or queues. Advanced data structures are essential in many computer science and statistics problems, for example graph algorithms or string analysis. The package uses ‘Boost’ and ‘STL’ data types and extends these to R with ‘Rcpp’ modules. dawai Discriminant Analysis with Additional Information In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate. dbfaker A Tool to Ensure the Validity of Database Writes A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support. dbplyr A ‘dplyr’ Back End for Databases A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author. dbscan Density Based Clustering of Applications with Noise (DBSCAN) A fast reimplementation of the DBSCAN clustering algorithm using the kd-tree data structure for speedup. dc3net Inferring Condition-Specific Networks via Differential Network Inference Performs differential network analysis to infer disease specific gene networks. DCA Dynamic Correlation Analysis for High Dimensional Data Finding dominant latent signals that regulate dynamic correlation between many pairs of variables. DClusterm Model-Based Detection of Disease Clusters Model-based methods for the detection of disease clusters using GLMs, GLMMs and zero-inflated models. DCM Data Converter Module Data Converter Module (DCM) converts the dataset format from split into stack and to the reverse. dCovTS Distance Covariance and Correlation for Time Series Analysis Computing and plotting the distance covariance and correlation function of a univariate or a multivariate time series. Test statistics for testing pairwise independence are also implemented. Some data sets are also included. DDM Death Registration Coverage Estimation A set of three two-census methods to the estimate the degree of death registration coverage for a population. Implemented methods include the Generalized Growth Balance method (GGB), the Synthetic Extinct Generation method (SEG), and a hybrid of the two, GGB-SEG. Each method offers automatic estimation, but users may also specify exact parameters or use a graphical interface to guess parameters in the traditional way if desired. ddpcr Analysis and Visualization of Droplet Digital PCR in R and on the Web An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing duplex ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R. ddR Distributed Data Structures in R Provides distributed data structures and simplifies distributed computing in R. DDRTree Learning Principal Graphs with DDRTree Project data into a reduced dimensional space and construct a principal graph from the reduced dimension. deadband Statistical Deadband Algorithms Comparison Statistical deadband algorithms are based on the Send-On-Delta concept as in Miskowicz(2006,). A collection of functions compare effectiveness and fidelity of sampled signals using statistical deadband algorithms. debugme Debug R Packages Specify debug messages as special string constants, and control debugging of packages via environment variables. decision Statistical Decision Analysis Contains a function called dmur() which accepts four parameters like possible values, probabilities of the values, selling cost and preparation cost. The dmur() function generates various numeric decision parameters like MEMV (Maximum (optimum) expected monitory value), best choice, EPPI (Expected profit with perfect information), EVPI (Expected value of the perfect information), EOL (Expected opportunity loss), which facilitate effective decision-making. DecisionCurve Calculate and Plot Decision Curves Decision curves are a useful tool to evaluate the population impact of adopting a risk prediction instrument into clinical practice. Given one or more instruments (risk models) that predict the probability of a binary outcome, this package calculates and plots decision curves, which display estimates of the standardized net benefit by the probability threshold used to categorize observations as ‘high risk.’ Curves can be estimated using data from an observational cohort, or from case-control studies when an estimate of the population outcome prevalence is available. Confidence intervals calculated using the bootstrap can be displayed and a wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided. decisionSupport Quantitative Support of Decision Making under Uncertainty Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual and Clustered Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own. decoder Decode Coded Variables to Plain Text (and Vice Versa) Main function ‘decode’ is used to decode coded key values to plain text. Function ‘code” can be used to code plain text to code if there is a 1:1 relation between the two. The concept relies on ‘keyvalue’ objects used for translation. There are several ‘keyvalue” objects included in the areas of geographical regional codes, administrative health care unit codes, diagnosis codes et cetera but it is also easy to extend the use by arbitrary code sets. decomposedPSF Time Series Prediction with PSF and Decomposition Methods (EMD and EEMD) Predict future values with hybrid combinations of Pattern Sequence based Forecasting (PSF), Autoregressive Integrated Moving Average (ARIMA), Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD) methods based hybrid methods. deconvolveR Empirical Bayes Estimation Strategies Empirical Bayes methods for learning prior distributions from data. An unknown prior distribution (g) has yielded (unobservable) parameters, each of which produces a data point from a parametric exponential family (f). The goal is to estimate the unknown prior (‘g-modeling’) by deconvolution and Empirical Bayes methods. DecorateR Fit and Deploy DECORATE Trees DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) builds an ensemble of J48 trees by recursively adding artificial samples of the training data (‘Melville, P., & Mooney, R. J. (2005). Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 99-111. ’). deductive Data Correction and Imputation Using Deductive Methods Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data. deepboost Deep Boosting Ensemble Modeling Provides deep boosting models training, evaluation, predicting and hyper parameter optimising using grid search and cross validation. Based on Google’s Deep Boosting algorithm, and Google’s C++ implementation. Cortes, C., Mohri, M., & Syed, U. (2014) . deeplearning An Implementation of Deep Neural Network for Regression and Classification An implementation of deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization. A combination of these methods have achieved state-of-the-art performance in ImageNet classification by overcoming the gradient saturation problem experienced by many deep architecture neural network models in the past. In addition, batch normalization and dropout are implemented as a means of regularization. The deeplearning package is inspired by the darch package and uses its class DArch. deepnet deep learning toolkit in R Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on. deformula Integration of One-Dimensional Functions with Double Exponential Formulas Numerical quadrature of functions of one variable over a finite or infinite interval with double exponential formulas. delt Estimation of Multivariate Densities Using Adaptive Partitions We implement methods for estimating multivariate densities. We include a discretized kernel estimator, an adaptive histogram (a greedy histogram and a CART-histogram), stagewise minimization, and bootstrap aggregation. deming Deming, Thiel-Sen and Passing-Bablock Regression Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions. dendextend Extending R’s Dendrogram Functionality Offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings. You can (1) Adjust a trees graphical parameters – the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different dendrograms to one another. denoiSeq Differential Expression Analysis Using a Bottom-Up Model Given count data from two conditions, it determines which transcripts are differentially expressed across the two conditions using Bayesian inference of the parameters of a bottom-up model for PCR amplification. This model is developed in Ndifon Wilfred, Hilah Gal, Eric Shifrut, Rina Aharoni, Nissan Yissachar, Nir Waysbort, Shlomit Reich Zeliger, Ruth Arnon, and Nir Friedman (2012), , and results in a distribution for the counts that is a superposition of the binomial and negative binomial distribution. denoiseR Regularized low rank matrix estimation Regularized low rank matrix estimation denseFLMM Functional Linear Mixed Models for Densely Sampled Data Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis. densityClust Clustering by fast search and find of density peaks An implementation of the clustering algorithm described by Alex Rodriguez and Alessandro Laio (Science, 2014 vol. 344), along with tools to inspect and visualize the results. DensParcorr Dens-Based Method for Partial Correlation Estimation in Large Scale Brain Networks Provide a Dens-based method for estimating functional connection in large scale brain networks using partial correlation. densratio Density Ratio Estimation Density ratio estimation. The estimated density ratio function can be used in many applications such as the inlier-based outlier detection, covariate shift adaptation and etc. DEoptim Global Optimization by Differential Evolution Implements the differential evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector. depmixS4 Dependent Mixture Models – Hidden Markov Models of GLMs and Other Distributions in S4 Fit latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models depth.plot Multivariate Analogy of Quantiles Could be used to obtain spatial depths, spatial ranks and outliers of multivariate random variables. Could also be used to visualize DD-plots (a multivariate generalization of QQ-plots). dequer An R ‘Deque’ Container Offers a special data structure called a ‘deque’ (pronounced like ‘deck’), which is a list-like structure. However, unlike R’s list structure, data put into a ‘deque’ is not necessarily stored contiguously, making insertions and deletions at the front/end of the structure much faster. The implementation here is new and uses a doubly linked list, and whence does not rely on R’s environments. To avoid unnecessary data copying, most ‘deque’ operations are performed via side-effects. desc Manipulate DESCRIPTION Files Tools to read, write, create, and manipulate DESCRIPTION files. It is intended for packages that create or manipulate other packages. describer Describe Data in R Using Common Descriptive Statistics Allows users to quickly and easily describe data using common descriptive statistics. descriptr Descriptive Statistics & Distributions Exploration Generate descriptive statistics such as measures of location, dispersion, frequency tables, cross tables, group summaries and multiple one/two way tables. Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions. desctable Produce Descriptive and Comparative Tables Easily Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as data frames/lists of data frames for easy manipulation after creation, and ready to be saved as csv, or piped to DT::datatable() or pander::pander() to integrate into reports. DescToolsAddIns Some Functions to be Used as Shortcuts in RStudio RStudio as of recently offers the option to define addins and assign shortcuts to them. This package contains AddIns for a few most used functions in an analysts (at least mine) daily work (like str(), example(), plot(), head(), view(), Desc()). Most of these functions will get the current selection in RStudio’s editor window and send the specific command to the console while instantly executing it. Assigning shortcuts to these AddIns will spare you quite a few keystrokes. designGLMM Finding Optimal Block Designs for a Generalised Linear Mixed Model Use simulated annealing to find optimal designs for Poisson regression models with blocks. deSolve General Solvers for Initial Value Problems of Ordinary Differential Equations (ODE), Partial Differential Equations (PDE), Differential Algebraic Equations (DAE), and Delay Differential Equations (DDE) Functions that solve initial value problems of a system of first-order ordinary differential equations (ODE), of partial differential equations (PDE), of differential algebraic equations (DAE), and of delay differential equations. The functions provide an interface to the FORTRAN functions lsoda, lsodar, lsode, lsodes of the ODEPACK collection, to the FORTRAN functions dvode and daspk and a C-implementation of solvers of the Runge-Kutta family with fixed or variable time steps. The package contains routines designed for solving ODEs resulting from 1-D, 2-D and 3-D partial differential equations (PDE) that have been converted to ODEs by numerical differencing. DESP Estimation of Diagonal Elements of Sparse Precision-Matrices Several estimators of the diagonal elements of a sparse precision (inverse covariance) matrix from a sample of Gaussian vectors for a given matrix of estimated marginal regression coefficients. To install package ‘gurobi’, instructions at http://…/gurobi-optimizer and http://…/r_api_overview.html. desplot Plotting Field Plans for Agricultural Experiments A function for plotting maps of agricultural field experiments that are laid out in grids. detector Detect Data Containing Personally Identifiable Information Allows users to quickly and easily detect data containing Personally Identifiable Information (PII) through convenience functions. DetMCD DetMCD Algorithm (Robust and Deterministic Estimation of Location and Scatter) DetMCD is a new algorithm for robust and deterministic estimation of location and scatter. The benefits of robust and deterministic estimation are explained in Hubert, M., Rousseeuw, P.J. and Verdonck, T. (2012),’A deterministic algorithm for robust location and scatter’, Journal of Computational and Graphical Statistics, Volume 21, Number 3, Pages 618-637. DetR Suite of Deterministic and Robust Algorithms for Linear Regression DetLTS, DetMM (and DetS) Algorithms for Deterministic, Robust Linear Regression. devEMF EMF Graphics Output Device Output graphics to EMF (enhanced metafile). devtools Tools to Make Developing R Packages Easier Collection of package development tools. dfCompare Compare Two Dataframes and Return Adds, Changes, and Deletes Compares two dataframes with a common key and returns the delta records. The package will return three dataframes that contain the added, changed, and deleted records. dfphase1 Phase I Control Charts (with Emphasis on Distribution-Free Methods) Statistical methods for retrospectively detecting changes in location and/or dispersion of univariate and multivariate variables. Data can be individual (one observation at each instant of time) or subgrouped (more than one observation at each instant of time). Control limits are computed, often using a permutation approach, so that a prescribed false alarm probability is guaranteed without making any parametric assumptions on the stable (in-control) distribution. dga Capture-Recapture Estimation using Bayesian Model Averaging Performs Bayesian model averaging for capture-recapture. This includes code to stratify records, check the strata for suitable overlap to be used for capture-recapture, and some functions to plot the estimated population size. dGAselID Genetic Algorithm with Incomplete Dominance for Feature Selection Feature selection from high dimensional data using a diploid genetic algorithm with Incomplete Dominance for genotype to phenotype mapping and Random Assortment of chromosomes approach to recombination. dggridR Discrete Global Grids for R Spatial analyses involving binning require that every bin have the same area, but this is impossible using a rectangular grid laid over the Earth or over any projection of the Earth. Discrete global grids use hexagons, triangles, and diamonds to overcome this issue, overlaying the Earth with equally-sized bins. This package provides utilities for working with discrete global grids, along with utilities to aid in plotting such data. dgo Dynamic Estimation of Group-Level Opinion Fit dynamic group-level IRT and MRP models from individual or aggregated item response data. This package handles common preprocessing tasks and extends functions for inspecting results, poststratification, and quick iteration over alternative models. DHARMa Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models The ‘DHARMa’ package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals from fitted generalized linear mixed models. Currently supported are ‘lme4’, ‘glm’ (except quasi-distributions) and ‘lm’ model classes. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problem, such as over/underdispersion, zero-inflation, and spatial and temporal autocorrelation. dHSIC Independence Testing via Hilbert Schmidt Independence Criterion Contains an implementation of the d-variable Hilbert Schmidt independence criterion and several hypothesis tests based on it. diagis Diagnostic Plot and Multivariate Summary Statistics of Weighted Samples from Importance Sampling Fast functions for effective sample size, weighted multivariate mean and variance computation, and weight diagnostic plot for generic importance sampling type results. diagonals Block Diagonal Extraction or Replacement Several tools for handling block-matrix diagonals and similar constructs are implemented. Block-diagonal matrices can be extracted or removed using two small functions implemented here. In addition, non-square matrices are supported. Block diagonal matrices occur when two dimensions of a data set are combined along one edge of a matrix. For example, trade-flow data in the ‘decompr’ and ‘gvc’ packages have each country-industry combination occur along both edges of the matrix. DiagrammeR Create diagrams and flowcharts using R Create diagrams and flowcharts using R. https://…/DiagrammeR DiallelAnalysisR Diallel Analysis with R Performs Diallel Analysis with R using Griffing’s and Hayman’s approaches. Four different methods (1: Method-I (Parents + F1’s + reciprocals); 2: Method-II (Parents and one set of F1’s); 3: Method-III (One set of F1’s and reciprocals); 4: Method-IV (One set of F1’s only)) and two methods (1: Fixed Effects Model; 2: Random Effects Model) can be applied using Griffing’s approach. diceR Diverse Cluster Ensemble in R Performs cluster analysis using an ensemble clustering framework. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters. dichromat Color Schemes for Dichromats Collapse red-green or green-blue distinctions to simulate the effects of different types of color-blindness. DidacticBoost A Simple Implementation and Demonstration of Gradient Boosting A basic, clear implementation of tree-based gradient boosting designed to illustrate the core operation of boosting models. Tuning parameters (such as stochastic subsampling, modified learning rate, or regularization) are not implemented. The only adjustable parameter is the number of training rounds. If you are looking for a high performance boosting implementation with tuning parameters, consider the ‘xgboost’ package. didrooRFM Compute Recency Frequency Monetary Scores for your Customer Data This hosts the findRFM function which generates RFM scores on a 1-5 point scale for customer transaction data. The function consumes a data frame with Transaction Number, Customer ID, Date of Purchase (in date format) and Amount of Purchase as the attributes. The function returns a data frame with RFM data for the sales information. diezeit R Interface to the ZEIT ONLINE Content API A wrapper for the ZEIT ONLINE Content API, available at . ‘diezeit’ gives access to articles and corresponding metadata from the ZEIT archive and from ZEIT ONLINE. A personal API key is required for usage. DIFboost Detection of Differential Item Functioning (DIF) in Rasch Models by Boosting Techniques Performs detection of Differential Item Functioning using the method DIFboost as proposed in Schauberger and Tutz (2015): Detection of Differential item functioning in Rasch models by boosting techniques, British Journal of Mathematical and Statistical Psychology. difconet Differential Coexpressed Networks Estimation of DIFferential COexpressed NETworks using diverse and user metrics. This package is basically used for three functions related to the estimation of differential coexpression. First, to estimate differential coexpression where the coexpression is estimated, by default, by Spearman correlation. For this, a metric to compare two correlation distributions is needed. The package includes 6 metrics. Some of them needs a threshold. A new metric can also be specified as a user function with specific parameters (see difconet.run). The significance is be estimated by permutations. Second, to generate datasets with controlled differential correlation data. This is done by either adding noise, or adding specific correlation structure. Third, to show the results of differential correlation analyses. Please see for further information. Difdtl Difference of Two Precision Matrices Estimation Difference of two precision matrices is estimated by the d-trace loss with lasso penalty, given two sample classes. diffobj Diffs for R Objects Generate a colorized diff of two R objects for an intuitive visualization of their differences. diffpriv Easy Differential Privacy An implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006) . Example mechanisms include the Laplace mechanism for releasing numeric aggregates, and the exponential mechanism for releasing set elements. A sensitivity sampler (Rubinstein & Alda, 2017) permits sampling target non-private function sensitivity; combined with the generic mechanisms, it permits turn-key privatization of arbitrary programs. diffrprojects Projects for Text Version Comparison and Analytics in R Provides data structures and methods for measuring, coding, and analysing text within text corpora. The package allows for manual as well computer aided coding on character, token and text pair level. diffrprojectswidget Visualization for ‘diffrprojects’ Interactive visualizations and tabulations for diffrprojects. All presentations are based on the htmlwidgets framework allowing for interactivity via HTML and Javascript, Rstudio viewer integration, RMarkdown integration, as well as Shiny compatibility. diffusr Network Diffusion Algorithms Implementation of network diffusion algorithms such as insulated heat propagation or Markov random walks. Network diffusion algorithms generally spread information in the form of node weights along the edges of a graph to other nodes. These weights can for example be interpreted as temperature, an initial amount of water, the activation of neurons in the brain, or the location of a random surfer in the internet. The information (node weights) is iteratively propagated to other nodes until a equilibrium state or stop criterion occurs. difNLR Detection of Dichotomous Differential Item Functioning (DIF) by Non-Linear Regression Function Detection of differential item functioning among dichotomously scored items with non-linear regression procedure. difR Collection of methods to detect dichotomous differential item functioning (DIF) in psychometrics The difR package contains several traditional methods to detect DIF in dichotomously scored items. Both uniform and non-uniform DIF effects can be detected, with methods relying upon item response models or not. Some methods deal with more than one focal group. digest Create Cryptographic Hash Digests of R Objects Implementation of a function ‘digest()’ for the creation of hash digests of arbitrary R objects (using the md5, sha-1, sha-256, crc32, xxhash and murmurhash algorithms) permitting easy comparison of R language objects, as well as a function ‘hmac()’ to create hash-based message authentication code. The md5 algorithm by Ron Rivest is specified in RFC 1321, the sha-1 and sha-256 algorithms are specified in FIPS-180-1 and FIPS-180-2, and the crc32 algorithm is described in ftp://ftp.rocksoft.com/cliens/rocksoft/papers/crc_v3.txt. For md5, sha-1, sha-256 and aes, this package uses small standalone implementations that were provided by Christophe Devine. For crc32, code from the zlib library is used. For sha-512, an implementation by Aaron D. Gifford is used. For xxHash, the implementation by Yann Collet is used. For murmurhash, an implementation by Shane Day is used. Please note that this package is not meant to be deployed for cryptographic purposes for which more comprehensive (and widely tested) libraries such as OpenSSL should be used. digitize Use Data from Published Plots in R Import data from a digital image; it requires user input for calibration and to locate the data points. The end result is similar to ‘DataThief’ and other other programs that ‘digitize’ published plots or graphs. dimple dimple charts for R The aim of dimple is to open up the power and flexibility of d3 to analysts. It aims to give a gentle learning curve and minimal code to achieve something productive. It also exposes the d3 objects so you can pick them up and run to create some really cool stuff. dimRed A Framework for Dimensionality Reduction A collection of dimensionality reduction techniques from R packages and provides a common interface for calling the methods. Directional Directional Statistics A collection of R functions for directional data analysis. DirectStandardisation Adjusted Means and Proportions by Direct Standardisation Calculate adjusted means and proportions of a variable by groups defined by another variable by direct standardisation, standardised to the structure of the dataset. dirmcmc Directional Metropolis Hastings Algorithm Implementation of Directional Metropolis Hastings Algorithm for MCMC. discord Functions for Discordant Kinship Modeling Functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Currently, the package contains data restructuring functions; functions for generating genetically- and environmentally-informed data for kin pairs. discreteRV Create and Manipulate Discrete Random Variables Create, manipulate, transform, and simulate from discrete random variables. The syntax is modeled after that which is used in mathematical statistics and probability courses, but with powerful support for more advanced probability calculations. This includes the creation of joint random variables, and the derivation and manipulation of their conditional and marginal distributions. http://…/hare-buja-hofmann.pdf DisimForMixed Calculate Dissimilarity Matrix for Dataset with Mixed Attributes Implement the methods proposed by Ahmad & Dey (2007) in calculating the dissimilarity matrix at the presence of mixed attributes. This Package includes functions to discretize quantitative variables, calculate conditional probability for each pair of attribute values, distance between every pair of attribute values, significance of attributes, calculate dissimilarity between each pair of objects. disparityfilter Disparity Filter Algorithm of Weighted Network Disparity filter is a network reduction algorithm to extract the backbone structure of both directed and undirected weighted networks. Disparity filter can reduce the network without destroying the multi-scale nature of the network. The algorithm has been developed by M. Angeles Serrano, Marian Boguna, and Alessandro Vespignani in Extracting the multiscale backbone of complex weighted networks. distance.sample.size Calculates Study Size Required for Distance Sampling Calculates the study size (either number of detections, or proportion of region that should be covered) to achieve a target precision for the estimated abundance. The calculation allows for the penalty due to unknown detection function, and for overdispersion. The user must specify a guess at the true detection function. distances Tools for Distances and Metrics Provides tools for constructing, manipulating and using distance metrics. distcomp Distributed Computations Distcomp, a new R package available on GitHub from a group of Stanford researchers has the potential to significantly advance the practice of collaborative computing with large data sets distributed over separate sites that may be unwilling to explicitly share data. The fundamental idea is to be able to rapidly set up a web service based on Shiny and opencpu technology that manages and performs a series of master / slave computations which require sharing only intermediate results. The particular target application for distcomp is any group of medical researchers who would like to fit a statistical model using the data from several data sets, but face daunting difficulties with data aggregation or are constrained by privacy concerns. Distcomp and its methodology, however, ought to be of interest to any organization with data spread across multiple heterogeneous database environments. DISTRIB Four Essential Functions for Statistical Distributions Analysis: A New Functional Approach A different way for calculating pdf/pmf, cdf, quantile and random data such that the user is able to consider the name of related distribution as an argument and so easily can changed by a changing argument by user. It must be mentioned that the core and computation base of package ‘DISTRIB’ is package ‘stats’. Although similar functions are introduced previously in package ‘stats’, but the package ‘DISTRIB’ has some special applications in some special computational programs. DJL Distance Measure Based Judgment and Learning Implements various decision support tools related to the new product development. Subroutines include productivity evaluation using distance measures, benchmarking, risk analysis, technology adoption model, inverse optimization, etc. DLASSO Implementation of Differentiable Lasso Penalty in Linear Models An implementation of the differentiable lasso (dlasso) using iterative ridge algorithm. This package allows selecting the tuning parameter by AIC, BIC and GCV. dlib Allow Access to the ‘Dlib’ C++ Library Interface for ‘Rcpp’ users to ‘dlib’ which is a ‘C++’ toolkit containing machine learning algorithms and computer vision tools. It is used in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. This package allows R users to use ‘dlib’ through ‘Rcpp’. dlm Bayesian and Likelihood Analysis of Dynamic Linear Models Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models dlsem Distributed-Lag Structural Equation Modelling Fit distributed-lag structural equation models and perform path analysis at different time lags. dlstats Download Stats of R Packages Monthly download stats of ‘CRAN’ and ‘Bioconductor’ packages. Download stats of ‘CRAN’ packages is from the ‘RStudio’ ‘CRAN mirror’, see . ‘Bioconductor’ package download stats is at . dml Distance Metric Learning in R The state-of-the-art algorithms for distance metric learning, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems. dmm Dyadic Mixed Model for Pedigree Data Dyadic mixed model analysis with multi-trait responses and pedigree-based partitioning of individual variation into a range of environmental and genetic variance components for individual and maternal effects. dMod Dynamic Modeling and Parameter Estimation in ODE Models The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives. dmutate Mutate Data Frames with Random Variates Work within the ‘dplyr’ workflow to add random variates to your data frame. Variates can be added at any level of an existing column. Also, bounds can be specified for simulated variates. dnc Dynamic Network Clustering Community detection for dynamic networks, i.e., networks measured repeatedly over a sequence of discrete time points, using a latent space approach. DNLC Differential Network Local Consistency Analysis Using Local Moran’s I for detection of differential network local consistency. DNMF Discriminant Non-Negative Matrix Factorization Discriminant Non-Negative Matrix Factorization aims to extend the Non-negative Matrix Factorization algorithm in order to extract features that enforce not only the spatial locality, but also the separability between classes in a discriminant manner. This algorithm refers to an article, Zafeiriou, Stefanos, et al. “Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification.” Neural Networks, IEEE Transactions on 17.3 (2006): 683-695. docker Wraps Docker Python SDK Allows accessing ‘Docker’ ‘SDK’ from ‘R’ via the ‘Docker’ ‘Python’ ‘SDK’ using the ‘reticulate’ package. This is a very thin wrapper that tries to do very little and get out of the way. The user is expected to know how to use the ‘reticulate’ package to access ‘Python’ modules, and how the ‘Docker’ ‘Python’ ‘SDK’ works. docopulae Optimal Designs for Copula Models A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding Ds-optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common low-level tasks. docstring Provides Docstring Capabilities to R Functions Provides the ability to display something analogous to Python’s docstrings within R. By allowing the user to document their functions as comments at the beginning of their function without requiring putting the function into a package we allow more users to easily provide documentation for their functions. The documentation can be viewed just like any other help files for functions provided by packages as well. doctr Easily Check Data Consistency and Quality A tool that helps you check the consistency and the quality of data. Like a real doctor, it has functions for examining, diagnosing and assessing the progress of its ‘patients”. document Run ‘roxygen2’ on (Chunks of) Single Code Files Have you ever been tempted to create ‘roxygen2’-style documentation comments for one of your functions that was not part of one of your packages (yet)? This is exactly what this package is about: running ‘roxygen2’ on (chunks of) a single code file. docuSignr Connect to ‘DocuSign’ API Connect to the ‘DocuSign’ Rest API , which supports embedded signing, and sending of documents. docxtractr Extract Tables from Microsoft Word Documents with R docxtractr is an R pacakge for extracting tables out of Word documents (docx) Microsoft Word docx files provide an XML structure that is fairly straightforward to navigate, especially when it applies to Word tables. The docxtractr package provides tools to determine table count, table structure and extract tables from Microsoft Word docx documents. DODR Detection of Differential Rhythmicity Detect Differences in rhythmic time series. Using linear least squares and the robust semi-parametric rfit() method. Differences in harmonic fitting could be detected as well as differences in scale of the noise distribution. doFuture Foreach Parallel Adaptor using the Future API of the ‘future’ Package Provides a ‘%dopar%’ adaptor such that any type of futures can be used as backends for the ‘foreach’ framework. domaintools R API interface to the DomainTools API The following functions are implemented: • domaintools_api_key: Get or set DOMAINTOOLS_API_KEY value • domaintools_username: Get or set DOMAINTOOLS_API_USERNAME value • domain_profile: Domain Profile • hosting_history: Hosting History • parsed_whois: Parsed Whois • reverse_ip: Reverse IP • reverse_ns: Reverse Nameserver • shared_ips: Shared IPs • whois: Whois Lookup • whois_history: Whois History doMC Foreach parallel adaptor for the multicore package Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.. DOT Render and Export DOT Graphs in R Renders DOT diagram markup language in R and also provides the possibility to export the graphs in PostScript and SVG (Scalable Vector Graphics) formats. In addition, it supports literate programming packages such as ‘knitr’ and ‘rmarkdown’. DoTC Distribution of Typicality Coefficients Calculation of cluster typicality coefficients as being generated by fuzzy k-means clustering. dotwhisker Dot-and-Whisker Plots of Regression Coefficients from Tidy Data Frames Quick and easy dot-and-whisker plots of regression models saved in tidy data frames. Dowd Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from http://www.kevindowd.org/measuring-market-risk/. It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods. downsize A Tool to Scale Down Large Workflows for Testing Toggles the test and production versions of a large workflow. dparser Port of Dparser Package A Scannerless GLR parser/parser generator. Note that GLR standing for ‘generalized LR’, where L stands for ‘left-to-right’ and R stands for ‘rightmost (derivation)’. For more information see . This parser is based on the Tomita (1987) algorithm. (Paper can be found at ). The original dparser package documentation can be found at . This allows you to add mini-languages to R (like RxODE’s ODE mini-language Wang, Hallow, and James 2015 ) or to parse other languages like NONMEM to automatically translate them to R code. To use this in your code, add a LinkingTo ‘dparser’ in your DESCRIPTION file and instead of using ‘#include ’ use ‘#include ’. This also provides a R-based port of the make_dparser command called ‘mkdparser’. Additionally you can parse an arbitrary grammar within R using the ‘dparse’ function. dplyr A Grammar of Data Manipulation A fast, consistent tool for working with data frame like objects, both in memory and out of memory. dplyrr Utilities for comfortable use of dplyr with databases dplyr is the most powerful package for data handling in R, and it has also the ability of working with databases(See Vignette). But the functionalities of dealing with databases in dplyr is developing yet. Now, I’m trying to make dplyr with databases more comfortable by using some functions. For that purpose, I’ve created dplyrr package. New package ‘dplyrr’ dplyrXdf dplyr backend for Revolution Analytics xdf files The dplyr package is a popular toolkit for data transformation and manipulation. Over the last year and a half, dplyr has become a hot topic in the R community, for the way in which it streamlines and simplifies many common data manipulation tasks. Out of the box, dplyr supports data frames, data tables (from the data.table package), and the following SQL databases: MySQL/MariaDB, SQLite, and PostgreSQL. However, a feature of dplyr is that it’s extensible: by writing a specific backend, you can make it work with many other kinds of data sources. For example the development version of the RSQLServer package implements a dplyr backend for Microsoft SQL Server. The dplyrXdf package implements such a backend for the xdf file format, a technology supplied as part of Revolution R Enterprise. All of the data transformation and modelling functions provided with Revolution R Enterprise support xdf files, which allow you to break R’s memory barrier: by storing the data on disk, rather than in memory, they make it possible to work with multi-gigabyte or terabyte-sized datasets. dplyrXdf brings the benefits of dplyr to xdf files, including support for pipeline notation, all major verbs, and the ability to incorporate xdfs into dplyr pipelines. dpmr Data Package Manager for R Create, install, and summarise data packages that follow the Open Knowledge Foundation’s Data Package Protocol. dprep Data Pre-Processing and Visualization Functions for Classification Data preprocessing techniques for classification. Functions for normalization, handling of missing values,discretization, outlier detection, feature selection, and data visualization are included. DPWeibull Dirichlet Process Weibull Mixture Model for Survival Data Use Dirichlet process Weibull mixture model and dependent Dirichlet process Weibull mixture model for survival data with and without competing risks. Dirichlet process Weibull mixture model is used for data without covariates and dependent Dirichlet process model is used for regression data. The package is designed to handle exact/right-censored/ interval-censored observations without competing risks and exact/right-censored observations for data with competing risks. Inside each cluster of Dirichlet process, we assume a multiplicative effect of covariates as in Cox model and Fine and Gray model. In addition, we provide a wrapper for DPdensity() function from the R package ‘DPpackage’. This wrapper automatically uses Low Information Omnibus prior and can model one and two dimensional data with Dirichlet mixture of Gaussian distributions. drake Data Frames in R for Make Efficiently keep your results up to date with your code. drat Drat R Archive Template Creation and Use of R Repositories via two helper functions to insert packages into a repository, and to add repository information to the current R session. Two primary types of repositories are support: gh-pages at GitHub, as well as local repositories on either the same machine or a local network. Drat is a recursive acronym which stands for Drat R Archive Template. DRaWR Discriminative Random Walk with Restart We present DRaWR, a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types, preserving more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only the relevant properties. We then rerank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. DrBats Data Representation: Bayesian Approach That’s Sparse Feed longitudinal data into a Bayesian Latent Factor Model to obtain a low-rank representation. Parameters are estimated using a Hamiltonian Monte Carlo algorithm with STAN. See G. Weinrott, B. Fontez, N. Hilgert and S. Holmes, ‘Bayesian Latent Factor Model for Functional Data Analysis’, Actes des JdS 2016. DREGAR Regularized Estimation of Dynamic Linear Regression in the Presence of Autocorrelated Residuals (DREGAR) A penalized/non-penalized implementation for dynamic regression in the presence of autocorrelated residuals (DREGAR) using iterative penalized/ordinary least squares. It applies Mallows CP, AIC, BIC and GCV to select the tuning parameters. DrillR R Driver for Apache Drill Provides a R driver for Apache Drill, which could connect to the Apache Drill cluster or drillbit and get result(in data frame) from the SQL query and check the current configuration status. This link contains more information about Apache Drill. DRIP Discontinuous Regression and Image Processing This is a collection of functions for discontinuous regression analysis and image processing. dsmodels A Language to Facilitate the Creation and Visualization of Two- Dimensional Dynamical Systems An expressive language to facilitate the creation and visualization of two-dimensional dynamical systems. The basic elements of the language are a model wrapping around a function(x,y) which outputs a list(x = xprime, y = yprime), and a range. The language supports three types of visual objects: visualizations, features, and backgrounds. Visualizations, including dots and arrows, depict the behavior of the dynamical system over the entire range. Features display user-defined curves and points, and their images under the system. Backgrounds define and color regions of interest, such as areas of convergence and divergence. The language can also automatically guess attractors and regions of convergence and divergence. dsrTest Tests and Confidence Intervals on Directly Standardized Rates for Several Methods Perform a test of a simple null hypothesis about a directly standardized rate and obtain the matching confidence interval using a choice of methods. DSsim Distance Sampling Simulations Performs distance sampling simulations. It repeatedly generates instances of a user defined population within a given survey region, generates realisations of a survey design (currently these must be pregenerated using Distance software ) and simulates the detection process. The data are then analysed so that the results can be compared for accuracy and precision across all replications. This will allow users to select survey designs which will give them the best accuracy and precision given their expectations about population distribution. Any uncertainty in population distribution or population parameters can be included by running the different survey designs for a number of different population descriptions. An example simulation can be found in the help file for make.simulation. dst Using Dempster-Shafer Theory This package allows you to make basic probability assignments on a set of possibilities (events) and combine these events with Dempster’s rule of combination. dSVA Direct Surrogate Variable Analysis Functions for direct surrogate variable analysis, which can identify hidden factors in high-dimensional biomedical data. DT R Interface to the jQuery Plug-in DataTables http://rstudio.github.io/DT This package provides a function datatable() to display R data via the DataTables library (N.B. not to be confused with the data.table package). An R interface to the DataTables library An R interface to the DataTables library dtables Simplifying Descriptive Frequencies and Statistics Towards automation of descriptive frequencies and statistics tables. dtplyr Data Table Back-End for ‘dplyr’ This implements the data table back-end for ‘dplyr’ so that you can seamlessly use data table and ‘dplyr’ together. dtq data.table query Auditing data transformation can be simply described as gathering metadata about the transformation process. The most basics metadata would be a timestamp, atomic transformation description, data volume on input, data volume on output, time elapsed. If you work with R only interactively you may find it more like a fancy tool. On the other hand for automated scheduled R jobs it may be quite helpful to have traceability on the lower grain of processing than just binary success or fail after the script is executed, for example a logging each query against the data. Similar features are already available in ETL tools for decades. I’ve addressed this in my dtq package. http://…/dtq.html dtree Decision Trees Combines various decision tree algorithms, plus both linear regression and ensemble methods into one package. Allows for the use of both continuous and categorical outcomes. An optional feature is to quantify the (in)stability to the decision tree methods, indicating when results can be trusted and when ensemble methods may be preferential. DTRlearn Learning Algorithms for Dynamic Treatment Regimes Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage by potentially time-varying patient features and intermediate outcomes observed in previous stages. There are 3 main type methods, O-learning, Q-learning and P-learning to learn the optimal Dynamic Treatment Regimes with continuous variables. This package provide these state of arts algorithms to learn DTRs. DTRreg DTR Estimation and Inference via G-Estimation, Dynamic WOLS, and Q-Learning Dynamic treatment regime estimation and inference via G-estimation, dynamic weighted ordinary least squares (dWOLS) and Q-learning. Inference via bootstrap and (for G-estimation) recursive sandwich estimation. dtwclust Time Series Clustering with Dynamic Time Warping Time series clustering using different techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Additionally, an implementation of k-Shape clustering is available. dtwSat Time-Weighted Dynamic Time Warping for Remote Sensing Time Series Analysis Provides a Time-Weighted Dynamic Time Warping (TWDTW) algorithm to measure similarity between two temporal sequences. This adaptation of the classical Dynamic Time Warping (DTW) algorithm is flexible to compare events that have a strong time dependency, such as phenological stages of cropland systems and tropical forests. This package provides methods for visualization of minimum cost paths, time series alignment, and time intervals classification. dwapi A Client for Data.world’s REST API A set of wrapper functions for data.world’s REST API endpoints. DWreg Parametric Regression for Discrete Response Regression for a discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. dwtools Data Warehouse related functions Handy wrappers for extraction, loading, denormalization, normalization. Additionally data.table Nth key feature, timing+logging and more. dygraphs Interface to Dygraphs Interactive Time Series Charting Library An R interface to the dygraphs JavaScript charting library (a copy of which is included in the package). Provides rich facilities for charting time-series data in R, including highly configurable series- and axis-display and interactive features like zoom/pan and series/point highlighting. http://…/dygraphs DYM Did You Mean? Add a ‘Did You Mean’ feature to the R interactive. With this package, error messages for misspelled input of variable names or package names suggest what you really want to do in addition to notification of the mistake. dynamichazard Dynamic Hazard Models using State Space Models Contains functions that lets you fit dynamic hazard models with binary outcomes using state space models. The methods are originally described in Fahrmeir (1992) and Fahrmeir (1994) . The functions also provide an extension hereof where the Extended Kalman filter is replaced by an Unscented Kalman filter. Models are fitted with the regular coxph() like formula. dynaTree Dynamic Trees for Learning and Design Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=’dynaTree’). dynetNLAResistance Resisting Neighbor Label Attack in a Dynamic Network An anonymization algorithm to resist neighbor label attack in a dynamic network. dynOmics Fast Fourier Transform to Identify Associations Between Time Course Omics Data Implements the fast Fourier transform to estimate delays of expression initiation between trajectories to integrate and analyse time course omics data. dynpanel Dynamic Panel Data Models Computes the first stage GMM estimate of a dynamic linear model with p lags of the dependent variables. dynRB Dynamic Range Boxes Improves the concept of multivariate range boxes, which is highly susceptible for outliers and does not consider the distribution of the data. The package uses dynamic range boxes to overcome these problems. dynsbm Dynamic Stochastic Block Models Dynamic stochastic block model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time, developed in Matias and Miele (2016) . DynTxRegime Methods for Estimating Dynamic Treatment Regimes A comprehensive toolkit for estimating Dynamic Treatment Regimes. Available methods include Interactive Q-Learning, Q-Learning, and value-search methods based on Augmented Inverse Probability Weighted estimators and Inverse Probability Weighted estimators. DySeq Functions for Dyadic Sequence Analyses Small collection of functions for dyadic binary/dichotomous sequence analyses, e.g. transforming sequences into time-to-event data, implementation of Bakeman & Gottman’s (1997) approach of aggregated logit-models, and simulating expected number of low/zero frequencies for state-transition tables. Further functions will be added in future releases. References: Bakeman, R., & Gottman, J. M. (1997) . DZEXPM Estimation and Prediction of Skewed Spatial Processes A collection of functions designed to estimate and predict skewed spatial processes, and a real data set. E e1071 Misc Functions of the Department of Statistics (e1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, … eAnalytics Dynamic Web Analytics for the Energy Industry A ‘Shiny’ web application for energy industry analytics. Take an overview of the industry, measure Key Performance Indicators, identify changes in the industry over time, and discover new relationships in the data. earth Multivariate Adaptive Regression Splines Build regression models using the techniques in Friedman’s papers ‘Fast MARS’ and ‘Multivariate Adaptive Regression Splines’. (The term ‘MARS’ is trademarked and thus not used in the name of the package.) earthtones Derive a Color Palette from a Particular Location on Earth Downloads a satellite image via Google Maps/Earth (these are originally from a variety of aerial photography sources), translates the image into a perceptually uniform color space, runs one of a few different clustering algorithms on the colors in the image searching for a user-supplied number of colors, and returns the resulting color palette. easyDes An Easy Way to Descriptive Analysis Descriptive analysis is essential for publishing medical articles. This package provides an easy way to conduct the descriptive analysis. 1. Both numeric and factor variables can be handled. For numeric variables, normality test will be applied to choose the parametric and nonparametric test. 2. Both two or more groups can be handled. For groups more than two, the post hoc test will be applied, ‘Tukey’ for the numeric variables and ‘FDR’ for the factor variables. 3. ANOVA or Fisher test can be forced to apply. easyformatr Tools for Building Formats Builds format strings for both times and numbers. easyml Easily Build and Evaluate Machine Learning Models Easily build and evaluate machine learning models on a dataset. Machine learning models supported include penalized linear models, penalized linear models with interactions, random forest, support vector machines, neural networks, and deep neural networks. EasyMx Easy Model-Builder Functions for OpenMx Utilities for building certain kinds of common matrices and models in the extended structural equation modeling package, OpenMx. easyNCDF Tools to Easily Read/Write NetCDF Files into/from Multidimensional R Arrays Set of wrappers for the ‘ncdf4’ package to simplify and extend its reading/writing capabilities into/from multidimensional R arrays. easypackages Easy Loading and Installing of Packages Easily load and install multiple packages from different sources, including CRAN and GitHub. The libraries function allows you to load or attach multiple packages in the same function call. The packages function will load one or more packages, and install any packages that are not installed on your system (after prompting you). Also included is a from_import function that allows you to import specific functions from a package into the global environment. easypower Sample Size Estimation for Experimental Designs Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package. easyreg Easy Regression Performs analysis of regression in simple designs with quantitative treatments, including mixed models and non linear models. Plot graphics (equations and data). easySdcTable Easy Interface to the Statistical Disclosure Control Package ‘sdcTable’ The main function, ProtectTable(), performs table suppression according to a frequency rule with a data set as the only required input. Within this function, protectTable() or protectLinkedTables() in package ‘sdcTable’ is called. Lists of level-hierarchy (parameter ‘dimList’) and other required input to these functions are created automatically. easyVerification Ensemble Forecast Verification for Large Datasets Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large datasets. The forecast metrics are imported from the ‘SpecsVerification’ package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package. EBASS Sample Size Calculation Method for Cost-Effectiveness Studies Based on Expected Value of Perfect Information We propose a new sample size calculation method for trial-based cost-effectiveness analyses. Our strategy is based on the value of perfect information that would remain after the completion of the study. EBrank Empirical Bayes Ranking Empirical Bayes ranking applicable to parallel-estimation settings where the estimated parameters are asymptotically unbiased and normal, with known standard errors. A mixture normal prior for each parameter is estimated using Empirical Bayes methods, subsequentially ranks for each parameter are simulated from the resulting joint posterior over all parameters (The marginal posterior densities for each parameter are assumed independent). Finally, experiments are ordered by expected posterior rank, although computations minimizing other plausible rank-loss functions are also given. ECctmc Simulation from Endpoint-Conditioned Continuous Time Markov Chains Draw sample paths for endpoint-conditioned continuous time Markov chains via modified rejection sampling or uniformization. ecd Elliptic Distribution Based on Elliptic Curves An implementation of the univariate elliptic distribution and elliptic option pricing model. It provides detailed functionality and data sets for the distribution and modelling. Especially, it contains functions for the computation of density, probability, quantile, fitting procedures, option prices, volatility smile. It also comes with sample financial data, and plotting routines. ecdfHT Empirical CDF for Heavy Tailed Data Computes and plots a transformed empirical CDF (ecdf) as a diagnostic for heavy tailed data, specifically data with power law decay on the tails. Routines for annotating the plot, comparing data to a model, fitting a nonparametric model, and some multivariate extensions are given. ECharts2Shiny Embedding Charts Generated with ECharts Library into Shiny Applications With this package, users can embed interactive charts to their Shiny applications. These charts will be generated by ECharts library developed by Baidu (http://echarts.baidu.com ). Current version support line charts, bar charts, pie charts and gauge. ecm Build Error Correction Models Functions for easy building of error correction models (ECM) for time series regression. ecolottery Coalescent-Based Simulation of Ecological Communities Coalescent-Based Simulation of Ecological Communities as proposed by Munoz et al. (2017) . The package includes a tool for estimating parameters of community assembly by using Approximate Bayesian Computation. EconDemand General Analysis of Various Economics Demand Systems Tools for general properties including price, quantity, elasticity, convexity, marginal revenue and manifold of various economics demand systems including Linear, Translog, CES, LES and CREMR. ECOSolveR Embedded Conic Solver in R R interface to the Embedded COnic Solver (ECOS) for convex problems. Conic and equality constraints can be specified in addition to mixed integer problems. ecp Nonparametric Multiple Change-Point Analysis of Multivariate Data Implements hierarchical procedures to find multiple change-points through the use of U-statistics. The procedures do not make any distributional assumptions other than the existence of certain absolute moments. Both agglomerative and divisive procedures are included. These methods return the set of estimated change-points as well as other summary information. ecr Evolutionary Computing in R Provides a powerful framework for evolutionary computing in R. The user can easily construct powerful evolutionary algorithms for tackling both single- and multi-objective problems by plugging in different predefined evolutionary building blocks, e. g., operators for mutation, recombination and selection with just a few lines of code. Your problem cannot be easily solved with a standard EA which works on real-valued vectors, permutations or binary strings? No problem, ‘ecr’ has been developed with that in mind. Extending the framework with own operators is also possible. Additionally there are various comfort functions, like monitoring, logging and more. edarf Exploratory Data Analysis using Random Forests Functions useful for exploratory data analysis using random forests which can be used to compute multivariate partial dependence, observation, class, and variable-wise marginal and joint permutation importance as well as observation-specific measures of distance (supervised or unsupervised). All of the aforementioned functions are accompanied by ‘ggplot2’ plotting functions. edci Edge Detection and Clustering in Images Detection of edge points in images based on the difference of two asymmetric M-kernel estimators. Linear and circular regression clustering based on redescending M-estimators. Detection of linear edges in images. edeaR Exploratory and Descriptive Event-Based Data Analysis Functions for exploratory and descriptive analysis of event based data. Can be used to import and export xes-files, the IEEE eXtensible Event Stream standard. Provides methods for describing and selecting process data. edesign Maximum Entropy Sampling An implementation of maximum entropy sampling for spatial data is provided. An exact branch-and-bound algorithm as well as greedy and dual greedy heuristics are included. edfun Creating Empirical Distribution Functions Easily creating empirical distribution functions from data: ‘dfun’, ‘pfun’, ‘qfun’ and ‘rfun’. edgeCorr Spatial Edge Correction Facilitates basic spatial edge correction to point pattern data. EditImputeCont Simultaneous Edit-Imputation for Continuous Microdata An integrated editing and imputation method for continuous microdata under linear constraints is implemented. It relies on a Bayesian nonparametric hierarchical modeling approach in which the joint distribution of the data is estimated by a flexible joint probability model. The generated edit-imputed data are guaranteed to satisfy all imposed edit rules, whose types include ratio edits, balance edits and range restriction editR A Rmarkdown editor with instant preview editR is a basic Rmarkdown editor with instant previewing of your document. It allows you to create and edit Rmarkdown documents while instantly previewing the result of your writing and coding. It also allows you to render your Rmarkdown file in any format permitted by the rmarkdown R package. edpclient Empirical Data Platform Client R client for Empirical Data Platform. More information is at . For support, contact support@empirical.com. edstan Stan Models for Item Response Theory Provides convenience functions and pre-programmed Stan models related to item response theory. Its purpose is to make fitting common item response theory models using Stan easy. eefAnalytics Analysing Education Trials Provides tools for analysing education trials. Making different methods accessible in a single place is essential for sensitivity analysis of education trials, particularly the implication of the different methods in analysing simple randomised trials, cluster randomised trials and multisite trials. eel Extended Empirical Likelihood Compute the extended empirical log likelihood ratio (Tsao & Wu, 2014) for the mean and parameters defined by estimating equations. eesim Simulate and Evaluate Time Series for Environmental Epidemiology Provides functions to create simulated time series of environmental exposures (e.g., temperature, air pollution) and health outcomes for use in power analysis and simulation studies in environmental epidemiology. This package also provides functions to evaluate the results of simulation studies based on these simulated time series. This work was supported by a grant from the National Institute of Environmental Health Sciences (R00ES022631) and a fellowship from the Colorado State University Programs for Research and Scholarly Excellence. EFAutilities Utility Functions for Exploratory Factor Analysis A number of utility function for exploratory factor analysis are included in this package. In particular, it computes standard errors for parameter estimates and factor correlations under a variety of conditions. effectFusion Bayesian Effect Fusion for Categorical Predictors Variable selection and Bayesian effect fusion for categorical predictors in linear regression models. Effect fusion aims at the question which categories have a similar effect on the response and therefore can be fused to obtain a sparser representation of the model. Effect fusion and variable selection can be obtained either with a prior that has an interpretation as spike and slab prior on the level effect differences or with a sparse finite mixture prior on the level effects. The regression coefficients are estimated with a flat uninformative prior after model selection or model averaged. For posterior inference, an MCMC sampling scheme is used that involves only Gibbs sampling steps. EffectLiteR Average and Conditional Effects Use structural equation modeling to estimate average and conditional effects of a treatment variable on an outcome variable, taking into account multiple continuous and categorical covariates. EffectStars Visualization of Categorical Response Models The package provides functions to visualize regression models with categorical response. The effects of the covariates are plotted with star plots in order to allow for an optical impression of the fitted model. EffectTreat Prediction of Therapeutic Success In personalized medicine, one wants to know, for a given patient and his or her outcome for a predictor (pre-treatment variable), how likely it is that a treatment will be more beneficial than an alternative treatment. This package allows for the quantification of the predictive causal association(i.e., the association between the predictor variable and the individual causal effect of the treatment) and related metrics. EfficientMaxEigenpair Efficient Initials for Computing the Maximal Eigenpair An implementation for using efficient initials to compute the maximal eigenpair in R. It provides two algorithms to find the efficient initials under two cases: the tridiagonal matrix case and the general matrix case. Besides, it also provides algorithms for the next to the maximal eigenpair under these two cases. efflog The Causal Effects for a Causal Loglinear Model Fitting a causal loglinear model and calculating the causal effects for a causal loglinear model with the multiplicative interaction or without the multiplicative interaction, obtaining the natural direct, indirect and the total effect. It calculates also the cell effect, which is a new interaction effect. EFS Tool for Ensemble Feature Selection Provides a function to check the importance of a feature based on a dependent classification variable. An ensemble of correlation and importance measure tests are used to determine the normed importance value of all features. Combining these methods in one function (building the sum of the importance values) leads to a better tool for selecting most important features. This selection can also be viewed in a barplot using the barplot_fs() function and proved using an also provided function for a logistic regression model, namely logreg_test(). elasso Enhanced Least Absolute Shrinkage Operator Performs some enhanced variable selection algorithms based on least absolute shrinkage operator for regression model. elasticsearchr A Lightweight Interface for Interacting with Elasticsearch from R A lightweight R interface to ‘Elasticsearch’ – a NoSQL search-engine and column store database (see for more information). This package implements a simple Domain-Specific Language (DSL) for indexing, deleting, querying, sorting and aggregating data using ‘Elasticsearch’. elhmc Sampling from a Empirical Likelihood Bayesian Posterior of Parameters Using Hamiltonian Monte Carlo A tool to draw samples from a Empirical Likelihood Bayesian posterior of parameters using Hamiltonian Monte Carlo. ELMR Extreme Machine Learning (ELM) Training and prediction functions are provided for the Extreme Learning Machine algorithm (ELM). The ELM use a Single Hidden Layer Feedforward Neural Network (SLFN) with random generated weights and no gradient-based backpropagation. The training time is very short and the online version allows to update the model using small chunk of the training set at each iteration. The only parameter to tune is the hidden layer size and the learning function. EloChoice Preference Rating for Visual Stimuli Based on Elo Ratings Allows calculating global scores for characteristics of visual stimuli. Stimuli are presented as sequence of pairwise comparisons (‘contests’), during each of which a rater expresses preference for one stimulus over the other. The algorithm for calculating global scores is based on Elo rating, which updates individual scores after each single pairwise contest. Elo rating is widely used to rank chess players according to their performance. Its core feature is that dyadic contests with expected outcomes lead to smaller changes of participants’ scores than outcomes that were unexpected. As such, Elo rating is an efficient tool to rate individual stimuli when a large number of such stimuli are paired against each other in the context of experiments where the goal is to rank stimuli according to some characteristic of interest. elpatron Bicycling Data Analysis with R Functions to facilitate cycling analysis within the R environment. EMAtools Data Management Tools for Real-Time Monitoring/Ecological Momentary Assessment Data Do data management functions common in real-time monitoring (also called: ecological momentary assessment, experience sampling, micro-longitudinal) data, including centering on participant means and merging event-level data into momentary data sets where you need the events to correspond to the nearest data point in the momentary data. This is VERY early release software, and more features will be added over time. EMbC Expectation-Maximization Binary Clustering Unsupervised, multivariate, clustering algorithm yielding a meaningful binary clustering taking into account the uncertainty in the data. A specific constructor for trajectory movement analysis yields behavioural annotation of the tracks based on estimated local measures of velocity and turning angle, eventually with solar position covariate as a daytime indicator. EMCC Evolutionary Monte Carlo (EMC) Methods for Clustering Evolutionary Monte Carlo methods for clustering, temperature ladder construction and placement. This package implements methods introduced in Goswami, Liu and Wong (2007) . The paper above introduced probabilistic genetic-algorithm-style crossover moves for clustering. The paper applied the algorithm to several clustering problems including Bernoulli clustering, biological sequence motif clustering, BIC based variable selection, mixture of Normals clustering, and showed that the proposed algorithm performed better both as a sampler and as a stochastic optimizer than the existing tools, namely, Gibbs sampling, “split-merge” Metropolis-Hastings algorithm, K-means clustering, and the MCLUST algorithm (in the package ‘mclust’). Emcdf Computation and Visualization of Empirical Joint Distribution (Empirical Joint CDF) Computes and visualizes empirical joint distribution of multivariate data with optimized algorithms and multi-thread computation. There is a faster algorithm using dynamic programming to compute the whole empirical joint distribution of a bivariate data. There are optimized algorithms for computing empirical joint CDF function values for other multivariate data. Visualization is focused on bivariate data. Levelplots and wireframes are included. emdi Estimating and Mapping Disaggregated Indicators Functions that support estimating, assessing and mapping regional disaggregated indicators. So far, estimation methods comprise the model-based approach Empirical Best Prediction (see ‘Small area estimation of poverty indicators’ by Molina and Rao (2010)), as well as their precision estimates. The assessment of the used model is supported by a summary and diagnostic plots. For a suitable presentation of estimates, map plots can be easily created. Furthermore, results can easily be exported to excel. emIRT EM Algorithms for Estimating Item Response Theory Models Various Expectation-Maximization (EM) algorithms are implemented for item response theory (IRT) models. The current implementation includes IRT models for binary and ordinal responses, along with dynamic and hierarchical IRT models with binary responses. The latter two models are derived and implemented using variational EM. eMLEloglin Fitting log-Linear Models in Sparse Contingency Tables Log-linear modeling is a popular method for the analysis of contingency table data. When the table is sparse, the data can fall on the boundary of the convex support, and we say that ‘the MLE does not exist’ in the sense that some parameters cannot be estimated. However, an extended MLE always exists, and a subset of the original parameters will be estimable. The ‘eMLEloglin’ package determines which sampling zeros contribute to the non-existence of the MLE. These problematic zero cells can be removed from the contingency table and the model can then be fit (as far as is possible) using the glm() function. EMMIXcontrasts2 Contrasts in Mixed Effects for EMMIX Model with Random Effects 2 For forming contrasts in the mixed effects for mixtures of linear mixed models fitted to the gene profiles. EMMIXcskew Fitting Mixtures of CFUST Distributions Functions to fit finite mixture of multivariate canonical fundamental skew t (FM-CFUST) distributions, random sample generation, 2D and 3D contour plots. EMMLi A Maximum Likelihood Approach to the Analysis of Modularity Fit models of modularity to morphological landmarks. Perform model selection on results. Fit models with a single within-module correlation or with separate within-module correlations fitted to each module. emojifont Emoji Fonts for using in R An implementation of using emoji font in both base and ‘ggplot2’ graphics. ems Epimed Solutions Collection for Data Editing, Analysis, and Benchmarking of Health Units Collection of functions for data analysis and editing. Most of them are related to benchmarking with prediction models. EMSaov The Analysis of Variance with EMS The analysis of variance table including the expected mean squares (EMS) for various types of experimental design is provided. When some variables are random effects or we use special experimental design such as nested design, repeated-measures design, or split-plot design, it is not easy to find the appropriate test, especially denominator for F-statistic which depends on EMS. EMSC Extended Multiplicative Signal Correction Background correction of spectral like data. Handles variations in scaling, polynomial baselines and interferents. Parameters for corrections are stored for further analysis, and spectra are corrected accordingly. emuR Main Package of the EMU Speech Database Management System Provides the next iteration of the EMU Speech Database Management System (EMU_SDMS) with database management, data extraction, data preparation and data visualization facilities. encode Represent Ordered Lists and Pairs as Strings Interconverts between ordered lists and compact string notation. Useful for capturing code lists, and pair-wise codes and decodes, for text storage. Analogous to factor levels and labels. Generics ‘encode’ and ‘decode’ perform interconversion, while ‘codes’ and ‘decodes’ extract components of an encoding. The function ‘encoded’ checks whether something is interpretable as an encoding. endogenous Classical Simultaneous Equation Models Likelihood-based approaches to estimate linear regression parameters and treatment effects in the presence of endogeneity. Specifically, this package includes James Heckman’s classical simultaneous equation models-the sample selection model for outcome selection bias and hybrid model with structural shift for endogenous treatment. For more information, see the seminal paper of Heckman (1978) in which the details of these models are provided. This package accommodates repeated measures on subjects with a working independence approach. The hybrid model further accommodates treatment effect modification. endtoend Transmissions and Receptions in an End to End Network Computes the expectation of the number of transmissions and receptions considering an End-to-End transport model with limited number of retransmissions per packet. It provides theoretical results and also estimated values based on Monte Carlo simulations. enpls Ensemble Partial Least Squares (EnPLS) Regression R package for ensemble partial least squares regression, a unified framework for feature selection, outlier detection, and ensemble learning. enrichwith Methods to Enrich R Objects with Extra Components The enrichwith package provides the ‘enrich’ method to enrich list-like R objects with new, relevant components. The current version has methods for enriching objects of class ‘family’, ‘link-glm’ and ‘glm’. The resulting objects preserve their class, so all methods associated to them still apply. The package can also be used to produce customisable source code templates for the structured implementation of methods to compute new components. EnsembleCV Extensible Package for Cross-Validation-Based Integration of Base Learners This package extends the base classes and methods of EnsembleBase package for cross-validation-based integration of base learners. Default implementation calculates average of repeated CV errors, and selects the base learner / configuration with minimum average error. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. The package can be extended, e.g. by adding variants of the current implementation. EnsemblePCReg Extensible Package for Principal-Component-Regression-based Integration of Base Learners This package extends the base classes and methods of EnsembleBase package for Principal-Components-Regression-based (PCR) integration of base learners. Default implementation uses cross-validation error to choose the optimal number of PC components for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package. EnsemblePenReg Extensible Classes and Methods for Penalized-Regression-based Integration of Base Learners Extending the base classes and methods of EnsembleBase package for Penalized-Regression-based (Ridge and Lasso) integration of base learners. Default implementation uses cross-validation error to choose the optimal lambda (shrinkage parameter) for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package. ensembleR Ensemble Models in R Functions to use ensembles of several machine learning models specified in caret package. EntropyExplorer Tools for Exploring Differential Shannon Entropy, Differential Coefficient of Variation and Differential Expression Rows of two matrices are compared for Shannon entropy, coefficient of variation, and expression. P-values can be requested for all metrics. envestigate R package to interrogate environments. R package to interrogate environments. Scary, I know. EnviroPRA Environmental Probabilistic Risk Assessment Tools Methods to perform a Probabilistic Environmental Risk assessment from exposure to toxic substances – i.e. USEPA (1997) -. epandist Statistical Functions for the Censored and Uncensored Epanechnikov Distribution Analyzing censored variables usually requires the use of optimization algorithms. This package provides an alternative algebraic approach to the task of determining the expected value of a random censored variable with a known censoring point. Likewise this approach allows for the determination of the censoring point if the expected value is known. These results are derived under the assumption that the variable follows an Epanechnikov kernel distribution with known mean and range prior to censoring. Statistical functions related to the uncensored Epanechnikov distribution are also provided by this package. EPGLM Gaussian Approximation of Bayesian Binary Regression Models The main functions compute the expectation propagation approximation of a Bayesian probit/logit models with Gaussian prior. More information can be found in Chopin and Ridgway (2015). More models and priors should follow. EpistemicGameTheory Constructing an Epistemic Model for the Games with Two Players Constructing an epistemic model such that, for every player i and for every choice c(i) which is optimal, there is one type that expresses common belief in rationality. EpiWeek Conversion Between Epidemiological Weeks and Calendar Dates Users can easily derive the calendar dates from epidemiological weeks, and vice versa. equSA Estimate a Single or Multiple Graphical Models and Construct Networks Provides an equivalent measure of partial correlation coefficients for high-dimensional Gaussian Graphical Models to learn and visualize the underlying relationships between variables from single or multiple datasets. You can refer to Liang, F., Song, Q. and Qiu, P. (2015) for more detail. Based on this method, the package also provides the method for constructing networks for Next Generation Sequencing Data. Besides, it includes the method for jointly estimating Gaussian Graphical Models of multiple datasets. ercv Fitting Tails by the Empirical Residual Coefficient of Variation Provides a methodology simple and trustworthy for the analysis of extreme values and multiple threshold tests for a generalized Pareto distribution, together with an automatic threshold selection algorithm. See del Castillo, J, Daoudi, J and Lockhart, R (2014) . ergm.rank Fit, Simulate and Diagnose Exponential-Family Models for Rank-Order Relational Data A set of extensions for the ‘ergm’ package to fit weighted networks whose edge weights are ranks. errint Build Error Intervals Build and analyze error intervals for a particular model predictions assuming different distributions for noise in the data. errorizer Function Errorizer Provides a function to convert existing R functions into ‘errorized’ versions with added logging and handling functionality when encountering errors or warnings. The errorize function accepts an existing R function as its first argument and returns a R function with the exact same arguments and functionality. However, if an error or warning occurs when running that ‘errorized’ R function, it will save a .Rds file to the current working directory with the relevant objects and information required to immediately recreate the error. errorlocate Locate Errors with Validation Rules Errors in data can be located and removed using validation rules from package ‘validate’. errors Error Propagation for R Vectors Support for painless automatic error propagation in numerical operations. esaBcv Estimate Number of Latent Factors and Factor Matrix for Factor Analysis These functions estimate the latent factors of a given matrix, no matter it is high-dimensional or not. It tries to first estimate the number of factors using bi-cross-validation and then estimate the latent factor matrix and the noise variances. For more information about the method, see Art B. Owen and Jingshu Wang 2015 archived article on factor model (http://…/1503.03515 ). esaddle Extended Empirical Saddlepoint Density Approximation Tools for fitting the Extended Empirical Saddlepoint (EES) density. esc Effect Size Computation for Meta Analysis Implementation of the web-based ‘Practical Meta-Analysis Effect Size Calculator’ from David B. Wilson in R. Based on the input, the effect size can be returned as standardized mean difference, Hedges’ g, correlation coefficient r or Fisher’s transformation z, odds ratio or log odds effect size. eshrink Shrinkage for Effect Estimation Computes shrinkage estimators for regression problems. Selects penalty parameter by minimizing bias and variance in the effect estimate, where bias and variance are estimated from the posterior predictive distribution. ESKNN Ensemble of Subset of K-Nearest Neighbours Classifiers for Classification and Class Membership Probability Estimation Functions for classification and group membership probability estimation are given. The issue of non-informative features in the data is addressed by utilizing the ensemble method. A few optimal models are selected in the ensemble from an initially large set of base k-nearest neighbours (KNN) models, generated on subset of features from the training data. A two stage assessment is applied in selection of optimal models for the ensemble in the training function. The prediction functions for classification and class membership probability estimation returns class outcomes and class membership probability estimates for the test data. The package includes measure of classification error and brier score, for classification and probability estimation tasks respectively. EstHer Estimation of Heritability in High Dimensional Sparse Linear Mixed Models using Variable Selection Our method is a variable selection method to select active components in sparse linear mixed models in order to estimate the heritability. The selection allows us to reduce the size of the data sets which improves the accuracy of the estimations. Our package also provides a confidence interval for the estimated heritability. estimability Estimability Tools for Linear Models Provides tools for determining estimability of linear functions of regression coefficients, and alternative epredict methods for lm, glm, and mlm objects that handle non-estimable cases correctly. EstimateGroupNetwork Perform the Joint Graphical Lasso and Selects Tuning Parameters Can be used to simultaneously estimate networks (Gaussian Graphical Models) in data from different groups or classes via Joint Graphical Lasso. Tuning parameters are selected via information criteria (AIC / BIC / eBIC) or crossvalidation. EstSimPDMP Estimation and Simulation for PDMPs This package deals with the estimation of the jump rate for piecewise-deterministic Markov processes (PDMPs), from only one observation of the process within a long time. The main functions provide an estimate of this function. The state space may be discrete or continuous. The associated paper has been published in Scandinavian Journal of Statistics and is given in references. Other functions provide a method to simulate random variables from their (conditional) hazard rate, and then to simulate PDMPs. etrunct Computes Moments of Univariate Truncated t Distribution Computes moments of univariate truncated t distribution. There is only one exported function, e_trunct(), which should be seen for details. eulerr Area-Proportional Euler Diagrams If possible, generates exactly area-proportional Euler diagrams, or otherwise approximately proportional diagrams using numeric optimization. A Euler diagram is a generalization of a Venn diagram, relaxing the criterion that all interactions need to be represented. EvaluationMeasures Collection of Model Evaluation Measure Functions Provides Some of the most important evaluation measures for evaluating a model. Just by giving the real and predicted class, measures such as accuracy, sensitivity, specificity, ppv, npv, fmeasure, mcc and … will be returned. evaluator Information Security Quantified Risk Assessment Toolkit An open source information security strategic risk analysis toolkit based on the OpenFAIR taxonomy and risk assessment standard . Empowers an organization to perform a quantifiable, repeatable, and data-driven review of its security program. evclass Evidential Distance-Based Classification Different evidential distance-based classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule and the evidential neural network. evclust Evidential Clustering Various clustering algorithms that produce a credal partition, i.e., a set of Dempster-Shafer mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership uncertainty of the objects. The algorithms are: Evidential c-Means (ECM), Relational Evidential c-Means (RECM), Constrained Evidential c-Means (CECM), EVCLUS and EK-NNclus. event Event History Procedures and Models Functions for setting up and analyzing event history data. eventdataR Event Data Repository Event dataset repository including both real-life and artificial event logs. They can be used in combination with functionalities provided by the ‘bupaR’ packages ‘edeaR’, ‘processmapR’, etc. evidenceFactors Reporting Tools for Sensitivity Analysis of Evidence Factors in Observational Studies Integrated Sensitivity Analysis of Evidence Factors in Observational Studies. Evomorph Evolutionary Morphometric Simulation Evolutionary process simulation using geometric morphometric data. Manipulation of landmark data files (TPS), shape plotting and distances plotting functions. evoper Evolutionary Parameter Estimation for ‘Repast Simphony’ Models The EvoPER, Evolutionary Parameter Estimation for ‘Repast Simphony’ Agent-Based framework, provides optimization driven parameter estimation methods based on evolutionary computation techniques which could be more efficient and require, in some cases, fewer model evaluations than other alternatives relaying on experimental design. EW Edgeworth Expansion Edgeworth Expansion calculation. exampletestr Help for Writing Tests Based on Function Examples Take the examples written in your documentation of functions and use them to create shells (skeletons which must be manually completed by the user) of test files to be tested with the ‘testthat’ package. Documentation must be done with ‘roxygen2’. ExcessMass Excess Mass Calculation and Plots Implementation of a function which calculates the empirical excess mass for given \eqn{\lambda} and given maximal number of modes (excessm()). Offering powerful plot features to visualize empirical excess mass (exmplot()). This includes the possibility of drawing several plots (with different maximal number of modes / cut off values) in a single graph. exif Read EXIF Metadata from JPEGs Extracts Exchangeable Image File Format (EXIF) metadata, such as camera make and model, ISO speed and the date-time the picture was taken on, from JPEG images. Incorporates the ‘easyexif’ (https://…/easyexif ) library. exifr EXIF Image Data in R Reads EXIF data using ExifTool and returns results as a data frame. ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. ExifTool supports many different metadata formats including EXIF, GPS, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the maker notes of many digital cameras by Canon, Casio, FLIR, FujiFilm, GE, HP, JVC/Victor, Kodak, Leaf, Minolta/Konica-Minolta, Motorola, Nikon, Nintendo, Olympus/Epson, Panasonic/Leica, Pentax/Asahi, Phase One, Reconyx, Ricoh, Samsung, Sanyo, Sigma/Foveon and Sony. expandFunctions Feature Matrix Builder Generates feature matrix outputs from R object inputs using a variety of expansion functions. The generated feature matrices have applications as inputs for a variety of machine learning algorithms. The expansion functions are based on coercing the input to a matrix, treating the columns as features and converting individual columns or combinations into blocks of columns. Currently these include expansion of columns by efficient sparse embedding by vectors of lags, quadratic expansion into squares and unique products, powers by vectors of degree, vectors of orthogonal polynomials functions, and block random affine projection transformations (RAPTs). The transformations are magrittr- and cbind-friendly, and can be used in a building block fashion. For instance, taking the cos() of the output of the RAPT transformation generates a stationary kernel expansion via Bochner’s theorem, and this expansion can then be cbind-ed with other features. Additionally, there are utilities for replacing features, removing rows with NAs, creating matrix samples of a given distribution, a simple wrapper for LASSO with CV, a Freeman-Tukey transform, generalizations of the outer function, matrix size-preserving discrete difference by row, plotting, etc. ExpDE Modular Differential Evolution for Experimenting with Operators Modular implementation of the Differential Evolution algorithm for experimenting with different types of operators. expint Exponential Integral and Incomplete Gamma Function The exponential integrals E_1(x), E_2(x), E_n(x) and Ei(x), and the incomplete gamma function G(a, x) defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details. A test package included in sub-directory example_API provides an implementation. C routines derived from the GNU Scientific Library . ExplainPrediction Explanation of Predictions for Classification and Regression Models Package contains methods to generate explanations for individual predictions of classification and regression models. Weighted averages of individual explanations form explanation of the whole model. The package extends ‘CORElearn’ package, but other prediction models can also be explained using a wrapper. explor Interactive Interfaces for Results Exploration Shiny interfaces and graphical functions for multivariate analysis results exploration. exploreR Tools for Quickly Exploring Data Simplifies some complicated and labor intensive processes involved in exploring and explaining data. Allows you to quickly and efficiently visualize the interaction between variables and simplifies the process of discovering covariation in your data. Also includes some convenience features designed to remove as much redundant typing as possible. expm Matrix exponential Computation of the matrix exponential and related quantities. ExpRep Experiment Repetitions Allows to calculate the probabilities of occurrences of an event in a great number of repetitions of Bernoulli experiment, through the application of the local and the integral theorem of De Moivre Laplace, and the theorem of Poisson. Gives the possibility to show the results graphically and analytically, and to compare the results obtained by the application of the above theorems with those calculated by the direct application of the Binomial formula. Is basically useful for educational purposes. expss Some Useful Functions from Spreadsheets and ‘SPSS’ Statistics Package implements several popular functions from Excel (‘COUNTIF’, ‘VLOOKUP’, etc.) and ‘SPSS’ Statistics (‘RECODE’, ‘COUNT’, etc.). Also there are functions for basic tables with value labels/variable labels support. Package aimed to help people to move data processing from Excel/’SPSS’ to R. exreport Fast, Reliable and Elegant Reproducible Research Analysis of experimental results and automatic report generation in both interactive HTML and LaTeX. This package ships with a rich interface for data modeling and built in functions for the rapid application of statistical tests and generation of common plots and tables with publish-ready quality. EXRQ Extreme Regression of Quantiles Estimation for high conditional quantiles based on quantile regression. ExtDist Extending the Range of Functions for Probability Distributions A consistent, unified and extensible framework for estimation of parameters for probability distributions, including parameter estimation procedures that allow for weighted samples; the current set of distributions included are: the standard beta, The four-parameter beta, Burr, gamma, Gumbel, Johnson SB and SU, Laplace, logistic, normal, symmetric truncated normal, truncated normal, symmetric-reflected truncated beta, standard symmetric-reflected truncated beta, triangular, uniform, and Weibull distributions; decision criteria and selections based on these decision criteria. exteriorMatch Constructs the Exterior Match from Two Matched Control Groups If one treated group is matched to one control reservoir in two different ways to produce two sets of treated-control matched pairs, then the two control groups may be entwined, in the sense that some control individuals are in both control groups. The exterior match is used to compare the two control groups. extracat Categorical Data Analysis and Visualization Categorical Data Analysis and Visualization. ExtremeBounds ExtremeBounds: Extreme Bounds Analysis in R An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and Sala-i-Martin’s versions of EBA, and allows users to customize all aspects of the analysis. extremefit Estimation of Extreme Conditional Quantiles and Probabilities Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities. extremeStat Extreme Value Statistics and Quantile Estimation Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale. extremogram Estimation of Extreme Value Dependence for Time Series Data Estimation of the sample univariate, cross and return time extremograms. The package can also adds empirical confidence bands to each of the extremogram plots via a permutation procedure under the assumption that the data are independent. Finally, the stationary bootstrap allows us to construct credible confidence bands for the extremograms. ezknitr Avoid the Typical Working Directory Pain When Using ‘knitr’ An extension of ‘knitr that adds flexibility in several ways. One common source of frustration with ‘knitr’ is that it assumes the directory where the source file lives should be the working directory, which is often not true. ‘ezknitr’ addresses this problem by giving you complete control over where all the inputs and outputs are, and adds several other convenient features to make rendering markdown/HTML documents easier. ezsummary Summarise Data in the Quick and Easy Way Functions that can fulfill the gap between the outcomes of ‘dplyr’ and a print-ready summary table. F fabCI FAB Confidence Intervals Frequentist assisted by Bayes (FAB) confidence interval construction. See ‘Adaptive multigroup confidence intervals with constant coverage’ by Yu and Hoff . face Fast Covariance Estimation for Sparse Functional Data Fast covariance estimation for sparse functional data. facebook.S4 Access to Facebook API V2 via a Set of S4 Classes Provides an interface to the Facebook API and builds collections of elements that reflects the graph architecture of Facebook. See for more information. factoextra Extract and Visualize the Results of Multivariate Data Analyses Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including ‘PCA’ (Principal Component Analysis), ‘CA’ (Correspondence Analysis), ‘MCA’ (Multiple Correspondence Analysis), ‘MFA’ (Multiple Factor Analysis) and ‘HMFA’ (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides ‘ggplot2’ – based elegant data visualization. FactoInvestigate Automatic Description of Factorial Analysis Brings a set of tools to help and automatically realise the description of principal component analyses (from ‘FactoMineR’ functions). Detection of existing outliers, identification of the informative components, graphical views and dimensions description are performed threw dedicated functions. The Investigate() function performs all these functions in one, and returns the result as a report document (Word, PDF or HTML). FactoMineR Multivariate Exploratory Data Analysis and Data Mining Exploratory data analysis methods such as principal component methods and clustering factorcpt Simultaneous Change-Point and Factor Analysis Identifies change-points in the common and the idiosyncratic components via factor modelling. FactoRizationMachines Machine Learning with Higher-Order Factorization Machines Implementation of three machine learning approaches: Support Vector Machines (SVM) with a linear kernel, second-order Factorization Machines (FM), and higher-order Factorization Machines (HoFM). factorMerger Hierarchical Algorithm for Post-Hoc Testing A set of tools to support results of post-hoc testing and enable to extract hierarchical structure of factors. Work on this package was financially supported by the ‘NCN Opus grant 2016/21/B/ST6/02176’. factorstochvol Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix. Factoshiny Perform Factorial Analysis from FactoMineR with a Shiny Application Perform factorial analysis with a menu and draw graphs interactively thanks to FactoMineR and a Shiny application. faisalconjoint Faisal Conjoint Model: A New Approach to Conjoint Analysis It is used for systematic analysis of decisions based on attributes and its levels. fakeR Simulates Data from a Data Frame of Different Variable Types Generates fake data from a dataset of different variable types. The package contains the functions simulate_dataset and simulate_dataset_ts to simulate time-independent and time-dependent data. It randomly samples character and factor variables from contingency tables and numeric and ordered factors from a multivariate normal distribution. It currently supports the simulation of stationary and zero-inflated count time series. fancycut A Fancy Version of ‘base::cut’ Provides the function fancycut() which is like cut() except you can mix left open and right open intervals with point values, intervals that are closed on both ends and intervals that are open on both ends. fanovaGraph Building Kriging Models from FANOVA Graphs Estimation and plotting of a function’s FANOVA graph to identify the interaction structure and fitting, prediction and simulation of a Kriging model modified by the identified structure. The interactive function plotManipulate() can only be run on the RStudio IDE with RStudio’s package ‘manipulate’ loaded. RStudio is freely available (www.rstudio.org), and includes package ‘manipulate’. The equivalent function plotTk() bases on CRAN Repository packages only. fanplot Visualisation of Sequential Probability Distributions Using Fan Charts Visualise sequential distributions using a range of plotting styles. Sequential distribution data can be input as either simulations or values corresponding to percentiles over time. Plots are added to existing graphic devices using the fan function. Users can choose from four different styles, including fan chart type plots, where a set of coloured polygon, with shadings corresponding to the percentile values are layered to represent different uncertainty levels. farff A Faster ‘ARFF’ File Reader and Writer Reads and writes ‘ARFF’ files. ‘ARFF’ (Attribute-Relation File Format) files are like ‘CSV’ files, with a little bit of added meta information in a header and standardized NA values. They are quite often used for machine learning data sets and were introduced for the ‘WEKA’ machine learning ‘Java’ toolbox. See for further info on ‘ARFF’ and for for more info on ‘WEKA’. ‘farff’ gets rid of the ‘Java’ dependency that ‘RWeka’ enforces, and it is at least a faster reader (for bigger files). It uses ‘readr’ as parser back-end for the data section of the ‘ARFF’ file. Consistency with ‘RWeka’ is tested on ‘Github’ and ‘Travis CI’ with hundreds of ‘ARFF’ files from ‘OpenML’. Note that the ‘OpenML’ package is currently only available from ‘Github’ at: . fasjem A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models The FASJEM (A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models) is a joint estimator which is fast and scalable for learning multiple related sparse Gaussian graphical models. For more details, please see . fastAdaboost a Fast Implementation of Adaboost Implements Adaboost based on C++ backend code. This is blazingly fast and especially useful for large, in memory data sets. The package uses decision trees as weak classifiers. Once the classifiers have been trained, they can be used to predict new data. Currently, we support only binary classification tasks. The package implements the Adaboost.M1 algorithm and the real Adaboost(SAMME.R) algorithm. FastBandChol Fast Estimation of a Covariance Matrix by Banding the Cholesky Factor Fast and numerically stable estimation of a covariance matrix by banding the Cholesky factor using a modified Gram-Schmidt algorithm implemented in RcppArmadilo. See for details on the algorithm. fastcmh Significant Interval Discovery with Categorical Covariates A method which uses the Cochran-Mantel-Haenszel test with significant pattern mining to detect intervals in binary genotype data which are significantly associated with a particular phenotype, while accounting for categorical covariates. fastdigest Fast, Low Memory-Footprint Digests of R Objects Provides an R interface to Bob Jenkin’s streaming, non-cryptographic ‘SpookyHash’ hash algorithm for use in digest-based comparisons of R objects. ‘fastdigest’ plugs directly into R’s internal serialization machinery, allowing digests of all R objects the serialize() function supports, including reference-style objects via custom hooks. Speed is high and scales linearly by object size; memory usage is constant and negligible. fastDummies Fast Creation of Dummy (Binary) Columns from Categorical Variables Creates dummy columns from columns that have categorical variables (character or factor types). This package provides a significant speed increase from creating dummy variables through model.matrix(). fasteraster Raster Images Processing and Vector Recognition If there is a need to recognise edges on a raster image or a bitmap or any kind of a matrix, one can find packages that does only 90 degrees vectorization. Typically the nature of artefact images is linear and can be vectorized in much more efficient way than draw a series of 90 degrees lines. The fasteraster package does recognition of lines using only one pass. fastGraph Fast Drawing and Shading of Graphs of Statistical Distributions Provides functionality to produce graphs of probability density functions and cumulative distribution functions with few keystrokes, allows shading under the curve of the probability density function to illustrate concepts such as p-values and critical values, and fits a simple linear regression line on a scatter plot with the equation as the main title. fastHorseshoe The Elliptical Slice Sampler for Bayesian Horseshoe Regression The elliptical slice sampler for Bayesian shrinkage linear regression, such as horseshoe, double-exponential and user specific priors. FastKM A Fast Multiple-Kernel Method Based on a Low-Rank Approximation A computationally efficient and statistically rigorous fast Kernel Machine method for multi-kernel analysis. The approach is based on a low-rank approximation to the nuisance effect kernel matrices. The algorithm is applicable to continuous, binary, and survival traits and is implemented using the existing single-kernel analysis software ‘SKAT’ and ‘coxKM’. ‘coxKM’ can be obtained from http://…/software.html. FastKNN Fast k-Nearest Neighbors Compute labels for a test set according to the k-Nearest Neighbors classification. This is a fast way to do k-Nearest Neighbors classification because the distance matrix -between the features of the observations- is an input to the function rather than being calculated in the function itself every time. fastLink Fast Probabilistic Record Linkage with Missing Data Implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. This includes functionalities to conduct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. In addition, tools for preparing, adjusting, and summarizing data merges are included. The package implements methods described in Enamorado, Fifield, and Imai (2017) ”Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records”, available at . fastLSU Fast Linear Step Up Procedure of Benjamini-Hochberg FDR Method for Huge-Scale Testing Problems An efficient algorithm to apply the Benjamini-Hochberg Linear Step Up FDR controlling procedure in huge-scale testing problems (proposed in Vered Madar and Sandra Batista(2016) ). Unlike ‘BH’ method, the package does not require any p value ordering. Besides, it permits separating p values arbitrarily into computationally feasible chunks of arbitrary size and produces the same results as those from applying linear step up BH procedure to the entire set of tests. fastnet Large-Scale Social Network Analysis We present an implementation of the algorithms required to simulate large-scale social networks and retrieve their most relevant metrics. fastpseudo Fast Pseudo Observations Computes pseudo-observations for survival analysis on right-censored data based on restricted mean survival time. fastqcr Quality Control of Sequencing Data FASTQC’ is the most widely used tool for evaluating the quality of high throughput sequencing data. It produces, for each sample, an html report and a compressed file containing the raw data. If you have hundreds of samples, you are not going to open up each ‘HTML’ page. You need some way of looking at these data in aggregate. ‘fastqcr’ Provides helper functions to easily parse, aggregate and analyze ‘FastQC’ reports for large numbers of samples. It provides a convenient solution for building a ‘Multi-QC’ report, as well as, a ‘one-sample’ report with result interpretations. fastTextR An Interface to the ‘fastText’ Library An interface to the ‘fastText’ library . The package can be used for text classification and to learn word vectors. The install folder contains the ‘PATENTS’ file. An example how to use ‘fastTextR’ can be found in the ‘README’ file. fasttime Fast Utility Function for Time Parsing and Conversion Fast functions for timestamp manipulation that avoid system calls and take shortcuts to facilitate operations on very large data. fauxpas HTTP Error Helpers HTTP error helpers. Methods included for general purpose HTTP error handling, as well as individual methods for every HTTP status code, both via status code numbers as well as their descriptive names. Supports ability to adjust behavior to stop, message or warning. Includes ability to use custom whisker template to have any configuration of status code, short description, and verbose message. Currently supports integration with ‘crul’, ‘curl’, and ‘httr’. fbRads Analyzing and Managing Facebook Ads from R Wrapper functions around the Facebook Marketing ‘API’ to create, read, update and delete custom audiences, images, campaigns, ad sets, ads and related content. fbroc Fast Algorithms to Bootstrap ROC Curves Implements a very fast C++ algorithm to quickly bootstrap ROC Curves and derived performance metrics (e.g. AUC). You can also plot the results and calculate confidence intervals. Currently the calculation of 100000 bootstrap replicates for 500 observations takes about one second. fcm Inference of Fuzzy Cognitive Maps (FCMs) Provides a selection of 6 different inference rules and 4 threshold functions in order to obtain the inference of the FCM (Fuzzy Cognitive Map). Moreover, the ‘fcm’ package returns a data frame of the concepts’ values of each state after the inference procedure. Fuzzy cognitive maps were introduced by Kosko (1986) providing ideal causal cognition tools for modeling and simulating dynamic systems. FCMapper Fuzzy Cognitive Mapping Provides several functions to create and manipulate fuzzy cognitive maps. It is based on FCMapper for Excel, distributed at http://…/joomla , developed by Michael Bachhofer and Martin Wildenberg. Maps are inputted as adjacency matrices. Attributes of the maps and the equilibrium values of the concepts (including with user-defined constrained values) can be calculated. The maps can be graphed with a function that calls “igraph”. Multiple maps with shared concepts can be aggregated. FCNN4R Fast Compressed Neural Networks for R The FCNN4R package provides an interface to kernel routines from the FCNN C++ library. FCNN is based on a completely new Artificial Neural Network representation that offers unmatched efficiency, modularity, and extensibility. FCNN4R provides standard teaching (backpropagation, Rprop) and pruning algorithms (minimum magnitude, Optimal Brain Surgeon), but it is first and foremost an efficient computational engine. Users can easily implement their algorithms by taking advantage of fast gradient computing routines, as well as network reconstruction functionality (removing weights and redundant neurons). fdapace Functional Data Analysis and Empirical Dynamics Provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. PACE is useful for the analysis of data that have been generated by a sample of underlying (but usually not fully observed) random trajectories. It does not rely on pre-smoothing of trajectories, which is problematic if functional data are sparsely sampled. PACE provides options for functional regression and correlation, for Longitudinal Data Analysis, the analysis of stochastic processes from samples of realized trajectories, and for the analysis of underlying dynamics. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ ‘glue’. fdaPDE Regression with Partial Differential Regularizations, using the Finite Element Method An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization. FDboost Boosting Functional Regression Models Regression models for functional data, i.e. scalar-on-function, function-on-scalar and function-on-function regression models are fitted using a component-wise gradient boosting algorithm. fdcov Analysis of Covariance Operators Provides a variety of tools for the analysis of covariance operators. fDMA Dynamic Model Averaging and Dynamic Model Selection for Continuous Outcomes It allows to estimate Dynamic Model Averaging, Dynamic Model Selection and Median Probability Model. The original methods (see References) are implemented, as well as, selected further modifications of these methods. In particular the User might choose between recursive moment estimation and exponentially moving average for variance updating. Inclusion probabilities might be modified in a way using Google Trends. The code is written in a way which minimises the computational burden (which is quite an obstacle for Dynamic Model Averaging if many variables are used). For example, this package allows for parallel computations under Windows machines. Additionally, the User might reduce a set of models according to a certain algorithm. The package is designed in a way that is hoped to be especially useful in economics and finance. (Research funded by the Polish National Science Centre grant under the contract number DEC-2015/19/N/HS4/00205.) FDRsampsize Compute Sample Size that Meets Requirements for Average Power and FDR Defines a collection of functions to compute average power and sample size for studies that use the false discovery rate as the final measure of statistical significance. FeaLect Scores Features for Feature Selection For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models. Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets. FeatureHashing Implement Feature Hashing on Model Matrix Feature hashing, also called as the hashing trick, is a method to transform features to vector. Without looking the indices up in an associative array, it applies a hash function to the features and uses their hash values as indices directly. This package implements the method of feature hashing proposed in Weinberger et. al. (2009) with Murmurhash3 and provides a formula interface in R. See the README.md for more information. featurizer Some Helper Functions that Help Create Features from Data A collection of functions that would help one to build features based on external data. Very useful for Data Scientists in data to day work. Many functions create features using parallel computation. Since the nitty gritty of parallel computation is hidden under the hood, the user need not worry about creating clusters and shutting them down. FedData Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources Functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package allows for retrieval of four datasets: The National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS); The National Hydrography Dataset (USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; and the Global Historical Climatology Network (GHCN), coordinated by National Climatic Data Center at NOAA. Additional data sources are in the works, including global DEM resources (ETOPO1, ETOPO5, ETOPO30, SRTM), global soils (HWSD), tree-ring records (ITRDB), MODIS satellite data products, the National Atlas (US), Natural Earth, PRISM, and WorldClim. feedeR Read RSS/Atom Feeds from R Retrieve data from RSS/Atom feeds. fence Using Fence Methods for Model Selection This method is a new class of model selection strategies, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. References: 1. Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692. . 2. Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625-629. 3. Jiang J., Nguyen T., Rao J.S. (2010), Fence Method for Nonparametric Small Area Estimation. Survey Methodology, 36(1), 3-11. . 4. Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403-415. . 5. Thuan Nguyen & Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303-314. . 6. Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644-662. . 7. Jiang, J. (2014), The fence methods, in Advances in Statistics, Hindawi Publishing Corp., Cairo. . 8. Jiming Jiang and Thuan Nguyen (2015), The Fence Methods, World Scientific, Singapore. . FENmlm Fixed Effects Nonlinear Maximum Likelihood Models Efficient estimation of fixed-effect maximum likelihood models with, possibly, non-linear right hand sides. ffstream Forgetting Factor Methods for Change Detection in Streaming Data An implementation of the adaptive forgetting factor scheme described in Bodenham and Adams (2016) which adaptively estimates the mean and variance of a stream in order to detect multiple changepoints in streaming data. The implementation is in C++ and uses Rcpp. Additionally, implementations of the fixed forgetting factor scheme from the same paper, as well as the classic CUSUM and EWMA methods, are included. FFTrees Generate, Visualise, and Compare Fast and Frugal Decision Trees (FFTs) Fast and Frugal Trees (FFTs) are very simply decision trees for classifying cases (i.e.; breast cancer patients) into one of two classes (e.g.; no cancer vs. true cancer). FFTs can be preferable to more complex algorithms (such as logistic regression) because they are easy to communicate and implement, and are robust against noisy data. This package contains several functions that allow users to input their own data, set model criteria and visualize the best tree(s) for their data. FHDI Fractional Hot Deck and Fully Efficient Fractional Imputation Impute general multivariate missing data with the fractional hot deck imputation. fheatmap Draw Heatmaps with Colored Dendogram R function to plot high quality, elegant heatmap using ‘ggplot2’ graphics . Some of the important features of this package are, coloring of row/column side tree with respect to the number of user defined cuts in the cluster, add annotations to both columns and rows, option to input annotation palette for tree and column annotations and multiple parameters to modify aesthetics (style, color, font) of texts in the plot. fiery A Lightweight and Flexible Web Framework A very flexible framework for building server side logic in R. The framework is unoppinionated when it comes to how HTTP requests and WebSocket messages are handled and supports all levels of app complexity; from serving static content to full-blown dynamic web-apps. Fiery does not hold your hand as much as e.g. the shiny package does, but instead sets you free to create your web app the way you want. filematrix File-Backed Matrix Class with Convenient Read and Write Access Interface for working with large matrices stored in files, not in computer memory. Supports multiple data types (double, integer, logical and raw) of different sizes (e.g. 4, 2, or 1 byte integers). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. Supports very large matrices (tested on 1 terabyte matrix), allowing for more than 2^32 rows or columns. Cross-platform as the package has R code only, no C/C++. filenamer Easy Management of File Names Create descriptive file names with ease. New file names are automatically (but optionally) time stamped and placed in date stamped directories. Streamline your analysis pipeline with input and output file names that have informative tags and proper file extensions. fileplyr Chunk Processing or Split-Apply-Combine on Delimited Files(CSV Etc) Perform chunk processing or split-apply-combine on data in a delimited file(example: CSV) across multiple cores of a single machine with low memory footprint. These functions are a convenient wrapper over the versatile package ‘datadr’. filesstrings Handy String and File Manipulation Handy string and file processing and manipulation tools. Built on top of the functionality of base and ‘stringr’. Good for those who like to do all of their file and string manipulation from within R. FinAna Financial Analysis and Regression Diagnostic Analysis Functions for regression analysis and financial modeling, including batch graphs generation, beta calculation, descriptive statistics. findviews A View Generator for Multidimensional Data A tool to explore wide data sets, by detecting, ranking and plotting groups of statistically dependent columns. finreportr Financial Data from U.S. Securities and Exchange Commission Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See for more information. fitur Fit Univariate Distributions Wrapper for computing parameters and then assigning to distribution function families. fixedTimeEvents The Distribution of Distances Between Discrete Events in Fixed Time Distribution functions and test for over-representation of short distances in the Liland distribution. Simulation functions are included for comparison. FixSeqMTP Fixed Sequence Multiple Testing Procedures Generalized Fixed Sequence Multiple Testing Procedures (g-FSMTPs) are used to test a sequence of pre- ordered hypotheses. The proposed three Family-wise Error Rate (FWER) controlling g-FSMTPs utilize numbers of rejections and acceptances, all the procedures are designed under arbitrary dependence. The proposed two False Discovery Rate (FDR) controlling g-FSMTPs allows more but a given number of acceptances (k>=1), the procedures are designed for arbitrary dependence and independence. The main functions for each proposed g-FSMTPs are designed to calculate adjusted p-values and critical values, respectively. For users’ convenience, the output results also include the option of decision rules for convenience. flacco Feature-Based Landscape Analysis of Continuous and Constraint Optimization Problems Contains tools and features, which can be used for an exploratory landscape analysis of continuous optimization problems. Those are able to quantify rather complex properties, such as the global structure, separability, etc., of continuous optimization problems. flare Family of Lasso Regression The package ‘flare’ provides the implementation of a family of Lasso variants including Dantzig Selector, LAD Lasso, SQRT Lasso, Lq Lasso for estimating high dimensional sparse linear model. We adopt the alternating direction method of multipliers and convert the original optimization problem into a sequential L1 penalized least square minimization problem, which can be efficiently solved by linearization algorithm. A multi-stage screening approach is adopted for further acceleration. Besides the sparse linear model estimation, we also provide the extension of these Lasso variants to sparse Gaussian graphical model estimation including TIGER and CLIME using either L1 or adaptive penalty. Missing values can be tolerated for Dantzig selector and CLIME. The computation is memory-optimized using the sparse matrix output. flars Functional LARS Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors. FlexDir Tools to Work with the Flexible Dirichlet Distribution Provides tools to work with the Flexible Dirichlet distribution. The main features are an E-M algorithm for computing the maximum likelihood estimate of the parameter vector and a function based on conditional bootstrap to estimate its asymptotic variance-covariance matrix. It contains also functions to plot graphs, to generate random observations and to handle compositional data. FlexParamCurve Tools to Fit Flexible Parametric Curves Model selection tools and ‘selfStart’ functions to fit parametric curves in ‘nls’, ‘nlsList’ and ‘nlme’ frameworks. flexPM Flexible Parametric Models for Censored and Truncated Data Estimation of flexible parametric models for survival data. flexrsurv Flexible Relative Survival Perform relative survival analyses using approaches described in Remontet et al. (2007) and Mahboubi et al. (2011) . It implements non-linear, non-proportional effects and both non proportional and non linear effects using splines (B-spline and truncated power basis). flexsurvcure Flexible Parametric Cure Models Flexible parametric mixture and non-mixture cure models for time-to-event data. flextable Tabular Reporting API Create pretty tables for ‘Microsoft Word’, ‘Microsoft PowerPoint’ and ‘HTML’ documents. Functions are provided to let users create tables, modify and format their content. It extends package ‘officer’ that does not contain any feature for customized tabular reporting. Function ‘tabwid’ produces an ‘htmlwidget’ ready to be used in ‘Shiny’ or ‘R Markdown (*.Rmd)’ documents. See the ‘flextable’ website for more information. flifo Don’t Get Stuck with Stacks in R Functions to create and manipulate FIFO (First In First Out), LIFO (Last In First Out), and NINO (Not In or Never Out) stacks in R. FLIM Farewell’s Linear Increments Model FLIM fits linear models for the observed increments in a longitudinal dataset, and imputes missing values according to the models. flock Process Synchronization Using File Locks Implements synchronization between R processes (spawned by using the ‘parallel’ package for instance) using file locks. Supports both exclusive and shared locking. flowr Streamlining Design and Deployment of Complex Workflows An interface to streamline design of complex workflows and their deployment to a High Performance Computing Cluster. flows Flow Selection and Analysis Selections on flow matrices, statistics on selected flows, map and graph visualisations. fmbasics Financial Market Building Blocks Implements basic financial market objects like currencies, currency pairs, interest rates and interest rate indices. You will be able to use Benchmark instances of these objects which have been defined using their most common conventions or those defined by International Swap Dealer Association (ISDA, ) legal documentation. FMC Factorial Experiments with Minimum Level Changes Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs. fmrs Variable Selection in Finite Mixture of AFT Regression and FMR Provides parameter estimation as well as variable selection in Finite Mixture of Accelerated Failure Time Regression Models and Finite Mixture of Regression models. It also provides the Ridge regression and Elastic Net. FMsmsnReg Regression Models with Finite Mixtures of Skew Heavy-Tailed Errors Fit linear regression models where the random errors follow a finite mixture of of Skew Heavy-Tailed Errors. foghorn Summarizes CRAN Check Results in the Terminal The CRAN check results in your R terminal. fold A Self-Describing Dataset Format and Interface Defines a compact data format that includes metadata. The function fold() creates the format by converting from data.frame, and unfold() converts back. The predictability of the folded format supports reusability of data processing tools, while the presence of embedded metadata improves portability, interpretability, and efficiency. fontquiver Set of Installed Fonts Provides a set of fonts with permissive licences. This is useful when you want to avoid system fonts to make sure your outputs are reproducible. forcats Tools for Working with Categorical Variables (Factors) Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, anonymising, and manually recoding). foreach Foreach looping construct for R Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel. ForecastCombinations Forecast Combinations Aim: Supports the most frequently used methods to combine forecasts. Among others: Simple average, Ordinary Least Squares, Least Absolute Deviation, Constrained Least Squares, Variance-based, Best Individual model, Complete subset regressions and Information-theoretic (information criteria based). forecastHybrid Convenient Functions for Ensemble Time Series Forecasts Convenient functions for ensemble forecasts in R combining approaches from the ‘forecast’ package. Forecasts generated from auto.arima(), ets(), nnetar(), stlm(), and tbats() can be combined with equal weights or weights based on in-sample errors. Future methods such as cross validation are planned. forecastSNSTS Forecasting for Stationary and Non-Stationary Time Series Methods to compute linear h-step prediction coefficients based on localised and iterated Yule-Walker estimates and empirical mean square prediction errors for the resulting predictors. forecTheta Forecasting Time Series by Theta Method Routines for forecasting univariate time series using Theta Method and Optimised Theta Method (Fioruci et al, 2015). Contains two cross-validation routines of Tashman (2000). forega Floating-Point Genetic Algorithms with Statistical Forecast Based Inheritance Operator The implemented algorithm performs a floating-point genetic algorithm search with a statistical forecasting operator that generates offspring which probably will be generated in future generations. Use of this operator enhances the search capabilities of floating-point genetic algorithms because offspring generated by usual genetic operators rapidly forecasted before performing more generations. forestFloor Visualizes Random Forests with Feature Contributions Enables user to form appropriate visualization of high dimensional mapping curvature of random forests. forestinventory Design-Based Global and Small-Area Estimations for Multiphase Forest Inventories Extensive global and small-area estimation procedures for multiphase forest inventories under the design-based Monte-Carlo approach are provided. The implementation includes estimators for simple and cluster sampling published by Daniel Mandallaz in 2007 (), 2013 (, , , ) and 2016 (). It provides point estimates, their external- and design-based variances as well as confidence intervals. The procedures have also been optimized for the use of remote sensing data as auxiliary information. forestmodel Forest Plots from Regression Models Produces forest plots using ‘ggplot2’ from models produced by functions such as stats::lm(), stats::glm() and survival::coxph(). forestplot Advanced Forest Plot Using ‘grid’ Graphics The plot allows for multiple confidence intervals per row, custom fonts for each text element, custom confidence intervals, text mixed with expressions, and more. The aim is to extend the use of forest plots beyond meta-analyses. This is a more general version of the original ‘rmeta’ package’s forestplot function and relies heavily on the ‘grid’ package. ForestTools Analysing Remotely Sensed Forest Data Forest Tools provides functions for analyzing remotely sensed forest data. formattable Formattable Data Structures Provides functions to create formattable vectors and data frames. Formattable vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in markdown to improve the readability of data presented in tabular form rendered as web pages. forward Forward search Forward search approach to robust analysis in linear and generalized linear regression models. ForwardSearch Forward Search using asymptotic theory Forward Search analysis of time series regressions. Implements the asymptotic theory developed in Johansen and Nielsen (2013, 2014). fourierin Computes Numeric Fourier Integrals Computes Fourier integrals of functions of one and two variables using the Fast Fourier transform. The Fourier transforms must be evaluated on a regular grid. fourPNO Bayesian 4 Parameter Item Response Model Estimate Lord & Barton’s four parameter IRT model with lower and upper asymptotes using Bayesian formulation described by Culpepper (2015). fpa Spatio-Temporal Fixation Pattern Analysis Spatio-temporal Fixation Pattern Analysis (FPA) is a new method of analyzing eye movement data, developed by Mr. Jinlu Cao under the supervision of Prof. Chen Hsuan-Chih at The Chinese University of Hong Kong, and Prof. Wang Suiping at the South China Normal Univeristy. The package ‘fpa’ is a R implementation which makes FPA analysis much easier. There are four major functions in the package: ft2fp(), get_pattern(), plot_pattern(), and lineplot(). The function ft2fp() is the core function, which can complete all the preprocessing within moments. The other three functions are supportive functions which visualize the eye fixation patterns. FPCA2D Two Dimensional Functional Principal Component Analysis Compute the two dimension functional principal component scores for a series of two dimension images. fpCompare Reliable Comparison of Floating Point Numbers Comparisons of floating point numbers are problematic due to errors associated with the binary representation of decimal numbers. Despite being aware of these problems, people still use numerical methods that fail to account for these and other rounding errors (this pitfall is the first to be highlighted in Circle 1 of Burns (2012, http://…/R_inferno.pdf ). This package provides four new relational operators useful for performing floating point number comparisons with a set tolerance. FPDclustering PD-Clustering and Factor PD-Clustering Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PD-clustering is a flexible method that can be used with non-spherical clusters, outliers, or noisy data. Facto PD-clustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It allows clustering of high dimensional data sets. fpest Estimating Finite Population Total Given the values of sampled units and selection probabilities the desraj function in the package computes the estimated value of the total as well as estimated variance. fractional Vulgar Fractions in R The main function of this package allows numerical vector objects to be displayed with their values in vulgar fractional form. This is convenient if patterns can then be more easily detected. In some cases replacing the components of a numeric vector by a rational approximation can also be expected to remove some component of round-off error. The main functions form a re-implementation of the functions ‘fractions’ and ‘rational’ of the MASS package, but using a radically improved programming strategy. fragilityindex Fragility Index Implements the fragility index calculation for dichotomous results as described in Walsh, Srinathan, McAuley. Mrkobrada, Levine, Ribic, Molnar, Dattani, Burke, Guyatt, Thabane, Walter, Pogue and Devereaux PJ (2014) . frailtyEM Fitting Frailty Models with the EM Algorithm Contains functions for fitting shared frailty models with a semi-parametric baseline hazard with the Expectation-Maximization algorithm. Supported data formats include clustered failures with left truncation and recurrent events in gap-time or Andersen-Gill format. Several frailty distributions, such as the the gamma, positive stable and the Power Variance Family are supported. frailtySurv General Semiparametric Shared Frailty Model Simulates and fits semiparametric shared frailty models under a wide range of frailty distributions using a consistent and asymptotically-normal estimator. Currently supports: gamma, power variance function, log-normal, and inverse Gaussian frailty models. franc Detect the Language of Text With no external dependencies and support for 335 languages; all languages spoken by more than one million speakers. ‘Franc’ is a port of the ‘JavaScript’ project of the same name, see . frbs Fuzzy Rule-Based Systems for Classification and Regression Tasks An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community. frbs: Fuzzy Rule-Based Systems for Classification and Regression in R freqdist Frequency Distribution Generates a frequency distribution. The frequency distribution includes raw frequencies, percentages in each category, and cumulative frequencies. The frequency distribution can be stored as a data frame. freqdom Frequency Domain Analysis for Multivariate Time Series Methods for the analysis of multivariate time series using frequency domain techniques. Implementations of dynamic principle components analysis (DPCA) and estimators of operators in lagged regression. Examples of usage in functional data analysis setup. FreqProf Frequency Profiles Computing and Plotting Tools for generating an informative type of line graph, the frequency profile, which allows single behaviors, multiple behaviors, or the specific behavioral patterns of individual subjects to be graphed from occurrence/nonoccurrence behavioral data. frequencies Create Frequency Tables with Counts and Rates Provides functions to create frequency tables which display both counts and rates. frequencyConnectedness Spectral Decomposition of Connectedness Measures Accompanies a paper (Barunik, Krehlik (2017) ) dedicated to spectral decomposition of connectedness measures and their interpretation. We implement all the developed estimators as well as the historical counterparts. For more information, see the help or GitHub page ( ) for relevant information. FRK Fixed Rank Kriging Fixed Rank Kriging is a tool for spatial/spatio-temporal modelling and prediction with large datasets. The approach, discussed in Cressie and Johannesson (2008), decomposes the field, and hence the covariance function, using a fixed set of n basis functions, where n is typically much smaller than the number of data points (or polygons) m. The method naturally allows for non-stationary, anisotropic covariance functions and the use of observations with varying support (with known error variance). The projected field is a key building block of the Spatial Random Effects (SRE) model, on which this package is based. The package FRK provides helper functions to model, fit, and predict using an SRE with relative ease. Reference: Cressie, N. and Johannesson, G. (2008) . fromo Fast Robust Moments Fast computation of moments via ‘Rcpp’. Supports computation on vectors and matrices, and Monoidal append of moments. FSelectorRcpp Rcpp’ Implementation of ‘FSelector’ Entropy-Based Feature Selection Algorithms with a Sparse Matrix Support Rcpp’ (free of ‘Java’/’Weka’) implementation of ‘FSelector’ entropy-based feature selection algorithms with a sparse matrix support. It is also equipped with a parallel backend. FSInteract Fast Searches for Interactions Performs fast detection of interactions in large-scale data using the method of random intersection trees introduced in ‘Shah, R. D. and Meinshausen, N. (2014) Random Intersection Trees’. The algorithm finds potentially high-order interactions in high-dimensional binary two-class classification data, without requiring lower order interactions to be informative. The search is particularly fast when the matrices of predictors are sparse. It can also be used to perform market basket analysis when supplied with a single binary data matrix. Here it will find collections of columns which for many rows contain all 1’s. fst Lightning Fast Serialization of Data Frames for R Read and write data frames at high speed. Compress your data with fast and efficient type-optimized algorithms that allow for random access of stored data frames (columns and rows). FTRLProximal FTRL Proximal Implementation for Elastic Net Regression Implementation of Follow The Regularized Leader (FTRL) Proximal algorithm used for online training of large scale regression models using a mixture of L1 and L2 regularization. ftsspec Spectral Density Estimation and Comparison for Functional Time Series Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length. fullfact Full Factorial Breeding Analysis Package for the analysis of full factorial breeding designs. fulltext Full Text of ‘Scholarly’ Articles Across Many Data Sources Provides a single interface to many sources of full text ‘scholarly’ data, including ‘Biomed Central’, Public Library of Science, ‘Pubmed Central’, ‘eLife’, ‘F1000Research’, ‘PeerJ’, ‘Pensoft’, ‘Hindawi’, ‘arXiv’ ‘preprints’, and more. Functionality included for searching for articles, downloading full or partial text, converting to various data formats used in and outside of R. funchir Convenience Functions by Michael Chirico A set of functions, some subset of which I use in every .R file I write. Examples are table2(), which adds useful functionalities to base table (sorting, built-in proportion argument, etc.); lyx.xtable(), which converts xtable() output to a format more easily copy-pasted into LyX; pdf2(), which writes a plot to file while also displaying it in the RStudio plot window; and abbr_to_colClass(), which is a much more concise way of feeding many types to a colClass argument in a data reader. functools Extending Functional Programming in R Extending functional programming in R by providing support to the usual higher order functional suspects (Map, Reduce, Filter, etc.). funcy Functional Clustering Algorithms Unified framework to cluster functional data according to one of seven models. All models are based on the projection of the curves onto a basis. The main function funcit() calls wrapper functions for the existing algorithms, so that input parameters are the same. A list is returned with each entry representing the same or extended output for the corresponding method. Method specific as well as general visualization tools are available. funData An S4 Class for Functional Data S4 classes for univariate and multivariate functional data with utility functions. funFEM Clustering in the Discriminative Functional Subspace The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace. fungible Fungible Coefficients and Monte Carlo Functions Functions for computing fungible coefficients and Monte Carlo data. funHDDC Model-based clustering in group-specific functional subspaces The package provides the funHDDC algorithm (Bouveyron & Jacques, 2011) which allows to cluster functional data by modeling each group within a specific functional subspace. funModeling Learning Data Cleaning, Visual Analysis and Model Performance Learn data cleaning, visual data analysis and model performance assessment (KS, AUC and ROC), package core is in the vignette documentation explaining last topics as a tutorial. funr Simple Utility Providing Terminal Access to all R Functions A small utility which wraps Rscript and provides access to all R functions from the shell. funrar Functional Rarity Indices Computation Computes functional rarity indices as proposed by Violle et al (in revision). Various indices can be computed using both regional and local information. Functional Rarity combines both the functional aspect of rarity as well as the extent aspect of rarity. FUNTA Functional Tangential Angle Pseudo-Depth Computes the functional tangential angle pseudo-depth and its robustified version from the paper by Kuhnt and Rehage (2016). See Kuhnt, S.; Rehage, A. (2016): An angle-based multivariate functional pseudo-depth for shape outlier detection, JMVA 146, 325-340, for details. funtimes Functions for Time Series Analysis Includes non-parametric estimators and tests for time series analysis. The functions allow to test for presence of possibly non-monotonic trends and for synchronism of trends in multiple time series, using modern bootstrap techniques and robust non-parametric difference-based estimators. future A Future API for R A Future API for R is provided. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available. Futures are useful constructs in for instance concurrent evaluation, e.g. multicore parallel processing and distributed processing on compute clusters. The purpose of this package is to provide a lightweight interface for using futures in R. Functions ‘future()’ and ‘value()’ exist for creating futures and requesting their values. An infix assignment operator ‘%<=%’ exists for creating futures whose values are accessible by the assigned variables (as promises). This package implements the synchronous ‘lazy’ and ‘eager’ futures, and the asynchronous ‘multicore’ future (not on Windows). Additional types of futures are provided by other packages enhancing this package. A Future API for R future.BatchJobs A Future for BatchJobs Simple parallel and distributed processing using futures that utilizes the ‘BatchJobs’ framework, e.g. ‘fit %<-% { glm.fit(x, y) }’. This package implements the Future API of the ‘future’ package. future.batchtools A Future API for Parallel and Distributed Processing using ‘batchtools’ Implements of the Future API on top of the ‘batchtools’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, not only on your local machine or ad-hoc cluster of machines, but also via high-performance compute (‘HPC’) job schedulers such as ‘LSF’, ‘OpenLava’, ‘Slurm’, ‘SGE’, and ‘TORQUE’ / ‘PBS’, e.g. ‘y <- future_lapply(files, FUN = process)’. fuzzr Fuzz-Test R Functions Test function arguments with a wide array of inputs, and produce reports summarizing messages, warnings, errors, and returned values. Fuzzy.p.value Computing Fuzzy p-Value The main goal of this package is drawing the membership function of the fuzzy p-value which is defined as a fuzzy set on the unit interval for three following problems: (1) testing crisp hypotheses based on fuzzy data, (2) testing fuzzy hypotheses based on crisp data, and (3) testing fuzzy hypotheses based on fuzzy data. In all cases, the fuzziness of data or/and the fuzziness of the boundary of null fuzzy hypothesis transported via the p-value function and causes to produce the fuzzy p-value. If the p-value is fuzzy, it is more appropriate to consider a fuzzy significance level for the problem. Therefore, the comparison of the fuzzy p-value and the fuzzy significance level is evaluated by a fuzzy ranking method in this package. FuzzyAHP (Fuzzy) AHP Calculation Calculation of AHP (Analytic Hierarchy Process – ) with classic and fuzzy weights based on Saaty’s pairwise comparison method for determination of weights. fuzzyforest Fuzzy Forests Fuzzy forests, a new algorithm based on random forests, is designed to reduce the bias seen in random forest feature selection caused by the presence of correlated features. Fuzzy forests uses recursive feature elimination random forests to select features from separate blocks of correlated features where the correlation within each block of features is high and the correlation between blocks of features is low. One final random forest is fit using the surviving features. This package fits random forests using the ‘randomForest’ package and allows for easy use of ‘WGCNA’ to split features into distinct blocks. fuzzyjoin Join Tables Together on Inexact Matching Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching. FuzzyLP Fuzzy Linear Programming Methods to solve Fuzzy Linear Programming Problems with fuzzy constraints (by Verdegay, Zimmermann, Werner, Tanaka), fuzzy costs (multiobjective, interval arithmetic, stratified piecewise reduction, defuzzification-based), and fuzzy technological matrix. FuzzyMCDM Multi-Criteria Decision Making Methods for Fuzzy Data Implementation of several MCDM methods for fuzzy data (triangular fuzzy numbers) for decision making problems. The methods that are implemented in this package are Fuzzy TOPSIS (with two normalization procedures), Fuzzy VIKOR, Fuzzy Multi-MOORA and Fuzzy WASPAS. In addition, function MetaRanking() calculates a new ranking from the sum of the rankings calculated, as well as an aggregated ranking. FuzzyNumbers.Ext.2 Apply Two Fuzzy Numbers on a Monotone Function One can easily draw the membership function of f(x,y) by package ‘FuzzyNumbers.Ext.2’ in which f(.,.) is supposed monotone and x and y are two fuzzy numbers. This work is possible using function f2apply() which is an extension of function fapply() from Package ‘FuzzyNumbers’ for two-variable monotone functions. FuzzyR Fuzzy Logic Toolkit for R Design and simulate fuzzy logic systems using Type 1 Fuzzy Logic. This toolkit includes with graphical user interface (GUI) and an adaptive neuro-fuzzy inference system (ANFIS). This toolkit is a continuation from the previous package (‘FuzzyToolkitUoN’). Produced by the Intelligent Modelling & Analysis Group, University of Nottingham. FuzzyStatTra Statistical Methods for Trapezoidal Fuzzy Numbers The aim of the package is to provide some basic functions for doing statistics with trapezoidal fuzzy numbers. In particular, the package contains several functions for simulating trapezoidal fuzzy numbers, as well as for calculating some central tendency measures (mean and two types of median), some scale measures (variance, ADD, MDD, Sn, Qn, Tn and some M-estimators) and one diversity index and one inequality index. Moreover, functions for calculating the 1-norm distance, the mid/spr distance and the (phi,theta)-wabl/ldev/rdev distance between fuzzy numbers are included, and a function to calculate the value phi-wabl given a sample of trapezoidal fuzzy numbers. fuzzywuzzyR Fuzzy String Matching Fuzzy string matching implementation of the ‘fuzzywuzzy’ ‘python’ package. It uses the Levenshtein Distance to calculate the differences between sequences. G GAabbreviate Abbreviating Questionnaires (or Other Measures) Using Genetic Algorithms The GAabbreviate uses Genetic Algorithms as an optimization tool to create abbreviated forms of lengthy questionnaires (or other measures) that maximally capture the variance in the original data of the long form of the measure. GADAG A Genetic Algorithm for Learning Directed Acyclic Graphs Sparse large Directed Acyclic Graphs learning with a combination of a convex program and a tailored genetic algorithm (see Champion et al. (2017) ). gafit Genetic Algorithm for Curve Fitting A group of sample points are evaluated against a user-defined expression, the sample points are lists of parameters with values that may be substituted into that expression. The genetic algorithm attempts to make the result of the expression as low as possible (usually this would be the sum of residuals squared). gains Gains Table Package This package constructs gains tables and lift charts for prediction algorithms. Gains tables and lift charts are commonly used in direct marketing applications. gamCopula Generalized Additive Models for Bivariate Conditional Dependence Structures and Vine Copulas Implementation of various inference and simulation tools to apply generalized additive models to bivariate dependence structures and non-simplified vine copulas. GAMens Applies GAMbag, GAMrsm and GAMens Ensemble Classifiers for Binary Classification Ensemble classifiers based upon generalized additive models for binary classification (De Bock et al. (2010) ). The ensembles implement Bagging (Breiman (1996) ), the Random Subspace Method (Ho (1998) ), or both, and use Hastie and Tibshirani’s (1990) generalized additive models (GAMs) as base classifiers. Once an ensemble classifier has been trained, it can be used for predictions on new data. A function for cross validation is also included. GameTheory Cooperative Game Theory Implementation of a common set of punctual solutions for Cooperative Game Theory. GameTheoryAllocation Tools for Calculating Allocations in Game Theory Many situations can be modeled as game theoretic situations. Some procedures are included in this package to calculate the most important allocations rules in Game Theory: Shapley value, Owen value or nucleolus, among other. First, we must define as an argument the value of the unions of the envolved agents with the characteristic function. gamlss.inf Fitting Mixed (Inflated and Adjusted) Distributions This is an add-on package to ‘gamlss’. The purpose of this package is to allow users to fit GAMLSS (Generalised Additive Models for Location Scale and Shape) models when the response variable is defined either in the intervals [0,1), (0,1] and [0,1] (inflated at zero and/or one distributions), or in the positive real line including zero (zero-adjusted distributions). The mass points at zero and/or one are treated as extra parameters with the possibility to include a linear predictor for both. The package also allows transformed or truncated distributions from the GAMLSS family to be used for the continuous part of the distribution. Standard methods and GAMLSS diagnostics can be used with the resulting fitted object. gamlssbssn Bimodal Skew Symmetric Normal Distribution Density, distribution function, quantile function and random generation for the bimodal skew symmetric normal distribution of Hassan and El-Bassiouni (2016) . gammSlice Generalized additive mixed model analysis via slice sampling Uses a slice sampling-based Markov chain Monte Carlo to conduct Bayesian fitting and inference for generalized additive mixed models (GAMM). Generalized linear mixed models and generalized additive models are also handled as special cases of GAMM. gamreg Robust and Sparse Regression via Gamma-Divergence Robust regression via gamma-divergence with L1, elastic net and ridge. gamRR Calculate the RR for the GAM To calculate the relative risk (RR) for the generalized additive model. gamsel Fit Regularization Path for Generalized Additive Models Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families. GAR Authorize and Request Google Analytics Data The functions included are used to obtain initial authentication with Google Analytics as well as simple and organized data retrieval from the API. Allows for retrieval from multiple profiles at once. GAS Generalized Autoregressive Score Models Simulate, Estimate and Forecast using univariate and multivariate GAS models. gaselect Genetic Algorithm (GA) for Variable Selection from High-Dimensional Data Provides a genetic algorithm for finding variable subsets in high dimensional data with high prediction performance. The genetic algorithm can use ordinary least squares (OLS) regression models or partial least squares (PLS) regression models to evaluate the prediction power of variable subsets. By supporting different cross-validation schemes, the user can fine-tune the tradeoff between speed and quality of the solution. gatepoints Easily Gate or Select Points on a Scatter Plot Allows user to choose/gate a region on the plot and returns points within it. gbm Generalized Boosted Regression Models An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). gbp A Bin Packing Problem Solver Basic infrastructure and several algorithms for 1d-4d bin packing problem. This package provides a set of c-level classes and solvers for 1d-4d bin packing problem, and an r-level solver for 4d bin packing problem, which is a wrapper over the c-level 4d bin packing problem solver. The 4d bin packing problem solver aims to solve bin packing problem, a.k.a container loading problem, with an additional constraint on weight. Given a set of rectangular-shaped items, and a set of rectangular-shaped bins with weight limit, the solver looks for an orthogonal packing solution such that minimizes the number of bins and maximize volume utilization. Each rectangular-shaped item i = 1, .. , n is characterized by length l_i, depth d_i, height h_i, and weight w_i, and each rectangular-shaped bin j = 1, .. , m is specified similarly by length l_j, depth d_j, height h_j, and weight limit w_j. The item can be rotated into any orthogonal direction, and no further restrictions implied. gbts Hyperparameter Search for Gradient Boosted Trees An implementation of hyperparameter optimization for Gradient Boosted Trees on binary classification and regression problems. The current version provides two optimization methods: active learning and random search. GCalignR Simple Peak Alignment for Gas-Chromatography Data Aligns chromatography peaks with a three step algorithm: (1) Linear transformation of retention times to maximise shared peaks among samples (2) Align peaks within a certain error-interval (3) Merges rows that are likely representing the same substance (i.e. no sample shows peaks in both rows and the rows have similar retention time means). The method was first described in Stoffel et al. (2015) . gcerisk Generalized Competing Event Model Generalized competing event model based on Cox PH model and Fine-Gray model. This function is designed to develop optimized risk-stratification methods for competing risks data, such as described in: 1. Carmona R, Gulaya S, Murphy JD, Rose BS, Wu J, Noticewala S,McHale MT, Yashar CM, Vaida F, and Mell LK.(2014) Validated competing event model for thestage I-II endometrial cancer population. Int J Radiat Oncol Biol Phys.89:888-98. . 2. Carmona R, Zakeri K, Green G, Hwang L, Gulaya S, Xu B, Verma R, Williamson CW, Triplett DP, Rose BS, Shen H, Vaida F, Murphy JD, and Mell LK. (2016) Improved method to stratify elderly cancer patients at risk for competing events. J Clin Oncol.in press. . gcite Google Citation Parser Scrapes Google Citation pages and creates data frames of citations over time. gcKrig Analyze and Interpolate Geostatistical Count Data using Gaussian Copula Provides a variety of functions to analyze and model geostatistical count data with Gaussian copulas, including 1) data simulation and visualization; 2) correlation structure assessment (here also known as the NORTA); 3) calculate multivariate normal rectangle probabilities; 4) likelihood inference and parallel prediction at unsampled locations. GDAtools A toolbox for the analysis of categorical data in social sciences, and especially Geometric Data Analysis This package contains functions for ‘specific’ MCA (Multiple Correspondence Analysis), ‘class specific’ MCA, computing and plotting structuring factors and concentration ellipses, ‘standardized’ MCA, inductive tests and others tools for Geometric Data Analysis. It also provides functions for the translation of logit models coefficients into percentages (forthcoming), weighted contingency tables and an association measure – i.e. Percentages of Maximum Deviation from Independence (PEM). gdm Functions for Generalized Dissimilarity Modeling A toolkit with functions to fit, plot, and summarize Generalized Dissimilarity Models. gdns Tools to Work with Google DNS Over HTTPS API To address the problem of insecurity of UDP-based DNS requests, Google Public DNS offers DNS resolution over an encrypted HTTPS connection. DNS-over-HTTPS greatly enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups. Functions that enable querying individual requests that bulk requests that return detailed responses and bulk requests are both provided. Support for reverse lookups is also provided. See for more information. gdpc Generalized Dynamic Principal Components Functions to compute the Generalized Dynamic Principal Components introduced in Peña and Yohai (2016) . gds Descriptive Statistics of Grouped Data Contains a function called gds() which accepts three input parameters like lower limits, upper limits and the frequencies of the corresponding classes. The gds() function calculate and return the values of mean (‘gmean’), median (‘gmedian’), mode (‘gmode’), variance (‘gvar’), standard deviation (‘gstdev’), coefficient of variance (‘gcv’), quartiles (‘gq1’, ‘gq2’, ‘gq3’), inter-quartile range (‘gIQR’), skewness (‘g1’), and kurtosis (‘g2’) which facilitate effective data analysis. For skewness and kurtosis calculations we use moments. gdtools Utilities for Graphical Rendering Useful tools for writing vector graphics devices. gear Geostatistical Analysis in R Implements common geostatistical methods in a clean, straightforward, efficient manner. A quasi reboot of the SpatialTools R package. gee4 Generalised Estimating Equations (GEE/WGEE) using ‘Armadillo’ and S4 Fit joint mean-covariance models for longitudinal data within the framework of (weighted) generalised estimating equations (GEE/WGEE). The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Armadillo’ C++ library for numerical linear algebra and ‘RcppArmadillo’ glue. GEEmediate Mediation Analysis for Generalized Linear Models Using the Difference Method Causal mediation analysis for a single exposure/treatment and a single mediator, both allowed to be either continuous or binary. The package implements the difference method and provide point and interval estimates as well as testing for the natural direct and indirect effects and the mediation proportion. gelnet Generalized Elastic Nets The package implements several extensions of the elastic net regularization scheme. These extensions include individual feature penalties for the L1 term and feature-feature penalties for the L2 term. gemmR General Monotone Model An R-language implementation of the General Monotone Model proposed by Michael Dougherty and Rick Thomas. It is a procedure for estimating weights for a set of independent predictors that minimize the rank-order inversions between the model predictions and some outcome. gems Generalized Multistate Simulation Model Simulate and analyze multistate models with general hazard functions. gems provides functionality for the preparation of hazard functions and parameters, simulation from a general multistate model and predicting future events. The multistate model is not required to be a Markov model and may take the history of previous events into account. In the basic version, it allows to simulate from transition-specific hazard function, whose parameters are multivariable normally distributed. gencve General Cross Validation Engine Engines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for ‘glmnet’, ‘lars’, ‘plus’, ‘MASS’, ‘rpart’, ‘C50’ and ‘randomforest’. It is easy for the user to add other regression or classification algorithms. The ‘parallel’ package is used to improve speed. Several data generation algorithms for problems in regression and classification are provided. genderizeR Gender Prediction Based on First Names Utilizes the genderize.io API to predict gender from first names extracted from a text vector. The accuracy of prediction could be controlled by two parameters: counts of a first name in the database and probability of prediction. genderNames Client for the Genderize API That Determines the Gender of Names API client for genderize.io which will tell you the gender of the name you input. Use the first name of the person you are interested in to find their gender. gendist Generated Probability Distribution Models Computes the probability density function (pdf), cumulative distribution function (cdf), quantile function (qf) and generates random values (rg) for the following general models : mixture models, composite models, folded models, skewed symmetric models and arc tan models. GeneralizedUmatrix Credible Visualization for Two-Dimensional Projections of Data Projections from a high-dimensional data space onto a two-dimensional plane are used to detect structures, such as clusters, in multivariate data. The generalized Umatrix is able to visualize errors of these two-dimensional scatter plots by using a 3D topographic map. GeneralOaxaca Blinder-Oaxaca Decomposition for Generalized Linear Model Perform the Blinder-Oaxaca decomposition for generalized linear model with bootstrapped standard errors. The twofold and threefold decomposition are given, even the generalized linear model output in each group. GeneralTree General Tree Data Structure A general tree data structure implementation in R. generator Generate Data Containing Fake Personally Identifiable Information Allows users to quickly and easily generate fake data containing Personally Identifiable Information (PII) through convenience functions. GeNetIt Spatial Graph-Theoretic Genetic Gravity Modelling Implementation of spatial graph-theoretic genetic gravity models. The model framework is applicable for other types of spatial flow questions. Includes functions for constructing spatial graphs, sampling and summarizing associated raster variables and building unconstrained and singly constrained gravity models. GenForImp The Forward Imputation: A Sequential Distance-Based Approach for Imputing Missing Data Two methods based on the Forward Imputation approach are implemented for the imputation of quantitative missing data. One method alternates Nearest Neighbour Imputation and Principal Component Analysis (function ‘ForImp.PCA’), the other uses Nearest Neig genie A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. genpathmox Generalized PATHMOX Algorithm for PLS-PM, LS and LAD Regression genpathmox provides a very interesting solution for handling segmentation variables in complex statistical methodology. It contains en extended version of the PATHMOX algorithm in the context of partial least square path modeling (Sanchez, 2009) including the F-block test (to detect the responsible latent endogenous equations of the difference), the F-coefficient (to detect the path coefficients responsible of the difference) and the invariance test (to realize a comparison between the sub-models’ latent variables). Furthermore, the package contains a generalized version of the PATHMOX algorithm to approach different methodologies: linear regression and least absolute regression models. gensphere Generalized Spherical Distributions Define and compute with generalized spherical distributions – multivariate probability laws that are specified by a star shaped contour (directional behavior) and a radial component. geoaxe Split ‘Geospatial’ Objects into Pieces Split ‘geospatial’ objects into pieces. Includes support for some spatial object inputs, ‘Well-Known Text’, and ‘GeoJSON’. geofacet ggplot2′ Faceting Utilities for Geographical Data Provides geofaceting functionality for ‘ggplot2’. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that preserves some of the geographical orientation. geofd Spatial Prediction for Function Value Data Kriging based methods are used for predicting functional data (curves) with spatial dependence. geoGAM Select Sparse Geoadditive Models for Spatial Prediction A model building procedure to select a sparse geoadditive model from a large number of covariates. Continuous, binary and ordered categorical responses are supported. The model building is based on component wise gradient boosting with linear effects and smoothing splines. The resulting covariate set after gradient boosting is further reduced through cross validated backward selection and aggregation of factor levels. The package provides a model based bootstrap method to simulate prediction intervals for point predictions. A test data set of a soil mapping case study is provided. geohash Tools for Geohash Creation and Manipulation Provides tools to encode lat/long pairs into geohashes, decode those geohashes, and identify their neighbours. geojson Classes for ‘GeoJSON’ Classes for ‘GeoJSON’ to make working with ‘GeoJSON’ easier. geojsonio Convert Data from and to ‘geoJSON’ or ‘topoJSON’ Convert data to ‘geoJSON’ or ‘topoJSON’ from various R classes, including vectors, lists, data frames, shape files, and spatial classes. ‘geojsonio’ does not aim to replace packages like ‘sp’, ‘rgdal’, ‘rgeos’, but rather aims to be a high level client to simplify conversions of data from and to ‘geoJSON’ and ‘topoJSON’. geojsonlint Tools for Validating ‘GeoJSON’ Tools for linting ‘GeoJSON’. Includes tools for interacting with the online tool , the ‘Javascript’ library ‘geojsonhint’ ( ), and validating against a GeoJSON schema via the ‘Javascript’ library ( ). Some tools work locally while others require an internet connection. geojsonR A GeoJson Processing Toolkit Includes functions for processing GeoJson objects relying on ‘RFC 7946’ . The geojson encoding is based on ‘json11’, a tiny JSON library for ‘C++11’ . Furthermore, the source code is exported in R through the ‘Rcpp’ and ‘RcppArmadillo’ packages. GeomComb (Geometric) Forecast Combination Methods Provides eigenvector-based (geometric) forecast combination methods; also includes simple approaches (simple average, median, trimmed and winsorized mean, inverse rank method) and regression-based combination. Tools for data pre-processing are available in order to deal with common problems in forecast combination (missingness, collinearity). geometa Tools for Reading and Writing ISO/OGC Geographic Metadata Provides facilities to handle reading and writing of geographic metadata defined with OGC/ISO 19115 and 19139 (XML) standards. geomorph Geometric Morphometric Analyses of 2D/3D Landmark Data Geomorph allows users to read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation. geonames Interface to http://www.geonames.org web service Code for querying the web service at http://www.geonames.org geoparser Interface to the Geoparser.io API for Identifying and Disambiguating Places Mentioned in Text A wrapper for the Geoparser.io API version 0.4.0 (see ), which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. Basic, limited API access is free with paid plans to accommodate larger workloads. geosapi GeoServer REST API R Interface Provides an R interface to the GeoServer REST API, allowing to upload and publish data in a GeoServer web-application and expose data to OGC Web-Services. The package currently supports all CRUD (Create,Read,Update,Delete) operations on GeoServer workspaces, namespaces, datastores (stores of vector data), featuretypes, layers, styles, as well as vector data upload operations. For more information about the GeoServer REST API, see . geosptdb Spatio-Temporal; Inverse Distance Weighting and Radial Basis Functions with Distance-Based Regression Spatio-temporal: Inverse Distance Weighting (IDW) and radial basis functions; optimization, prediction, summary statistics from leave-one-out cross-validation, adjusting distance-based linear regression model and generation of the principal coordinates of a new individual from Gower’s distance. geotoolsR Tools to Improve the Use of Geostatistic The basic idea of this package is provides some tools to help the researcher to work with geostatistics. Initially, we present a collection of functions that allow the researchers to deal with spatial data using bootstrap procedure.There are five methods available and two ways to display them: bootstrap confidence interval – provides a two-sided bootstrap confidence interval; bootstrap plot – a graphic with the original variogram and each of the B bootstrap variograms. GERGM Estimation and Fit Diagnostics for Generalized Exponential Random Graph Models Estimation and diagnosis of the convergence of Generalized Exponential Random Graph Models (GERGM) via Gibbs sampling or Metropolis Hastings with exponential down weighting. gesca Generalized Structured Component Analysis (GSCA) Fit a variety of component-based structural equation models. getmstatistic Quantifying Systematic Heterogeneity in Meta-Analysis Quantifying systematic heterogeneity in meta-analysis using R. The \code{M} statistic aggregates heterogeneity information across multiple variants to, identify systematic heterogeneity patterns and their direction of effect in meta-analysis. It’s primary use is to identify outlier studies, which either show ‘null’ effects or consistently show stronger or weaker genetic effects than average across, the panel of variants examined in a GWAS meta-analysis. In contrast to conventional heterogeneity metrics (Q-statistic, I-squared and tau-squared) which measure random heterogeneity at individual variants, \code{M} measures systematic (non-random) heterogeneity across multiple independently associated variants. Systematic heterogeneity can arise in a meta-analysis due to differences in the study characteristics of participating studies. Some of the differences may include: ancestry, allele frequencies, phenotype definition, age-of-disease onset, family-history, gender, linkage disequilibrium and quality control thresholds. See for statistical statistical theory, documentation and examples. getPass Masked User Input A micro-package for reading ‘passwords’, i.e. reading user input with masking, so that the input is not displayed as it is typed. Currently we have support for ‘RStudio’, the command line (every OS), and any platform where ‘tcltk’ is present. gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods Automated multi-path General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting structural breaks in the mean. The mean can be specified as an autoregressive model with covariates (an ‘AR-X’ model), and the variance can be specified as a log-variance model with covariates (a ‘log-ARCH-X’ model). The four main functions of the package are arx, getsm, getsv and isat. The first function, arx, estimates an AR-X model with log-ARCH-X errors. The second function, getsm, undertakes GETS model selection of the mean specification of an arx object. The third function, getsv, undertakes GETS model selection of the log-variance specification of an arx object. The fourth function, isat, undertakes GETS model selection of an indicator saturated mean specification. gettz Get the Timezone Information A function to retrieve the system timezone on Unix systems which has been found to find an answer when ‘Sys.timezone()’ has failed. It is based on an answer by Duane McCully posted on ‘StackOverflow’, and adapted to be callable from R. GFA Group Factor Analysis Factor analysis implementation for multiple data sources, i.e., for groups of variables. The whole data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The model group factor analysis (GFA) is inferred with Gibbs sampling. GFD Tests for General Factorial Designs Implemented are the Wald-type statistic, a permuted version thereof as well as the ANOVA-type statistic for general factorial designs, even with non-normal error terms and/or heteroscedastic variances, for crossed designs with an arbitrary number of factors and nested designs with up to three factors. gfmR Implements Group Fused Multinomial Regression Software to implement methodology to preform automatic response category combinations in multinomial logistic regression. There are functions for both cross validation and AIC for model selection. The method provides regression coefficient estimates that may be useful for better understanding the true probability distribution of multinomial logistic regression when category probabilities are similar. These methods are not recommended for a large number of predictor variables. ggalt Extra Coordinate Systems, Geoms and Statistical Transformations for ‘ggplot2’ A compendium of ‘geoms’, ‘coords’ and ‘stats’ for ‘ggplot2’, including splines, 1d and 2d densities, univariate average shifted histograms and a new map coordinate system based on the ‘PROJ.4’-library. ggbeeswarm Categorical Scatter (Violin Point) Plots Provides two methods of plotting categorical scatter plots such that the arrangement of points within a category reflects the density of data at that region, and avoids over-plotting. ggcorrplot Visualization of a Correlation Matrix using ‘ggplot2’ The ‘ggcorrplot’ package can be used to visualize easily a correlation matrix using ‘ggplot2’. It provides a solution for reordering the correlation matrix and displays the significance level on the plot. It also includes a function for computing a matrix of correlation p-values. ggdmc Dynamic Model of Choice with Parallel Computation, and C++ Capabilities A fast engine for computing hierarchical Bayesian model implemented in the Dynamic Model of Choice. ggedit Interactive ‘ggplot2’ Layer and Theme Aesthetic Editor Interactively edit ‘ggplot2’ layer and theme aesthetics definitions. ggenealogy Visualization Tools for Genealogical Data Methods for searching through genealogical data and displaying the results. Plotting algorithms assist with data exploration and publication-quality image generation. Uses the Grammar of Graphics. ggExtra Collection of Functions and Layers to Enhance ggplot2 Collection of functions and layers to enhance ggplot2. ggforce Accelerating ‘ggplot2’ The aim of ‘ggplot2’ is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. ‘ggforce’ aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using ‘ggforce’ should be a stable experience. ggformula Formula Interface to the Grammar of Graphics Provides a formula interface to ‘ggplot2’ graphics. ggfortify Data Visualization Tools for Statistical Analysis Results Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using ‘ggplot2’. ggghost Capture the Spirit of Your ‘ggplot2’ Calls Creates a reproducible ‘ggplot2’ object by storing the data and calls. ggimage Use Image in ‘ggplot2’ Supports aesthetic mapping of image files to be visualized in ‘ggplot2’ graphic system. files as a scatterplot. ggiraph Make ‘ggplot2’ Graphics Interactive Using ‘htmlwidgets’ Create interactive ‘ggplot2’ graphics that are usable in the ‘RStudio’ viewer pane, in ‘R Markdown’ documents and in ‘Shiny’ apps. ggiraphExtra Make Interactive ‘ggplot2’. Extension to ‘ggplot2’ and ‘ggiraph’ Collection of functions to enhance ‘ggplot2’ and ‘ggiraph’. Provides functions for exploratory plots. All plot can be a ‘static’ plot or an ‘interactive’ plot using ‘ggiraph’. ggjoy Joyplots in ‘ggplot2’ Joyplots provide a convenient way of visualizing changes in distributions over time or space. This package enables the creation of such plots in ‘ggplot2’. gglogo Geom for Logo Sequence Plots Visualize sequences in (modified) logo plots. The design choices used by these logo plots allow sequencing data to be more easily analyzed. Because it is integrated into the ‘ggplot2’ geom framework, these logo plots support native features such as faceting. ggloop Create ‘ggplot2’ Plots in a Loop Pass a data frame and mapping aesthetics to ggloop() in order to create a list of ‘ggplot2’ plots. The way x-y and dots are paired together is controlled by the remapping arguments. Geoms, themes, facets, and other features can be added with the special %L+% (L-plus) operator. ggm Functions for graphical Markov models Functions and datasets for maximum likelihood fitting of some classes of graphical Markov models. ggmap Spatial Visualization with Google Maps and OpenStreetMap Easily visualize of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps with ggplot2. ggmosaic Mosaic Plots in the ‘ggplot2’ Framework Mosaic plots in the ‘ggplot2’ framework. Mosaic plot functionality is provided in a single ‘ggplot2’ layer by calling the geom ‘mosaic’. GGMridge Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM). ggnetwork Geometries to Plot Networks with ‘ggplot2’ Geometries to plot network objects with ‘ggplot2’. ggplot2 An Implementation of the Grammar of Graphics An implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes. See http://ggplot2.org for more information, documentation and examples. ggpmisc Miscellaneous Extensions to ‘ggplot2’ Implements extensions to ‘ggplot2’ respecting the grammar of graphics paradigm. Provides new stats to locate and tag peaks and valleys in 2D plots, a stat to add a label by group with the equation of a polynomial fitted with lm(), or R^2 or adjusted R^2 values for any model fitted with function lm(). Provides a function for flexibly converting time series to data frames suitable for plotting with ggplot(). In addition provides two stats useful for diagnosing what data are passed to compute_group() and compute_panel() functions. ggpolypath Polygons with Holes for the Grammar of Graphics Tools for working with polygons with holes in ‘ggplot2’, with a new ‘geom’ for drawing a ‘polypath’ applying the ‘evenodd’ or ‘winding’ rules. ggpubr ggplot2′ Based Publication Ready Plots ggplot2′ is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ‘ggplot’, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. ‘ggpubr’ provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots. ggpval Annotate Statistical Tests for ‘ggplot2’ Automatically perform desired statistical tests (e.g. wilcox.test(), t.test()) to compare between groups, and add test p-values to the plot with annotation