Multivariate Symmetric Uncertainty and Other Measurements (msu)
Estimators for multivariate symmetrical uncertainty based on the work of Gustavo Sosa et al. (2016) <arXiv:1709.08730>, total correlation, information gain and symmetrical uncertainty of categorical variables.
Higher-Order Generalized Singular Value Decomposition (hogsvdR)
Implementation of higher order generalized singular value decomposition (HO GSVD). Based on Ponnapalli, Saunders, etal (2011) <doi:10.1371/journal.pone.0028072>.
Modified Mann Kendall Trend Tests with Variance Correction Approach (modifiedmk)
Power of non-parametric Mann-Kendall test is highly influenced by serially correlated data. To address this issue, original time-series is modified by removing any trend component existing in the data and calculating effective sample size. Hamed, K. H., & Ramachandra Rao, A. (1998). A modified Mann-Kendall trend test for auto correlated data. Journal of Hydrology, 204(1-4), 182-196. <doi:10.1016/S0022-1694(97)00125-X>. Yue, S., & Wang, C. Y. (2004). The Mann-Kendall test modified by effective sample size to detect trend in serially correlated hydrological series. Water Resources Management, 18(3), 201-218. <doi:10.1023/B:WARM.0000043140.61082.60>.
Mark-Recapture Analysis (mra)
Accomplishes mark-recapture analysis with covariates. Models available include the Cormack-Jolly-Seber open population (Cormack (1972) <doi:10.2307/2556151>; Jolly (1965) <doi:10.2307/2333826>; Seber (1965) <doi:10.2307/2333827>) and Huggin’s (1989) <doi:10.2307/2336377> closed population. Link functions include logit, sine, and hazard. Model selection, model averaging, plot, and simulation routines included. Open population size by the Horvitz-Thompson (1959) <doi:10.2307/2280784> estimator.
Mode Testing and Exploring (multimode)
Different examples and methods for testing (including different proposals described in Ameijeiras-Alonso et al., 2016 <arXiv:1609.05188>) and exploring (including the mode tree, mode forest and SiZer) the number of modes using nonparametric techniques.
Identify and Parse Web Security Policies Files (securitytxt)
When security risks in web services are discovered by independent security researchers who understand the severity of the risk, they often lack the channels to properly disclose them. As a result, security issues may be left unreported. The ‘security.txt’ ‘Web Security Policies’ specification defines an ‘IETF’ draft standard <https://…/draft-foudil-securitytxt-00> to help organizations define the process for security researchers to securely disclose security vulnerabilities. Tools are provided to help identify and parse ‘security.txt’ files to enable analysis of the usage and adoption of these policies.
“Improving Visual Data Discovery:
1. Always have new data sources.
2. Always have new techniques.
3. Always have new tools and platforms.
Visual data discovery is not once and done. It is an iterative process that requires communication and exploration.” Analise Polsky ( 2014 )
Algorithms which compute properties over graphs have always been of interest in computer science, with some of the fundamental algorithms, such as Dijkstra’s algorithm, dating back to the 50s. Since the 70s there as been interest in computing over graphs which are constantly changing, in a way which is more efficient than simple recomputing after each time the graph changes. In this paper we provide a survey of both the foundational, and the state of the art, algorithms which solve either shortest path or transitive closure problems in either fully or partially dynamic graphs. We balance this with the known conditional lowerbounds. Dynamic Shortest Path and Transitive Closure Algorithms: A Survey
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images. …
Deep Ritz Method
We propose a deep learning based method, the Deep Ritz Method, for numerically solving variational problems, particularly the ones that arise from partial differential equations. The Deep Ritz method is naturally nonlinear, naturally adaptive and has the potential to work in rather high dimensions. The framework is quite simple and fits well with the stochastic gradient descent method used in deep learning. We illustrate the method on several problems including some eigenvalue problems. …
rApache is a project supporting web application development using the R statistical language and environment and the Apache web server. The current software distribution runs on UNIX/Linux and Mac OS X operating systems. Apache servers with threaded Multi-Processing Modules are now supported, but the the Apache Prefork Multi-Processing Module is still recommended (refer to the Multi-Processing Modules chapter from Apache for more about this). The rApache software distribution provides the Apache module named mod_R that embeds the R interpreter inside the web server. It also comes bundled with libapreq, an Apache module for manipulating client request data. Together, they provide the glue to transform R into a server-side scripting environment. Another important project that’s not bundled with rApache, but plays an important role in server-side scripting, is the R package brew (also available on CRAN). It implements a templating framework for report generation, and it’s perfect for generating HTML on the fly. it’s syntax is similar to PHP, Ruby’s erb module, Java Server Pages, and Python’s psp module. brew can be used stand-alone as well, so it’s not part of the distribution.
Nested Loop Cross Validation (nlcv)
Nested loop cross validation for classification purposes for misclassification error rate estimation. The package supports several methodologies for feature selection: random forest, Student t-test, limma, and provides an interface to the following classification methods in the ‘MLInterfaces’ package: linear, quadratic discriminant analyses, random forest, bagging, prediction analysis for microarray, generalized linear model, support vector machine (svm and ksvm). Visualizations to assess the quality of the classifier are included: plot of the ranks of the features, scores plot for a specific classification algorithm and number of features, misclassification rate for the different number of features and classification algorithms tested and ROC plot. For further details about the methodology, please check: Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann (2004) <doi:10.2202/1544-6115.1078>.
Estimated Marginal Means, aka Least-Squares Means (emmeans)
Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and compact letter displays. Least-squares means are discussed, and the term ‘estimated marginal means’ is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.
Highlight Lines and Points in ‘ggplot2’ (gghighlight)
Make it easier to explore data with highlights.
Diversity Measures on Tripartite Graphs (triversity)
Computing diversity measures on tripartite graphs. This package first implements a parametrized family of such diversity measures which apply on probability distributions. Sometimes called ‘True Diversity’, this family contains famous measures such as the richness, the Shannon entropy, the Herfindahl-Hirschman index, and the Berger-Parker index. Second, the package allows to apply these measures on probability distributions resulting from random walks between the levels of tripartite graphs. By defining an initial distribution at a given level of the graph and a path to follow between the three levels, the probability of the walker’s position within the final level is then computed, thus providing a particular instance of diversity to measure.
Rename and Encode Data Frames Using External Crosswalk Files (crosswalkr)
A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in ‘Stata’.
Meta-CART: A Flexible Approach to Identify Moderators in Meta-Analysis (metacart)
Fits meta-CART by integrating classification and regression trees (CART) into meta-analysis. Meta-CART is a flexible approach to identify interaction effects between moderators in meta-analysis. The methods are described in Dusseldorp et al. (2014) <doi:10.1037/hea0000018> and Li et al. (2017) <doi:10.1111/bmsp.12088>.
2. Logistic Regression
4. Naïve Bayes
9. Bagging with Random Forests
10. Boosting with AdaBoost
“Improvements in technology and big data trends have given rise to improvements in machine learning. The sheer volume of data is growing exponentially, and companies are looking for faster speeds and real-time analytics. Cognitive computing combines machine learning and artificial intelligence to go beyond data mining and provide actionable insights.” Gil Allouche ( January 9, 2015 )