Jaccard Index The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.
Jackknife Regression Jackknife logistic and linear regression for clustering and predictions. Our goal is to produce a regression tool that can be used as a black box, be very robust and parameter-free, and usable and easy-to-interpret by non-statisticians. It is part of a bigger project: automating many fundamental data science tasks, to make it easy, scalable and cheap for data consumers, not just for data experts.
Jackknife Resampling In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife predates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size N, the jackknife estimate is found by aggregating the estimates of each N – 1 estimate in the sample.
The jackknife technique was developed in Quenouille (1949, 1956). Tukey (1958) expanded on the technique and proposed the name “jackknife” since, like a Boy Scout’s jackknife, it is a “rough and ready” tool that can solve a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.
The jackknife represents a linear approximation of the bootstrap.
James-Stein Estimator The James-Stein estimator is a biased estimator of the mean of Gaussian random vectors. It can be shown that the James-Stein estimator dominates the ‘ordinary’ least squares approach, i.e., it has lower mean squared error on average. It is the best-known example of Stein’s phenomenon. An earlier version of the estimator was developed by Charles Stein in 1956, and is sometimes referred to as Stein’s estimator. The result was improved by Willard James and Charles Stein in 1961.
Jamovi The jamovi project was founded to develop a free and open statistical platform which is intuitive to use, and can provide the latest developments in statistical methodology. At the core of the jamovi philosophy, is that scientific software should be “community driven”, where anyone can develop and publish analyses, and make them available to a wide audience.
Jaro-Winker Distance In computer science and statistics, the Jaro-Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995), a type of string edit distance, and mainly used in the area of record linkage (duplicate detection). The higher the Jaro-Winkler distance for two strings is, the more similar the strings are. The Jaro-Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match.
Java Class Library for Evolutionary Computation
JCLEC is a software system for Evolutionary Computation (EC) research, developed in the Java programming language. It provides a high-level software framework to do any kind of Evolutionary Algorithm (EA), providing support for genetic algorithms (binary, integer and real encoding), genetic programming (Koza’s style, strongly typed, and grammar based) and evolutionary programming.
Java Data Mining
Java Data Mining (JDM) is a standard Java API for developing data mining applications and tools. JDM defines an object model and Java API for data mining objects and processes. JDM enables applications to integrate data mining technology for developing predictive analytics applications and tools. The JDM 1.0 standard was developed under the Java Community Process as JSR 73. In 2006, the JDM 2.0 specification was being developed under JSR 247, but has been withdrawn in 2011 without standardization.
Various data mining functions and techniques like statistical classification and association, regression analysis, data clustering, and attribute importance are covered by the 1.0 release of this standard.
jpmml, the world’s leading open-source PMML scoring engine to rapidly deploy predictive models into production.
JavaScript 3D Library
The aim of the project is to create a lightweight 3D library with a very low level of complexity. The library provides <canvas>, <svg>, CSS3D and WebGL renderers.
JavaScript Object Notation
JSON, or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute-value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.
Jaya Optimisation Algorithm An Efficient Multi-core Implementation of the Jaya Optimisation Algorithm
jblas jblas is a fast linear algebra library for Java. jblas is based on BLAS and LAPACK, the de-facto industry standard for matrix computations, and uses state-of-the-art implementations like ATLAS for all its computational routines, making jBLAS very fast. jblas can is essentially a light-wight wrapper around the BLAS and LAPACK routines. These packages have originated in the Fortran community which explains their often archaic API. On the other hand modern implementations are hard to beat performance wise. jblas aims to make this functionality available to Java programmers such that they do not have to worry about writing JNI interfaces and calling conventions of Fortran code. jblas depends on an implementation of the LAPACK and BLAS routines. Currently it is tested with ATLAS ( ) and BLAS/LAPACK (http://…/lapack)
Jeffreys-Lindley Paradox
Lindley’s paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys’ 1939 textbook; it became known as Lindley’s paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper.
Jeffries-Matusita Distance Jeffries-Matusita Distance calculates the separability of a pair of probability distributions. This can be particularly meaningful for evaluating the results of Maximum Likelihood classifications.
Jensen-Shannon Distance
In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback-Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it is always a finite value. The square root of the Jensen-Shannon divergence is a metric often referred to as Jensen-Shannon distance.
JMP SAS created JMP in 1989 to empower scientists and engineers to explore data visually. Since then, JMP has grown from a single product into a family of statistical discovery tools, each one tailored to meet specific needs. All of our software is visual, interactive, comprehensive and extensible.
Job Safety Analysis
A Job Safety Analysis (JSA) is one of the risk assessment tools used to identify and control workplace hazards. A JSA is a second tier risk assessment with the aim of preventing personal injury to a person, or their colleagues, and any other person passing or working adjacent, above or below. JSAs are also known as Activity Hazard Analysis (AHA), Job Hazard Analysis (JHA) and Task Hazard Analysis (THA).
Joint and Individual Variation Explained
Research in several fields now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types.
Joint Approximate Diagonalization of Eigenmatrices
Joint Matrix Factorization Nonnegative matrix factorization (NMF) is a powerful tool in data exploratory analysis by discovering the hidden features and part-based patterns from high-dimensional data. NMF and its variants have been successfully applied into diverse fields such as pattern recognition, signal processing, data mining, bioinformatics and so on. Recently, NMF has been extended to analyze multiple matrices simultaneously. However, a unified framework is still lacking. In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration. Next, we present four update algorithms to solve this framework. The merits and demerits of these algorithms are systematically explored. Furthermore, extensive computational experiments using both synthetic data and real data demonstrate the effectiveness of JMF framework and related algorithms on pattern recognition and data mining.
Joint Maximum Likelihood Estimation
JMLE ‘Joint Maximum Likelihood Estimation’ is also called UCON, ‘Unconditional maximum likelihood estimation’. It was devised by Wright & Panchapakesan, In this formulation, the estimate of the Rasch parameter (for which the observed data are most likely, assuming those data fit the Rasch model) occurs when the observed raw score for the parameter matches the expected raw score. ‘Joint’ means that the estimates for the persons (rows) and items (columns) and rating scale structures (if any) of the data matrix are obtained simultaneously.
Joint Probability Distribution In the study of probability, given at least two random variables X, Y, …, that are defined on a probability space, the joint probability distribution for X, Y, … is a probability distribution that gives the probability that each of X, Y, … falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.
Joint Random Forest
Joint-Policy Correlation To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents’ policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.
Joyplot joyplot: a series of histograms, density plots or time series for a number of data segments, all aligned to the same horizontal scale and presented with a slight overlap.
jQuery jQuery is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and Ajax much simpler with an easy-to-use API that works across a multitude of browsers. With a combination of versatility and extensibility, jQuery has changed the way that millions of people write JavaScript.
JSON-stat JSON-stat is a simple lightweight JSON dissemination format best suited for data visualization, mobile apps or open data initiatives, that has been designed for all kinds of disseminators. JSON-stat also proposes an HTML microdata schema to enrich HTML tables and put the JSON-stat vocabulary in the browser. Fortunately, there are already tools that ease the use of JSON-stat, like the JSON-stat Javascript Toolkit, a library to process JSON-stat responses.
Jubatus Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities:
• Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
• Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction
• Framework for Distributed Online Machine Learning with Fault Tolerance
Jupyter The Jupyter Notebook is a web application for interactive data science and scientific computing. It allows users to author documents that combine live-code with narrative text, equations, images, video and visualizations. These documents encode a complete and reproducible record of a computation that can be shared with others on GitHub, Dropbox and the Jupyter Notebook Viewer.
Just Another Gibbs Sampler
Just another Gibbs sampler (JAGS) is a program for simulation from Bayesian hierarchical models using Markov chain Monte Carlo (MCMC), developed by Martyn Plummer. JAGS has been employed for statistical work in many fields, for example ecology, management, and genetics. JAGS aims for compatibility with WinBUGS/OpenBUGS through the use of a dialect of the same modeling language (informally, BUGS), but it provides no GUI for model building and MCMC sample postprocessing, which must therefore be treated in a separate program (for example calling JAGS from R through a library such as rjags and post-processing MCMC output in R). The main advantage of JAGS in comparison to the members of the original BUGS family (WinBUGS and OpenBUGS) is its platform independence. It is written in C++, while the BUGS family is written in Component Pascal, a less widely known programming language. In addition, JAGS is already part of many repositories of Linux distributions such as Ubuntu. It can also be compiled as a 64-bit application on 64-bit platforms, thus making all the addressable space available to BUGS models. JAGS can be used via the command line or run in batch mode through script files. This means that there is no need to redo the settings with every run and that the program can be called and controlled from within another program (e.g. from R via rjags as outlined above). JAGS is licensed under the GNU General Public License.