|Jaccard Index||The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.|
|Jackknife Regression||Jackknife logistic and linear regression for clustering and predictions. Our goal is to produce a regression tool that can be used as a black box, be very robust and parameter-free, and usable and easy-to-interpret by non-statisticians. It is part of a bigger project: automating many fundamental data science tasks, to make it easy, scalable and cheap for data consumers, not just for data experts.|
|Jackknife Resampling||In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife predates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size N, the jackknife estimate is found by aggregating the estimates of each N – 1 estimate in the sample.
The jackknife technique was developed in Quenouille (1949, 1956). Tukey (1958) expanded on the technique and proposed the name “jackknife” since, like a Boy Scout’s jackknife, it is a “rough and ready” tool that can solve a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.
The jackknife represents a linear approximation of the bootstrap.
|JacSketch||We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method –JacSketch– is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the stochastic average gradient (SAGA) method, and several of its existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al (2015). The rates we obtain for minibatch SAGA are also superior to existing rates.|
|James-Stein Estimator||The James-Stein estimator is a biased estimator of the mean of Gaussian random vectors. It can be shown that the James-Stein estimator dominates the ‘ordinary’ least squares approach, i.e., it has lower mean squared error on average. It is the best-known example of Stein’s phenomenon. An earlier version of the estimator was developed by Charles Stein in 1956, and is sometimes referred to as Stein’s estimator. The result was improved by Willard James and Charles Stein in 1961.|
|Jamovi||The jamovi project was founded to develop a free and open statistical platform which is intuitive to use, and can provide the latest developments in statistical methodology. At the core of the jamovi philosophy, is that scientific software should be ‘community driven’, where anyone can develop and publish analyses, and make them available to a wide audience.
jamovi for R: Easy but Controversial
|Jaro-Winker Distance||In computer science and statistics, the Jaro-Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995), a type of string edit distance, and mainly used in the area of record linkage (duplicate detection). The higher the Jaro-Winkler distance for two strings is, the more similar the strings are. The Jaro-Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match.|
|Java Class Library for Evolutionary Computation
|JCLEC is a software system for Evolutionary Computation (EC) research, developed in the Java programming language. It provides a high-level software framework to do any kind of Evolutionary Algorithm (EA), providing support for genetic algorithms (binary, integer and real encoding), genetic programming (Koza’s style, strongly typed, and grammar based) and evolutionary programming.|
|Java Data Mining
|Java Data Mining (JDM) is a standard Java API for developing data mining applications and tools. JDM defines an object model and Java API for data mining objects and processes. JDM enables applications to integrate data mining technology for developing predictive analytics applications and tools. The JDM 1.0 standard was developed under the Java Community Process as JSR 73. In 2006, the JDM 2.0 specification was being developed under JSR 247, but has been withdrawn in 2011 without standardization.
Various data mining functions and techniques like statistical classification and association, regression analysis, data clustering, and attribute importance are covered by the 1.0 release of this standard.
|jpmml, the world’s leading open-source PMML scoring engine to rapidly deploy predictive models into production.|
|The aim of the project is to create a lightweight 3D library with a very low level of complexity. The library provides <canvas>, <svg>, CSS3D and WebGL renderers.
|Jaya Optimisation Algorithm||An Efficient Multi-core Implementation of the Jaya Optimisation Algorithm|
|Jazz||Jazz is a lightweight modular data processing framework, including a web server. It provides data persistence and computation capabilities accessible from R and Python and also through a REST API.
|jblas||jblas is a fast linear algebra library for Java. jblas is based on BLAS and LAPACK, the de-facto industry standard for matrix computations, and uses state-of-the-art implementations like ATLAS for all its computational routines, making jBLAS very fast. jblas can is essentially a light-wight wrapper around the BLAS and LAPACK routines. These packages have originated in the Fortran community which explains their often archaic API. On the other hand modern implementations are hard to beat performance wise. jblas aims to make this functionality available to Java programmers such that they do not have to worry about writing JNI interfaces and calling conventions of Fortran code. jblas depends on an implementation of the LAPACK and BLAS routines. Currently it is tested with ATLAS (http://math-atlas.sourceforge.net ) and BLAS/LAPACK (http://…/lapack)|
|JEDI||With the increasing demand for large amount of labeled data, crowdsourcing has been used in many large-scale data mining applications. However, most existing works in crowdsourcing mainly focus on label inference and incentive design. In this paper, we address a different problem of adaptive crowd teaching, which is a sub-area of machine teaching in the context of crowdsourcing. Compared with machines, human beings are extremely good at learning a specific target concept (e.g., classifying the images into given categories) and they can also easily transfer the learned concepts into similar learning tasks. Therefore, a more effective way of utilizing crowdsourcing is by supervising the crowd to label in the form of teaching. In order to perform the teaching and expertise estimation simultaneously, we propose an adaptive teaching framework named JEDI to construct the personalized optimal teaching set for the crowdsourcing workers. In JEDI teaching, the teacher assumes that each learner has an exponentially decayed memory. Furthermore, it ensures comprehensiveness in the learning process by carefully balancing teaching diversity and learner’s accurate learning in terms of teaching usefulness. Finally, we validate the effectiveness and efficacy of JEDI teaching in comparison with the state-of-the-art techniques on multiple data sets with both synthetic learners and real crowdsourcing workers.|
|Lindley’s paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys’ 1939 textbook; it became known as Lindley’s paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper.|
|Jeffries-Matusita Distance||Jeffries-Matusita Distance calculates the separability of a pair of probability distributions. This can be particularly meaningful for evaluating the results of Maximum Likelihood classifications.
|In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback-Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it is always a finite value. The square root of the Jensen-Shannon divergence is a metric often referred to as Jensen-Shannon distance.|
|JMP||SAS created JMP in 1989 to empower scientists and engineers to explore data visually. Since then, JMP has grown from a single product into a family of statistical discovery tools, each one tailored to meet specific needs. All of our software is visual, interactive, comprehensive and extensible.|
|Job Safety Analysis
|A Job Safety Analysis (JSA) is one of the risk assessment tools used to identify and control workplace hazards. A JSA is a second tier risk assessment with the aim of preventing personal injury to a person, or their colleagues, and any other person passing or working adjacent, above or below. JSAs are also known as Activity Hazard Analysis (AHA), Job Hazard Analysis (JHA) and Task Hazard Analysis (THA).|
|Joint and Individual Variation Explained
|Research in several fields now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types.
|Joint Approximate Diagonalization of Eigenmatrices
|Joint Greedy Equivalence Search
|We consider the problem of jointly estimating multiple related directed acyclic graph (DAG) models based on high-dimensional data from each graph. This problem is motivated by the task of learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. We prove that under certain regularity conditions, the proposed $\ell_0$-penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the data-generating distributions and has the correct sparsity. In particular, we show that this joint estimation procedure leads to a faster convergence rate than estimating each DAG model separately. As a corollary we also obtain high-dimensional consistency results for causal inference from a mix of observational and interventional data. For practical purposes, we propose jointGES consisting of Greedy Equivalence Search (GES) to estimate the union of all DAG models followed by variable selection using lasso to obtain the different DAGs, and we analyze its consistency guarantees. The proposed method is illustrated through an analysis of simulated data as well as epithelial ovarian cancer gene expression data.|
|Joint Matrix Factorization||Nonnegative matrix factorization (NMF) is a powerful tool in data exploratory analysis by discovering the hidden features and part-based patterns from high-dimensional data. NMF and its variants have been successfully applied into diverse fields such as pattern recognition, signal processing, data mining, bioinformatics and so on. Recently, NMF has been extended to analyze multiple matrices simultaneously. However, a unified framework is still lacking. In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration. Next, we present four update algorithms to solve this framework. The merits and demerits of these algorithms are systematically explored. Furthermore, extensive computational experiments using both synthetic data and real data demonstrate the effectiveness of JMF framework and related algorithms on pattern recognition and data mining.|
|Joint Maximum Likelihood Estimation
|JMLE ‘Joint Maximum Likelihood Estimation’ is also called UCON, ‘Unconditional maximum likelihood estimation’. It was devised by Wright & Panchapakesan, http://www.rasch.org/memo46.htm. In this formulation, the estimate of the Rasch parameter (for which the observed data are most likely, assuming those data fit the Rasch model) occurs when the observed raw score for the parameter matches the expected raw score. ‘Joint’ means that the estimates for the persons (rows) and items (columns) and rating scale structures (if any) of the data matrix are obtained simultaneously.|
|Joint Probability Distribution||In the study of probability, given at least two random variables X, Y, …, that are defined on a probability space, the joint probability distribution for X, Y, … is a probability distribution that gives the probability that each of X, Y, … falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.|
|Joint Random Forest
|JointDNN||Deep neural networks are among the most influential architectures of deep learning algorithms, being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants (IPAs), autonomous cars, and smart home services often employ either simple local models or complex remote models on the cloud. Mobile-only and cloud-only computations are currently the status quo approaches. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side, but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward and backward propagation in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18X and 32X reductions on the latency and mobile energy consumption of querying DNNs, respectively.|
|JointGAN||A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains). This is achieved by learning to sample from conditional distributions between the domains, while simultaneously learning to sample from the marginals of each individual domain. The proposed framework consists of multiple generators and a single softmax-based critic, all jointly trained via adversarial learning. From a simple noise source, the proposed framework allows synthesis of draws from the marginals, conditional draws given observations from a subset of random variables, or complete draws from the full joint distribution. Most examples considered are for joint analysis of two domains, with examples for three domains also presented.|
|Joint-Policy Correlation||To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents’ policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.|
|Joyplot||joyplot: a series of histograms, density plots or time series for a number of data segments, all aligned to the same horizontal scale and presented with a slight overlap.|
|Jubatus||Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities:
· Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
· Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction
· Framework for Distributed Online Machine Learning with Fault Tolerance
|Jumping Knowledge Network
|Recent deep learning approaches for representation learning on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose a strategy to overcome those. In particular, the range of ‘neighboring’ nodes that a node’s representation draws from strongly depends on the graph structure, analogous to the spread of a random walk. To adapt to local neighborhood properties and tasks, we explore an architecture — jumping knowledge (JK) networks — that flexibly leverages, for each node, different neighborhood ranges to enable better structure-aware representation. In a number of experiments on social, bioinformatics and citation networks, we demonstrate that our model achieves state-of-the-art performance. Furthermore, combining the JK framework with models like Graph Convolutional Networks, GraphSAGE and Graph Attention Networks consistently improves those models’ performance.|
|Juniper||Nonconvex mixed-integer nonlinear programs (MINLPs) represent a challenging class of optimization problems that often arise in engineering and scientific applications. Because of nonconvexities, these programs are typically solved with global optimization algorithms, which have limited scalability. However, nonlinear branch-and-bound has recently been shown to be an effective heuristic for quickly finding high-quality solutions to large-scale nonconvex MINLPs, such as those arising in infrastructure network optimization. This work proposes Juniper, a Julia-based open-source solver for nonlinear branch-and-bound. Leveraging the high-level Julia programming language makes it easy to modify Juniper’s algorithm and explore extensions, such as branching heuristics, feasibility pumps, and parallelization. Detailed numerical experiments demonstrate that the initial release of Juniper is comparable with other nonlinear branch-and-bound solvers, such as Bonmin, Minotaur, and Knitro, illustrating that Juniper provides a strong foundation for further exploration in utilizing nonlinear branch-and-bound algorithms as heuristics for nonconvex MINLPs.|
|Jupyter||The Jupyter Notebook is a web application for interactive data science and scientific computing. It allows users to author documents that combine live-code with narrative text, equations, images, video and visualizations. These documents encode a complete and reproducible record of a computation that can be shared with others on GitHub, Dropbox and the Jupyter Notebook Viewer.|
|Just Another Gibbs Sampler
|Just another Gibbs sampler (JAGS) is a program for simulation from Bayesian hierarchical models using Markov chain Monte Carlo (MCMC), developed by Martyn Plummer. JAGS has been employed for statistical work in many fields, for example ecology, management, and genetics. JAGS aims for compatibility with WinBUGS/OpenBUGS through the use of a dialect of the same modeling language (informally, BUGS), but it provides no GUI for model building and MCMC sample postprocessing, which must therefore be treated in a separate program (for example calling JAGS from R through a library such as rjags and post-processing MCMC output in R). The main advantage of JAGS in comparison to the members of the original BUGS family (WinBUGS and OpenBUGS) is its platform independence. It is written in C++, while the BUGS family is written in Component Pascal, a less widely known programming language. In addition, JAGS is already part of many repositories of Linux distributions such as Ubuntu. It can also be compiled as a 64-bit application on 64-bit platforms, thus making all the addressable space available to BUGS models. JAGS can be used via the command line or run in batch mode through script files. This means that there is no need to redo the settings with every run and that the program can be called and controlled from within another program (e.g. from R via rjags as outlined above). JAGS is licensed under the GNU General Public License.|