R  R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, timeseries analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R’s strengths is the ease with which welldesigned publicationquality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. 
R Consortium  The R Consortium, Inc. is a group of businesses organized under an open source governance and foundation model to provide support to the R community, the R Foundation and groups and individuals, using, maintaining and distributing R software. The R language is an open source environment for statistical computing and graphics, and runs on a wide variety of computing platforms. The R language has enjoyed significant growth, and now supports over 2 million users. A broad range of industries have adopted the R language, including biotech, finance, research and high technology industries. The R language is often integrated into third party analysis, visualization and reporting applications. The central mission of the R Consortium is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects. From a governance perspective, the business of the consortium is managed by a Board of Directors. The technical aspects of the project, including the development and implementation of infrastructure projects, is overseen by an Infrastructure Steering Committee. While the initial members of the Infrastructure Steering Committee consist of representatives of the founding members of the R Consortium, Inc., project leads of key infrastructure projects will become voting members of the Infrastructure Steering Committee. Potential infrastructure projects include: • strengthening the R Forge infrastructure; • assisting the Stanford University group running user!R 2016; • developing documentation; and • encouraging increased communication and collaboration among users and developers of the R language. 
R Service Bus (RSB) 
Having the right algorithm is a first big step to get advanced analytics solve your problem and inform your decisions. The next one is to have the algorithm work for you and integrate it in your workflows and business processes. The R Service Bus is a swiss army knife that allows you to plug R into your processes independently of the technology used by other software applications involved in the workflow. The prime objective of the R Service Bus is to smoothly integrate into your existing infrastructure and it therefore supports communication using a plethora of protocols such as • SOAP and RESTful web services • various email protocols • folder monitoring, (s)ftp • messaging protocols such a JMS or STOMP • … The R Service Bus is based on mature open source projects and was developed to maximize reliability, flexibility, high availability and scalability of Rbased analytics applications. It is in use at major pharmaceutical and financial institutions to power businesscritical modeling activities. The R Service Bus is open source and freely available from our downloads page. The R Service Bus has also been packaged for all current versions of Debian/Ubuntu and is available from our repository. 
R.NET  R.NET enables the .NET Framework to interoperate with the R statistical language in the same process. R.NET requires .NET Framework 4 and the native R DLLs installed with the R environment. R.NET works on Windows, Linux and MacOS. Enjoy statistics and programming in your special language with R. 
Rabix  An opensource toolkit for developing and running portable workflows based on the Common Workflow Language specification and Docker. liftr 
Race Track Concordance Charts  One way to help keep track of things from the perspective of a particular driver, rather than the race leader, is to rebase the origin of the xaxis relative to the that driver. 
Radial Basis Function (RBF) 
A radial basis function (RBF) is a realvalued function whose value depends only on the distance from the origin, so that Phi(x) = Phi(x); or alternatively on the distance from some other point c, called a center. Any function Phi that satisfies this property is a radial function. The norm is usually Euclidean distance, although other distance functions are also possible. For example, using LukaszykKarmowski metric, it is possible for some radial functions to avoid problems with ill conditioning of the matrix solved to determine coefficients wi, since the x is always greater than zero. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally invented, by David Broomhead and David Lowe in 1988. RBFs are also used as a kernel in support vector classification. 
Radial Basis Function Kernel (RBF) 
In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in support vector machine classification. 
Radial Basis Function Networks (RBF) 
In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment. 
Rainforest Plots  Research has shown that forest plots are a gold standard in the visualization of metaanalytic results. However, research on the general interpretation of forest plots and the role of researchers’ metaanalysis experience and field of study is still unavailable. Additionally, the traditional display of effect sizes, confidence intervals, and weights have repeatedly been criticized. The current work presents an online statistical cognition experiment in which a total of 279 researchers with experience in metaanalysis from 36 countries evaluated conventional forest plots and two novel versions of forest plots, namely, thick forest plots and rainforest plots. The results indicate certain biases in the interpretation of forest plots, especially with regard to heterogeneity, the distribution of weights, and the theoretical concept of confidence intervals. Although the two novel displays (thick forest plots and rainforest plots) are associated with slightly longer viewing times, they are at least as wellsuited and esthetically and perceptively pleasing as the conventional displays while facilitating the correct and exhaustive interpretation of the metaanalytic information. Furthermore, it is advisable to combine conventional forest plots with distribution information of the individual effects, make confidence lines more visually striking, and to display a background grid in the graph. metaviz 
RamerDouglasPeucker Algorithm (RDP) 
The RamerDouglasPeucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. The initial form of the algorithm was independently suggested in 1972 by Urs Ramer and 1973 by David Douglas and Thomas Peucker and several others in the following decade. This algorithm is also known under the names DouglasPeucker algorithm, iterative endpoint fit algorithm and splitandmerge algorithm.The purpose of the algorithm is, given a curve composed of line segments, to find a similar curve with fewer points. The algorithm defines ‘dissimilar’ based on the maximum distance between the original curve and the simplified curve. The simplified curve consists of a subset of the points that defined the original curve. http://…/rdp 
Rand Index  The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used. mri 
Random Assignment  Random assignment or random placement is an experimental technique for assigning subjects to different treatments (or no treatment). The thinking behind random assignment is that by randomizing treatment assignment, then the group attributes for the different treatments will be roughly equivalent and therefore any effect observed between treatment groups can be linked to the treatment effect and is not a characteristic of the individuals in the group. In experimental design, random assignment of participants in experiments or treatment and control groups help to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Random assignment does not guarantee that the groups are “matched” or equivalent, only that any differences are due to chance. Random assignment facilitates comparison in experiments by creating similar groups. Example compares “Apple to Apple” and “Orange to Orange”. Random assignment Step 1: Begin with a collection of subjects. Example 20 people. Step 2: Devise a method to randomize that is purely mechanical ( e.g. flip a coin) Step 3: Assign subjects with “Heads” to one group : Control Group. Assign subjects with “Tails” to the other group: Experimental Group 
Random Average Shifted Histogram (RASH) 
A new density estimator called RASH, for Random Average Shifted Histogram, obtained by averaging several histograms as proposed in Average Shifted Histograms, is presented. The principal difference between the two methods is that in RASH each histogram is built over a grid with random shifted breakpoints. The asymptotic behavior of this estimator is established and its performance through several simulations is analyzed. RASH is compared to several classic density estimators and to some recent ensemble methods. Although RASH does not always outperform the other methods, it is very simple to implement, being also more intuitive. 
Random Decision Forests (RDF) 

Random Dot Product Graph (RDPG) 

Random Effects Model  In statistics, a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy. In econometrics, random effects models are used in the analysis of hierarchical or panel data when one assumes no fixed effects (it allows for individual effects). The random effects model is a special case of the fixed effects model. Contrast this to the biostatistics definitions, as biostatisticians use ‘fixed’ and ‘random’ effects to respectively refer to the populationaverage and subjectspecific effects (and where the latter are generally assumed to be unknown, latent variables). 
Random Erasing  In this paper, we introduce Random Erasing, a simple yet effective data augmentation techniques for training the convolutional neural network (CNN). In training phase, Random Erasing randomly selects a rectangle region in an image, and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduce the risk of network overfitting and make the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated into most of the CNNbased recognition models. Albeit simple, Random Erasing yields consistent improvement in image classification, object detection and person reidentification (reID). For image classification, our method improves WRN2810: top1 error rate from 3.72% to 3.08% on CIFAR10, and from 18.68% to 17.65% on CIFAR100. For object detection on PASCAL VOC 2007, Random Erasing improves FastRCNN from 74.8% to 76.2% in mAP. For person reID, when using Random Erasing in recent deep models, we achieve the stateoftheart accuracy: the rank1 accuracy is 89.13% for Market1501, 84.02% for DukeMTMCreID, and 63.93% for CUHK03 under the new evaluation protocol. 
Random Ferns Method / Classifier  Random ferns is a machine learning algorithm proposed by Ozuysal, Fua, and Lepetit (2007) for matching the same elements between two images of the same scene, allowing one to recognize certain objects or trace them on videos. The original motivation behind this method was to create a simple and e cient algorithm by extending the naive Bayes classifier; still the authors acknowledged its strong connection to decision tree ensembles like the random forest algorithm (Breiman 2001). Since introduction, random ferns have been applied in numerous computer vision applications, like image recognition (Bosch, Zisserman, and Munoz 2007), action recognition (Oshin, Gilbert, Illingworth, and Bowden 2009) or augmented reality (Wagner, Reitmayr, Mulloni, Drummond, and Schmalstieg 2010). However, it has not gathered attention outside this eld; thus, this work aims to bring this algorithm to a much wider spectrum of applications. In order to do that, I propose a generalized version of the algorithm, implemented in the R (R Core Team 2014) package rFerns (Kursa 2014) which is available from the Comprehensive R Archive Network (CRAN) at http://…/package=rFerns. rFerns 
Random Fields  A random field is a generalization of a stochastic process such that the underlying parameter need no longer be a simple real or integer valued “time”, but can instead take values that are multidimensional vectors, or points on some manifold. At its most basic, discrete case, a random field is a list of random numbers whose indices are mapped onto a space (of n dimensions). When used in the natural sciences, values in a random field are often spatially correlated in one way or another. In its most basic form this might mean that adjacent values (i.e. values with adjacent indices) do not differ as much as values that are further apart. This is an example of a covariance structure, many different types of which may be modeled in a random field. More generally, the values might be defined over a continuous domain, and the random field might be thought of as a “function valued” random variable. 
Random Forest  Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and “Random Forests” is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman’s “bagging” idea and the random selection of features, introduced independently by Ho and Amit and Geman in order to construct a collection of decision trees with controlled variance. The selection of a random subset of features is an example of the random subspace method, which, in Ho’s formulation, is a way to implement classification proposed by Eugene Kleinberg. ranger 
Random Geometric Graph (RGG) 
We propose an interdependent random geometric graph (RGG) model for interdependent networks. Based on this model, we study the robustness of two interdependent spatially embedded networks where interdependence exists between geographically nearby nodes in the two networks. We study the emergence of the giant mutual component in two interdependent RGGs as node densities increase, and define the percolation threshold as a pair of node densities above which the giant mutual component first appears. In contrast to the case for a single RGG, where the percolation threshold is a unique scalar for a given connection distance, for two interdependent RGGs, multiple pairs of percolation thresholds may exist, given that a smaller node density in one RGG may increase the minimum node density in the other RGG in order for a giant mutual component to exist. We derive analytical upper bounds on the percolation thresholds of two interdependent RGGs by discretization, and obtain $99\%$ confidence intervals for the percolation thresholds by simulation. Based on these results, we derive conditions for the interdependent RGGs to be robust under random failures and geographical attacks. 
Random KNN (RKNN) 
Random KNN consists of an ensemble of base knearest neighbor models, each constructed from a random subset of the input variables. Random KNN can be used to select important features using the RKNNFS algorithm. RKNNFS is an innovative feature selection procedure for ‘small n, large p problems.’ Random KNN (no bootstrapping) is fast and stable compared with Random Forests. The rknn R package implements Random KNN classification, regression and variable selection algorithms. • KNN is stable, no hierarchical structure • Final model can be a single KNN (vs. many trees) • Local method: robust for complex data structure • Automatically retrain, incremental learning • Easy to implement rknn 
Random KNN Feature Selection (RKNNFS) 
We present RKNNFS, an innovative feature selection procedure for ‘small n, large p problems.’ RKNNFS is based on Random KNN (RKNN), a novel generalization of traditional nearestneighbor modeling. RKNN consists of an ensemble of base knearest neighbor models, each constructed from a random subset of the input variables. To rank the importance of the variables, we define a criterion on the RKNN framework, using the notion of support. A twostage backward model selection method is then developed based on this criterion. Empirical results on microarray data sets with thousands of variables and relatively few samples show that RKNNFS is an effective feature selection approach for highdimensional data. RKNN is similar to Random Forests in terms of classification accuracy without feature selection. However, RKNN provides much better classification accuracy than RF when each method incorporates a featureselection step. Our results show that RKNN is significantly more stable and more robust than Random Forests for feature selection when the input data are noisy and/or unbalanced. Further, RKNNFS is much faster than the Random Forests feature selection method (RFFS), especially for large scale problems, involving thousands of variables and multiple classes. rknn 
Random Projection  Random Projection is a foundational research topic that connects a bunch of machine learning algorithms under a similar mathematical basis. It is used to reduce the dimensionality of the dataset by projecting the data points efficiently to a smaller dimensions while preserving the original relative distance between the data points. In this paper, we are intended to explain random projection method, by explaining its mathematical background and foundation, the applications that are currently adopting it, and an overview on its current research perspective. 
Random Projection Ensemble Classification  The random projection ensemble classifier is a very general method for classification of highdimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lowerdimensional space. The random projections are divided into nonoverlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a datadriven voting threshold to determine the final assignment. http://…/randproj.pdf RPEnsemble 
Random Regression Model (RRM) 
Random regressions are types of hierarchical models in which data are structured in groups and (regression) coefficients can vary by groups. MultiRR 
Random Sample Consensus (RANSAC) 
Random sample consensus (RANSAC) is a successful algorithm in model fitting applications. It is vital to have strong exploration phase when there are an enormous amount of outliers within the dataset. Achieving a proper model is guaranteed by pure exploration strategy of RANSAC. However, finding the optimum result requires exploitation. GASAC is an evolutionary paradigm to add exploitation capability to the algorithm. Although GASAC improves the results of RANSAC, it has a fixed strategy for balancing between exploration and exploitation. In this paper, a new paradigm is proposed based on genetic algorithm with an adaptive strategy. We utilize an adaptive genetic operator to select high fitness individuals as parents and mutate low fitness ones. In the mutation phase, a training method is used to gradually learn which gene is the best replacement for the mutated gene. The proposed method adaptively balance between exploration and exploitation by learning about genes. During the final Iterations, the algorithm draws on this information to improve the final results. The proposed method is extensively evaluated on two set of experiments. In all tests, our method outperformed the other methods in terms of both the number of inliers found and the speed of the algorithm. 
Random SelfEnsemble (RSE) 
Recent studies have revealed the vulnerability of deep neural networks – A small adversarial perturbation that is imperceptible to human can easily make a welltrained deep neural network misclassify. This makes it unsafe to apply neural networks in securitycritical applications. In this paper, we propose a new defensive algorithm called Random SelfEnsemble (RSE) by combining two important concepts: ${\bf randomness}$ and ${\bf ensemble}$. To protect a targeted model, RSE adds random noise layers to the neural network to prevent from stateoftheart gradientbased attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models $f_\epsilon$ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has good predictive capability. Our algorithm significantly outperforms previous defense techniques on real datasets. For instance, on CIFAR10 with VGG network (which has $92\%$ accuracy without any attack), under the stateoftheart C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than $10\%$, the best previous defense technique has $48\%$ accuracy, while our method still has $86\%$ prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network. 
Random Subsampling  Random subsampling, which is also known as Monte Carlo crossvalidation, as multiple holdout or as repeated evaluation set, is based on randomly splitting the data into subsets, whereby the size of the subsets is defined by the user. The random partitioning of the data can be repeated arbitrarily often. In contrast to a full crossvalidation procedure, random subsampling has been shown to be asymptotically consistent resulting in more pessimistic predictions of the test data compared with crossvalidation. The predictions of the test data give a realistic estimation of the predictions of external validation data . 
Random Variable  In probability and statistics, a random variable, aleatory variable or stochastic variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). A random variable can take on a set of possible different values (similarly to other mathematical variables), each with an associated probability, in contrast to other mathematical variables. A random variable’s possible values might represent the possible outcomes of a yettobeperformed experiment, or the possible outcomes of a past experiment whose alreadyexisting value is uncertain (for example, due to imprecise measurements or quantum uncertainty). They may also conceptually represent either the results of an ‘objectively’ random process (such as rolling a die) or the ‘subjective’ randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself but is instead related to philosophical arguments over the interpretation of probability. The mathematics works the same regardless of the particular interpretation in use. The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function, characteristic of a probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of a probability distribution; or a mixture of both types. The realizations of a random variable, that is, the results of randomly choosing values according to the variable’s probability distribution function, are called random variates. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a function defined on a sample space whose outputs are numerical values. 
Random Vector Functional Link Network (RVFL+) 
In school, a teacher plays an important role in various classroom teaching patterns. Likewise to this human learning activity, the learning using privileged information (LUPI) paradigm provides additional information generated by the teacher to ‘teach’ learning algorithms during the training stage. Therefore, this novel learning paradigm is a typical TeacherStudent Interaction mechanism. This paper is the first to present a random vector functional link network based on the LUPI paradigm, called RVFL+. Rather than simply combining two existing approaches, the newlyderived RVFL+ fills the gap between neural networks and the LUPI paradigm, which offers an alternative way to train RVFL networks. Moreover, the proposed RVFL+ can perform in conjunction with the kernel trick for highly complicated nonlinear feature learning, which is termed KRVFL+. Furthermore, the statistical property of the proposed RVFL+ is investigated, and we derive a sharp and highquality generalization error bound based on the Rademacher complexity. Competitive experimental results on 14 realworld datasets illustrate the great effectiveness and efficiency of the novel RVFL+ and KRVFL+, which can achieve better generalization performance than stateoftheart algorithms. 
Random Walk Covariance Model  rwc 
Randomized Canonical Correlation  Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernelbased nonrandom versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernelbased measures. 
Randomized Generalized Variance  Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernelbased nonrandom versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernelbased measures. 
Randomized Hierarchical Alternating Least Squares  Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of `big data’ has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a nearoptimal factorization, i.e., the algorithm scales with the target rank of the data rather than the ambient dimension of measurement space. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS. 
Randomized Independent Component Analysis (RICA) 
Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernelbased nonrandom versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernelbased measures. 
Randomized Response  Randomized response is a research method used in structured survey interview. It was first proposed by S. L. Warner in 19651 and later modified by B. G. Greenberg in 1969.2 It allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while maintaining confidentiality. Chance decides, unknown to the interviewer, whether the question is to be answered truthfully, or “yes”, regardless of the truth. For example, social scientists have used it to ask people whether they use drugs, whether they have illegally installed telephones, or whether they have evaded paying taxes. Before abortions were legal, social scientists used the method to ask women whether they had had abortions. rr 
Randomized Weighted Majority Algorithm (RWMA) 
The randomized weighted majority algorithm is an algorithm in machine learning theory. It improves the mistake bound of the weighted majority algorithm. Imagine that every morning before the stock market opens, we get a prediction from each of our ‘experts’ about whether the stock market will go up or down. Our goal is to somehow combine this set of predictions into a single prediction that we then use to make a buy or sell decision for the day. The RWMA gives us a way to do this combination such that our prediction record will be nearly as good as that of the single best expert in hindsight. ➘ “Weighted Majority Algorithm” 
RankLib  RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented: • MART (Multiple Additive Regression Trees, a.k.a. Gradient boosted regression tree) • RankNet • RankBoost • AdaRank • Coordinate Ascent • LambdaMART • ListNet • Random Forests • With appropriate parameters for Random Forests, it can also do bagging several MART/LambdaMART rankers. It also implements many retrieval metrics as well as provides many ways to carry out evaluation. 
RankPL  In this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing ‘normal’ from’ surprising’ events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download. 
RaoScott CochranArmitage by Slices Trend Test (RSCABS) 
RSCABS 
rApache  rApache is a project supporting web application development using the R statistical language and environment and the Apache web server. The current software distribution runs on UNIX/Linux and Mac OS X operating systems. Apache servers with threaded MultiProcessing Modules are now supported, but the the Apache Prefork MultiProcessing Module is still recommended (refer to the MultiProcessing Modules chapter from Apache for more about this). The rApache software distribution provides the Apache module named mod_R that embeds the R interpreter inside the web server. It also comes bundled with libapreq, an Apache module for manipulating client request data. Together, they provide the glue to transform R into a serverside scripting environment. Another important project that’s not bundled with rApache, but plays an important role in serverside scripting, is the R package brew (also available on CRAN). It implements a templating framework for report generation, and it’s perfect for generating HTML on the fly. it’s syntax is similar to PHP, Ruby’s erb module, Java Server Pages, and Python’s psp module. brew can be used standalone as well, so it’s not part of the distribution. http://…/rscriptasserviceapi 
Rapid Automatic Keyword Extraction (RAKE) 
Keywords are widely used to define queries within information retrieval (IR) systems as they are easy to define, revise, remember, and share. This chapter describes the rapid automatic keyword extraction (RAKE), an unsupervised, domainindependent, and languageindependent method for extracting keywords from individual documents. It provides details of the algorithm and its configuration parameters, and present results on a benchmark dataset of technical abstracts, showing that RAKE is more computationally efficient than TextRank while achieving higher precision and comparable recall scores. The chapter then describes a novel method for generating stoplists, which is used to configure RAKE for specific domains and corpora. Finally, it applies RAKE to a corpus of news articles and defines metrics for evaluating the exclusivity, essentiality, and generality of extracted keywords, enabling a system to identify keywords that are essential or general to documents in the absence of manual annotations. rapidraker 
Rapid Orthogonal Approximate Slepian Transform (ROAST) 
In this paper, we provide a Rapid Orthogonal Approximate Slepian Transform (ROAST) for the discrete vector one obtains when collecting a finite set of uniform samples from a baseband analog signal. The ROAST offers an orthogonal projection which is an approximation to the orthogonal projection onto the leading discrete prolate spheroidal sequence (DPSS) vectors (also known as Slepian basis vectors). As such, the ROAST is guaranteed to accurately and compactly represent not only oversampled bandlimited signals but also the leading DPSS vectors themselves. Moreover, the subspace angle between the ROAST subspace and the corresponding DPSS subspace can be made arbitrarily small. The complexity of computing the representation of a signal using the ROAST is comparable to the FFT, which is much less than the complexity of using the DPSS basis vectors. We also give nonasymptotic results to guarantee that the proposed basis not only provides a very high degree of approximation accuracy in a meansquare error sense for bandlimited sample vectors, but also that it can provide highquality approximations of all sampled sinusoids within the band of interest. 
Rasch Model  The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the tradeoff between (a) the respondent’s abilities, attitudes or personality traits and (b) the item difficulty. For example, they may be used to estimate a student’s reading ability, or the extremity of a person’s attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession and market research because of their general applicability. The mathematical theory underlying Rasch models is a special case of item response theory and, more generally, a special case of a generalized linear model. However, there are important differences in the interpretation of the model parameters and its philosophical implications that separate proponents of the Rasch model from the item response modeling tradition. A central aspect of this divide relates to the role of specific objectivity, a defining property of the Rasch model according to Georg Rasch, as a requirement for successful measurement. 
Rating Scale  A rating scale is a set of categories designed to elicit information about a quantitative or a qualitative attribute. In the social sciences, particularly psychology, common examples are the Likert scale and 110 rating scales in which a person selects the number which is considered to reflect the perceived quality of a product. 
Rationalization  We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal stateaction representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda. 
Raw Data  Raw data (also known as primary data) is a term for data collected from a source. Raw data has not been subjected to processing or any other manipulation, and are also referred to as primary data. Raw data is a relative term. Raw data can be input to a computer program or used in manual procedures such as analyzing statistics from a survey. 
Ray  The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a dynamic task graph computation model that supports both the taskparallel and the actor programming models. To meet the performance requirements of AI applications, we propose an architecture that logically centralizes the system’s control state using a sharded storage system and a novel bottomup distributed scheduler. In our experiments, we demonstrate submillisecond remote task latencies and linear throughput scaling beyond 1.8 million tasks per second. We empirically validate that Ray speeds up challenging benchmarks and serves as both a natural and performant fit for an emerging class of reinforcement learning applications and algorithms. 
Ray RLLib  Reinforcement learning (RL) algorithms involve the deep nesting of distinct components, where each component typically exhibits opportunities for distributed computation. Current RL libraries offer parallelism at the level of the entire program, coupling all the components together and making existing implementations difficult to extend, combine, and reuse. We argue for building composable RL components by encapsulating parallelism and resource requirements within individual components, which can be achieved by building on top of a flexible taskbased programming model. We demonstrate this principle by building Ray RLLib on top of Ray and show that we can implement a wide range of stateoftheart algorithms by composing and reusing a handful of standard components. This composability does not come at the cost of performance — in our experiments, RLLib matches or exceeds the performance of highly optimized reference implementations. Ray RLLib is available as part of Ray at https://…/. 
RDeepSense  Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications. Although existing studies have demonstrated the effectiveness and feasibility of running deep neural network inference operations on mobile and embedded devices, they overlooked the reliability of mobile computing models. Reliability measurements such as predictive uncertainty estimations are key factors for improving the decision accuracy and user experience. In this work, we propose RDeepSense, the first deep learning model that provides wellcalibrated uncertainty estimations for resourceconstrained mobile and embedded devices. RDeepSense enables the predictive uncertainty by adopting a tunable proper scoring rule as the training criterion and dropout as the implicit Bayesian approximation, which theoretically proves its correctness.To reduce the computational complexity, RDeepSense employs efficient dropout and predictive distribution estimation instead of model ensemble or samplingbased method for inference operations. We evaluate RDeepSense with four mobile sensing applications using Intel Edison devices. Results show that RDeepSense can reduce around 90% of the energy consumption while producing superior uncertainty estimations and preserving at least the same model accuracy compared with other stateoftheart methods. 
ReabsNet  Though deep neural network has hit a huge success in recent studies and applica tions, it still remains vulnerable to adversarial perturbations which are imperceptible to humans. To address this problem, we propose a novel network called ReabsNet to achieve high classification accuracy in the face of various attacks. The approach is to augment an existing classification network with a guardian network to detect if a sample is natural or has been adversarially perturbed. Critically, instead of simply rejecting adversarial examples, we revise them to get their true labels. We exploit the observation that a sample containing adversarial perturbations has a possibility of returning to its true class after revision. We demonstrate that our ReabsNet outperforms the stateoftheart defense method under various adversarial attacks. 
React.js  React (sometimes styled React.js or ReactJS) is an opensource JavaScript library for creating user interfaces that aims to address challenges encountered in developing singlepage applications. It is maintained by Facebook, Instagram and a community of individual developers and corporations. React is intended to help developers build large applications that use data that changes over time. Its goal is to be simple, declarative and composable. React only handles the user interface in an app; it is considered to only be the view in the modelviewcontroller (MVC) software pattern, and can be used in conjunction with other JavaScript libraries or larger MVC frameworks such as AngularJS. It can also be used with Reactbased addons that take care of the nonUI parts of building a web application. According to JavaScript analytics service Libscore, React is currently being used on the homepages of Imgur, Bleacher Report, Feedly, Airbnb, SeatGeek, HelloSign, and others. 
Reactive Application  A Reactive Application is an application that reacts to its changing environment by design. It’s constructed from the beginning to react to load, react to failure and react to users. This is achieved by the underlying notion of reacting to messages. 
Reactive Programming  In computing, reactive programming is a programming paradigm oriented around data flows and the propagation of change. This means that it should be possible to express static or dynamic data flows with ease in the programming languages used, and that the underlying execution model will automatically propagate changes through the data flow. For example, in an imperative programming setting, a:=b+c would mean that a is being assigned the result of b+c in the instant the expression is evaluated. Later, the values of b and c can be changed with no effect on the value of a. In reactive programming, the value of a would be automatically updated based on the new values. 
Real log Canonical Threshold (RLCT) 
➘ “Widely Applicable Bayesian Information Criterion” 
Real Logic  We propose real logic: a uniform framework for integrating automatic learning and reasoning. Real logic is defined on a full firstorder language where formulas have truthvalue in the interval [0,1] and semantics defined concretely on the domain of real numbers. Logical constants are interpreted as (feature) vectors of real numbers. Real logic promotes a wellfounded integration of deductive reasoning on knowledgebases with efficient, datadriven relational machine learning. We show how Real Logic can be implemented in deep Tensor Neural Networks with the use of Google’s TensorFlow primitives. The paper concludes with experiments on a simple but representative example of knowledge completion. 
RealTime Intelligent Computing  ➘ “RealTime Intelligent Systems” 
RealTime Intelligent Systems  Intelligent computing refers greatly to artificial intelligence with the aim at making computer to act as a human. This newly developed area of realtime intelligent computing integrates the aspect of dynamic environments with the human intelligence. Book: Lecture Notes in RealTime Intelligent Systems 
Realtime IoT Benchmark for Distributed Stream Processing Platforms (RIoTBench) 
The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in realtime. The inherent closedloop responsiveness and decision making of IoT applications make them ideal candidates for using low latency and scalable stream processing platforms. Distributed Stream Processing Systems (DSPS) hosted on Cloud datacenters are becoming the vital engine for realtime data processing and analytics in any IoT software architecture. But the efficacy and performance of contemporary DSPS have not been rigorously studied for IoT applications and data streams. Here, we develop RIoTBench, a Realtime IoT Benchmark suite, along with performance metrics, to evaluate DSPS for streaming IoT applications. The benchmark includes 27 common IoT tasks classified across various functional categories and implemented as reusable microbenchmarks. Further, we propose four IoT application benchmarks composed from these tasks, and that leverage various dataflow semantics of DSPS. The applications are based on common IoT patterns for data preprocessing, statistical summarization and predictive analytics. These are coupled with four stream workloads sourced from real IoT observations on smart cities and fitness, with peak streams rates that range from 500 to 10000 messages/sec and diverse frequency distributions. We validate the RIoTBench suite for the popular Apache Storm DSPS on the Microsoft Azure public Cloud, and present empirical observations. This suite can be used by DSPS researchers for performance analysis and resource scheduling, and by IoT practitioners to evaluate DSPS platforms. 
RealTime Predictive Analytics  It is when a predictive model (built/fitted on a set of aggregated data) is deployed to perform runtime prediction on a continuous stream of event data to enable decision making in realtime. In order to achieve this, there are two aspects involved. One, the predictive model built by a Data Scientist via a standalone tool (R, SAS, SPSS, etc.) has to be exported in a consumable format (PMML is a preferred method across machine learning environments these days; we have done this and also via other formats). Second, a streaming operational analytics platform has to consume the model (PMML or other format) and translate it into the necessary predictive function (via opensource jPMML or Cascading Pattern or Zementis’ commercial licensed UPPI or other interfaces), and also feed the processed streaming event data (via a stream processing component in CEP or similar) to compute the predicted outcome. This deployment of a complex predictive model, from its parent machine learning environment to an operational analytics environment, is one possible route in order to successfully achieve a continuous runtime prediction on streaming event data in realtime. 
Recall  In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. 
RecallOriented Understudy for Gisting Evaluation (ROUGE) 
ROUGE, or RecallOriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (humanproduced) summary or translation. 
Receiver Operating Characteristic (ROC Curve) 
In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the total actual positives (TPR = true positive rate) vs. the fraction of false positives out of the total actual negatives (FPR = false positive rate), at various threshold settings. TPR is also known as sensitivity or recall in machine learning. The FPR is also known as the fallout and can be calculated as one minus the more well known specificity. The ROC curve is then the sensitivity as a function of fallout. In general, if both of the probability distributions for detection and false alarm are known, the ROC curve can be generated by plotting the Cumulative Distribution Function of the detection probability in the yaxis versus the Cumulative Distribution Function of the false alarm probability in xaxis. https://rocr.bioinf.mpisb.mpg.de ROCR 
Recommendation Engine of Multilayers (REM) 
Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate betweensubjects dependency. One major advantage is that the proposed method is able to address the ‘coldstart’ issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through subgroup information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwisecoordinatedescent algorithm. In theory, we investigate both algorithmic properties for global and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature. 
Recommender System  Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item. recosystem 
Record Linkage (RL) 
Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases). Record linkage is necessary when joining data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), as may be the case due to differences in record shape, storage location, and/or curator style or preference. A data set that has undergone RLoriented reconciliation may be referred to as being crosslinked. Record Linkage is called Data Linkage in many jurisdictions, but is the same process. 
Rectified Factor Networks (RFN) 
We propose rectified factor networks (RFNs) to efficiently construct very sparse, nonlinear, highdimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces nonnegative and normalized posterior means. We proof convergence and correctness of the RFN learning algorithm. On benchmarks, RFNs are compared to other unsupervised methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to previous sparse coding methods, RFNs yield sparser codes, capture the data’s covariance structure more precisely, and have a significantly smaller reconstruction error. We test RFNs as pretraining technique for deep networks on different vision datasets, where RFNs were superior to RBMs and autoencoders. On gene expression data from two pharmaceutical drug discovery studies, RFNs detected small and rare gene modules that revealed highly relevant new biological insights which were so far missed by other unsupervised methods. 
Rectified Linear Unit (ReLU) 

Rectifier  In the context of artificial neural networks, the rectifier is an activation function defined as f(x) = max(0, x) where x is the input to a neuron. This activation function has been argued to be more biologically plausible (cortical neurons are rarely in their maximum saturation regime) than the widely used logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. A unit employing the rectifier is also called a rectified linear unit (ReLU). 
Recurrent Additive Networks (RAN) 
We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated componentwise sum of the input and the previous state, without any of the nonlinearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs outperform both LSTMs and GRUs on benchmark language modeling problems. This result shows that many of the nonlinear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood. 
Recurrent Collective Classification (RCC) 
We propose a new method for training iterative collective classifiers for labeling nodes in network data. The iterative classification algorithm (ICA) is a canonical method for incorporating relational information into classification. Yet, existing methods for training ICA models rely on the assumption that relational features reflect the true labels of the nodes. This unrealistic assumption introduces a bias that is inconsistent with the actual prediction algorithm. In this paper, we introduce recurrent collective classification (RCC), a variant of ICA analogous to recurrent neural network prediction. RCC accommodates any differentiable local classifier and relational feature functions. We provide gradientbased strategies for optimizing over model parameters to more directly minimize the loss function. In our experiments, this direct loss minimization translates to improved accuracy and robustness on real network data. We demonstrate the robustness of RCC in settings where local classification is very noisy, settings that are particularly challenging for ICA. 
Recurrent Entity Network (EntNet) 
We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic longterm memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason onthefly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed size memory and can learn to perform location and contentbased read and write operations. However, unlike those models it has a simple parallel architecture in which several memory locations can be updated simultaneously. The EntNet sets a new stateoftheart on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting. We also demonstrate that it can solve a reasoning task which requires a large number of supporting facts, which other methods are not able to solve, and can generalize past its training horizon. It can also be practically used on large scale datasets such as Children’s Book Test, where it obtains competitive performance, reading the story in a single pass. 
Recurrent Gaussian Processes (RGP) 
We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNNbased sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available. 
Recurrent Ladder Network  In this paper we address the problem of electing a committee among a set of $m$ candidates and on the basis of the preferences of a set of $n$ voters. We consider the approval voting method in which each voter can approve as many candidates as she/he likes by expressing a preference profile (boolean $m$vector). In order to elect a committee, a voting rule must be established to `transform’ the $n$ voters’ profiles into a winning committee. The problem is widely studied in voting theory; for a variety of voting rules the problem was shown to be computationally difficult and approximation algorithms and heuristic techniques were proposed in the literature. In this paper we follow an Ordered Weighted Averaging approach and study the $k$sum approval voting (optimization) problem in the general case $1 \leq k <n$. For this problem we provide different mathematical programming formulations that allow us to solve it in an exact solution framework. We provide computational results showing that our approach is efficient for mediumsize test problems ($n$ up to 200, $m$ up to 60) since in all tested cases it was able to find the exact optimal solution in very short computational times. Recurrent Ladder Networks 
Recurrent Memory Network  Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long ShortTerm Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform indepth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous stateoftheart by a large margin. 
Recurrent Neural Network (RNN) 
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition, where they have achieved the best known results. rnn 
Recurrent Neural Network Language Model (RNNLM) 
Recurrent neural network based language model has been proposed to overcome certain limitations of the feedforward NNLM, such as the need to specify the context length (the order of the model N), and because theoretically RNNs can efficiently represent more complex patterns than the shallow neural networks. The RNN model does not have a projection layer; only input, hidden and output layer. What is special for this type of model is the recurrent matrix that connects hidden layer to itself, using timedelayed connections. This allows the recurrent model to form some kind of short term memory, as information from the past can be represented by the hidden layer state that gets updated based on the current input and the state of the hidden layer in the previous time step. The complexity per training example of the RNN model is Q = HH + HV; where the word representations D have the same dimensionality as the hidden layer H. Again, the term HV can be efficiently reduced to H log2(V ) by using hierarchical softmax. Most of the complexity then comes from HH. http://rnnlm.org Gated WordCharacter Recurrent Language Model 
Recurrent Neural Network With Residual Attention (RRA) 
In this paper, we propose a recurrent neural network (RNN) with residual attention (RRA) to learn longrange dependencies from sequential data. We propose to add residual connections across timesteps to RNN, which explicitly enhances the interaction between current state and hidden states that are several timesteps apart. This also allows training errors to be directly backpropagated through residual connections and effectively alleviates gradient vanishing problem. We further reformulate an attention mechanism over residual connections. An attention gate is defined to summarize the individual contribution from multiple previous hidden states in computing the current state. We evaluate RRA on three tasks: the adding problem, pixelbypixel MNIST classification and sentiment analysis on the IMDB dataset. Our experiments demonstrate that RRA yields better performance, faster convergence and more stable training compared to a standard LSTM network. Furthermore, RRA shows highly competitive performance to the stateoftheart methods. 
Recurrent Neural Network with Tensor Train  Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many stateoftheart performance on various complex problems. However, most of the stateoftheart RNNs have millions of parameters and require many computational resources for training and predicting new data. This paper proposes an alternative RNN model to reduce the number of parameters significantly by representing the weight parameters based on Tensor Train (TT) format. In this paper, we implement the TTformat representation for several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We compare and evaluate our proposed RNN model with uncompressed RNN model on sequence classification and sequence prediction tasks. Our proposed RNNs with TTformat are able to preserve the performance while reducing the number of RNN parameters significantly up to 40 times smaller. 
Recurrent Relational Network  Humans possess an ability to abstractly reason about objects and their interactions, an ability not shared with stateoftheart deep learning models. Relational networks, introduced by Santoro et al. (2017), add the capacity for relational reasoning to deep neural networks, but are limited in the complexity of the reasoning tasks they can address. We introduce recurrent relational networks which increase the suite of solvable tasks to those that require an order of magnitude more steps of relational reasoning. We use recurrent relational networks to solve Sudoku puzzles and achieve stateoftheart results by solving 96.6% of the hardest Sudoku puzzles, where relational networks fail to solve any. We also apply our model to the BaBi textual QA dataset solving 19/20 tasks which is competitive with stateoftheart sparse differentiable neural computers. The recurrent relational network is a general purpose module that can augment any neural network model with the capacity to do manystep relational reasoning. 
Recurrent Spatial Transformer Networks (RNNSPN) 
We integrate the recently proposed spatial transformer network (SPN) into a recurrent neural network (RNN) to form an RNNSPN model. We use the RNNSPN to classify digits in cluttered MNIST sequences. The proposed model achieves a single digit error of 1.5% compared to 2.9% for a convolutional networks and 2.0% for convolutional networks with SPN layers. The SPN outputs a zoomed, rotated and skewed version of the input image. We investigate different downsampling factors (ratio of pixel in input and output) for the SPN and show that the RNNSPN model is able to downsample the input images without deteriorating performance. The downsampling in RNNSPN can be thought of as adaptive downsampling that minimizes the information loss in the regions of interest. We attribute the superior performance of the RNNSPN to the fact that it can attend to a sequence of regions of interest. GitXiv 
Recursive Bayesian Estimation  Recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function recursively over time using incoming measurements and a mathematical process model. 
Recursive Feature Elimination (RFE) 
Recursive feature elimination (RFE) is a featureselection strategy. It performs in two nested levels of crossvalidation. First it tries to divide the training set into N folds. RFE puts one fold aside for testing the generalization and then trains itself with the remaining data. http://…efeatureeliminationcoupledtosvminr http://…/recursivefeatureeliminationrfe http://…0Selection%20from%20Microarray%20Data.pdf pathClass 
Recursive Neural Network (RNN) 
A recursive neural network (RNN) is a kind of deep neural network created by applying the same set of weights recursively over a structure, to produce a structured prediction over variablelength input, or a scalar prediction on it, by traversing a given structure in topological order. RNNs have been successful in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RNNs have first been introduced to learn distributed representations of structure, such as logical terms. 
Recursive Partitioning  Recursive partitioning is a statistical method for multivariable analysis. Recursive partitioning creates a decision tree that strives to correctly classify members of the population based on several dichotomous independent variables. A variation is ‘Cox linear recursive partitioning’. 
Recursively Decomposing the function into locally Independent Subspaces (RDIS) 
Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most realworld optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problemdecomposition approach to nonconvex optimization. Similarly to DPLLstyle SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent subfunctions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding. GitXiv 
Redis  Redis is an open source, BSD licensed, advanced keyvalue cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs. You can run atomic operations on these types, like appending to a string; incrementing the value in a hash; pushing an element to a list; computing set intersection, union and difference; or getting the member with highest ranking in a sorted set. In order to achieve its outstanding performance, Redis works with an inmemory dataset. Depending on your use case, you can persist it either by dumping the dataset to disk every once in a while, or by appending each command to a log. Persistence can be optionally disabled, if you just need a featurerich, networked, inmemory cache. Redis also supports trivialtosetup masterslave asynchronous replication, with very fast nonblocking first synchronization, autoreconnection with partial resynchronization on net split. RcppRedis 
ReducedRank Regression  The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. rrr 
Redundancy Analysis (RDA) 
Redundancy analysis (RDA) is a form of constrained ordination that examines how much of the variation in one set of variables explains the variation in another set of variables. It is the multivariate analog of simple linear regression. Redundancy analysis is based on similar principles as principal components analysis and thus makes similar assumptions about the data. It is appropriate when the expected relationship between dependent and independent variables is linear (e.g. climate and allele frequency). 
Reed’s Law  Reed’s law is the assertion of David P. Reed that the utility of large networks, particularly social networks, can scale exponentially with the size of the network. The reason for this is that the number of possible subgroups of network participants is 2N − N − 1, where N is the number of participants. This grows much more rapidly than either • the number of participants, N, or • the number of possible pair connections, N(N − 1)/2 (which follows Metcalfe’s law), so that even if the utility of groups available to be joined is very small on a peergroup basis, eventually the network effect of potential group membership can dominate the overall economics of the system. 
Referenced Metric and Unreferenced Metric Blended Evaluation Routine (RUBER) 
Opendomain humancomputer conversation has been attracting increasing attention over the past few years. However, there does not exist a standard automatic evaluation metric for opendomain dialog systems; researchers usually resort to human annotation for model evaluation, which is time and laborintensive. In this paper, we propose RUBER, a Referenced metric and Unreferenced metric Blended Evaluation Routine, which evaluates a reply by taking into consideration both a groundtruth reply and a query (previous user utterance). Our metric is learnable, but its training does not require labels of human satisfaction. Hence, RUBER is flexible and extensible to different datasets and languages. Experiments on both retrieval and generative dialog systems show that RUBER has high correlation with human annotation. 
Refinery  Refinery is an open source platform for the massive analysis of large unstructured document collections using the latest state of the art topic models. The goal of Refinery is to simplify this process within an intuitive webbased interface. What makes Refinery unique is that its meant to be run locally, thus bypassing the need for securing document collections over the internet. Refinery was developed by myself and Ben Swanson at MIT Media Lab. It was also the recipient of the Knight Prototype Award in 2014. 
Reflective Oracles  Classical game theory treats players as special – a description of a game contains a full, explicit enumeration of all players – even though in the real world, ‘players’ are no more fundamentally special than rocks or clouds. It isn’t trivial to find a decisiontheoretic foundation for game theory in which an agent’s coplayers are a nondistinguished part of the agent’s environment. Attempts to model both players and the environment as Turing machines, for example, fail for standard diagonalization reasons. In this paper, we introduce a ‘reflective’ type of oracle, which is able to answer questions about the outputs of oracle machines with access to the same oracle. These oracles avoid diagonalization by answering some queries randomly. We show that machines with access to a reflective oracle can be used to de ne rational agents using causal decision theory. These agents model their environment as a probabilistic oracle machine, which may contain other agents as a nondistinguished part. We show that if such agents interact, they will play a Nash equilibrium, with the randomization in mixed strategies coming from the randomization in the oracle’s answers. This can be seen as providing a foundation for classical game theory in which players aren’t special. 
Refutation Complexity  The sample complexity of learning a Booleanvalued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning. We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample complexity of \emph{efficient} agnostic learning. Informally, refutation complexity of a class $\mathcal{C}$ is the minimum number of examplelabel pairs required to efficiently distinguish between the case that the labels correlate with the evaluation of some member of $\mathcal{C}$ (\emph{structure}) and the case where the labels are i.i.d. Rademacher random variables (\emph{noise}). The easy direction of this relationship was implicitly used in the recent framework for improper PAC learning lower bounds of Daniely and coauthors via connections to the hardness of refuting random constraint satisfaction problems. Our work can be seen as making the relationship between agnostic learning and refutation implicit in their work into an explicit equivalence. In a recent, independent work, Salil Vadhan discovered a similar relationship between refutation and PAClearning in the realizable (i.e. noiseless) case. 
Regression Analysis  In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or ‘criterion variable’) changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution. 
Regression Discontinuity Design (RDD) 
In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasiexperimental pretestposttest design that elicits the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the local Average treatment effect in environments in which randomization was unfeasible. First applied by Donald Thistlewaite and Donald Campbell to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. rddtools 
Regression Nomogram Plot  regplot 
Regression toward the mean / Regression to the mean  In statistics, regression toward (or to) the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurementand, paradoxically, if it is extreme on its second measurement, it will tend to have been closer to the average on its first. To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data. 
Regression Tree  A dataanalysis method that recursively partitions data into sets each of which are simply modeled using regression methods. 
Regret  Regret is the negative emotion experienced when learning that an alternative course of action would have resulted in a more favorable outcome. The theory of regret aversion or anticipated regret proposes that when facing a decision, individuals may anticipate the possibility of feeling regret after the uncertainty is resolved and thus incorporate in their choice their desire to eliminate or reduce this possibility. 
Regret Minimizing Set  A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a kregret minimizing set has the property that the regret ratio between the score of the top1 item in Q and the score of the topk item in P is minimized, where the score of an item is the inner product of the item’s attributes with a user’s weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that kregret minimization is NPComplete for all dimensions d >= 3. This settles an open problem from Chester et al. [VLDB 2014], and resolves the complexity status of the problem for all d: the problem is known to have polynomialtime solution for d <= 2. In addition, we propose two new approximation schemes for regret minimization, both with provable guarantees, one based on coresets and another based on hitting sets. We also carry out extensive experimental evaluation, and show that our schemes compute regretminimizing sets comparable in size to the greedy algorithm proposed in [VLDB 14] but our schemes are significantly faster and scalable to large data sets. 
Regularization  Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an illposed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm. 
Regularization Methods  Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an illposed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm. A theoretical justification for regularization is that it attempts to impose Occam’s razor on the solution. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters. 
Regularized Discriminant Analysis (RDA) 
The regularized discriminant analysis (RDA) is a generalization of the linear discriminant analysis (LDA) and the quadratic discreminant analysis (QDA). Both algorithms are special cases of this algorithm. If the alpha parameter is set to 1, this operator performs LDA. Similarly if the alpha parameter is set to 0, this operator performs QDA. For more information about LDA and QDA please study the documentation of the corresponding operators. Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups. For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college, (2) NOT to go to college. For that purpose the researcher could collect data on numerous variables prior to students’ graduation. After graduation, most students will naturally fall into one of the two categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students’ subsequent educational choice. Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA). For example, suppose the same student graduation scenario. We could have measured students’ stated intention to continue on to college one year prior to graduation. If the means for the two groups (those who actually went to college and those who did not) are different, then we can say that intention to attend college as stated one year prior to graduation allows us to discriminate between those who are and are not college bound (and this information may be used by career counselors to provide the appropriate guidance to the respective students). The basic idea underlying discriminant analysis is to determine whether groups differ with regard to the mean of a variable, and then to use that variable to predict group membership (e.g., of new cases). Discriminant Analysis may be used for two objectives: either we want to assess the adequacy of classification, given the group memberships of the objects under study; or we wish to assign objects to one of a number of (known) groups of objects. Discriminant Analysis may thus have a descriptive or a predictive objective. In both cases, some group assignments must be known before carrying out the Discriminant Analysis. Such group assignments, or labeling, may be arrived at in any way. Hence Discriminant Analysis can be employed as a useful complement to Cluster Analysis (in order to judge the results of the latter) or Principal Components Analysis. http://…/ESLII_print10.pdf http://…/slacpub4389.pdf http://…/citation.cfm?id=1658388 rda 
Regularized Empirical Risk Minimization (RERM) 
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on the performance of learning algorithms. Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction 
Regularized Optimal Scaling Regression (ROS Regression) 
In this paper we combine two important extensions of ordinary least squares regression: regularization and optimal scaling. Optimal scaling (sometimes also called optimal scoring) has originally been developed for categorical data, and the process finds quantifications for the categories that are optimal for the regression model in the sense that they maximize the multiple correlation. Although the optimal scaling method was developed initially for variables with a limited number of categories, optimal transformations of continuous variables are a special case. We will consider a variety of transformation types; typically we use step functions for categorical variables, and smooth (spline) functions for continuous variables. Both types of functions can be restricted to be monotonic, preserving the ordinal information in the data. In addition to optimal scaling, three regularization methods will be considered: Ridge regression, the Lasso, and the Elastic Net. The resulting method will be called ROS Regression (Regularized Optimal Scaling Regression. We will show that the basic OS algorithm provides straightforward and efficient estimation of the regularized regression coefficients, automatically gives the Group Lasso and Blockwise Sparse Regression, and extends them with monotonicity properties. We will show that Optimal Scaling linearizes nonlinear relationships between predictors and outcome, and improves upon the condition of the predictor correlation matrix, increasing (on average) the conditional independence of the predictors. Alternative options for regularization of either regression coefficients or category quantifications are mentioned. Extended examples are provided. Keywords: Categorical Data, Optimal Scaling, Conditional Independence, Step Functions, Splines, Monotonic Transformations, Regularization, Lasso, Elastic Net, Group Lasso, Blockwise Sparse Regression. 
Reinforced EncoderDecoder (RED) 
Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced EncoderDecoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequencelevel supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS14 and TVHumanInteraction datasets for action anticipation and achieve stateoftheart performance on all datasets. 
REINFORCEjs  REINFORCEjs is a Reinforcement Learning library that implements several common RL algorithms, all with web demos. In particular, the library currently includes: • Dynamic Programming methods • (Tabular) Temporal Difference Learning (SARSA/QLearning) • Deep QLearning for QLearning with function approximation with Neural Networks • Stochastic/Deterministic Policy Gradients and Actor Critic architectures for dealing with continuous action spaces. (very alpha, likely buggy or at the very least finicky and inconsistent) GitHub REINFORCEjs 
Reinforcement Learning (RL) 
Reinforcement learning (RL) is learning by interacting with an environment. An RL agent learns from the consequences of its actions, rather than from being explicitly taught and it selects its actions on basis of its past experiences (exploitation) and also by new choices (exploration), which is essentially trial and error learning. The reinforcement signal that the RLagent receives is a numerical reward, which encodes the success of an action’s outcome, and the agent seeks to learn to select actions that maximize the accumulated reward over time. (The use of the term reward is used here in a neutral fashion and does not imply any pleasure, hedonic impact or other psychological interpretations.) 
Reinforcement Learning with Parameterized Actions (QPAMDP) 
We introduce a modelfree algorithm for learning in Markov decision processes with parameterized actions—discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with this action. This models domains where there are distinct actions which can be adjusted to a particular state. We introduce the QPAMDP algorithm for learning in these domains. We show that QPAMDP converges to a local optima, and compare different approaches in a robot soccer goalscoring domain and a platformer domain. 
Rejection Sampling  In mathematics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptancerejection method or “acceptreject algorithm” and is a type of Monte Carlo method. The method works for any distribution in with a density. Rejection sampling is based on the observation that to sample a random variable one can sample uniformly from the region under the graph of its density function. AR 
ReKopedia  Very important breakthroughs in datacentric machine learning algorithms led to impressive performance in transactional point applications such as detecting anger in speech, alerts from a Face Recognition system, or EKG interpretation. Nontransactional applications, e.g. medical diagnosis beyond the EKG results, require AI algorithms that integrate deeper and broader knowledge in their problemsolving capabilities, e.g. integrating knowledge about anatomy and physiology of the heart with EKG results and additional patient findings. Similarly, for military aerial interpretation, where knowledge about enemy doctrines on force composition and spread helps immensely in situation assessment beyond image recognition of individual objects. The Double Deep Learning approach advocates integrating datacentric machine selflearning techniques with machineteaching techniques to leverage the power of both and overcome their corresponding limitations. To take AI to the next level, it is essential that we rebalance the roles of data and knowledge. Data is important but knowledge deep and commonsense are equally important. An initiative is proposed to build Wikipedia for Smart Machines, meaning target readers are not human, but rather smart machines. Named ReKopedia, the goal is to develop methodologies, tools, and automatic algorithms to convert humanity knowledge that we all learn in schools, universities and during our professional life into Reusable Knowledge structures that smart machines can use in their inference algorithms. Ideally, ReKopedia would be an open source shared knowledge repository similar to the wellknown shared open source software code repositories. Examples in the article are based on or inspired by reallife nontransactional AI systems I deployed over decades of AI career that benefit hundreds of millions of people around the globe. 
RELARM  Following widely used in visual recognition concept of relative attributes, the article establishes definition of the relative PCA attributes for a class of objects defined by vectors of their parameters. A new rating model (RELARM) is built using relative PCA attribute ranking functions for rating object description and kmeans clustering algorithm. Rating assignment of each rating object to a rating category is derived as a result of cluster centers projection on the specially selected rating vector. Empirical study has shown a high level of approximation to the existing S & P, Moody’s and Fitch ratings. 
Relation Extraction (RE) 
With the advent of the Internet, large amount of digital text is generated everyday in the form of news articles, research publications, blogs, question answering forums and social media. It is important to develop techniques for extracting information automatically from these documents, as lot of important information is hidden within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Several applications such as Question Answering, Information Retrieval would benefit from this information. Entities like persons and organizations, form the most basic unit of the information. Occurrences of entities in a sentence are often linked through welldefined relations; e.g., occurrences of person and organization in a sentence may be linked through relations such as employed at. The task of Relation Extraction (RE) is to identify such relations automatically. In this paper, we survey several important supervised, semisupervised and unsupervised RE techniques. We also cover the paradigms of Open Information Extraction (OIE) and Distant Supervision. Finally, we describe some of the recent trends in the RE techniques and possible future research directions. This survey would be useful for three kinds of readers – i) Newcomers in the field who want to quickly learn about RE; ii) Researchers who want to know how the various RE techniques evolved over time and what are possible future research directions and iii) Practitioners who just need to know which RE technique works best in various settings. 
Relation Network (RN) 
We present a conceptually simple, flexible, and general framework for fewshot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained endtoend from scratch. During metalearning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the fewshot setting. Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network. Besides providing improved performance on fewshot learning, our framework is easily extended to zeroshot learning. Extensive experiments on four datasets demonstrate that our simple approach provides a unified and effective approach for both of these two tasks. 
Relational Class Analysis  RCA 
Relational Event Models (REM) 
Sequences of relational events underlie much empirical research on organizational relations. Yet relational event data are typically aggregated and dichotomized to derive networks that can be analyzed with specialized statistical methods. Transforming sequences of relational events into binary network ties entails two main limitations: the loss of information about the order and number of events that compose each tie and the inability to account for compositional changes in the set of actors and/or recipients. rem,relevent 
Relational Network  Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plugandplay module to solve problems that fundamentally hinge on relational reasoning. We tested RNaugmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve stateoftheart, superhuman performance; textbased question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called SortofCLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations. ➚ “Recurrent Relational Network” Recurrent Relational Networks for Complex Relational Reasoning 
Relational Similarity Machines (RSM) 
This paper proposes Relational Similarity Machines (RSM): a fast, accurate, and flexible relational learning framework for supervised and semisupervised learning tasks. Despite the importance of relational learning, most existing methods are hard to adapt to different settings, due to issues with efficiency, scalability, accuracy, and flexibility for handling a wide variety of classification problems, data, constraints, and tasks. For instance, many existing methods perform poorly for multiclass classification problems, graphs that are sparsely labeled or network data with low relational autocorrelation. In contrast, the proposed relational learning framework is designed to be (i) fast for learning and inference at realtime interactive rates, and (ii) flexible for a variety of learning settings (multiclass problems), constraints (few labeled instances), and application domains. The experiments demonstrate the effectiveness of RSM for a variety of tasks and data. 
Relationship Extraction  A Relationship Extraction (Relation Extraction) task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships. http://…ingannlpproblemwithoutusingatonof 
Relative Likelihood  marl 
Relative Risk  In statistics and epidemiology, relative risk or risk ratio (RR) is the ratio of the probability of an event occurring (for example, developing a disease, being injured) in an exposed group to the probability of the event occurring in a comparison, nonexposed group. Relative risk includes two important features: (i) a comparison of risk between two ‘exposures’ puts risks in context, and (ii) ‘exposure’ is ensured by having proper denominators for each group representing the exposure prop.comb.RR 
Relative Survival  In survival analysis, relative survival of a disease is calculated by dividing the overall survival after diagnosis by the survival as observed in a similar population that was not diagnosed with that disease. A similar population is composed of individuals with at least age and gender similar to those diagnosed with the disease. When describing the survival experience of a group of people or patients typically the method of overall survival is used, and it presents estimates of the proportion of people or patients alive at a certain point in time. The problem with measuring overall survival using KaplanMeier or actuarial survival methods, is that the estimates include two causes of death: 1) deaths due to the disease of interest and; 2) deaths due to all other causes, which includes old age, other cancers, trauma and any other possible cause of death. In general, survival analysis is interested in the deaths due to a disease rather than all causes, and therefore a ’causespecific survival analysis’ is employed to measure diseasespecific survival. Thus, there are two ways in performing a causespecific survival analysis ‘competing risks survival analysis’ and ‘relative survival’. 
Relaxed Online Maximum Margin Algorithm (ROMMA) 
An incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximummargin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the maximummargin algorithm also satisfies this mistake bound; this is the first worstcase performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis. 
Relevance Vector Machine (RVM) 
In mathematics, a relevance vector machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification. Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require crossvalidationbased postoptimizations). However RVMs use an expectation maximization (EM)like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization (SMO)based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is patented in the United States by Microsoft. 
Relevant Component Analysis (RCA) 
Irrelevant data variability often causes difficulties in classification and clustering tasks. For example, when data variability is dominated by environment conditions, such as global illumination, nearestneighbour classification in the original feature space may be very unreliable. The goal of Relevant Component Analysis (RCA) is to find a transformation that amplifies relevant variability and suppresses irrelevant variability. Relevant Component Analysis tries to find a linear transformation W of the feature space such that the effect of irrelevant variability is reduced in the transformed space. That is, we wish to rescale the feature space and reduce the weights of irrelevant directions. The main premise of RCA is that we can reduce irrelevant variability by reducing the withinclass variability. Intuitively, a direction which exhibits high variability among samples of the same class is unlikely to be useful for classification or clustering. RECA 
Reliability Data Analysis  After you have obtained component or system reliability data, how do you fit life distribution models, reliability growth models, or acceleration models? How do you estimate failure rates or MTBF’s and project component or system reliability at use conditions? SPREDA 
ReliefBased Feature Selection  Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Reliefbased algorithms (RBAs), a unique family of filterstyle feature selection algorithms that strike an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability. 
Reluplex  Deep neural networks have emerged as a widely used and effective means for tackling complex, realworld problems. However, a major obstacle in applying them to safetycritical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counterexamples). The technique is based on the simplex method, extended to handle the nonconvex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the nextgeneration Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods. 
REMIX  Outlier detection is the identification of points in a dataset that do not conform to the norm. Outlier detection is highly sensitive to the choice of the detection algorithm and the feature subspace used by the algorithm. Extracting domainrelevant insights from outliers needs systematic exploration of these choices since diverse outlier sets could lead to complementary insights. This challenge is especially acute in an interactive setting, where the choices must be explored in a timeconstrained manner. In this work, we present REMIX, the first system to address the problem of outlier detection in an interactive setting. REMIX uses a novel mixed integer programming (MIP) formulation for automatically selecting and executing a diverse set of outlier detectors within a time limit. This formulation incorporates multiple aspects such as (i) an upper limit on the total execution time of detectors (ii) diversity in the space of algorithms and features, and (iii) metalearning for evaluating the cost and utility of detectors. REMIX provides two distinct ways for the analyst to consume its results: (i) a partitioning of the detectors explored by REMIX into perspectives through lowrank nonnegative matrix factorization; each perspective can be easily visualized as an intuitive heatmap of experiments versus outliers, and (ii) an ensembled set of outliers which combines outlier scores from all detectors. We demonstrate the benefits of REMIX through extensive empirical validation on realworld data. 
Remove Unwanted Variation, 2step (RUV2) 
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method “Remove Unwanted Variation, 2step” (RUV2). ruv 
Remove Unwanted Variation, 4step (RUV4) 
High dimensional data suffer from unwanted variation, such as the batch effects common in microarray data. Unwanted variation complicates the analysis of high dimensional data, leading to high rates of false discoveries, high rates of missed discoveries, or both. In many cases the factors causing the unwanted variation are unknown and must be inferred from the data. In such cases, negative controls may be used to identify the unwanted variation and separate it from the wanted variation. We present a new method, RUV4, to adjust for unwanted variation in high dimensional data with negative controls. RUV4 may be used when the goal of the analysis is to determine which of the features are truly associated with a given factor of interest. One nice property of RUV4 is that it is relatively insensitive to the number of unwanted factors included in the model; this makes estimating the number of factors less critical. We also present a novel method for estimating the features’ variances that may be used even when a large number of unwanted factors are included in the model and the design matrix is full rank. We name this the “inverse method for estimating variances.” By combining RUV4 with the inverse method, it is no longer necessary to estimate the number of unwanted factors at all. Using both real and simulated data we compare the performance of RUV4 with that of other adjustment methods such as SVA, LEAPP, ICE, and RUV2. We find that RUV4 and its variants perform as well or better than other methods. ruv 
Renewal Hawkes Process  RHawkes 
Renyi Entropy  In information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min entropy. Entropies quantify the diversity, uncertainty, or randomness of a system. The Rényi entropy is named after Alfréd Rényi. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of Generalized dimensions. The Rényi entropy is important in ecology and statistics as index of diversity. The Rényi entropy is also important in quantum information, where it can be used as a measure of entanglement. In the Heisenberg XY spin chain model, the Rényi entropy as a function of a can be calculated explicitly by virtue of the fact that it is an automorphic function with respect to a particular subgroup of the modular group. In theoretical computer science, the minentropy is used in the context of randomness extractors. 
REorders and/or REflects FACTors (REREFACT) 
Executes a postrotation algorithm that REorders and/or REflects FACTors (REREFACT) for each replication of a simulation study with exploratory factor analysis. 
REPACRR  Adhoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called REPACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results. 
Repeated Measures  Repeated measures design uses the same subjects with every branch of research, including the control. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed. Other (nonrepeated measures) studies compare the same measure under two or more different conditions. For instance, to test the effects of caffeine on cognitive function, a subject’s math ability might be tested once after they consume caffeine and another time when they consume a placebo. Book: Analysis of Repeated Measures Data 
Replacement AutoEncoder  An increasing number of sensors on mobile, Internet of things (IoT), and wearable devices generate timeseries measurements of physical activities. Though access to the sensory data is critical to the success of many beneficial applications such as health monitoring or activity recognition, a wide range of potentially sensitive information about the individuals can also be discovered through these datasets and this cannot easily be protected using traditional privacy approaches. In this paper, we propose an integrated sensing framework for managing access to personal timeseries data in order to provide utility while protecting individuals’ privacy. We introduce \textit{Replacement AutoEncoder}, a novel featurelearning algorithm which learns how to transform discriminative features of multidimensional timeseries that correspond to sensitive inferences, into some features that have been more observed in nonsensitive inferences, to protect users’ privacy. The main advantage of Replacement AutoEncoder is its ability to keep important features of desired inferences unchanged to preserve the utility of the data. We evaluate the efficacy of the algorithm with an activity recognition task in a multisensing environment using extensive experiments on three benchmark datasets. We show that it can retain the recognition accuracy of stateoftheart techniques while simultaneously preserving the privacy of sensitive information. We use a Generative Adversarial Network to attempt to detect the replacement of sensitive data with fake nonsensitive data. We show that this approach does not detect the replacement unless the network can train using the users’ original unmodified data. 
Reporting  
Representation Learning  Feature learning or representation learning is a set of techniques that learn a transformation of raw data input to a representation that can be effectively exploited in machine learning tasks. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, realworld data such as images, video, and sensor measurement is usually complex, redundant, and highly variable. Thus, it is necessary to discover useful features or representations from raw data. Traditional handcrafted features often require expensive human labor and often rely on expert knowledge. Also, they normally do not generalize well. This motivates the design of efficient feature learning techniques. Feature learning can be divided into two categories: • In supervised and unsupervised feature learning. In supervised feature learning, features are learned with labeled input data. Examples include neural networks, multilayer perceptron, and (supervised) dictionary learning. • In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, independent component analysis, autoencoders, matrix factorization, and various forms of clustering. 
Representational Distance Learning (RDL) 
We propose representational distance learning (RDL), a technique that allows transferring knowledge from a model of arbitrary type to a deep neural network (DNN). This method seeks to maximize the similarity between the representational dissimilarity, or distance, matrices (RDMs) of a model with desired knowledge, the teacher, and a DNN currently being trained, the student. This knowledge transfer is performed using auxiliary error functions. This allows DNNs to simultaneously learn from a teacher model and learn to perform some task within the framework of backpropagation. We test the use of RDL for knowledge distillation, also known as model compression, from a large teacher DNN to a small student DNN using the MNIST and CIFAR10 datasets. Also, we test the use of RDL for knowledge transfer between tasks using the CIFAR10 and CIFAR100 datasets. For each test, RDL significantly improves performance when compared to traditional backpropagation alone and performs similarly to, or better than, recently proposed methods for model compression and knowledge transfer. 
Repulsion Loss  Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in realworld scenarios. In this paper, we first explore how a stateoftheart pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowdrobust localization. Our detector trained by repulsion loss outperforms all the stateoftheart methods with a significant improvement in occlusion cases. 
Reputation System  A reputation system computes and publishes reputation scores for a set of objects (e.g. service providers, services, goods or entities) within a community or domain, based on a collection of opinions that other entities hold about the objects. The opinions are typically passed as ratings to a central place where all perceptions, opinions and ratings accumulated. A reputation center which uses a specific reputation algorithm to dynamically compute the reputation scores based on the received ratings. Reputation is a sign of trustworthiness manifested as testimony by other people. New expectations and realities about the transparency, availability, and privacy of people and institutions are emerging. Reputation management – the selective exposure of personal information and activitires – is an important element to how people function in networks as they establish credentials, build trust with others, and garther information to deal with problems or make decisions. 
Resampling  In statistics, resampling is any of a variety of methods for doing one of the following: 1.Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping) 2.Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or rerandomization tests) 3.Validating models by using random subsets (bootstrapping, cross validation) Common resampling techniques include bootstrapping, jackknifing and permutation tests. 
ResBinNet  Recent efforts on training lightweight binary neural networks offer promising execution/memory efficiency. This paper introduces ResBinNet, which is a composition of two interlinked methodologies aiming to address the slow convergence speed and limited accuracy of binary convolutional neural networks. The first method, called residual binarization, learns a multilevel binary representation for the features within a certain neural network layer. The second method, called temperature adjustment, gradually binarizes the weights of a particular layer. The two methods jointly learn a set of softbinarized parameters that improve the convergence rate and accuracy of binary neural networks. We corroborate the applicability and scalability of ResBinNet by implementing a prototype hardware accelerator. The accelerator is reconfigurable in terms of the numerical precision of the binarized features, offering a tradeoff between runtime and inference accuracy. 
Reservoir Computing  Reservoir computing is a framework for computation like a neural network. Typically an input signal is fed into a fixed (random) dynamical system called reservoir and the dynamics of the reservoir map the input to a higher dimension. Then a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The main benefit is that the training is performed only at the readout stage and the reservoir is fixed. Liquidstate machines and echo state networks are two major types of reservoir computing. 
Reservoir Sampling  Reservoir sampling is randomly pulling out a known number of examples from an unknown (or very large) pool of streaming items. 
Residual Analysis  The analysis of residuals plays an important role in validating the regression model. If the error term in the regression model satisfies the four assumptions noted earlier, then the model is considered valid. Since the statistical tests for significance are also based on these assumptions, the conclusions resulting from these significance tests are called into question if the assumptions regarding epsilon are not satisfied. 
Residual Gated Graph ConvNet  Graphstructured data such as functional brain networks, social networks, gene regulatory networks, communications networks have brought the interest in generalizing neural networks to graph domains. In this paper, we are interested to de sign efficient neural network architectures for graphs with variable length. Several existing works such as Scarselli et al. (2009); Li et al. (2016) have focused on recurrent neural networks (RNNs) to solve this task. A recent different approach was proposed in Sukhbaatar et al. (2016), where a vanilla graph convolutional neural network (ConvNets) was introduced. We believe the latter approach to be a better paradigm to solve graph learning problems because ConvNets are more pruned to deep networks than RNNs. For this reason, we propose the most generic class of residual multilayer graph ConvNets that make use of an edge gating mechanism, as proposed in Marcheggiani & Titov (2017). Gated edges appear to be a natural property in the context of graph learning tasks, as the system has the ability to learn which edges are important or not for the task to solve. We apply several graph neural models to two basic network science tasks; subgraph matching and semisupervised clustering for graphs with variable length. Numerical results show the performances of the new model. 
Residual RNN (R2N2) 
Multivariate timeseries modeling and forecasting is an important problem with numerous applications. Traditional approaches such as VAR (vector autoregressive) models and more recent approaches such as RNNs (recurrent neural networks) are indispensable tools in modeling timeseries data. In many multivariate time series modeling problems, there is usually a significant linear dependency component, for which VARs are suitable, and a nonlinear component, for which RNNs are suitable. Modeling such times series with only VAR or only RNNs can lead to poor predictive performance or complex models with large training times. In this work, we propose a hybrid model called R2N2 (Residual RNN), which first models the time series with a simple linear model (like VAR) and then models its residual errors using RNNs. R2N2s can be trained using existing algorithms for VARs and RNNs. Through an extensive empirical evaluation on two real world datasets (aviation and climate domains), we show that R2N2 is competitive, usually better than VAR or RNN, used alone. We also show that R2N2 is faster to train as compared to an RNN, while requiring less number of hidden units. 
Residual Sum of Squares (RSS, SSR, SSE) 
In statistics, the residual sum of squares (RSS) is the sum of squares of residuals. It is also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE). It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data. In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate ordinary least squares (OLS) case, see partitioning in the general OLS model. 
Residual Transfer Network (RTN) 
The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to domain adaptation in deep networks that can jointly learn adaptive classifiers and transferable features from labeled data in the source domain and unlabeled data in the target domain. We relax a sharedclassifier assumption made by previous methods and assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging several layers into deep network to explicitly learn the residual function with reference to the target classifier. We fuse features of multiple layers with tensor product and embed them into reproducing kernel Hilbert spaces to match distributions for feature adaptation. The adaptation can be achieved in most feedforward models by extending them with new residual layers and loss functions, which can be trained efficiently via backpropagation. Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks. 
Residualbased Predictiveness Curve (RBP Curve) 
A visual tool, the RBP curve, to assess the performance of prediction models. RBPcurve 
Resilience  We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings, including the previously unstudied problem of robust mean estimation in $\ell_p$norms. 
Resilient Distributed Dataset (RDD,RDDS) 
Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform inmemory computations on large clusters in a faulttolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarsegrained transformations rather than finegrained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. Formally, an RDD is a readonly, partitioned collection of records. RDDs can only be created through deterministic operations on either (1) data in stable storage or (2) other RDDs. We call these operations transformations to differentiate them from other operations on RDDs. Examples of transformations include map, filter, and join.2 RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference an RDD that it cannot reconstruct after a failure. Finally, users can control two other aspects of RDDs: persistence and partitioning. Users can indicate which RDDs they will reuse and choose a storage strategy for them (e.g., inmemory storage). They can also ask that an RDD’s elements be partitioned across machines based on a key in each record. This is useful for placement optimizations, such as ensuring that two datasets that will be joined together are hashpartitioned in the same way. 
Resilient Linear Classification  Datadriven techniques are used in cyberphysical systems (CPS) for controlling autonomous vehicles, handling demand responses for energy management, and modeling human physiology for medical devices. These datadriven techniques extract models from training data, where their performance is often analyzed with respect to random errors in the training data. However, if the training data is maliciously altered by attackers, the effect of these attacks on the learning algorithms underpinning datadriven CPS have yet to be considered. In this paper, we analyze the resilience of classification algorithms to training data attacks. Specifically, a generic metric is proposed that is tailored to measure resilience of classification algorithms with respect to worstcase tampering of the training data. Using the metric, we show that traditional linear classification algorithms are resilient under restricted conditions. To overcome these limitations, we propose a linear classification algorithm with a majority constraint and prove that it is strictly more resilient than the traditional algorithms. Evaluations on both synthetic data and a realworld retrospective arrhythmia medical casestudy show that the traditional algorithms are vulnerable to tampered training data, whereas the proposed algorithm is more resilient (as measured by worstcase tampering). 
Resource Description Framework (RDF) 
RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semistructured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easytounderstand visual explanations. 
Respondent Driven Sampling (RDS) 
Respondentdriven sampling (RDS), combines “snowball sampling” (getting individuals to refer those they know, these individuals in turn refer those they know and so on) with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a nonrandom way. RDS represents an advance in sampling methodology because it resolves what had previously been an intractable dilemma, a dilemma that is especially severe when sampling hardtoreach groups, that is, groups that are small relative to the general population, and for which no exhaustive list of population members is available. This includes groups relevant to public health, such as drug injectors, prostitutes, and gay men, groups relevant to public policy such as street youth and the homeless, and groups relevant to arts and culture such as jazz musicians and other performance and expressive artists. The dilemma is that if a study focuses only on the most accessible part of the target population, standard probability sampling methods can be used but coverage of the target population is limited. For example, drug injectors can be sampled from needle exchanges and from the streets on which drugs are sold, but this approach misses many women, youth, and those who only recently started injecting. Therefore, a statistically representative sample is drawn of an unrepresentative part of the target population, so conclusions cannot be validly made about the entirety of the target population…. RDS 
Response Surface Method (RSM) 
In statistics, response surface methodology (RSM) explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an optimal response. Box and Wilson suggest using a seconddegree polynomial model to do this. They acknowledge that this model is only an approximation, but use it because such a model is easy to estimate and apply, even when little is known about the process. https://…/NBradley_thesis.pdf http://…/9783662462133 
Restricted Boltzmann Machine (RBM) 
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986, but only rose to prominence after Geoffrey Hinton and collaborators invented fast learning algorithms for them in the mid2000s. RBMs have found applications in dimensionality reduction, classification, collaborative filtering, feature learning and topic modelling. They can be trained in either supervised or unsupervised ways, depending on the task. 
Restricted Maximum Likelihood (REML) 
In statistics, the restricted (or residual, or reduced) maximum likelihood (REML) approach is a particular form of maximum likelihood estimation which does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect. In the case of variance component estimation, the original data set is replaced by a set of contrasts calculated from the data, and the likelihood function is calculated from the probability distribution of these contrasts, according to the model for the complete data set. In particular, REML is used as a method for fitting linear mixed models. In contrast to the earlier maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters. The idea underlying REML estimation was put forward by M. S. Bartlett in 1937. The first description of the approach applied to estimating components of variance in unbalanced data was by Desmond Patterson and Robin Thompson of the University of Edinburgh, although they did not use the term REML. A review of the early literature was given by Harville. REML estimation is available in a number of generalpurpose statistical software packages, including Genstat (the REML directive), SAS (the MIXED procedure), SPSS (the MIXED command), Stata (the mixed command), and R (the lme4 and older nlme packages), as well as in more specialist packages such as MLwiN, HLM, ASReml, Statistical Parametric Mapping and CropStat. 
Restricted Mean Survival Time (RMST) 
RMST = area under the survival curve up to t* • Can think of it as the ‘t*year life expectancy’ • A patient might be told that ‘your life expectancy with Z disease on X treatment over the next 18 months is 9 months’ • Or, ‘treatment A increases your life expectancy during the next 18 months by 2 months, compared with treatment B’ http://…tomerchurnrestrictedmeansurvivaltime survRM2 
Restricted Recurrent Neural Tensor Networks (RNTN) 
Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, resulting in a significant increase of computational cost. An alternative is the recurrent neural tensor network (RNTN), which increases capacity by employing distinct hidden layer weights for each vocabulary word. The disadvantage of RNTNs is that memory usage scales linearly with vocabulary size, which can reach millions for wordlevel language models. In this paper, we introduce restricted recurrent neural tensor networks (rRNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations using the Penn Treebank corpus show that rRNTNs improve language model performance over standard RNNs using only a small fraction of the parameters of unrestricted RNTNs. 
Retainable Evaluator Execution Framework (REEF) 
REEF (Retainable Evaluator Execution Framework) is our approach to simplify and unify the lower layers of big data systems on modern resource managers. For managers like Apache YARN, Apache Mesos, Google Omega, and Facebook Corona, REEF provides a centralized control plane abstraction that can be used to build a decentralized data plane for supporting big data systems. Special consideration is given to graph computation and machine learning applications, both of which require data retention on allocated resources to execute multiple passes over the data. More broadly, applications that run on YARN will have the need for a variety of dataprocessing tasks e.g., data shuffle, group communication, aggregation, checkpointing, and many more. Rather than reimplement these for each application, REEF aims to provide them in a library form, so that they can be reused by higherlevel applications and tuned for a specific domain problem e.g., Machine Learning. In that sense, our longterm vision is that REEF will mature into a Big Data Application Server, that will host a variety of tool kits and applications, on modern resource managers. 
RethinkDB  RethinkDB is an open source noSQL database that stores JSON documents. This can be great for open ended data analytics. The company officially provides drivers for Ruby, Python and NodeJS and community supported drivers and ORMs are available in around a dozen languages. The production ready version 2.0 was released very recently on April 14, 2015 after 5 years of development. RethinkDB is a boon when it comes to writing real time applications. Traditionally applications had to poll data bases to get the updated data which made them slow and hard to maintain. RethinkDB’s architecture solves this problem by pushing the updated results of a query when they are available. Apart from solving real time data push problem RethinkDB offers many advantages such as: • Its advanced query language, ReQL, supports table joins and subqueries. The monitoring api also integrates with the query language, this makes scaling distributed databases very easy. • Unlike some previous noSQL systems RethinkDB never acknowledges a write until it’s safely written to the disk. • Additionally, the database supports Mapreduce functionality out of the box & would not need an additional Hadoop type software to run the analysis. http://…/rethinkdbforyouradvancedanalytics 
Return on Data Assets (RDA) 
The return on data assets is a measure of how efficiently an organization is able to generate profits from their inventory of data. Creating visual representations of the data is one emerging technique to help company owners make sense of the immense volumes of raw data within their organization. By having data properly represented, company owners make better business decisions such as revenue lines that can be leveraged, costs that can be eliminated, or divisions that should be shut down – all of this creates value, and ultimately leads to higher returns and a higher sales multiple when selling a company. 
Reverse CuthillMcKee (RCM) 
Ordering vertices of a graph is key to minimize fillin and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributedmemory architectures. In this paper, we present the firstever distributedmemory implementation of the reverse CuthillMcKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a twodimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices. 
revisit  In recent years there has been widespread concern in the scientific community over a reproducibility crisis. Among the major causes that have been identified is statistical: In many scientific research the statistical analysis (including data preparation) suffers from a lack of transparency and methodological problems, major obstructions to reproducibility. The revisit package aims toward remedying this problem, by generating a ‘software paper trail’ of the statistical operations applied to a dataset. This record can be ‘replayed’ for verification purposes, as well as be modified to enable alternative analyses. The software also issues warnings of certain kinds of potential errors in statistical methodology, again related to the reproducibility issue. 
RHIPE  RHIPE is a R package which provides an API to use Hadoop, similar to Rhadoop. RHIPE 
RHub  The infrastructure available for developing, building, testing, and validating R packages is of critical importance to the R community. CRAN and RForge have traditionally met these needs, however the maintenance and enhancement of RForge has significant costs in both money and time. This proposal outlines rhub, a service that is complementary to CRAN and RForge, that would add capabilities, improve extensibility, and create a platform for community contributions to rhub itself. 
Rich Component Analysis (RCA) 
In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might have one set of molecular observations and one set of physiological observations on the same group of individuals, and we want to quantify molecular patterns that are uncorrelated with physiology. Despite being a common problem, this is highly challenging when the correlations come from complex distributions. In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, highdimensional distribution. We introduce algorithms based on cumulant extraction that provably learn each of the components without having to model the other components. We show how to integrate RCA with stochastic gradient descent into a metaalgorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. Our method makes it possible to learn latent variable models when we don’t have samples from the true model but only samples after complex perturbations. 
Ridge Regression  Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of illposed problems. In statistics, the method is known as ridge regression, and, with multiple independent discoveries, it is also variously known as the TikhonovMiller method, the PhillipsTwomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the LevenbergMarquardt algorithm for nonlinear leastsquares problems. bigRR 
Ridge Regularized Linear Models (RRLM) 
Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. 
Ridgeline Plot  Ridgeline plots provide a convenient way of visualizing changes in distributions over time or space. ggridges 
RiemannTheta Boltzmann Machine  A general Boltzmann machine with continuous visible and discrete integer valued hidden states is introduced. Under mild assumptions about the connection matrices, the probability density function of the visible units can be solved for analytically, yielding a novel parametric density function involving a ratio of RiemannTheta functions. The conditional expectation of a hidden state for given visible states can also be calculated analytically, yielding a derivative of the logarithmic RiemannTheta function. The conditional expectation can be used as activation function in a feedforward neural network, thereby increasing the modelling capacity of the network. Both the Boltzmann machine and the derived feedforward neural network can be successfully trained via standard gradient and nongradientbased optimization techniques. 
RiskAverse Imitation Learning (RAIL) 
Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a stateoftheart algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectorycosts is often more heavytailed for GAILagents than the expert at a number of benchmark continuouscontrol tasks. Thus, highcost trajectories, corresponding to tailend events of catastrophic failure, are more likely to be encountered by the GAILagents than the expert. This makes the reliability of GAILagents questionable when it comes to deployment in safetycritical applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tailend events by minimizing tailrisk within the GAIL framework. We quantify tailrisk by the ConditionalValueatRisk (CVaR) of trajectories and develop the RiskAverse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tailend risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in safetycritical applications. 
Ristretto  Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bitwidth of network parameters and outputs of resourceintense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adderonly arithmetic. The tool finetunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be timeconsuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8bit. The code for Ristretto is available. 
River Definition Language (RDL) 
The primary goal with the River development model (and language – RDL) is to significantly improve the development experience of business software applications. This includes writing the application code, testing it and evolving/maintaining it over time. We also target various development scenarios with a range of variables: ondemand/on premise, oneoff projects vs. products, mobile/web, extensions of core, etc. 
RMSProp  RMSProp is an adaptative learning rate method. Divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. This is the minibatch version of just using the sign of the gradient. http://…/lecture_slides_lec6.pdf https://…/neuralnets http://…/rmsprop.html#tieleman2012rmsprop 
RobbinsMonro Algorithm  The RobbinsMonro algorithm, introduced in 1951 by Herbert Robbins and Sutton Monro, presented a methodology for solving a root finding problem, where the function is represented as an expected value. Assume that we have a function M(x), and a constant \alpha, such that the equation M(x) = \alpha has a unique root at x=\theta. It is assumed that while we cannot directly observe the function M(x), we can instead obtain measurements of the random variable N(x) where \mathbb E[N(x)] = M(x). 
Robinsonian Matrix  A Robinson (dis)similarity matrix is a symmetric matrix whose entries (increase) decrease monotonically along rows and columns when moving away from the diagonal, and such matrices arise in the classical seriation problem. 
Robotic Processing Automation (RPA) 
Robotic process automation (or RPA) is an emerging form of clerical process automation technology based on the notion of software robots or artificial intelligence (AI) workers. A software ‘robot’ is a software application that replicates the actions of a human being interacting with the user interface of a computer system. For example, the execution of data entry into an ERP system – or indeed a full endtoend business process – would be a typical activity for a software robot. The software robot operates on the user interface (UI) in the same way that a human would; this is a significant departure from traditional forms of IT integration which have historically been based on Application Programming Interfaces (or APIs) – that is to say, machinetomachine forms of communication based on data layers which operate at an architectural layer beneath the UI. 
Robust Adversarial Reinforcement Learning (RARL) 
Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RLbased approaches fail to generalize since: (a) the gap between simulation and real world is so large that policylearning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from Hinfinity control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced — that is, it learns an optimal destabilization policy. We formulate the policy learning as a zerosum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary. 
Robust Anomaly Detection (RAD) 
Outlier detection can be a pain point for all data driven companies, especially as data volumes grow. At Netflix we have multiple datasets growing by 10B+ record/day and so there’s a need for automated anomaly detection tools ensuring data quality and identifying suspicious anomalies. Today we are opensourcing our outlier detection function, called Robust Anomaly Detection (RAD), as part of our Surus project. As we built RAD we identified four generic challenges that are ubiquitous in outlier detection on “big data.” • High cardinality dimensions: High cardinality data sets – especially those with large combinatorial permutations of column groupings – makes human inspection impractical. • Minimizing False Positives: A successful anomaly detection tool must minimize false positives. In our experience there are many alerting platforms that “sound an alarm” that goes ultimately unresolved. The goal is to create alerting mechanisms that can be tuned to appropriately balance noise and information. • Seasonality: Hourly/Weekly/Biweekly/Monthly seasonal effects are common and can be misidentified as outliers deserving attention if not handled properly. Seasonal variability needs to be ignored. • Data is not always normally distributed: This has been a particular challenge since Netflix has been growing over the last 24 months. Generally though, an outlier tool must be robust so that it works on data that is not normally distributed. In addition to addressing the challenges above, we wanted a solution with a generic interface (supporting application development). We met these objectives with a novel algorithm encased in a wrapper for easy deployment in our ETL environment. 
Robust Compound Regression (RCR) 
The errorsinvariables (EIV) regression model, being more realistic by accounting for measurement errors in both the dependent and the independent variables, is widely adopted in applied sciences. The traditional EIV model estimators, however, can be highly biased by outliers and other departures from the underlying assumptions. In this paper, we develop a novel nonparametric regression approach – the robust compound regression (RCR) analysis method for the robust estimation of EIV models. We first introduce a robust and efficient estimator called least sine squares (LSS). Taking full advantage of both the new LSS method and the compound regression analysis method developed in our own group, we subsequently propose the RCR approach as a generalization of those two, which provides a robust counterpart of the entire class of the maximum likelihood estimation (MLE) solutions of the EIV model, in a 11 mapping. Technically, our approach gives users the flexibility to select from a class of RCR estimates the optimal one with a predefined regression efficiency criterion satisfied. Simulation studies and reallife examples are provided to illustrate the effectiveness of the RCR approach. 
Robust Decision Making (RDM) 
Robust decisionmaking is an iterative decision analytic framework that helps identify potential robust strategies, characterize the vulnerabilities of such strategies, and evaluate the tradeoffs among them. RDM focuses on informing decisions under conditions of what is called ‘deep uncertainty,’ that is, conditions where the parties to a decision do not know or do not agree on the system model(s) relating actions to consequences or the prior probability distributions for the key input parameters to those model(s). 
Robust Elastic Net (REN) 
We construct rich vector spaces of continuous functions with prescribed curved or linear pathwise quadratic variations. We also construct a class of functions whose quadratic variation may depend in a local and nonlinear way on the function value. These functions can then be used as integrators in F\’ollmer’s pathwise It\=o calculus. Our construction of the latter class of functions relies on an extension of the Doss–Sussman method to a class of nonlinear It\=o differential equations for the F\’ollmer integral. As an application, we provide a deterministic variant of the support theorem for diffusions. We also establish that many of the constructed functions are nowhere differentiable. 
Robust Intelligence (RI) 

Robust Matrix Elastic net Based Canonical Correlation Analysis (RMENCCA) 
This paper presents a robust matrix elastic net based canonical correlation analysis (RMENCCA) for multiple view unsupervised learning problems, which emphasizes the combination of CCA and the robust matrix elastic net (RMEN) used as coupled feature selection. The RMENCCA leverages the strength of the RMEN to distill naturally meaningful features without any prior assumption and to measure effectively correlations between different ‘views’. We can further employ directly the kernel trick to extend the RMENCCA to the kernel scenario with theoretical guarantees, which takes advantage of the kernel trick for highly complicated nonlinear feature learning. Rather than simply incorporating existing regularization minimization terms into CCA, this paper provides a new learning paradigm for CCA and is the first to derive a coupled feature selection based CCA algorithm that guarantees convergence. More significantly, for CCA, the newlyderived RMENCCA bridges the gap between measurement of relevance and coupled feature selection. Moreover, it is nontrivial to tackle directly the RMENCCA by previous optimization approaches derived from its sophisticated model architecture. Therefore, this paper further offers a bridge between a new optimization problem and an existing efficient iterative approach. As a consequence, the RMENCCA can overcome the limitation of CCA and address largescale and streaming data problems. Experimental results on four popular competing datasets illustrate that the RMENCCA performs more effectively and efficiently than do stateoftheart approaches. 
Robust Mixture Discriminant Analysis (RMDA) 
robustDA 
Robust Multiple Signal Classification (MUSIC) 
In this paper, we introduce a new framework for robust multiple signal classification (MUSIC). The proposed framework, called robust measuretransformed (MT) MUSIC, is based on applying a transform to the probability distribution of the received signals, i.e., transformation of the probability measure defined on the observation space. In robust MTMUSIC, the sample covariance is replaced by the empirical MTcovariance. By judicious choice of the transform we show that: 1) the resulting empirical MTcovariance is Brobust, with bounded influence function that takes negligible values for large norm outliers, and 2) under the assumption of spherically contoured noise distribution, the noise subspace can be determined from the eigendecomposition of the MTcovariance. Furthermore, we derive a new robust measuretransformed minimum description length (MDL) criterion for estimating the number of signals, and extend the MTMUSIC framework to the case of coherent signals. The proposed approach is illustrated in simulation examples that show its advantages as compared to other robust MUSIC and MDL generalizations. 
Robust Optimization  Robust optimization is a field of optimization theory that deals with optimization problems in which a certain measure of robustness is sought against uncertainty that can be represented as deterministic variability in the value of the parameters of the problem itself and/or its solution. 
Robust Principal Component Analysis (RPCA) 
Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure Principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a lowrank matrix L0 from highly corrupted measurements M = L0 +S0. This decomposition in lowrank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), Stable PCP, Quantized PCP , Block based PCP, and Local PCP. Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM), Alternating Direction Method (ADM), Fast Alternating Minimization (FAM) or Iteratively Reweighted Least Squares. Bouwmans and Zahzah have made a complete survey in 2014. 
Robust Principal Component Analysis (ROBPCA) 
We introduce a new method for robust principal component analysis (PCA). Classical PCA is based on the empirical covariance matrix of the data and hence is highly sensitive to outlying observations. Two robust approaches have been developed to date. The first approach is based on the eigenvectors of a robust scatter matrix such as the minimum covariance determinant or an Sestimator and is limited to relatively lowdimensional data. The second approach is based on projection pursuit and can handle highdimensional data. Here we propose the ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation. ROBPCA yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data. ROBPCA can be computed rapidly, and is able to detect exactfit situations. As a byproduct, ROBPCA produces a diagnostic plot that displays and classifies the outliers. We apply the algorithm to several datasets from chemometrics and engineering. 
Robust Regression  In robust statistics, robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and nonparametric methods. Regression analysis seeks to find the relationship between one or more independent variables and a dependent variable. Certain widely used methods of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results if those assumptions are not true; thus ordinary least squares is said to be not robust to violations of its assumptions. Robust regression methods are designed to be not overly affected by violations of assumptions by the underlying datagenerating process. In particular, least squares estimates for regression models are highly sensitive to (not robust against) outliers. While there is no precise definition of an outlier, outliers are observations which do not follow the pattern of the other observations. This is not normally a problem if the outlier is simply an extreme observation drawn from the tail of a normal distribution, but if the outlier results from nonnormal measurement error or some other violation of standard ordinary least squares assumptions, then it compromises the validity of the regression results if a nonrobust regression technique is used. 
Robust Representation Learning  Book: Robust Representation for Data Analytics 
Robust Sparse Principal Component Analysis (ROSPCA) 
A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuitbased algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time. rospca 
Robust Statistics  Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standarddeviations, for example, one and three; under this model, nonrobust methods like a ttest work badly. 
Robust Trimmed Clustering (RTC) 
tclust 
Robust Variable Power Fractional LMS Algorithm (RVPFLMS) 
In this paper, we propose an adaptive framework for the variable power of the fractional least mean square (FLMS) algorithm. The proposed algorithm named as robust variable power FLMS (RVPFLMS) dynamically adapts the fractional power of the FLMS to achieve high convergence rate with low steady state error. For the evaluation purpose, the problems of system identification and channel equalization are considered. The experiments clearly show that the proposed approach achieves better convergence rate and lower steadystate error compared to the FLMS. The MATLAB code for the related simulation is available online at https://goo.gl/dGTGmP. 
Robust Variable Step Size – Fractional Least Mean Square (RVSSFLMS) 
In this paper, we propose an adaptive framework for the variable step size of the fractional least mean square (FLMS) algorithm. The proposed algorithm named the robust variable step sizeFLMS (RVSSFLMS), dynamically updates the step size of the FLMS to achieve high convergence rate with low steady state error. For the evaluation purpose, the problem of system identification is considered. The experiments clearly show that the proposed approach achieves better convergence rate compared to the FLMS and adaptive stepsize modified FLMS (AMFLMS). 
Robustness  In computer science, robustness is the ability of a computer system to cope with errors during execution. Robustness can also be defined as the ability of an algorithm to continue operating despite abnormalities in input, calculations, etc. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network. Formal techniques, such as fuzz testing, are essential to showing robustness since this type of testing involves invalid or unexpected inputs. Alternatively, fault injection can be used to test robustness. Various commercial products perform robustness testing of software systems, and is a process of failure assessment analysis. 
Rodeo  Rodeo is a data centric IDE for Python. You can think of it as an alternative UI to the notebook for the IPython Kernel. It’s heavily inspired by great projects like Sublime Text and Eclipse. http://…/introducingrodeo.html 
Rolling Entry Matching  rollmatch 
Rolling Forecast  With a rolling forecast the number of periods in the forecast remain constant so that if for example the periods of your forecast are monthly for 12 months then as each month is traded it drops out of the forecast and another month is added onto the end of the forecast so you are always forecasting 12 monthly periods out into the future. 
Root Cause Analysis (RCA) 
RCA practice solve problems by attempting to identify and correct the root causes of events, as opposed to simply addressing their symptoms. Focusing correction on root causes has the goal of preventing problem recurrence. RCFA (Root Cause Failure Analysis) recognizes that complete prevention of recurrence by one corrective action is not always possible. Conversely, there may be several effective measures (methods) that address the root causes of a problem. Thus, RCA is an iterative process and a tool of continuous improvement. RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a preemptive method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management. 
Root Cause Analysis Solver Engine (RCASE) 
Root Cause Analysis Solver Engine (informally RCASE) is a proprietary algorithm developed from research originally at the Warwick Manufacturing Group (WMG) at Warwick University. RCASE development commenced in 2003 to provide an automated version of root cause analysis, the method of problem solving that tries to identify the root causes of faults or problems. 
Root Mean Square Error (RMSE) 
Taking the square root of MSE yields the rootmeansquare error or rootmeansquare deviation (RMSE or RMSD), which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard deviation. 
Root Mean Squared Logarithmic Error (RMSLE) 
The evaluation metric that Kaggle uses to rank competing algorithms is the Root Mean Squared Logarithmic Error (RMSLE). 
Rosette  Rosette is an API for multilingual text analysis and information extraction. rosetteApi 
Rosner’s Outlier Test  This test will detect outliers that are either much smaller or much larger than the rest of the data. Rosner’s approach is designed to avoid the problem of masking, where an outlier that is close in value to another outlier can go undetected. Rosner’s test is appropriate only when the data, excluding the suspected outliers, are approximately normally distributed, and when the sample size is greater than or equal to 25. Data should not be excluded from analysis solely on the basis of the results of this or any other statistical test. If any values are flagged as possible outliers, further investigation is recommended to determine whether there is a plausible explanation that justifies removing or replacing them. 
Rotation Equivariant Vector Field Networks  We propose a method to encode rotation equivariance or invariance into convolutional neural networks (CNNs). Each convolutional filter is applied with several orientations and returns a vector field that represents the magnitude and angle of the highest scoring rotation at the given spatial location. To propagate information about the main orientation of the different features to each layer in the network, we propose an enriched orientation pooling, i.e. max and argmax operators over the orientation space, allowing to keep the dimensionality of the feature maps low and to propagate only useful information. We name this approach RotEqNet. We apply RotEqNet to three datasets: first, a rotation invariant classification problem, the MNISTrot benchmark, in which we improve over the stateoftheart results. Then, a neuron membrane segmentation benchmark, where we show that RotEqNet can be applied successfully to obtain equivariance to rotation with a simple fully convolutional architecture. Finally, we improve significantly the stateoftheart on the problem of estimating cars’ absolute orientation in aerial images, a problem where the output is required to be covariant with respect to the object’s orientation. 
Rotation Forest  Rotation forest is an ensemble method where each base classifier (tree) is fit on the principal components of the variables of random partitions of the feature set. A method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and principal component analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name ‘forest’. Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the rotation forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with bagging, AdaBoost, and random forest. The results were favorable to rotation forest and prompted an investigation into diversityaccuracy landscape of the ensemble models. Diversityerror diagrams revealed that rotation forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and random forest, and more diverse than these in bagging, sometimes more accurate as well. http://…/01677518.pdf?arnumber=1677518 http://…/Rotation%20Forest.ppt http://…/9K_ANovel.pdf Rotation Forest rotationForest 
Rotation Invariance Neural Network  Rotation invariance and translation invariance have great values in image recognition tasks. In this paper, we bring a new architecture in convolutional neural network (CNN) named cyclic convolutional layer to achieve rotation invariance in 2D symbol recognition. We can also get the position and orientation of the 2D symbol by the network to achieve detection purpose for multiple nonoverlap target. Last but not least, this architecture can achieve oneshot learning in some cases using those invariance. 
Rough Set  In computer science, a rough set, first described by Polish computer scientist Zdzislaw I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the lower and the upper approximation of the original set. In the standard version of rough set theory (Pawlak 1991), the lower and upperapproximation sets are crisp sets, but in other variations, the approximating sets may be fuzzy sets. RoughSetKnowledgeReduction 
RQDA  RDQA is a R package for Qualitative Data Analysis, a free (free as freedom) qualitative analysis software application (BSD license). It works on Windows, Linux/FreeBSD and the Mac OSX platforms. RQDA is an easy to use tool to assist in the analysis of textual data. At the moment it only supports plain text formatted data. All the information is stored in a SQLite database via the R package of RSQLite. The GUI is based on RGtk2, via the aid of gWidgetsRGtk2. It includes a number of standard ComputerAided Qualitative Data Analysis features. In addition it seamlessly integrates with R, which means that a) statistical analysis on the coding is possible, and b) functions for data manipulation and analysis can be easily extended by writing R functions. To some extent, RQDA and R make an integrated platform for both quantitative and qualitative data analysis. 
RSquared Value (RSQ) 
RSquared Value (RSQ), described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. mcca 
RStudio Connect  RStudio Connect is a new publishing platform for the work your teams create in R. Share Shiny applications, R Markdown reports, dashboards, plots, and more in one convenient place. Use pushbutton publishing from the RStudio IDE, scheduled execution of reports, and flexible security policies to bring the power of data science to your entire enterprise. 
RTree  Rtrees are tree data structures used for spatial access methods, i.e., for indexing multidimensional information such as geographical coordinates, rectangles or polygons. The Rtree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common realworld usage for an Rtree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as ‘Find all museums within 2 km of my current location’, ‘retrieve all road segments within 2 km of my location’ (to display them in a navigation system) or ‘find the nearest gas station’ (although not taking roads into account). The Rtree can also accelerate nearest neighbor search for various distance metrics, including greatcircle distance. 
RubnerTavan Network  
Rug Plot  A rug plot is a compact way of illustrating the marginal distributions of a variable along x and y. Positions of the data points along x and y are denoted by tick marks, reminiscent of the tassels on a rug. Known Issues: Rug marks are overlaid onto the same axis as the original data. Changing the axis dimensions after calling rug will therefore cause the tick marks to become disassociated from the axes. http://…skerneldensityestimationandrugplots 
Rule of Five (RO5) 
There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population. 
RuleFit  The RuleFit algorithm from Friedman and Propescu is an interesting regression and classification approach that uses decision rules in a linear model. RuleFit: When disassembled trees meet Lasso Rule based Learning Ensembles 
RuleGuided Embedding (RUGE) 
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a onetime injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose RuleGuided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over stateoftheart baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://…/RUGE. 
Rupture Detection  There are some graphs that you cannot forget. One graph that I found puzzling was mentioned on Andrew Gelman’s blog, a few years back, and was related to rupture detection. What I remember from this graph is that if you want to get a rupture, you can easily find one… 
ruptures  ruptures is a Python library for offline change point detection. This package provides methods for the analysis and segmentation of nonstationary signals. Implemented algorithms include exact and approximate detection for various parametric and nonparametric models. ruptures focuses on ease of use by providing a welldocumented and consistent interface. In addition, thanks to its modular structure, different algorithms and models can be connected and extended within this package. 
RusselRao Distance  RussellRao dissimilarity between two Boolean vectors 
rValue  Given a large collection of measurement units, the rvalue, r, of a particular unit is a reported percentile that may be interpreted as the smallest percentile at which the unit should be placed in the top rfraction of units. rvalues 
Advertisements