Eager Execution  Eager execution is an imperative, definebyrun interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive. The benefits of eager execution include: • Fast debugging with immediate runtime errors and integration with Python tools • Support for dynamic models using easytouse Python control flow • Strong support for custom and higherorder gradients • Almost all of the available TensorFlow operations Eager execution is available now as an experimental feature, so we’re looking for feedback from the community to guide our direction. ➘ “Tensorflow” 
Eager Learning  In artificial intelligence, eager learning is a learning method in which the system tries to construct a general, input independent target function during training of the system, as opposed to lazy learning, where generalization beyond the training data is delayed until a query is made to the system. The main advantage gained in employing an eager learning method, such as an artificial neural network, is that the target function will be approximated globally during training, thus requiring much less space than a lazy learning system. Eager learning systems also deal much better with noise in the training data. Eager learning is an example of offline learning, in which posttraining queries to the system have no effect on the system itself, and thus the same query to the system will always produce the same result. The main disadvantage with eager learning is that it is generally unable to provide good local approximations in the target function. 
EARL  In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework, called EARL, which performs entity linking and relation linking as a joint single task. EARL is modeled on an optimised variation of GeneralisedTravelling Salesperson Problem. The system determines the best semantic connection between all keywords of the question by referring to the knowledge graph. This is achieved by exploiting the connection density between entity candidates and relation candidates. We have empirically evaluated the framework on a dataset with 3000 complex questions. Our system surpasses stateoftheart scores for entity linking task by reporting an accuracy of 0.67against 0.40 from the next best entity linker 
Early Stopping  In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner’s performance on data outside of the training set. Past that point, however, improving the learner’s fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to overfit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation. 
Earnings Before Interest, Taxes, Depreciation and Amortization (EBIDTA) 
A company’s earnings before interest, taxes, depreciation, and amortization (EBITDA) is an accounting metric computed by considering a company’s earnings before interest payments, tax, depreciation, and amortization are subtracted for any final accounting of its income and expenses. The EBITDA of a business gives an indication of its current operational profitability, i.e., how much profit it makes with its present assets and its operations on the products it produces and sells. 
Earth Mover’s Distance (EMD) 
In computer science, the earth mover’s distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved. The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions. ➘ “Wasserstein Metric” 
EC3  Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multiclass classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 wellknown standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method. 
Eclat Algorithm  The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains. The basic idea for the eclat algorithm is use tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree. 
Ecological Regression  Ecological regression is a statistical technique used especially in political science and history to estimate group voting behavior from aggregate data. For example, if counties have a known Democratic vote (in percentage) D, and a known percentage of Catholics, C, then run the linear regression of dependent variable D against independent variable C. This gives D = a + bC. When C = 1 (100% Catholic) this gives the estimated Democratic vote as a+b. When C = 0 (0% Catholic), this gives the estimated nonCatholic vote as a. For example, if the regression gives D = .22 + .45C, then the estimated Catholic vote is 67% Democratic and the nonCatholic vote is 22% Democratic. The technique has been often used in litigation brought under the Voting Rights Act of 1965 to see how blacks and whites voted. 
Econometrics  Econometrics is the application of mathematics, statistical methods, and, more recently, computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations. More precisely, it is “the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference.” An introductory economics textbook describes econometrics as allowing economists “to sift through mountains of data to extract simple relationships.” The first known use of the term “econometrics” (in cognate form) was by Polish economist Pawel Ciompa in 1910. Ragnar Frisch is credited with coining the term in the sense in which it is used today. Econometrics is the intersection of economics, mathematics, and statistics. Econometrics adds empirical content to economic theory allowing theories to be tested and used for forecasting and policy evaluation. 
Edgeworth Series  The GramCharlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms (and thus the accuracy of truncating the series) differ. http://…/EdgeworthSeries.html EW 
EDISON Data Science Framework (EDSF) 
The EDISON Data Science Framework is a collection of documents that define the Data Science profession. Freely available, these documents have been developed to guide educators and trainers, emplyers and managers, and Data Scientists themselves. This collection of documents collectively breakdown the complexity of the skills and competences need to define Data Science as a professional practice. 
EDivisive with Medians (EDM) 
EDivisive with Medians (EDM) – employs energy statistics to detect divergence in mean. Note that EDM can also be used detect change in distribution in a given time series. EDM uses robust statistical metrics, viz., median, and estimates the statistical significance of a breakout through a permutation test. In addition, EDM is nonparametric. This is important since the distribution of production data seldom (if at all) follows the commonly assumed normal distribution or any other widely accepted model. 
Educational Data Mining (EDM) 
Educational Data Mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems). At a high level, the field seeks to develop and improve methods for exploring this data, which often has multiple levels of meaningful hierarchy, in order to discover new insights about how people learn in the context of such settings. In doing so, EDM has contributed to theories of learning investigated by researchers in educational psychology and the learning sciences. The field is closely tied to that of learning analytics, and the two have been compared and contrasted. 
Edward  Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward’s design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model’s fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale. 
Eesen Framework (Eesen) 
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build stateoftheart ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting contextindependent targets (phonemes or characters). To remove the need for pregenerated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finitestate transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly. 
Effect Size  In statistics, an effect size is a quantitative measure of the strength of a phenomenon. Examples of effect sizes are the correlation between two variables, the regression coefficient, the mean difference, or even the risk with which something happens, such as how many people survive after a heart attack for every one person that does not survive. For each type of effectsize, a larger absolute value always indicates a stronger effect. Effect sizes complement statistical hypothesis testing, and play an important role in statistical power analyses, sample size planning, and in metaanalyses. Especially in metaanalysis, where the purpose is to combine multiple effectsizes, the standard error of effectsize is of critical importance. The S.E. of effectsize is used to weight effectsizes when combining studies, so that large studies are considered more important than small studies in the analysis. The S.E. of effectsize is calculated differently for each type of effectsize, but generally only requires knowing the study’s sample size (N), or the number of observations in each group (n’s). Reporting effect sizes is considered good practice when presenting empirical research findings in many fields. The reporting of effect sizes facilitates the interpretation of the substantive, as opposed to the statistical, significance of a research result. Effect sizes are particularly prominent in social and medical research. Relative and absolute measures of effect size convey different information, and can be used complementarily. 
Effective Applications of the R Language (EARL) 
EARL is a Conference for users and developers of the open source R programming language. The primary focus of the Conference will be the commercial usage of R across a range of industry sectors with the aim of sharing knowledge and applications of the language. The EARL Conference Team is located at Mango Solutions, a data analysis company headquartered in the UK. 
Ehlers’s Autocorrelation Periodogram  The point of the Ehlers Autocorrelation Periodogram is to dynamically set a period between a minimum and a maximum period length. While I leave the exact explanation of the mechanic to Dr. Ehlers’s book, for all practical intents and purposes, in my opinion, the punchline of this method is to attempt to remove a massive source of overfitting from trading system creationnamely specifying a lookback period. 
Eigen  Eigen is a highlevel C++ library of template headers for linear algebra, matrix and vector operations, numerical solvers and related algorithms. Eigen is an open source library licensed under MPL2 starting from version 3.1.1. Earlier versions were licensed under LGPL3+. Eigen is often noted for its elegant API, versatile fixed and dynamic matrix capabilities and a range of dense and sparse solvers. To achieve high performance, Eigen utilizes explicit vectorization for the SSE 2/3/4, ARM NEON, and AltiVec instruction sets. RcppEigen 
Eigenface  Eigenfaces is the name given to a set of eigenvectors when they are used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the highdimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. 
EigenRec  Sparsity presents one of the major challenges of Collaborative Filtering. Graphbased methods are known to alleviate its effects, however their use is often computationally prohibitive; LatentFactor methods, on the other hand, present a reasonable and viable alternative. In this paper, we introduce EigenRec; a versatile and efficient LatentFactor framework for TopN Recommendations, that generalizes the wellknown PureSVD algorithm (a) providing intuition about its inner structure, (b) paving the path towards improving its efficacy and, at the same time, (c) reducing its complexity. One of our central goals in this work is to ensure the applicability of our method in realistic bigdata scenarios. To this end, we propose building our model using a computationally efficient Lanczosbased procedure, we discuss its Parallel Implementation in distributed computing environments, and we verify its favourable performance using realworld datasets. Furthermore, from a qualitative point of view, a comprehensive set of experiments on the MovieLens and the Yahoo!R2Music datasets based on widely applied performance metrics, indicate that EigenRec outperforms several stateoftheart algorithms, in terms of Standard and LongTail recommendation accuracy, exhibiting low susceptibility to sparsity, even in its most extreme manifestations the ColdStart problems. 
Eigenvalues, Eigenvectors  An eigenvector of a square matrix is a nonzero vector that, when the matrix is multiplied by , yields a constant multiple of , the multiplier being commonly denoted by d. That is Av = dv. The number d is called the eigenvalue of A corresponding to v. 
Eikosogram  Eikosograms provide a nice visual representation of statistical correlation, because when the two variables are independent, then the value of one, say X, doesn’t affect the probability of the second, Y. This visually translates into a horizontal pattern, which easily contrasts with a staircase shape that occurs when the variables are dependent or correlated. http://…/eikosograms 
Elastic Net Regularization  In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. 
Elasticsearch  Elasticsearch is a search server based on Lucene. It provides a distributed, multitenantcapable fulltext search engine with a RESTful web interface and schemafree JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine. 
Elasticsearch, Logstash and Kibana (ELK Stack) 
ELK stands for Elasticsearch, Logstash and Kibana. Brief definitions: Logstash: It is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. It is fully free and fully open source. Elasticsearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenantcapable fulltext search engine with a RESTful web interface and schemafree JSON documents. Kibana: A nifty tool to visualize logs and timestamped data. 
Eligibility Traces  Eligibility traces are one of the basic mechanisms of reinforcement learning. For example, in the popular TD(lambda) algorithm, the lambda refers to the use of an eligibility trace. Almost any temporaldifference (TD) method, such as Qlearning or Sarsa, can be combined with eligibility traces to obtain a more general method that may learn more efficiently. There are two ways to view eligibility traces. The more theoretical view, which we emphasize here, is that they are a bridge from TD to Monte Carlo methods. When TD methods are augmented with eligibility traces, they produce a family of methods spanning a spectrum that has Monte Carlo methods at one end and onestep TD methods at the other. In between are intermediate methods that are often better than either extreme method. In this sense eligibility traces unify TD and Monte Carlo methods in a valuable and revealing way. The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment. 
ELimination Et Choix Traduisant la REalité (ELECTRE) 
ELECTRE is a family of multicriteria decision analysis methods that originated in Europe in the mid1960s. The acronym ELECTRE stands for: ELimination Et Choix Traduisant la REalité (ELimination and Choice Expressing REality). The method was first proposed by Bernard Roy and his colleagues at SEMA consultancy company. A team at SEMA was working on the concrete, multiple criteria, realworld problem of how firms could decide on new activities and had encountered problems using a weighted sum technique. Bernard Roy was called in as a consultant and the group devised the ELECTRE method. As it was first applied in 1965, the ELECTRE method was to choose the best action(s) from a given set of actions, but it was soon applied to three main problems: choosing, ranking and sorting. The method became more widely known when a paper by B. Roy appeared in a French operations research journal. It evolved into ELECTRE I (electre one) and the evolutions have continued with ELECTRE II, ELECTRE III, ELECTRE IV, ELECTRE IS and ELECTRE TRI (electre tree), to mention a few. Bernard Roy is widely recognized as the father of the ELECTRE method, which was one of the earliest approaches in what is sometimes known as the French School of decision making. It is usually classified as an “outranking method” of decision making. There are two main parts to an ELECTRE application: first, the construction of one or several outranking relations, which aims at comparing in a comprehensive way each pair of actions; second, an exploitation procedure that elaborates on the recommendations obtained in the first phase. The nature of the recommendation depends on the problem being addressed: choosing, ranking or sorting. Usually the Electre Methods are used to discard some alternatives to the problem, which are unacceptable. After that we can use another MCDA to select the best one. The Advantage of using the Electre Methods before is that we can apply another MCDA with a restricted set of alternatives saving much time. Criteria in ELECTRE methods have two distinct sets of parameters: the importance coefficients and the veto thresholds. OutrankingTools 
Elite Based Guided Local Search (EBGLS) 
Local search is a basic building block in memetic algorithms. Guided Local Search (GLS) can improve the efficiency of local search. By changing the guide function, GLS guides a local search to escape from locally optimal solutions and find better solutions. The key component of GLS is its penalizing mechanism which determines which feature is selected to penalize when the search is trapped in a locally optimal solution. The original GLS penalizing mechanism only makes use of the cost and the current penalty value of each feature. It is well known that many combinatorial optimization problems have a big valley structure, i.e., the better a solution is, the more the chance it is closer to a globally optimal solution. This paper proposes to use big valley structure assumption to improve the GLS penalizing mechanism. An improved GLS algorithm called Elite Biased GLS (EBGLS) is proposed. EBGLS records and maintains an elite solution as an estimate of the globally optimal solutions, and reduces the chance of penalizing the features in this solution. We have systematically tested the proposed algorithm on the symmetric traveling salesman problem. Experimental results show that EBGLS is significantly better than GLS. ➘ “Guided Local Search” 
Ellipsoid Method for Linear Programming  In this paper, ellipsoid method for linear programming is derived using only minimal knowledge of algebra and matrices. Unfortunately, most authors first describe the algorithm, then later prove its correctness, which requires a good knowledge of linear algebra. 
Elo Rating System  The Elo rating system is a method for calculating the relative skill levels of players in competitorversuscompetitor games such as chess. It is named after its creator Arpad Elo, a Hungarianborn American physics professor. The Elo system was invented as an improved chess rating system and is also used in many other games. It has also been adapted for use as a rating system for multiplayer competition in a number of video games, and has been adapted to team sports including soccer (association football), American college football, basketball, Major League Baseball, competitive programming, and ESports. The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other multiple times are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent’s is expected to win 64% of the time; if the difference is 200 points, then the expected win proportion for the stronger player is 76%. 
EmbeddedGraph  In this paper, we propose a new type of graph, denoted as ’embeddedgraph’, and its theory, which employs a distributed representation to describe the relations on the graph edges. Embeddedgraphs can express linguistic and complicated relations, which cannot be expressed by the existing edgegraphs or weightedgraphs. We introduce the mathematical definition of embeddedgraph, translation, edge distance, and graph similarity. We can transform an embeddedgraph into a weightedgraph and a weightedgraph into an edgegraph by the translation method and by threshold calculation, respectively. The edge distance of an embeddedgraph is a distance based on the components of a target vector, and it is calculated through cosine similarity with the target vector. The graph similarity is obtained considering the relations with linguistic complexity. In addition, we provide some examples and data structures for embeddedgraphs in this paper. 
Emotional Chatting Machine (EMC) 
Emotional intelligence is one of the key factors to the success of dialogue systems or conversational agents. In this paper, we propose Emotional Chatting Machine (ECM) which generates responses that are appropriate not only at the content level (relevant and grammatical) but also at the emotion level (consistent emotional expression). To the best of our knowledge, this is the first work that addresses the emotion factor in largescale conversation generation. ECM addresses the factor in three ways: modeling highlevel abstraction of emotion expression by embedding emotion categories, changing of implicit internal emotion states, and using explicit emotion expressions with an external emotion vocabulary. Experiments show that our model can generate responses appropriate not only in content but also in emotion. 
Emphatic TemporalDifference Learning Algorithm (ETD) 
In this paper we present the first empirical study of the emphatic temporaldifference learning algorithm (ETD), comparing it with conventional temporaldifference learning, in particular, with linear TD(0), on onpolicy and offpolicy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under \emph{off}policy training (Sutton, Mahmood \& White 2016), but it is also a new algorithm for the \emph{on}policy case. In both our onpolicy and offpolicy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of ‘bounce’. In the offpolicy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower. 
Empirical Bayes Geometric Mean (EBGM) 
Adjusted estimate for the relative reporting ratio. Example: if EBGM=3.9 for acetaminophenhepatic failure, then this drugevent combination occurred in the data 3.9 times more frequently than expected under the assumption of no association between the drug and the event. openEBGM 
Empirical Likelihood (EL) 
Empirical likelihood (EL) is an estimation method in statistics. Empirical likelihood estimates require few assumptions about the error distribution compared to similar methods like maximum likelihood. EL can handle data well as long as it is independent and identically distributed (iid). EL performs well even when the distribution is asymmetric or censored. EL methods are also useful since they can easily incorporate constraints and prior information. Art Owen pioneered work in this area with his 1988 paper. 
Empirical Orthogonal Function Analysis (EOF) 
In statistics, EOF analysis is known as Principal Component Analysis (PCA). As such, EOF analysis is sometimes classified as a multivariate statistical technique. 
Empirical Orthogonal Teleconnections (EOT) 
Calculating functions empirically and orthogonally from a given spacetime dataset. The method is rooted in multiple linear regression and yields solutions that are orthogonal in one direction, either space or time. remote 
Enclosure Diagram  The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rectangles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. 
Encoder Based Lifelong Learning  This paper introduces a new lifelong learning solution where a single model is trained for a sequence of tasks. The main challenge that vision systems face in this context is catastrophic forgetting: as they tend to adapt to the most recently seen task, they lose performance on the tasks that were learned previously. Our method aims at preserving the knowledge of the previous tasks while learning a new one by using autoencoders. For each task, an undercomplete autoencoder is learned, capturing the features that are crucial for its achievement. When a new task is presented to the system, we prevent the reconstructions of the features with these autoencoders from changing, which has the effect of preserving the information on which the previous tasks are mainly relying. At the same time, the features are given space to adjust to the most recent environment as only their projection into a low dimension submanifold is controlled. The proposed system is evaluated on image classification tasks and shows a reduction of forgetting over the stateoftheart 
Encog  Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algoritms are multithreaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008. Encog: Library of Interchangeable Machine Learning Models for Java and C# 
Endogenous Variable  In a statistical model, a parameter or variable is said to be endogenous when there is a correlation between the parameter or variable and the error term. Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneity and omitted variables. Broadly, a loop of causality between the independent and dependent variables of a model leads to endogeneity. For example, in a simple supply and demand model, when predicting the quantity demanded in equilibrium, the price is endogenous because producers change their price in response to demand and consumers change their demand in response to price. In this case, the price variable is said to have total endogeneity once the demand and supply curves are known. In contrast, a change in consumer tastes or preferences would be an exogenous change on the demand curve. 
Energybased Exploration of Random Features (EERF) 
The randomizedfeature approach has been successfully employed in largescale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing datadependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomizedfeature approach in supervised learning for good generalizability. We propose the Energybased Exploration of Random Features (EERF) algorithm based on a datadependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the stateoftheart while introducing negligible preprocessing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters. 
EnergyNet  We present ENERGYNET , a new framework for analyzing and building artificial neural network architectures. Our approach adaptively learns the structure of the networks in an unsupervised manner. The methodology is based upon the theoretical guarantees of the energy function of restricted Boltzmann machines (RBM) of infinite number of nodes. We present experimental results to show that the final network adapts to the complexity of a given problem. 
ENet  The ability to perform pixelwise semantic segmentation in realtime is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long runtimes that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing stateoftheart methods, and the tradeoffs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster. 
Enhanced Least Absolute Shrinkage Operator (ELASSO) 
elasso 
Ensemble Bayesian Optimization (EBO) 
Bayesian Optimization (BO) has been shown to be a very effective paradigm for tackling hard blackbox and nonconvex optimization problems encountered in Machine Learning. Despite these successes, the computational complexity of the underlying function approximation has restricted the use of BO to problems that can be handled with less than a few thousand function evaluations. Harder problems like those involving functions operating in very high dimensional spaces may require hundreds of thousands or millions of evaluations or more and become computationally intractable to handle using standard Bayesian Optimization methods. In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Further, we represent each GP model using tile coding random features and an additive function structure. Our approach generates speedups by parallelizing the time consuming hyperparameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. Our extensive experimental evaluation shows that EBO can speed up the posterior inference between 23 orders of magnitude (400 times in one experiment) compared to the stateoftheart by putting data into Mondrian bins without sacrificing the sample quality. We demonstrate the ability of EBO to handle sampleintensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations. 
Ensemble Empirical Mode Decomposition (EEMD) 
This approach consists of sifting an ensemble of white noiseadded signal (data) and treats the mean as the final true result. Finite, not infinitesimal, amplitude white noise is necessary to force the ensemble to exhaust all possible solutions in the sifting process, thus making the different scale signals to collate in the proper intrinsic mode functions (IMF) dictated by the dyadic filter banks. As EEMD is a timespace analysis method, the added white noise is averaged out with sufficient number of trials; the only persistent part that survives the averaging process is the component of the signal (original data), which is then treated as the true and more physical meaningful answer. The effect of the added white noise is to provide a uniform reference frame in the timefrequency space; therefore, the added noise collates the portion of the signal of comparable scale in one IMF. With this ensemble mean, one can separate scales naturally without any a priori subjective criterion selection as in the intermittence test for the original EMD algorithm. This new approach utilizes the full advantage of the statistical characteristics of white noise to perturb the signal in its true solution neighborhood, and to cancel itself out after serving its purpose; therefore, it represents a substantial improvement over the original EMD and is a truly noiseassisted data analysis (NADA) method. Rlibeemd 
Ensemble Methods  In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble refers only to a concrete finite set of alternative models, but typically allows for much more flexible structure to exist between those alternatives. 
Ensemble Partial Least Squares Regression (EnPLS) 
enpls 
Enterprise Control Language / DataCentric Programming Language (ECL) 
ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process big data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions. 
Enterprise Data Hub (EDH) 
Organizations everywhere are grappling with how to manage their growing big data sets from ERP and ecommerce systems, log files, sensor data, social media and more. Apache Hadoop provides a costeffective enterprise data hub (EDH) to store, transform, cleanse, filter, analyze and gain new value from all kinds of data. 
Enterprise Information Flow (EIF) 
What is Enterprise Information Flow? The concept is closely connected to its neighboring disciplines: Information Flow, Data Lineage Analysis and Metadata Management. But it’s not the same. This new buzzword is only beginning to be recognized, so let’s get a head start. Information Flow focuses on information processing when it comes to security, throughput optimization and transporters; Data Lineage Analysis studies the way data is transferred between systems; and Metadata Management is all about metadata structure and purpose. Why do we need a new concept then? • Big data is getting really big. From internal company systems, social networks and external data from partners to automatically collected data – a huge amount of information needs to be properly dealt with. The high volume of data is of course connected to the high volume of contributing sources: dozens of online channels, the previously mentioned social networks, portable devices, blogs, news and video content. And every source needs to be correctly described, attributed and integrated into the company’s Enterprise Information Flow. • Systems are getting more and more complicated. EIF needs to be ready not only for big data coming from a wide variety of channels, but also for the many different ways data is transformed and processed inside the system. Old school transformation methods like ETL and SQL scripts are easy and usually wellaccounted for, but cracks start to show when it comes to the semantic analysis of nonstructured data, Google’s search algorithms, Facebook’s preferential algorithms, automated quality assurance scripts or artificial intelligence methods used for predictive analysis. When it comes to transformations, it’s critical to know how security and other specific attributes change. Another key point is deciding if the information is created or just transformed. • New routes between systems. The number of different ways to transfer data between systems is rapidly growing. Classic ETL and extract transfers are joined by more complicated systems based on SOA, PBM and ESB. It’s also necessary to be ready for new approaches like Data Federation and Logical Data Warehouse, where data saving is not persistent. • Different data types. It’s not about relational data or text anymore. You need to be ready for NoSQL databases, hyperlinks, video, graphics, xml, semistructured data and other types of information. In complicated environments like these, current solutions fail. New approaches need to be more complex, as the new systems are. It’s necessary to follow data not only on a physical level, but also through more layers of logical abstraction. Let’s sum it up into two main angles of Enterprise Information Flow: 1) New information necessary for decision making appears. Where does it come from? When was it created? Who’s responsible for its quality? 2) Who uses my information and how? Those two sets of questions are vital to Enterprise Information Flow which is a standard part of Enterprise Information Management. Any organization who takes its data seriously is searching for answers anyway, but EIF can provide a more comprehensive overview and merge existing solutions from currently separated fields into one complex policy. A complex solution is precisely what you need, when you’re dealing with complex systems. 
Entity Resolution (ER) 
Entity Resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a longstanding challenge in database management, information retrieval, machine learning, natural language processing and statistics. Ironically, different subdisciplines refer to it by a variety of names, including record linkage, deduplication, coreference resolution, reference reconciliation, object consolidation, identity uncertainty and database hardening. Accurate and fast ER has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on ER there is still a surprising diversity of approaches – including rule based methods, pairwise classification, clustering approaches, and richer forms of probabilistic inference – and a lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is only growing. We are inundated with more and more data that needs to be integrated, aligned and matched before further utility can be extracted. 
Entropy  In information theory, entropy is a measure of the uncertainty in a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Entropy is typically measured in bits, nats, or bans. Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content. Shannon entropy provides an absolute limit on the best possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identically distributed random variables. 
Entropy Agglomeration (EA) 
➘ “Entropy Agglomeration” 
EpsilonGreedy Algorithm  To get you started thinking algorithmically about the ExploreExploit dilemma, we’re going to teach you how to code up one of the simplest possible algorithms for trading off exploration and exploitation. This algorithm is called the epsilonGreedy algorithm. In computer science, a greedy algorithm is an algorithm that always takes whatever action seems best at the present moment, even when that decision might lead to bad long term consequences. The epsilonGreedy algorithm is almost a greedy algorithm because it generally exploits the best available option, but every once in a while the epsilonGreedy algorithm explores the other available options. As we’ll see, the term epsilon in the algorithm’s name refers to the odds that the algorithm explores instead of exploiting. Let’s be more specific. The epsilonGreedy algorithm works by randomly oscillating between Cynthia’s vision of purely randomized experimentation and Bob’s instinct to maximize profits. The epsilonGreedy algorithm is one of the easiest bandit algorithms to understand because it tries to be fair to the two opposite goals of exploration and exploitation by using a mechanism that even a little kid could understand: it just flips a coin. While there are a few details we’ll have to iron out to make that statement precise, the big idea behind the epsilonGreedy algorithm really is that simple: if you flip a coin and it comes up heads, you … 
Erase Rectified Linear Unit (EraseReLU) 
For most stateoftheart architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied by each layer. Although ReLU can ease the network training to an extent, the character of blocking negative values may suppress the propagation of useful information and leads to the difficulty of optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking of layers with nonlinear activations is hard to approximate the intrinsic linear transformations between feature representations. In this paper, we investigate the effect of erasing ReLUs of certain layers and apply it to various representative architectures. We name our approach as ‘EraseReLU’. It can ease the optimization and improve the generalization performance for very deep CNN models. In experiments, this method successfully improves the performance of various representative architectures, and we report the improved results on SVHN, CIFAR10/100, and ImageNet1k. By using EraseReLU, we achieve stateoftheart singlemodel performance on CIFAR100 with 83.47% accuracy. Codes will be released soon. 
Error Correction Model (ECM) 
An error correction model belongs to a category of multiple time series models most commonly used for data where the underlying variables have a longrun stochastic trend, also known as cointegration. ECMs are a theoreticallydriven approach useful for estimating both shortterm and longterm effects of one time series on another. The term errorcorrection relates to the fact that lastperiods deviation from a longrun equilibrium, the error, influences its shortrun dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables. ecm 
Error Matrix  
ErrorRobust MultiView Clustering (EMVC) 
In the era of big data, data may come from multiple sources, known as multiview data. Multiview clustering aims at generating better clusters by exploiting complementary and consistent information from multiple views rather than relying on the individual view. Due to inevitable system errors caused by datacaptured sensors or others, the data in each view may be erroneous. Various types of errors behave differently and inconsistently in each view. More precisely, error could exhibit as noise and corruptions in reality. Unfortunately, none of the existing multiview clustering approaches handle all of these error types. Consequently, their clustering performance is dramatically degraded. In this paper, we propose a novel Markov chain method for ErrorRobust MultiView Clustering (EMVC). By decomposing each view into a shared transition probability matrix and error matrix and imposing structured sparsityinducing norms on error matrices, we characterize and handle typical types of errors explicitly. To solve the challenging optimization problem, we propose a new efficient algorithm based on Augmented Lagrangian Multipliers and prove its convergence rigorously. Experimental results on various synthetic and realworld datasets show the superiority of the proposed EMVC method over the baseline methods and its robustness against different types of errors. 
Espresso  There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bitpacking and bitwise computations for efficient execution. These techniques provide a speedup of matrixmultiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso. 
Esri  Esri´s GIS (geographic information systems) mapping software helps you understand and visualize data to make decisions based on the best information and analysis. 
Essence Vector Model (EV) 
In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative lowdimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (DEV) model, is proposed. The DEV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition. 
Essential Histogram  The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins for this purpose. We construct a confidence set of distribution functions that optimally address the two main tasks of the histogram: estimating probabilities and detecting features such as increases and (anti)modes in the distribution. We define the essential histogram as the histogram in the confidence set with the fewest bins. Thus the essential histogram is the simplest visualization of the data that optimally achieves the main tasks of the histogram. We provide a fast algorithm for computing a slightly relaxed version of the essential histogram, which still possesses most of its beneficial theoretical properties, and we illustrate our methodology with examples. An Rpackage is available online. 
Estimability  http://…/lenth.pdf http://…/ch06.pdf estimability 
Euclidean Distance  In mathematics, the Euclidean distance or Euclidean metric is the “ordinary” distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. The associated norm is called the Euclidean norm. 
Event Driven Architeture (EDA) 
Eventdriven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events. An event can be defined as “a significant change in state”. For example, when a consumer purchases a car, the car’s state changes from “for sale” to “sold”. A car dealer’s system architecture may treat this state change as an event whose occurrence can be made known to other applications within the architecture. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission. 
Event History Analysis  Event history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finitestate Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability. eha 
Event Schema Induction (ESI) 

Event Stream Processing (ESP) 
Event stream processing, or ESP, is a set of technologies designed to assist the construction of eventdriven information systems. ESP technologies include event visualization, event databases, eventdriven middleware, and event processing languages, or complex event processing (CEP). In practice, the terms ESP and CEP are often used interchangeably. ESP deals with the task of processing streams of event data with the goal of identifying the meaningful pattern within those streams, employing techniques such as detection of relationships between multiple events, event correlation, event hierarchies, and other aspects such as causality, membership and timing. ESP enables many different applications such as algorithmic trading in financial services, RFID event processing applications, fraud detection, process monitoring, and locationbased services in telecommunications. 
Evidence based Data Analysis  What I think the statistical community needs to invest time and energy into is what I call “evidencebased data analysis”. What do I mean by this? Most data analyses are not the simple classroom exercises that we’ve all done involving linear regression or twosample ttests. Most of the time, you have to obtain the data, clean that data, remove outliers, impute missing values, transform variables and on and on, even before you fit any sort of model. Then there’s model selection, model fitting, diagnostics, sensitivity analysis, and more. So a data analysis is really pipeline of operations where the output of one stage becomes the input of another. The basic idea behind evidencebased data analysis is that for each stage of that pipeline, we should be using the best method, justified by appropriate statistical research that provides evidence favoring one method over another. If we cannot reasonable agree on a best method for a given stage in the pipeline, then we have a gap that needs to be filled. So we fill it! 
Evidence Lower Bound (ELBO) 
(Section: 2.2 The Evidence Lower Bound) Filtering Variational Objectives ➘ “Filtering Variational Objectives” 
EvidenceDriven StateMerging (EDSM) 
Human in the Loop: Interactive Passive Automata Learning via EvidenceDriven StateMerging Algorithms 
Evidential CMedoids (ECMdd) 
In this work, a new prototypebased clustering method named Evidential CMedoids (ECMdd), which belongs to the family of medoidbased clustering for proximity data, is proposed as an extension of Fuzzy CMedoids (FCMdd) on the theoretical framework of belief functions. In the application of FCMdd and original ECMdd, a single medoid (prototype), which is supposed to belong to the object set, is utilized to represent one class. For the sake of clarity, this kind of ECMdd using a single medoid is denoted by sECMdd. In real clustering applications, using only one pattern to capture or interpret a class may not adequately model different types of group structure and hence limits the clustering performance. In order to address this problem, a variation of ECMdd using multiple weighted medoids, denoted by wECMdd, is presented. Unlike sECMdd, in wECMdd objects in each cluster carry various weights describing their degree of representativeness for that class. This mechanism enables each class to be represented by more than one object. Experimental results in synthetic and real data sets clearly demonstrate the superiority of sECMdd and wECMdd. Moreover, the clustering results by wECMdd can provide richer information for the inner structure of the detected classes with the help of prototype weights. 
Evolution Strategy (ES) 
In computer science, an evolution strategy (ES) is an optimization technique based on ideas of adaptation and evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies. 
Evolutionary Algorithm (EA) 
In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic populationbased metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators. Artificial evolution (AE) describes a process involving individual evolutionary algorithms; EAs are individual components that participate in an AE. 
Evolutionary DEep Networks (EDEN) 
Deep neural networks continue to show improved performance with increasing depth, an encouraging trend that implies an explosion in the possible permutations of network architectures and hyperparameters for which there is little intuitive guidance. To address this increasing complexity, we propose Evolutionary DEep Networks (EDEN), a computationally efficient neuroevolutionary algorithm which interfaces to any deep neural network platform, such as TensorFlow. We show that EDEN evolves simple yet successful architectures built from embedding, 1D and 2D convolutional, max pooling and fully connected layers along with their hyperparameters. Evaluation of EDEN across seven image and sentiment classification datasets shows that it reliably finds good networks — and in three cases achieves stateoftheart results — even on a single GPU, in just 624 hours. Our study provides a first attempt at applying neuroevolution to the creation of 1D convolutional networks for sentiment analysis including the optimisation of the embedding layer. 
Evolving Fuzzy System (EFS) 
Evolving fuzzy systems (EFS) can be defined as selfdeveloping, selflearning fuzzy rulebased or neurofuzzy systems that have both their parameters but also (more importantly) their structure selfadapting online. They are usually associated with streaming data and online (often realtime) modes of operation. In a narrower sense they can be seen as adaptive fuzzy systems. The difference is that evolving fuzzy systems assume online adaptation of system structure in addition to the parameter adaptation which is usually associated with the term adaptive. They also allow for adaptation of the learning mechanism. Therefore, evolving assumes a higher level of adaptation. In this definition the English word evolving is used with its core meaning as described in the Oxford dictionary (Hornby, 1974; p.294), namely unfolding; developing; being developed, naturally and gradually. Often evolving is used in relation to so called evolutionary and genetic algorithms. The meaning of the term evolutionary is defined in the Oxford dictionary as development of more complicated forms of life (plants, animals) from earlier and simpler forms. EFS consider a gradual development of the underlying (fuzzy or neurofuzzy) system structure and do not deal with such phenomena specific for the evolutionary and genetic algorithms as chromosomes crossover, mutation, selection and reproduction, parents and offsprings. ➘ “Evolving Intelligent System” 
Evolving Intelligent System (EIS) 
The term Evolving was first used to describe an intelligent system in 1996 by B. Carse, T. Fogarty and A Munro for a fuzzy rulebased controller where its parameters and structure were learnt simultaneously using a Genetic Algorithm. Years later, alternative methods to learn an evolving intelligent system (EIS) via Incremental learning were suggested as a neurofuzzy algorithm by N. Kasabov in 1998 and a rulebased model by P. Angelov in 1999. EIS are usually associated with, streaming data and online (often realtime) modes of operation. They can be seen as adaptive intelligent systems. EIS assumes online adaptation of system structure in addition to the parameter adaptation which is usually associated with the term ‘incremental’ from Incremental learning. They have been studied as a methodological solution to learn from streaming data exhibiting nonstationary behaviours by M. SayedMouchaweh and E. Lughofer. An important subarea of EIS is represented by Evolving Fuzzy Systems (EFS) (a comprehensive survey written by E. Lughofer including realworld applications can be found in ), which rely on fuzzy systems architecture and incrementally update, evolve and prune fuzzy sets and fuzzy rules on demand and onthefly. One of the major strengths of EFS, compared to other forms of evolving system models, is that they are able to support some sort of interpretability and understandability for experts and users. This opens possibilities for enriched humanmachine interaction’s scenarios, where the users may ‘communicate’ with an online evolving system in form of knowledge exchange (active learning (machine learning) and teaching). This concept is currently motivated and discussed in the evolving systems community under the term HumanInspired Evolving Machines and respected as ‘one future’ generation of ‘EIS’. ➘ “PANFIS++” 
Exact Soft ConfidenceWeighted Learning  In this paper, we propose a new Soft ConfidenceWeighted (SCW) online learning scheme, which enables the conventional confidenceweighted learning method to handle nonseparable cases. Unlike the previous confidenceweighted learning algorithms, the proposed soft confidenceweighted learning method enjoys all the four salient properties: (i) large margin training, (ii) confidence weighting, (iii) capability to handle nonseparable data, and (iv) adaptive margin. Our experimental results show that the proposed SCW algorithms significantly outperform the original CW algorithm. When comparing with a variety of stateoftheart algorithms (including AROW, NAROW and NHERD), we found that SCW generally achieves better or at least comparable predictive accuracy, but enjoys significant advantage of computational efficiency (i.e., smaller number of updates and lower time cost). 
ExGUtils  The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are usually modeled through the exGaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element in the field. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the exGaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the exGaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done. 
Exogenous Variable  Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path analysis and structural equation modeling. The complementary concept is endogenous variable. 
Expectation Maximization (EM) 
In statistics, an expectationmaximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the loglikelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected loglikelihood found on the E step. These parameterestimates are then used to determine the distribution of the latent variables in the next E step. 
Expectation Propagation (EP) 
Expectation Propagation (EP) is a technique in Bayesian machine learning. EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution. It differs from other Bayesian approximation approaches such as Variational Bayesian methods. 
Expected Improvement (EI) 
Improving the Expected Improvement Algorithm 
Expected Policy Gradient (EPG) 
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadric critics and then extend it to an analytical method for the universal case, covering a broad class of actors and critics, including Gaussian, exponential families, and reparameterised policies with bounded support. For Gaussian policies, we show that it is optimal to explore using covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. EPG also provides a general framework for reasoning about policy gradient methods, which we use to establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we show that EPG outperforms existing approaches on six challenging domains involving the simulated control of physical systems. 
EXPected Similarity Estimation (EXPoSE) 
We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernelbased and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being significant faster than techniques with the same discriminant power. 
Expected Utility Hypothesis (EUH) 
In economics, game theory, and decision theory the expected utility hypothesis is a hypothesis concerning people’s preferences with regard to choices that have uncertain outcomes (gambles). This hypothesis states that if specific axioms are satisfied, the subjective value associated with an individual’s gamble is the statistical expectation of that individual’s valuations of the outcomes of that gamble. This hypothesis has proved useful to explain some popular choices that seem to contradict the expected value criterion (which takes into account only the sizes of the payouts and the probabilities of occurrence), such as occur in the contexts of gambling and insurance. Daniel Bernoulli initiated this hypothesis in 1738. Until the midtwentieth century, the standard term for the expected utility was the moral expectation, contrasted with ‘mathematical expectation’ for the expected value. The von NeumannMorgenstern utility theorem provides necessary and sufficient conditions under which the expected utility hypothesis holds. From relatively early on, it was accepted that some of these conditions would be violated by real decisionmakers in practice but that the conditions could be interpreted nonetheless as ‘axioms’ of rational choice. Work by Anand (1993) argues against this normative interpretation and shows that ‘rationality’ does not require transitivity, independence or completeness. This view is now referred to as the ‘modern view’ and Anand argues that despite the normative and evidential difficulties the general theory of decisionmaking based on expected utility is an insightful first order approximation that highlights some important fundamental principles of choice, even if it imposes conceptual and technical limits on analysis which need to be relaxed in real world settings where knowledge is less certain or preferences are more sophisticated. 
Expected Value of Partial Perfect Information (EVPPI) 
http://…/WP130003.pdf http://…/seqposterSMDMfina.pdf BCEA 
Expected Value of Perfect Information (EVPI) 
In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information. http://…/seqposterSMDMfina.pdf 
Expert Iteration  Solving sequential decision making problems, such as text parsing, robotic control, and game playing, requires a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration, a novel algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. In contrast, standard Deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that our method substantially outperforms Policy Gradients in the board game Hex, winning 84.4% of games against it when trained for equal time. 
Explainable Artificial Intelligence (XAI) 
Dramatic success in machine learning has led to a torrent of Artificial Intelligence (AI) applications. Continued advances promise to produce autonomous systems that will perceive, learn, decide, and act on their own. However, the effectiveness of these systems is limited by the machine’s current inability to explain their decisions and actions to human users. The Department of Defense is facing challenges that demand more intelligent, autonomous, and symbiotic systems. Explainable AI—especially explainable machine learning—will be essential if future warfighters are to understand, appropriately trust, and effectively manage an emerging generation of artificially intelligent machine partners. The Explainable AI (XAI) program aims to create a suite of machine learning techniques that: • Produce more explainable models, while maintaining a high level of learning performance (prediction accuracy); and • Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners. New machinelearning systems will have the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future. The strategy for achieving that goal is to develop new or modified machinelearning techniques that will produce more explainable models. These models will be combined with stateoftheart humancomputer interface techniques capable of translating models into understandable and useful explanation dialogues for the end user. Our strategy is to pursue a variety of techniques in order to generate a portfolio of methods that will provide future developers with a range of design options covering the performanceversusexplainability trade space. 
Explicit Semantic Analysis (ESA) 
In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tfidf matrix of the text corpus and a document (string of words) is represented as the centroid of the vectors representing its words. Typically, the text corpus is Wikipedia, though other corpora including the Open Directory Project have been used. ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization and has been used by this pair of researchers to compute what they refer to as ‘semantic relatedness’ by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of ‘concepts explicitly defined and described by humans’, where Wikipedia articles (or ODP entries, or otherwise titles of documents in the knowledge base corpus) are equated with concepts. The name ‘explicit semantic analysis’ contrasts with latent semantic analysis (LSA), because the use of a knowledge base makes it possible to assign humanreadable labels to the concepts that make up the vector space. ESA, as originally posited by Gabrilovich and Markovitch, operates under the assumption that the knowledge base contains topically orthogonal concepts. However, it was later shown by Anderka and Stein that ESA also improves the performance of information retrieval systems when it is based not on Wikipedia, but on the Reuters corpus of newswire articles, which does not satisfy the orthogonality property; in their experiments, Anderka and Stein used newswire stories as ‘concepts’. To explain this observation, links have been shown between ESA and the generalized vector space model. Gabrilovich and Markovitch replied to Anderka and Stein by pointing out that their experimental result was achieved using ‘a single application of ESA (text similarity)’ and ‘just a single, extremely small and homogenous test collection of 50 news documents’. Crosslanguage explicit semantic analysis (CLESA) is a multilingual generalization of ESA. CLESA exploits a documentaligned multilingual reference collection (e.g., again, Wikipedia) to represent a document as a languageindependent concept vector. The relatedness of two documents in different languages is assessed by the cosine similarity between the corresponding vector representations. http://…explicitsemanticanalysisesaexplained 
Exploration Potential  We introduce exploration potential, a quantity for that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem’s reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multiarmed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation. 
Explorative Landscape Analysis (ELA) 
Exploratory Landscape Analysis subsumes a number of techniques employed to obtain knowledge about the properties of an unknown optimization problem, especially insofar as these properties are important for the performance of optimization algorithms. Wher flacco 
Exploratory Data Analysis (EDA) 
In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. 
Exploratory Factor Analysis (EFA) 
In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors influence two or more measured variables, while each unique factor influences only one measured variable and does not explain correlations among measured variables. EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method. http://…/factornew.htm 
Exponential Moving Average (EMA) 
An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA), is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease. 
Exponential Random Graph Models (ERGM) 
Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks. Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression. Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking locallevel processes to globallevel properties. Degree Preserving Randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks. 
Exponential Smoothing  Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed data for presentation, or to make forecasts. The time series data themselves are a sequence of observations. The observed phenomenon may be an essentially random process, or it may be an orderly, but noisy, process. Whereas in the simple moving average the past observations are weighted equally, exponential smoothing assigns exponentially decreasing weights over time. Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of repeated measurements. The simplest form of exponential smoothing should be used only for data without any systematic trend or seasonal components. 
Extended Bayesian Information Criterion (EBIC) 
The ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this article, we reexamine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. The new criteria take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to in nity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayes information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayes information criteria are extremely useful for variable selection in problems with a moderate sample size but a huge number of covariates, especially in genomewide association studies, which are now an active area in genetics research. Some keywords: Bayesian paradigm; Consistency; Genomewide association study; Tour nament approach; Variable selection. 
Extended Kalman Filter (EKF) 
In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS. 
Extended Space Forest  The Extended Space Forest is a new method for decision tree construction in which training is done with input vectors including all the original features and their random combinations. The combinations are generated with a difference operator applied to random pairs of original features. The experimental results show that extended space versions of ensemble algorithms have better performance than the original ensemble algorithms. To investigate the success dynamics of the Extended Space Forest, the individual accuracy and diversity creation powers of ensemble algorithms are compared. The Extended Space Forest creates more diversity when it uses all the input features than Bagging and Rotation Forest. It also results in more individual accuracy when it uses random selection of the features than Random Subspace and Random Forest methods. It needs more training time because of using more features than the original algorithms. But its testing time is lower than the others because it generates less complex base learners. 
Exterior Distance Function (EDF) 
We introduce and study exterior distance function (EDF) and correspondent exterior point method (EPM) for convex optimization. The EDF is a classical Lagrangian for an equivalent problem obtained from the initial one by monotone transformation of both the objective function and the constraints. The constraints transformation is scaled by a positive scaling parameter. Thus, the EDF is a particular realization of the Nonlinear Rescaling (NR) principle. Along with the ‘center’, the EDF has two extra tools: the barrier (scaling) parameter and the vector of Lagrange multipliers. We show that EPM generates primal – dual sequence, which converges to the primal – dual solution in value under minimum assumption on the input data. Moreover, the convergence is taking place under any fixed interior point as a ‘center’ and any fixed positive scaling parameter, just due to the Lagrange multipliers update. If the second order sufficient optimality condition is satisfied, then the EPM converges with Qlinear rate under any fixed interior point as a ‘center’ and any fixed, but large enough positive scaling parameter. 
Exterior Point Method (EPM) 
➘ “Exterior Distance Function” 
Extract, Transform, Analyse and Load (ET(A)L) 
The ET(AL) is another form of reduction mechanism, which is why the analytics aspect is included to ensure that the data that gets through is the data that is needed, and that the junk and noise that has no recognisable value, gets cleaned out early and often. 
ExtractTransformLoad (ETL) 
In computing, extract, transform, and load (ETL) refers to a process in database usage and especially in data warehousing that: *Extracts data from outside sources *Transforms it to fit operational needs, which can include quality levels *Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse) ETL systems are commonly used to integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing. 
Extremal Depth (ED) 
We propose a new notion called `extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme `outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between EDbased (simultaneous) central regions and appropriate pointwise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems. 
Extreme Bounds Analysis (EBA) 
The basic idea of extreme bounds analysis is quite simple. We are interested in finding out which variables from the set X are robustly associated with the dependent variable y. To do so, we run a large number of regression models. Each has y as the dependent variable and includes a set of standard explanatory variables F that are included in each regression model. In addition, each model includes a different subset D of the variables in X. Following the convention in the literature, we will refer to F as the free variables and to X as the doubtful variables. Some subset of the doubtful variables X might be socalled focus variables that are of particular interest to the researcher. The doubtful variables 4 ExtremeBounds: Extreme Bounds Analysis in R whose regression coefficients retain their statistical significance in a large enough proportion of estimated models are declared to be robust, whereas those that do not are labelled fragile. ExtremeBounds 
Extreme Function Theory  We introduce an extreme function theory as a novel method by which probabilistic novelty detection may be performed with functions, where the functions are represented by timeseries of (potentially multivariate) discrete observations. We set the method within the framework of Gaussian processes (GP), which offers a convenient means of constructing a distribution over functions. Whereas conventional novelty detection methods aim to identify individually extreme data points, with respect to a model of normality constructed using examples of ‘normal’ data points, the proposed method aims to identify extreme functions, with respect to a model of normality constructed using examples of ‘normal’ functions, where those functions are represented by timeseries of observations. The method is illustrated using synthetic data, physiological data acquired from a large clinical trial, and a benchmark timeseries dataset. 
Extreme Gradient Boosting  Extreme Gradient Boosting, which is an efficient implementation of gradient boosting framework. xgboost 
Extreme Learning Machine (ELM) 
Extreme learning machine (ELM) is a modification of single layer feedforward network (SLFN) where learning is quite similar to the reservoir computing. ELMR 
Extreme Machine Learning (ELM) 
➘ “Extreme Learning Machine” 
Extreme MultiLabel Learning using Distributional Semantics (ExMLDS) 
We present a novel and scalable label embedding framework for largescale multilabel learning a.k.a ExMLDS (Extreme MultiLabel Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multilabel learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as labellabel correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and stateoftheart methods for largescale multilabel learning. 
Extreme Studentized Deviate (ESD) 
The generalized extreme Studentized deviate (ESD) test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution. The primary limitation of the Grubbs test and the TietjenMoore test is that the suspected number of outliers, k, must be specified exactly. If k is not specified correctly, this can distort the conclusions of these tests. On the other hand, the generalized ESD test only requires that an upper bound for the suspected number of outliers be specified. Given the upper bound, r, the generalized ESD test essentially performs r separate tests: a test for one outlier, a test for two outliers, and so on up to r outliers. 
Extreme Value Analysis (EVA) 
➘ “Extreme Value Theory” Introduction to Extreme Value Analysis https://…/Extremes.pdf hkevp,revdbayes,threshr 
Extreme Value Learning (EVL) 
The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on openset recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabularyinformed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm — Vocabularyinformed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zeroshot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks. 
Extreme Value Theory (EVT) 
Extreme value theory (EVT) or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, such as the 100year flood. Similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50year wave and design the structure accordingly. 
Advertisements