Eager Learning In artificial intelligence, eager learning is a learning method in which the system tries to construct a general, input independent target function during training of the system, as opposed to lazy learning, where generalization beyond the training data is delayed until a query is made to the system. The main advantage gained in employing an eager learning method, such as an artificial neural network, is that the target function will be approximated globally during training, thus requiring much less space than a lazy learning system. Eager learning systems also deal much better with noise in the training data. Eager learning is an example of offline learning, in which post-training queries to the system have no effect on the system itself, and thus the same query to the system will always produce the same result. The main disadvantage with eager learning is that it is generally unable to provide good local approximations in the target function.
Early Stopping In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner’s performance on data outside of the training set. Past that point, however, improving the learner’s fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.
Earnings Before Interest, Taxes, Depreciation and Amortization
A company’s earnings before interest, taxes, depreciation, and amortization (EBITDA) is an accounting metric computed by considering a company’s earnings before interest payments, tax, depreciation, and amortization are subtracted for any final accounting of its income and expenses. The EBITDA of a business gives an indication of its current operational profitability, i.e., how much profit it makes with its present assets and its operations on the products it produces and sells.
Earth Mover’s Distance
In computer science, the earth mover’s distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved. The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions.
“Wasserstein Metric”
Eclat Algorithm The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains. The basic idea for the eclat algorithm is use tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree.
Ecological Regression Ecological regression is a statistical technique used especially in political science and history to estimate group voting behavior from aggregate data. For example, if counties have a known Democratic vote (in percentage) D, and a known percentage of Catholics, C, then run the linear regression of dependent variable D against independent variable C. This gives D = a + bC. When C = 1 (100% Catholic) this gives the estimated Democratic vote as a+b. When C = 0 (0% Catholic), this gives the estimated non-Catholic vote as a. For example, if the regression gives D = .22 + .45C, then the estimated Catholic vote is 67% Democratic and the non-Catholic vote is 22% Democratic. The technique has been often used in litigation brought under the Voting Rights Act of 1965 to see how blacks and whites voted.
Econometrics Econometrics is the application of mathematics, statistical methods, and, more recently, computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations. More precisely, it is “the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference.” An introductory economics textbook describes econometrics as allowing economists “to sift through mountains of data to extract simple relationships.” The first known use of the term “econometrics” (in cognate form) was by Polish economist Pawel Ciompa in 1910. Ragnar Frisch is credited with coining the term in the sense in which it is used today. Econometrics is the intersection of economics, mathematics, and statistics. Econometrics adds empirical content to economic theory allowing theories to be tested and used for forecasting and policy evaluation.
Edgeworth Series The Gram-Charlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms (and thus the accuracy of truncating the series) differ.
E-Divisive with Medians
E-Divisive with Medians (EDM) – employs energy statistics to detect divergence in mean. Note that EDM can also be used detect change in distribution in a given time series. EDM uses robust statistical metrics, viz., median, and estimates the statistical significance of a breakout through a permutation test. In addition, EDM is non-parametric. This is important since the distribution of production data seldom (if at all) follows the commonly assumed normal distribution or any other widely accepted model.
Educational Data Mining
Educational Data Mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems). At a high level, the field seeks to develop and improve methods for exploring this data, which often has multiple levels of meaningful hierarchy, in order to discover new insights about how people learn in the context of such settings. In doing so, EDM has contributed to theories of learning investigated by researchers in educational psychology and the learning sciences. The field is closely tied to that of learning analytics, and the two have been compared and contrasted.
Edward Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward’s design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model’s fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale.
Eesen Framework
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly.
Effect Size In statistics, an effect size is a quantitative measure of the strength of a phenomenon. Examples of effect sizes are the correlation between two variables, the regression coefficient, the mean difference, or even the risk with which something happens, such as how many people survive after a heart attack for every one person that does not survive. For each type of effect-size, a larger absolute value always indicates a stronger effect. Effect sizes complement statistical hypothesis testing, and play an important role in statistical power analyses, sample size planning, and in meta-analyses. Especially in meta-analysis, where the purpose is to combine multiple effect-sizes, the standard error of effect-size is of critical importance. The S.E. of effect-size is used to weight effect-sizes when combining studies, so that large studies are considered more important than small studies in the analysis. The S.E. of effect-size is calculated differently for each type of effect-size, but generally only requires knowing the study’s sample size (N), or the number of observations in each group (n’s). Reporting effect sizes is considered good practice when presenting empirical research findings in many fields. The reporting of effect sizes facilitates the interpretation of the substantive, as opposed to the statistical, significance of a research result. Effect sizes are particularly prominent in social and medical research. Relative and absolute measures of effect size convey different information, and can be used complementarily.
Effective Applications of the R Language
EARL is a Conference for users and developers of the open source R programming language. The primary focus of the Conference will be the commercial usage of R across a range of industry sectors with the aim of sharing knowledge and applications of the language. The EARL Conference Team is located at Mango Solutions, a data analysis company headquartered in the UK.
Ehlers’s Autocorrelation Periodogram The point of the Ehlers Autocorrelation Periodogram is to dynamically set a period between a minimum and a maximum period length. While I leave the exact explanation of the mechanic to Dr. Ehlers’s book, for all practical intents and purposes, in my opinion, the punchline of this method is to attempt to remove a massive source of overfitting from trading system creation-namely specifying a lookback period.
Eigen Eigen is a high-level C++ library of template headers for linear algebra, matrix and vector operations, numerical solvers and related algorithms. Eigen is an open source library licensed under MPL2 starting from version 3.1.1. Earlier versions were licensed under LGPL3+. Eigen is often noted for its elegant API, versatile fixed and dynamic matrix capabilities and a range of dense and sparse solvers. To achieve high performance, Eigen utilizes explicit vectorization for the SSE 2/3/4, ARM NEON, and AltiVec instruction sets.
Eigenface Eigenfaces is the name given to a set of eigenvectors when they are used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set.
EigenRec Sparsity presents one of the major challenges of Collaborative Filtering. Graph-based methods are known to alleviate its effects, however their use is often computationally prohibitive; Latent-Factor methods, on the other hand, present a reasonable and viable alternative. In this paper, we introduce EigenRec; a versatile and efficient Latent-Factor framework for Top-N Recommendations, that generalizes the well-known PureSVD algorithm (a) providing intuition about its inner structure, (b) paving the path towards improving its efficacy and, at the same time, (c) reducing its complexity. One of our central goals in this work is to ensure the applicability of our method in realistic big-data scenarios. To this end, we propose building our model using a computationally efficient Lanczos-based procedure, we discuss its Parallel Implementation in distributed computing environments, and we verify its favourable performance using real-world datasets. Furthermore, from a qualitative point of view, a comprehensive set of experiments on the MovieLens and the Yahoo!R2Music datasets based on widely applied performance metrics, indicate that EigenRec outperforms several state-of-the-art algorithms, in terms of Standard and Long-Tail recommendation accuracy, exhibiting low susceptibility to sparsity, even in its most extreme manifestations the Cold-Start problems.
Eigenvalues, Eigenvectors An eigenvector of a square matrix is a non-zero vector that, when the matrix is multiplied by , yields a constant multiple of , the multiplier being commonly denoted by d. That is Av = dv. The number d is called the eigenvalue of A corresponding to v.
Eikosogram Eikosograms provide a nice visual representation of statistical correlation, because when the two variables are independent, then the value of one, say X, doesn’t affect the probability of the second, Y. This visually translates into a horizontal pattern, which easily contrasts with a staircase shape that occurs when the variables are dependent or correlated.
Elastic Net Regularization In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.
Elasticsearch Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine.
Elasticsearch, Logstash and Kibana
(ELK Stack)
ELK stands for Elasticsearch, Logstash and Kibana.
Brief definitions:
Logstash: It is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. It is fully free and fully open source.
Elasticsearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents.
Kibana: A nifty tool to visualize logs and timestamped data.
Eligibility Traces Eligibility traces are one of the basic mechanisms of reinforcement learning. For example, in the popular TD(lambda) algorithm, the lambda refers to the use of an eligibility trace. Almost any temporal-difference (TD) method, such as Q-learning or Sarsa, can be combined with eligibility traces to obtain a more general method that may learn more efficiently.
There are two ways to view eligibility traces. The more theoretical view, which we emphasize here, is that they are a bridge from TD to Monte Carlo methods. When TD methods are augmented with eligibility traces, they produce a family of methods spanning a spectrum that has Monte Carlo methods at one end and one-step TD methods at the other. In between are intermediate methods that are often better than either extreme method. In this sense eligibility traces unify TD and Monte Carlo methods in a valuable and revealing way.
The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment.
ELimination Et Choix Traduisant la REalité
ELECTRE is a family of multi-criteria decision analysis methods that originated in Europe in the mid-1960s. The acronym ELECTRE stands for: ELimination Et Choix Traduisant la REalité (ELimination and Choice Expressing REality).
The method was first proposed by Bernard Roy and his colleagues at SEMA consultancy company. A team at SEMA was working on the concrete, multiple criteria, real-world problem of how firms could decide on new activities and had encountered problems using a weighted sum technique. Bernard Roy was called in as a consultant and the group devised the ELECTRE method. As it was first applied in 1965, the ELECTRE method was to choose the best action(s) from a given set of actions, but it was soon applied to three main problems: choosing, ranking and sorting. The method became more widely known when a paper by B. Roy appeared in a French operations research journal. It evolved into ELECTRE I (electre one) and the evolutions have continued with ELECTRE II, ELECTRE III, ELECTRE IV, ELECTRE IS and ELECTRE TRI (electre tree), to mention a few.
Bernard Roy is widely recognized as the father of the ELECTRE method, which was one of the earliest approaches in what is sometimes known as the French School of decision making. It is usually classified as an “outranking method” of decision making.
There are two main parts to an ELECTRE application: first, the construction of one or several outranking relations, which aims at comparing in a comprehensive way each pair of actions; second, an exploitation procedure that elaborates on the recommendations obtained in the first phase. The nature of the recommendation depends on the problem being addressed: choosing, ranking or sorting.
Usually the Electre Methods are used to discard some alternatives to the problem, which are unacceptable. After that we can use another MCDA to select the best one. The Advantage of using the Electre Methods before is that we can apply another MCDA with a restricted set of alternatives saving much time.
Criteria in ELECTRE methods have two distinct sets of parameters: the importance coefficients and the veto thresholds.
Elo Rating System The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games such as chess. It is named after its creator Arpad Elo, a Hungarian-born American physics professor.
The Elo system was invented as an improved chess rating system and is also used in many other games. It has also been adapted for use as a rating system for multiplayer competition in a number of video games, and has been adapted to team sports including soccer (association football), American college football, basketball, Major League Baseball, competitive programming, and E-Sports.
The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other multiple times are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent’s is expected to win 64% of the time; if the difference is 200 points, then the expected win proportion for the stronger player is 76%.
Emotional Chatting Machine
Emotional intelligence is one of the key factors to the success of dialogue systems or conversational agents. In this paper, we propose Emotional Chatting Machine (ECM) which generates responses that are appropriate not only at the content level (relevant and grammatical) but also at the emotion level (consistent emotional expression). To the best of our knowledge, this is the first work that addresses the emotion factor in large-scale conversation generation. ECM addresses the factor in three ways: modeling high-level abstraction of emotion expression by embedding emotion categories, changing of implicit internal emotion states, and using explicit emotion expressions with an external emotion vocabulary. Experiments show that our model can generate responses appropriate not only in content but also in emotion.
Emphatic Temporal-Difference Learning Algorithm
In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under \emph{off}-policy training (Sutton, Mahmood \& White 2016), but it is also a new algorithm for the \emph{on}-policy case. In both our on-policy and off-policy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of ‘bounce’. In the off-policy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower.
Empirical Bayes Geometric Mean
Adjusted estimate for the relative reporting ratio. Example: if EBGM=3.9 for acetaminophen-hepatic failure, then this drug-event combination occurred in the data 3.9 times more frequently than expected under the assumption of no association between the drug and the event.
Empirical Likelihood
Empirical likelihood (EL) is an estimation method in statistics. Empirical likelihood estimates require few assumptions about the error distribution compared to similar methods like maximum likelihood. EL can handle data well as long as it is independent and identically distributed (iid). EL performs well even when the distribution is asymmetric or censored. EL methods are also useful since they can easily incorporate constraints and prior information. Art Owen pioneered work in this area with his 1988 paper.
Empirical Orthogonal Function Analysis
In statistics, EOF analysis is known as Principal Component Analysis (PCA). As such, EOF analysis is sometimes classified as a multivariate statistical technique.
Empirical Orthogonal Teleconnections
Calculating functions empirically and orthogonally from a given space-time dataset. The method is rooted in multiple linear regression and yields solutions that are orthogonal in one direction, either space or time.
Enclosure Diagram The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rectangles. As with adjacency diagrams, the size of any node in the tree is quickly revealed.
Encoder Based Lifelong Learning This paper introduces a new lifelong learning solution where a single model is trained for a sequence of tasks. The main challenge that vision systems face in this context is catastrophic forgetting: as they tend to adapt to the most recently seen task, they lose performance on the tasks that were learned previously. Our method aims at preserving the knowledge of the previous tasks while learning a new one by using autoencoders. For each task, an under-complete autoencoder is learned, capturing the features that are crucial for its achievement. When a new task is presented to the system, we prevent the reconstructions of the features with these autoencoders from changing, which has the effect of preserving the information on which the previous tasks are mainly relying. At the same time, the features are given space to adjust to the most recent environment as only their projection into a low dimension submanifold is controlled. The proposed system is evaluated on image classification tasks and shows a reduction of forgetting over the state-of-the-art
Encog Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008.
Encog: Library of Interchangeable Machine Learning Models for Java and C#
Endogenous Variable In a statistical model, a parameter or variable is said to be endogenous when there is a correlation between the parameter or variable and the error term. Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneity and omitted variables. Broadly, a loop of causality between the independent and dependent variables of a model leads to endogeneity. For example, in a simple supply and demand model, when predicting the quantity demanded in equilibrium, the price is endogenous because producers change their price in response to demand and consumers change their demand in response to price. In this case, the price variable is said to have total endogeneity once the demand and supply curves are known. In contrast, a change in consumer tastes or preferences would be an exogenous change on the demand curve.
ENet The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster.
Enhanced Least Absolute Shrinkage Operator
Ensemble Bayesian Optimization
Bayesian Optimization (BO) has been shown to be a very effective paradigm for tackling hard black-box and non-convex optimization problems encountered in Machine Learning. Despite these successes, the computational complexity of the underlying function approximation has restricted the use of BO to problems that can be handled with less than a few thousand function evaluations. Harder problems like those involving functions operating in very high dimensional spaces may require hundreds of thousands or millions of evaluations or more and become computationally intractable to handle using standard Bayesian Optimization methods. In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Further, we represent each GP model using tile coding random features and an additive function structure. Our approach generates speedups by parallelizing the time consuming hyper-parameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. Our extensive experimental evaluation shows that EBO can speed up the posterior inference between 2-3 orders of magnitude (400 times in one experiment) compared to the state-of-the-art by putting data into Mondrian bins without sacrificing the sample quality. We demonstrate the ability of EBO to handle sample-intensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations.
Ensemble Empirical Mode Decomposition
This approach consists of sifting an ensemble of white noise-added signal (data) and treats the mean as the final true result. Finite, not infinitesimal, amplitude white noise is necessary to force the ensemble to exhaust all possible solutions in the sifting process, thus making the different scale signals to collate in the proper intrinsic mode functions (IMF) dictated by the dyadic filter banks. As EEMD is a time-space analysis method, the added white noise is averaged out with sufficient number of trials; the only persistent part that survives the averaging process is the component of the signal (original data), which is then treated as the true and more physical meaningful answer. The effect of the added white noise is to provide a uniform reference frame in the time-frequency space; therefore, the added noise collates the portion of the signal of comparable scale in one IMF. With this ensemble mean, one can separate scales naturally without any a priori subjective criterion selection as in the intermittence test for the original EMD algorithm. This new approach utilizes the full advantage of the statistical characteristics of white noise to perturb the signal in its true solution neighborhood, and to cancel itself out after serving its purpose; therefore, it represents a substantial improvement over the original EMD and is a truly noise-assisted data analysis (NADA) method.
Ensemble Methods In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble refers only to a concrete finite set of alternative models, but typically allows for much more flexible structure to exist between those alternatives.
Ensemble Partial Least Squares Regression
Enterprise Control Language / Data-Centric Programming Language
ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process big data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions.
Enterprise Data Hub
Organizations everywhere are grappling with how to manage their growing big data sets from ERP and e-commerce systems, log files, sensor data, social media and more. Apache Hadoop provides a cost-effective enterprise data hub (EDH) to store, transform, cleanse, filter, analyze and gain new value from all kinds of data.
Enterprise Information Flow
What is Enterprise Information Flow? The concept is closely connected to its neighboring disciplines: Information Flow, Data Lineage Analysis and Metadata Management. But it’s not the same. This new buzzword is only beginning to be recognized, so let’s get a head start. Information Flow focuses on information processing when it comes to security, throughput optimization and transporters; Data Lineage Analysis studies the way data is transferred between systems; and Metadata Management is all about metadata structure and purpose. Why do we need a new concept then?
• Big data is getting really big. From internal company systems, social networks and external data from partners to automatically collected data – a huge amount of information needs to be properly dealt with. The high volume of data is of course connected to the high volume of contributing sources: dozens of online channels, the previously mentioned social networks, portable devices, blogs, news and video content. And every source needs to be correctly described, attributed and integrated into the company’s Enterprise Information Flow.
• Systems are getting more and more complicated. EIF needs to be ready not only for big data coming from a wide variety of channels, but also for the many different ways data is transformed and processed inside the system. Old school transformation methods like ETL and SQL scripts are easy and usually well-accounted for, but cracks start to show when it comes to the semantic analysis of non-structured data, Google’s search algorithms, Facebook’s preferential algorithms, automated quality assurance scripts or artificial intelligence methods used for predictive analysis. When it comes to transformations, it’s critical to know how security and other specific attributes change. Another key point is deciding if the information is created or just transformed.
• New routes between systems. The number of different ways to transfer data between systems is rapidly growing. Classic ETL and extract transfers are joined by more complicated systems based on SOA, PBM and ESB. It’s also necessary to be ready for new approaches like Data Federation and Logical Data Warehouse, where data saving is not persistent.
• Different data types. It’s not about relational data or text anymore. You need to be ready for NoSQL databases, hyperlinks, video, graphics, xml, semi-structured data and other types of information. In complicated environments like these, current solutions fail. New approaches need to be more complex, as the new systems are. It’s necessary to follow data not only on a physical level, but also through more layers of logical abstraction.
Let’s sum it up into two main angles of Enterprise Information Flow:
1) New information necessary for decision making appears. Where does it come from? When was it created? Who’s responsible for its quality?
2) Who uses my information and how?
Those two sets of questions are vital to Enterprise Information Flow which is a standard part of Enterprise Information Management. Any organization who takes its data seriously is searching for answers anyway, but EIF can provide a more comprehensive overview and merge existing solutions from currently separated fields into one complex policy. A complex solution is precisely what you need, when you’re dealing with complex systems.
Entity Resolution
Entity Resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Ironically, different subdisciplines refer to it by a variety of names, including record linkage, deduplication, co-reference resolution, reference reconciliation, object consolidation, identity uncertainty and database hardening. Accurate and fast ER has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on ER there is still a surprising diversity of approaches – including rule based methods, pair-wise classification, clustering approaches, and richer forms of probabilistic inference – and a lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is only growing. We are inundated with more and more data that needs to be integrated, aligned and matched before further utility can be extracted.
Entropy In information theory, entropy is a measure of the uncertainty in a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Entropy is typically measured in bits, nats, or bans. Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content. Shannon entropy provides an absolute limit on the best possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identically distributed random variables.
Epsilon-Greedy Algorithm To get you started thinking algorithmically about the Explore-Exploit dilemma, we’re going to teach you how to code up one of the simplest possible algorithms for trading off exploration and exploitation. This algorithm is called the epsilon-Greedy algorithm. In computer science, a greedy algorithm is an algorithm that always takes whatever action seems best at the present moment, even when that decision might lead to bad long term consequences. The epsilon-Greedy algorithm is almost a greedy algorithm because it generally exploits the best available option, but every once in a while the epsilon-Greedy algorithm explores the other available options. As we’ll see, the term epsilon in the algorithm’s name refers to the odds that the algorithm explores instead of exploiting.
Let’s be more specific. The epsilon-Greedy algorithm works by randomly oscillating between Cynthia’s vision of purely randomized experimentation and Bob’s instinct to maximize profits. The epsilon-Greedy algorithm is one of the easiest bandit algorithms to understand because it tries to be fair to the two opposite goals of exploration and exploitation by using a mechanism that even a little kid could understand: it just flips a coin. While there are a few details we’ll have to iron out to make that statement precise, the big idea behind the epsilon-Greedy algorithm really is that simple: if you flip a coin and it comes up heads, you …
Error Correction Model
An error correction model belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-periods deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.
Error Matrix
Espresso There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso.
Esri Esri´s GIS (geographic information systems) mapping software helps you understand and visualize data to make decisions based on the best information and analysis.
Essence Vector Model
In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.
Essential Histogram The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins for this purpose. We construct a confidence set of distribution functions that optimally address the two main tasks of the histogram: estimating probabilities and detecting features such as increases and (anti)modes in the distribution. We define the essential histogram as the histogram in the confidence set with the fewest bins. Thus the essential histogram is the simplest visualization of the data that optimally achieves the main tasks of the histogram. We provide a fast algorithm for computing a slightly relaxed version of the essential histogram, which still possesses most of its beneficial theoretical properties, and we illustrate our methodology with examples. An R-package is available online.
Estimability http://…/lenth.pdf
Euclidean Distance In mathematics, the Euclidean distance or Euclidean metric is the “ordinary” distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. The associated norm is called the Euclidean norm.
Event Driven Architeture
Event-driven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events.
An event can be defined as “a significant change in state”. For example, when a consumer purchases a car, the car’s state changes from “for sale” to “sold”. A car dealer’s system architecture may treat this state change as an event whose occurrence can be made known to other applications within the architecture. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission.
Event History Analysis Event history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finite-state Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability.
Event Schema Induction
Event Stream Processing
Event stream processing, or ESP, is a set of technologies designed to assist the construction of event-driven information systems. ESP technologies include event visualization, event databases, event-driven middleware, and event processing languages, or complex event processing (CEP). In practice, the terms ESP and CEP are often used interchangeably. ESP deals with the task of processing streams of event data with the goal of identifying the meaningful pattern within those streams, employing techniques such as detection of relationships between multiple events, event correlation, event hierarchies, and other aspects such as causality, membership and timing. ESP enables many different applications such as algorithmic trading in financial services, RFID event processing applications, fraud detection, process monitoring, and location-based services in telecommunications.
Evidence based Data Analysis What I think the statistical community needs to invest time and energy into is what I call “evidence-based data analysis”. What do I mean by this? Most data analyses are not the simple classroom exercises that we’ve all done involving linear regression or two-sample t-tests. Most of the time, you have to obtain the data, clean that data, remove outliers, impute missing values, transform variables and on and on, even before you fit any sort of model. Then there’s model selection, model fitting, diagnostics, sensitivity analysis, and more. So a data analysis is really pipeline of operations where the output of one stage becomes the input of another. The basic idea behind evidence-based data analysis is that for each stage of that pipeline, we should be using the best method, justified by appropriate statistical research that provides evidence favoring one method over another. If we cannot reasonable agree on a best method for a given stage in the pipeline, then we have a gap that needs to be filled. So we fill it!
Evidence Lower Bound
(Section: 2.2 The Evidence Lower Bound) Filtering Variational Objectives “Filtering Variational Objectives”
Evidential C-Medoids
In this work, a new prototype-based clustering method named Evidential C-Medoids (ECMdd), which belongs to the family of medoid-based clustering for proximity data, is proposed as an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions. In the application of FCMdd and original ECMdd, a single medoid (prototype), which is supposed to belong to the object set, is utilized to represent one class. For the sake of clarity, this kind of ECMdd using a single medoid is denoted by sECMdd. In real clustering applications, using only one pattern to capture or interpret a class may not adequately model different types of group structure and hence limits the clustering performance. In order to address this problem, a variation of ECMdd using multiple weighted medoids, denoted by wECMdd, is presented. Unlike sECMdd, in wECMdd objects in each cluster carry various weights describing their degree of representativeness for that class. This mechanism enables each class to be represented by more than one object. Experimental results in synthetic and real data sets clearly demonstrate the superiority of sECMdd and wECMdd. Moreover, the clustering results by wECMdd can provide richer information for the inner structure of the detected classes with the help of prototype weights.
Evolution Strategy
In computer science, an evolution strategy (ES) is an optimization technique based on ideas of adaptation and evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies.
Evolutionary Algorithm
In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators. Artificial evolution (AE) describes a process involving individual evolutionary algorithms; EAs are individual components that participate in an AE.
Evolving Fuzzy System
Evolving fuzzy systems (EFS) can be defined as self-developing, self-learning fuzzy rule-based or neuro-fuzzy systems that have both their parameters but also (more importantly) their structure self-adapting on-line. They are usually associated with streaming data and on-line (often real-time) modes of operation. In a narrower sense they can be seen as adaptive fuzzy systems. The difference is that evolving fuzzy systems assume on-line adaptation of system structure in addition to the parameter adaptation which is usually associated with the term adaptive. They also allow for adaptation of the learning mechanism. Therefore, evolving assumes a higher level of adaptation. In this definition the English word evolving is used with its core meaning as described in the Oxford dictionary (Hornby, 1974; p.294), namely unfolding; developing; being developed, naturally and gradually. Often evolving is used in relation to so called evolutionary and genetic algorithms. The meaning of the term evolutionary is defined in the Oxford dictionary as development of more complicated forms of life (plants, animals) from earlier and simpler forms. EFS consider a gradual development of the underlying (fuzzy or neuro-fuzzy) system structure and do not deal with such phenomena specific for the evolutionary and genetic algorithms as chromosomes crossover, mutation, selection and reproduction, parents and off-springs.
“Evolving Intelligent System”
Evolving Intelligent System
The term Evolving was first used to describe an intelligent system in 1996 by B. Carse, T. Fogarty and A Munro for a fuzzy rule-based controller where its parameters and structure were learnt simultaneously using a Genetic Algorithm. Years later, alternative methods to learn an evolving intelligent system (EIS) via Incremental learning were suggested as a neuro-fuzzy algorithm by N. Kasabov in 1998 and a rule-based model by P. Angelov in 1999. EIS are usually associated with, streaming data and on-line (often real-time) modes of operation. They can be seen as adaptive intelligent systems. EIS assumes on-line adaptation of system structure in addition to the parameter adaptation which is usually associated with the term ‘incremental’ from Incremental learning. They have been studied as a methodological solution to learn from streaming data exhibiting non-stationary behaviours by M. Sayed-Mouchaweh and E. Lughofer. An important sub-area of EIS is represented by Evolving Fuzzy Systems (EFS) (a comprehensive survey written by E. Lughofer including real-world applications can be found in ), which rely on fuzzy systems architecture and incrementally update, evolve and prune fuzzy sets and fuzzy rules on demand and on-the-fly. One of the major strengths of EFS, compared to other forms of evolving system models, is that they are able to support some sort of interpretability and understandability for experts and users. This opens possibilities for enriched human-machine interaction’s scenarios, where the users may ‘communicate’ with an on-line evolving system in form of knowledge exchange (active learning (machine learning) and teaching). This concept is currently motivated and discussed in the evolving systems community under the term Human-Inspired Evolving Machines and respected as ‘one future’ generation of ‘EIS’.
Exact Soft Confidence-Weighted Learning In this paper, we propose a new Soft Confidence-Weighted (SCW) online learning scheme, which enables the conventional confidence-weighted learning method to handle non-separable cases. Unlike the previous confidence-weighted learning algorithms, the proposed soft confidence-weighted learning method enjoys all the four salient properties: (i) large margin training, (ii) confidence weighting, (iii) capability to handle non-separable data, and (iv) adaptive margin. Our experimental results show that the proposed SCW algorithms significantly outperform the original CW algorithm. When comparing with a variety of state-of-theart algorithms (including AROW, NAROW and NHERD), we found that SCW generally achieves better or at least comparable predictive accuracy, but enjoys significant advantage of computational efficiency (i.e., smaller number of updates and lower time cost).
ExGUtils The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are usually modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element in the field. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done.
Exogenous Variable Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path analysis and structural equation modeling. The complementary concept is endogenous variable.
Expectation Maximization
In statistics, an expectation-maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
Expectation Propagation
Expectation Propagation (EP) is a technique in Bayesian machine learning. EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution. It differs from other Bayesian approximation approaches such as Variational Bayesian methods.
Expected Improvement
Improving the Expected Improvement Algorithm
EXPected Similarity Estimation
We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being significant faster than techniques with the same discriminant power.
Expected Utility Hypothesis
In economics, game theory, and decision theory the expected utility hypothesis is a hypothesis concerning people’s preferences with regard to choices that have uncertain outcomes (gambles). This hypothesis states that if specific axioms are satisfied, the subjective value associated with an individual’s gamble is the statistical expectation of that individual’s valuations of the outcomes of that gamble. This hypothesis has proved useful to explain some popular choices that seem to contradict the expected value criterion (which takes into account only the sizes of the payouts and the probabilities of occurrence), such as occur in the contexts of gambling and insurance. Daniel Bernoulli initiated this hypothesis in 1738. Until the mid-twentieth century, the standard term for the expected utility was the moral expectation, contrasted with ‘mathematical expectation’ for the expected value. The von Neumann-Morgenstern utility theorem provides necessary and sufficient conditions under which the expected utility hypothesis holds. From relatively early on, it was accepted that some of these conditions would be violated by real decision-makers in practice but that the conditions could be interpreted nonetheless as ‘axioms’ of rational choice. Work by Anand (1993) argues against this normative interpretation and shows that ‘rationality’ does not require transitivity, independence or completeness. This view is now referred to as the ‘modern view’ and Anand argues that despite the normative and evidential difficulties the general theory of decision-making based on expected utility is an insightful first order approximation that highlights some important fundamental principles of choice, even if it imposes conceptual and technical limits on analysis which need to be relaxed in real world settings where knowledge is less certain or preferences are more sophisticated.
Expected Value of Partial Perfect Information
Expected Value of Perfect Information
In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information.
Expert Iteration Solving sequential decision making problems, such as text parsing, robotic control, and game playing, requires a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration, a novel algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. In contrast, standard Deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that our method substantially outperforms Policy Gradients in the board game Hex, winning 84.4% of games against it when trained for equal time.
Explicit Semantic Analysis
In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf-idf matrix of the text corpus and a document (string of words) is represented as the centroid of the vectors representing its words. Typically, the text corpus is Wikipedia, though other corpora including the Open Directory Project have been used. ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization and has been used by this pair of researchers to compute what they refer to as ‘semantic relatedness’ by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of ‘concepts explicitly defined and described by humans’, where Wikipedia articles (or ODP entries, or otherwise titles of documents in the knowledge base corpus) are equated with concepts. The name ‘explicit semantic analysis’ contrasts with latent semantic analysis (LSA), because the use of a knowledge base makes it possible to assign human-readable labels to the concepts that make up the vector space. ESA, as originally posited by Gabrilovich and Markovitch, operates under the assumption that the knowledge base contains topically orthogonal concepts. However, it was later shown by Anderka and Stein that ESA also improves the performance of information retrieval systems when it is based not on Wikipedia, but on the Reuters corpus of newswire articles, which does not satisfy the orthogonality property; in their experiments, Anderka and Stein used newswire stories as ‘concepts’. To explain this observation, links have been shown between ESA and the generalized vector space model. Gabrilovich and Markovitch replied to Anderka and Stein by pointing out that their experimental result was achieved using ‘a single application of ESA (text similarity)’ and ‘just a single, extremely small and homogenous test collection of 50 news documents’. Cross-language explicit semantic analysis (CL-ESA) is a multilingual generalization of ESA. CL-ESA exploits a document-aligned multilingual reference collection (e.g., again, Wikipedia) to represent a document as a language-independent concept vector. The relatedness of two documents in different languages is assessed by the cosine similarity between the corresponding vector representations.
Exploration Potential We introduce exploration potential, a quantity for that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem’s reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multi-armed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation.
Explorative Landscape Analysis
Exploratory Landscape Analysis subsumes a number of techniques employed to obtain knowledge about the properties of an unknown optimization problem, especially insofar as these properties are important for the performance of optimization algorithms. Wher
Exploratory Data Analysis
In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.
Exploratory Factor Analysis
In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors influence two or more measured variables, while each unique factor influences only one measured variable and does not explain correlations among measured variables. EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.
Exponential Moving Average
An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA), is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease.
Exponential Random Graph Models
Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks. Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression. Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties. Degree Preserving Randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.
Exponential Smoothing Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed data for presentation, or to make forecasts. The time series data themselves are a sequence of observations. The observed phenomenon may be an essentially random process, or it may be an orderly, but noisy, process. Whereas in the simple moving average the past observations are weighted equally, exponential smoothing assigns exponentially decreasing weights over time. Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of repeated measurements. The simplest form of exponential smoothing should be used only for data without any systematic trend or seasonal components.
Extended Bayesian Information Criterion
The ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this article, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. The new criteria take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to in nity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayes information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayes information criteria are extremely useful for variable selection in problems with a moderate sample size but a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research. Some keywords: Bayesian paradigm; Consistency; Genome-wide association study; Tour- nament approach; Variable selection.
Extended Kalman Filter
In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS.
Extended Space Forest The Extended Space Forest is a new method for decision tree construction in which training is done with input vectors including all the original features and their random combinations. The combinations are generated with a difference operator applied to random pairs of original features. The experimental results show that extended space versions of ensemble algorithms have better performance than the original ensemble algorithms. To investigate the success dynamics of the Extended Space Forest, the individual accuracy and diversity creation powers of ensemble algorithms are compared. The Extended Space Forest creates more diversity when it uses all the input features than Bagging and Rotation Forest. It also results in more individual accuracy when it uses random selection of the features than Random Subspace and Random Forest methods. It needs more training time because of using more features than the original algorithms. But its testing time is lower than the others because it generates less complex base learners.
Exterior Distance Function
We introduce and study exterior distance function (EDF) and correspondent exterior point method (EPM) for convex optimization. The EDF is a classical Lagrangian for an equivalent problem obtained from the initial one by monotone transformation of both the objective function and the constraints. The constraints transformation is scaled by a positive scaling parameter. Thus, the EDF is a particular realization of the Nonlinear Rescaling (NR) principle. Along with the ‘center’, the EDF has two extra tools: the barrier (scaling) parameter and the vector of Lagrange multipliers. We show that EPM generates primal – dual sequence, which converges to the primal – dual solution in value under minimum assumption on the input data. Moreover, the convergence is taking place under any fixed interior point as a ‘center’ and any fixed positive scaling parameter, just due to the Lagrange multipliers update. If the second order sufficient optimality condition is satisfied, then the EPM converges with Q-linear rate under any fixed interior point as a ‘center’ and any fixed, but large enough positive scaling parameter.
Exterior Point Method
“Exterior Distance Function”
Extract, Transform, Analyse and Load
The ET(AL) is another form of reduction mechanism, which is why the analytics aspect is included to ensure that the data that gets through is the data that is needed, and that the junk and noise that has no recognisable value, gets cleaned out early and often.
In computing, extract, transform, and load (ETL) refers to a process in database usage and especially in data warehousing that:
*Extracts data from outside sources
*Transforms it to fit operational needs, which can include quality levels
*Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse)
ETL systems are commonly used to integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing.
Extremal Depth
We propose a new notion called `extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme `outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between ED-based (simultaneous) central regions and appropriate point-wise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems.
Extreme Bounds Analysis
The basic idea of extreme bounds analysis is quite simple. We are interested in finding out which variables from the set X are robustly associated with the dependent variable y. To do so, we run a large number of regression models. Each has y as the dependent variable and includes a set of standard explanatory variables F that are included in each regression model. In addition, each model includes a different subset D of the variables in X. Following the convention in the literature, we will refer to F as the free variables and to X as the doubtful variables. Some subset of the doubtful variables X might be socalled focus variables that are of particular interest to the researcher. The doubtful variables 4 ExtremeBounds: Extreme Bounds Analysis in R whose regression coefficients retain their statistical significance in a large enough proportion of estimated models are declared to be robust, whereas those that do not are labelled fragile.
Extreme Function Theory We introduce an extreme function theory as a novel method by which probabilistic novelty detection may be performed with functions, where the functions are represented by time-series of (potentially multivariate) discrete observations. We set the method within the framework of Gaussian processes (GP), which offers a convenient means of constructing a distribution over functions. Whereas conventional novelty detection methods aim to identify individually extreme data points, with respect to a model of normality constructed using examples of ‘normal’ data points, the proposed method aims to identify extreme functions, with respect to a model of normality constructed using examples of ‘normal’ functions, where those functions are represented by time-series of observations. The method is illustrated using synthetic data, physiological data acquired from a large clinical trial, and a benchmark time-series dataset.
Extreme Gradient Boosting Extreme Gradient Boosting, which is an efficient implementation of gradient boosting framework.
Extreme Learning Machine
Extreme learning machine (ELM) is a modification of single layer feedforward network (SLFN) where learning is quite similar to the reservoir computing.
Extreme Machine Learning
“Extreme Learning Machine”
Extreme Studentized Deviate
The generalized extreme Studentized deviate (ESD) test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution. The primary limitation of the Grubbs test and the Tietjen-Moore test is that the suspected number of outliers, k, must be specified exactly. If k is not specified correctly, this can distort the conclusions of these tests. On the other hand, the generalized ESD test only requires that an upper bound for the suspected number of outliers be specified. Given the upper bound, r, the generalized ESD test essentially performs r separate tests: a test for one outlier, a test for two outliers, and so on up to r outliers.
Extreme Value Analysis
Introduction to Extreme Value Analysis
Extreme Value Learning
The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on open-set recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabulary-informed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm — Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zero-shot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks.
Extreme Value Theory
Extreme value theory (EVT) or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, such as the 100-year flood. Similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly.