A/B Testing  In marketing, A/B testing is a simple randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing. Other names include randomized controlled experiments, online controlled experiments, and split testing. In online settings, such as web design (especially user experience design), the goal is to identify changes to web pages that increase or maximize an outcome of interest (e.g., clickthrough rate for a banner advertisement). Why does designing a simple A/B test seem so complicated? acp 
ABC Analysis  In materials management, the ABC analysis (or Selective Inventory Control) is an inventory categorization technique. ABC analysis divides an inventory into three categories “A items” with very tight control and accurate records, “B items” with less tightly controlled and good records, and “C items” with the simplest controls possible and minimal records. The ABC analysis provides a mechanism for identifying items that will have a significant impact on overall inventory cost, while also providing a mechanism for identifying different categories of stock that will require different management and controls. The ABC analysis suggests that inventories of an organization are not of equal value. Thus, the inventory is grouped into three categories (A, B, and C) in order of their estimated importance. ‘A’ items are very important for an organization. Because of the high value of these ‘A’ items, frequent value analysis is required. In addition to that, an organization needs to choose an appropriate order pattern (e.g. ‘Just in time’) to avoid excess capacity. ‘B’ items are important, but of course less important than ‘A’ items and more important than ‘C’ items. Therefore ‘B’ items are intergroup items. ‘C’ items are marginally important. adaptDA 
Abstract Meaning Representation (AMR) 
We describe AbstractMeaning Representation (AMR), a semantic representation language in which we are writing down the meanings of thousands of English sentences. We hope that a sembank of simple, wholesentence semantic structures will spur new work in statistical natural language understanding and generation, like the Penn Treebank encouraged work on statistical parsing. This paper gives an overviewof AMR and tools associated with it. Abstract Meaning Representation (AMR) Annotation Release 1.0, Linguistic Data Consortium (LDC) Catalog Number LDC2014T12 and ISBN 1585636770, was developed by LDC, SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums. AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its wholesentence meaning in a treestructure. AMR utilizes PropBank frames, noncore semantic roles, withinsentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax. AMR Specification Abstract Meaning Representation: A survey ADDT 
ABtree  An Algorithm for SubgroupBased Treatment Assignment. Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other. This problem is relevant to the field of marketing, where treatments may correspond to different ways of selling a product. It is similarly relevant to the field of public policy, where treatments may correspond to specific government programs ADPclust,densityClust 
Accelerated Alternating Projections  We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, termed accelerated alternating projections, is introduced for robust PCA which accelerates existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014]. Let $\boldsymbol{L}_k$ and $\boldsymbol{S}_k$ be the current estimates of the low rank matrix and the sparse matrix, respectively. The algorithm achieves significant acceleration by first projecting $\boldsymbol{D}\boldsymbol{S}_k$ onto a low dimensional subspace before obtaining the new estimate of $\boldsymbol{L}$ via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other stateoftheart algorithms for robust PCA. 
Accelerated Destructive Degradation Test (ADDT) 
Degradation data analysis is a powerful tool for reliability assessment. Useful reliability information is available from degradation data when there are few or even no failures. For some applications the degradation measurement process destroys or changes the physical/mechanical characteristics of test units. In such applications, only one meaningful measurement can be can be taken on each test unit. This is known as ‘destructive degradation’. Degradation tests are often accelerated by testing at higher than usual levels of accelerating variables like temperature. aftgee 
Accelerated Distributed Nesterov Gradient Descent (AccDNGD) 
This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (AccDNGD) method. When the objective function is convex and $L$smooth, we show that it achieves a $O(\frac{1}{t^{1.4\epsilon}})$ convergence rate for all $\epsilon\in(0,1.4)$. We also show the convergence rate can be improved to $O(\frac{1}{t^2})$ if the objective function is a composition of a linear map and a stronglyconvex and smooth function. When the objective function is $\mu$strongly convex and $L$smooth, we show that it achieves a linear convergence rate of $O([ 1 – O( (\frac{\mu}{L})^{5/7} )]^t)$, where $\frac{L}{\mu}$ is the condition number of the objective. 
Accelerated Failure Time Model (AFT) 
In the statistical area of survival analysis, an accelerated failure time model (AFT model) is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. This is especially appealing in a technical context where the ‘disease’ is a result of some mechanical process with a known sequence of intermediary stages. AHR 
Accelerated Gradient Descent (AGD) 
In the world of optimization, we have a space and a convex objective function f we wish to minimize. We have seen that gradient descent is a simple greedy algorithm that works to minimize the objective function at some convergence rate (in this post we shall remain in discrete time). But the world is always stranger than we think. Indeed, there is a phenomenon of acceleration in convex optimization, in which we can boost the performance of some gradientbased algorithms by subtly modifying their implementation. In particular, we will discuss accelerated gradient descent, proposed by Yurii Nesterov in 1983, which achieves a faster – and optimal – convergence rate under the same assumption as gradient descent. Acceleration has received renewed research interests in recent years, leading to many proposed interpretations and further generalizations. Nevertheless, there is still a sense of mystery to what acceleration is doing and why it works; these are the questions that we want to understand better. 
Accelerated Hierarchical Density Clustering  We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://…/hdbscan 
Accelerated MiniBatch kMeans  We propose an accelerated MiniBatch kmeans algorithm which combines three key improvements. The first is a modified center update which results in convergence to a local minimum in fewer iterations. The second is an adaptive increase of batchsize to meet an increasing requirement for centroid accuracy. The third is the inclusion of distance bounds based on the triangle inequality, which are used to eliminate distance calculations along the same lines as Elkan’s algorithm. The combination of the two latter constitutes a very powerful scheme to reuse computation already done over samples until statistical accuracy requires the use of additional data points. alineR 
Accelerated Proximal Gradient (APG) 
More Efficient Accelerated Proximal Algorithm for Nonconvex Problems allanvar 
Accelerated Sparse Discriminant Analysis  accSDA 
Accelerated Sparse Subspace Clustering  Stateoftheart algorithms for sparse subspace clustering perform spectral clustering on a similarity matrix typically obtained by representing each data point as a sparse combination of other points using either basis pursuit (BP) or orthogonal matching pursuit (OMP). BPbased methods are often prohibitive in practice while the performance of OMPbased schemes are unsatisfactory, especially in settings where data points are highly similar. In this paper, we propose a novel algorithm that exploits an accelerated variant of orthogonal leastsquares to efficiently find the underlying subspaces. We show that under certain conditions the proposed algorithm returns a subspacepreserving solution. Simulation results illustrate that the proposed method compares favorably with BPbased method in terms of running time while being significantly more accurate than OMPbased schemes. 
AcceptanceRejection Method  ➘ “Rejection Sampling” 
Accumulated Gradient Normalization  This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of firstorder gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to firstorder gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically. 
Accumulated Local Effects (ALE) 
ALEPlot 
Accuracy  In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. alphaOutlier 
Accuracy and Precision  Accuracy and precision are defined in terms of systematic and random errors. The more common definition associates accuracy with systematic errors and precision with random errors. Another definition, advanced by ISO, associates trueness with systematic errors and precision with random errors, and defines accuracy as the combination of both trueness and precision. ANOM 
Accuracy Paradox  The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favor of other metrics such as precision and recall. Accuracy is often the starting point for analyzing the quality of a predictive model, as well as an obvious criterion for prediction. Accuracy measures the ratio of correct predictions to the total number of cases evaluated. It may seem obvious that the ratio of correct predictions to cases should be a key metric. A predictive model may have high accuracy, but be useless. 
AccurateML  The growing demands of processing massive datasets have promoted irresistible trends of running machine learning applications on MapReduce. When processing large input data, it is often of greater values to produce fast and accurate enough approximate results than slow exact results. Existing techniques produce approximate results by processing parts of the input data, thus incurring large accuracy losses when using short job execution times, because all the skipped input data potentially contributes to result accuracy. We address this limitation by proposing AccurateML that aggregates information of input data in each map task to create small aggregated data points. These aggregated points enable all map tasks producing initial outputs quickly to save computation times and decrease the outputs’ size to reduce communication times. Our approach further identifies the parts of input data most related to result accuracy, thus first using these parts to improve the produced outputs to minimize accuracy losses. We evaluated AccurateML using real machine learning applications and datasets. The results show: (i) it reduces execution times by 30 times with small accuracy losses compared to exact results; (ii) when using the same execution times, it achieves 2.71 times reductions in accuracy losses compared to existing approximate processing techniques. 
Action Schema Network (ASNet) 
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weightsharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet’s learning capability allows it to significantly outperform traditional nonlearning planners in several challenging domains. 
ActionAttending Graphic Neural Network (A2GNN) 
The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully endtoend actionattending graphic neural network (A$^2$GNN) for skeletonbased action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract highlevel semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an actionattending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, actionattending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct extensive experiments on four benchmark skeletonbased action datasets, including the largescale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the stateoftheart performances. 
Activation Ensemble  Many activation functions have been proposed in the past, but selecting an adequate one requires trial and error. We propose a new methodology of designing activation functions within a neural network at each layer. We call this technique an ‘activation ensemble’ because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, $\alpha$, at each activation layer of a network to allow for multiple activation functions to be active at each neuron. By design, activations with larger $\alpha$ values at a neuron is equivalent to having the largest magnitude. Hence, those higher magnitude activations are ‘chosen’ by the network. We implement the activation ensembles on a variety of datasets using an array of Feed Forward and Convolutional Neural Networks. By using the activation ensemble, we achieve superior results compared to traditional techniques. In addition, because of the flexibility of this methodology, we more deeply explore activation functions and the features that they capture. 
Active Betweenness Cardinality  Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the network and only rely on its topology. However, in many applications, the network is autonomous, vast, and distributed, and it is hard to collect the exact topology. At the same time, the underlying pairwise activity between node pairs is not uniform and node criticality strongly depends on the activity on the underlying network. In this paper, we propose active betweenness cardinality, as a new measure, where the node criticalities are based on not the static structure, but the activity of the network. We show how this metric can be computed efficiently by using only local information for a given node and how we can find the most critical nodes starting from only a few nodes. We also show how this metric can be used to monitor a network and identify failed nodes.We present experimental results to show effectiveness by demonstrating how the failed nodes can be identified by measuring active betweenness cardinality of a few nodes in the system. 
Active Function CrossEntropy Clustering (afCEC) 
Active function crossentropy clustering partitions the ndimensional data into the clusters by finding the parameters of the mixed generalized multivariate normal distribution, that optimally approximates the scattering of the data in the ndimensional space, whose density function is of the form: p_1*N(mi_1,^sigma_1,sigma_1,f_1)+…+p_k*N(mi_k,^sigma_k,sigma_k,f_k). The abovementioned generalization is performed by introducing so called ‘fadapted Gaussian densities’ (i.e. the ordinary Gaussian densities adapted by the ‘active function’). Additionally, the active function crossentropy clustering performs the automatic reduction of the unnecessary clusters. For more information please refer to P. Spurek, J. Tabor, K.Byrski, ‘Active function CrossEntropy Clustering’ (2017) <doi:10.1016/j.eswa.2016.12.011>. afCEC 
Active Information Store (AIS) 
The goal of the AIS project is to provide a scalable information repository to support dataintensive information worker and decision support applications. AIS can help businesses to enhance, structure and navigate data to discover information and relationships that are in existing data stores but not readily available or obvious. (see also HANA Graph Engine) 
Active Learning  Active learning is a special case of semisupervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design. There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning. 
Active Long Term Memory Network (ALTM) 
Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially. This paper introduces the Active Long Term Memory Networks (ALTM), a model of sequential multitask deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge. ALTM exploits the nonconvex nature of deep neural networks and actively maintains knowledge of previously learned, inactive tasks using a distillation loss. Distortions of the learned inputoutput map are penalized but hidden layers are free to transverse towards new local optima that are more favorable for the multitask objective. We reframe the McClelland’s seminal Hippocampal theory with respect to Catastrophic Inference (CI) behavior exhibited by modern deep architectures trained with backpropagation and inhomogeneous sampling of latent factors across epochs. We present empirical results of nontrivial CI during continual learning in Deep Linear Networks trained on the same task, in Convolutional Neural Networks when the task shifts from predicting semantic to graphical factors and during domain adaptation from simple to complex environments. We present results of the ALTM model’s ability to maintain viewpoint recognition learned in the highly controlled iLab20M dataset with 10 object categories and 88 camera viewpoints, while adapting to the unstructured domain of Imagenet with 1,000 object categories. 
Active Rotating Filters (ARF) 
Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During backpropagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce withinclass rotationinvariant deep features while maintaining interclass discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple stateoftheart DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks. 
Active Sampler  Recent years have witnessed amazing outcomes from ‘Big Models’ trained by ‘Big Data’. Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more ‘learning value’ to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a lightweight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of largescale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.62.2x. 
ActiveClean  Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, outofdate, or outlier data. Identifying dirty data is often a manual and iterative process, and can be challenging on large datasets. However, many data cleaning workflows can introduce subtle biases into the training processes due to violation of independence assumptions. We propose ActiveClean, a progressive cleaning approach where the model is updated incrementally instead of retraining and can guarantee accuracy on partially cleaned data. ActiveClean supports a popular class of models called convex loss models (e.g., linear regression and SVMs). ActiveClean also leverages the structure of a user’s model to prioritize cleaning those records likely to affect the results. We evaluate ActiveClean on five realworld datasets UCI Adult, UCI EEG, MNIST, Dollars For Docs, and WorldBank with both real and synthetic errors. Our results suggest that our proposed optimizations can improve model accuracy by upto 2.5x for the same amount of data cleaned. Furthermore for a fixed cleaning budget and on all real dirty datasets, ActiveClean returns more accurate models than uniform sampling and Active Learning. 
Actuarial Science  Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in insurance, finance and other industries and professions. Actuaries are professionals who are qualified in this field through education and experience. In many countries, actuaries must demonstrate their competence by passing a series of rigorous professional examinations. Actuarial science includes a number of interrelated subjects, including probability, mathematics, statistics, finance, economics, financial economics, and computer programming. Historically, actuarial science used deterministic models in the construction of tables and premiums. The science has gone through revolutionary changes during the last 30 years due to the proliferation of high speed computers and the union of stochastic actuarial models with modern financial theory (Frees 1990). Many universities have undergraduate and graduate degree programs in actuarial science. In 2010, a study published by job search website CareerCast ranked actuary as the #1 job in the United States (Needleman 2010). The study used five key criteria to rank jobs: environment, income, employment outlook, physical demands, and stress. A similar study by U.S. News & World Report in 2006 included actuaries among the 25 Best Professions that it expects will be in great demand in the future (Nemko 2006). apc 
AdaGAN  Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes. 
ADaPTION Toolbox  Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with highprecision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16bit weights and activations with only 0.8% drop in Top1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference. 
ADaptive Algorithm for Betweenness via Random Approximation (KADABRA) 
We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on realworld complex networks. The efficiency of the new algorithm relies on two new theoretical contribution, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms that approximate betweenness centrality. We show that, on realistic random graph models, we can perform this task in time $E^{\frac{1}{2}+o(1)}$ with high probability, obtaining a significant speedup with respect to the $\Theta(E)$ worstcase performance. We experimentally show that this new technique achieves similar speedups on realworld complex networks, as well. The second contribution is a new rigorous application of the adaptive sampling technique. This approach decreases the total number of shortest paths that need to be sampled to compute all betweenness centralities with a given absolute error, and it also handles more general problems, such as computing the $k$ most central nodes. Furthermore, our analysis is general, and it might be extended to other settings, as well. 
Adaptive BAll COver for Classification (ABACOC) 
Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless —traditional tuning methods are problematic in streaming settings— and avoid requiring prior knowledge of the number of distinct class labels occurring in the stream. In this paper, we introduce a new algorithmic approach for nonparametric learning in data streams. Our approach addresses all above mentioned challenges by learning a model that covers the input space using simple local classifiers. The distribution of these classifiers dynamically adapts to the local (unknown) complexity of the classification problem, thus achieving a good balance between model complexity and predictive accuracy. We design four variants of our approach of increasing adaptivity. By means of an extensive empirical evaluation against standard nonparametric baselines, we show stateoftheart results in terms of accuracy versus model size. For the variant that imposes a strict bound on the model size, we show better performance against all other methods measured at the same model size value. Our empirical analysis is complemented by a theoretical performance guarantee which does not rely on any stochastic assumption on the source generating the stream. ART 
Adaptive Batch Size (AdaBatch) 
Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR10, CIFAR100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while changing accuracy by less than 1% relative to training with fixed batch sizes. 
Adaptive Boosting (AdaBoost) 
AdaBoost, short for “Adaptive Boosting”, is a machine learning metaalgorithm formulated by Yoav Freund and Robert Schapire who won the prestigious “Gödel Prize” in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner. While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best outofthebox classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples. artfima 
Adaptive Coordinate Descent  Adaptive coordinate descent is an extension of the coordinate descent algorithm to nonseparable optimization. The adaptive coordinate descent approach gradually builds a transformation of the coordinate system such that the new coordinates are as decorrelated as possible with respect to the objective function. The adaptive coordinate descent was shown to be competitive to the stateoftheart evolutionary algorithms and has the following invariance properties: 1. Invariance with respect to monotonous transformations of the function (scaling) 2. Invariance with respect to orthogonal transformations of the search space (rotation). CMAlike Adaptive Encoding Update mostly based on principal component analysis is used to extend the coordinate descent method to the optimization of nonseparable problems. The adaptation of an appropriate coordinate system allows adaptive coordinate descent to outperform coordinate descent on nonseparable functions. ARTIVA 
Adaptive Density Peak Detection (ADPclust) 
ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio’s idea. ADPclust clusters data by finding density peaks in a densitydistance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘densitydistance plot’. asremlPlus 
Adaptive Feeding (AF) 
Object detection aims at high speed and accuracy simultaneously. However, fast models are usually less accurate, while accurate models cannot satisfy our need for speed. A fast model can be 10 times faster but 50\% less accurate than an accurate model. In this paper, we propose Adaptive Feeding (AF) to combine a fast (but less accurate) detector and an accurate (but slow) detector, by adaptively determining whether an image is easy or hard and choosing an appropriate detector for it. In practice, we build a cascade of detectors, including the AF classifier which make the easy vs. hard decision and the two detectors. The AF classifier can be tuned to obtain different tradeoff between speed and accuracy, which has negligible training time and requires no additional training data. Experimental results on the PASCAL VOC, MS COCO and Caltech Pedestrian datasets confirm that AF has the ability to achieve comparable speed as the fast detector and comparable accuracy as the accurate one at the same time. As an example, by combining the fast SSD300 with the accurate SSD500 detector, AF leads to 50\% speedup over SSD500 with the same precision on the VOC2007 test set. 
Adaptive Generalized PCA (adaptive gPCA) 
When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a lowdimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut. adaptiveGPCA 
Adaptive gPCA  When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a lowdimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut. 
Adaptive Graph Convolutional Neural Network  Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a taskdriven adaptive graph is learned for each graph data while training. To efficiently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graphstructured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy. ➘ “Graph Convolutional Neural Network” 
Adaptive Huber Regression  Big data are often contaminated by outliers and heavytailed errors, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our framework is able to handle heavytailed data with bounded $(1 \! + \! \delta)$th moment for any $\delta\!>\!0$. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when $\delta \!\geq\! 1$, the estimator admits a subGaussiantype deviation bound without subGaussian assumptions on the data, while only a slower rate is available in the regime $0 \!<\! \delta \!<\! 1$ and the transition is smooth and optimal. Moreover, a nonasymptotic Bahadur representation for finitesample inference is derived when the variance is finite. Numerical experiments lend further support to our obtained theory. 
Adaptive Learning for MultiAgent Navigation (ALAN) 
In multiagent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and static obstacles, often without communication with each other. Existing methods compute motions that are optimal locally but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop methods to allow agents to dynamically adapt their behavior to their local conditions. We accomplish this by formulating the multiagent navigation problem as an actionselection problem, and propose an approach, ALAN, that allows agents to compute timeefficient and collisionfree motions. ALAN is highly scalable because each agent makes its own decisions on how to move using a set of velocities optimized for a variety of navigation tasks. Experimental results show that the agents using ALAN, in general, reach their destinations faster than using ORCA, a stateoftheart collision avoidance framework, the Social Forces model for pedestrian navigation, and a Predictive collision avoidance model. 
Adaptive Learning Rate (ADADELTA) 
Automatically set learning rate for each neuron in a neural network based on it´s training history. asVPC 
Adaptive Linear Neuron (ADALINE) 
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early singlelayer artificial neural network and the name of the physical device that implemented this network. The network uses memistors. It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It consists of a weight, a bias and a summation function. The difference between Adaline and the standard (McCulloch–Pitts) perceptron is that in the learning phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function’s output is used for adjusting the weights. There also exists an extension known as Madaline. ATE 
Adaptive Mixture Discriminant Analysis (AMDA) 
In supervised learning, an important issue usually not taken into account by classical methods is the possibility of having in the test set individuals belonging to a class which has not been observed during the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a modelbased discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which is able to detect several unobserved groups of points and to adapt the learned classifier to the new situation. Two EMbased procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real word problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis. bacr 
Adaptive Moving Average (AMA) 
The Adaptive Moving Average (AMA) study is similar to the exponential moving average (EMA), except the AMA uses a scalable constant instead of a fixed constant for smoothing the data. Barnard 
Adaptive Moving Selforganizing Map (AMSOM) 
SelfOrganizing Map (SOM) is a neural network model which is used to obtain a topologypreserving mapping from the (usually high dimensional) input/feature space to an output/map space of fewer dimensions (usually two or three in order to facilitate visualization). Neurons in the output space are connected with each other but this structure remains fixed throughout training and learning is achieved through the updating of neuron reference vectors in feature space. Despite the fact that growing variants of SOM overcome the fixed structure limitation they increase computational cost and also do not allow the removal of a neuron after its introduction. In this paper, a variant of SOM is proposed called AMSOM (Adaptive Moving SelfOrganizing Map) that on the one hand creates a more flexible structure where neuron positions are dynamically altered during training and on the other hand tackles the drawback of having a predefined grid by allowing neuron addition and/or removal during training. Experiments using multiple literature datasets show that the proposed method improves training performance of SOM, leads to a better visualization of the input dataset and provides a framework for determining the optimal number and structure of neurons. 
Adaptive Nonparametric Clustering  ➘ “Adaptive Weights Clustering” 
Adaptive pvalue Thresholding (AdaPT) 
We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a pvalue $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For largescale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing procedures. We propose a general iterative framework for this problem, called the Adaptive pvalue Thresholding (AdaPT) procedure, which adaptively estimates a Bayesoptimal pvalue rejection threshold and controls the false discovery rate (FDR) in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored pvalues, estimates the false discovery proportion (FDP) below the threshold, and either stops to reject or proposes another threshold, until the estimated FDP is below $\alpha$. Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues. 
Adaptive Resonance Theory (ART) 
Adaptive resonance theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction. bartMachine,BayesTree 
Adaptive Robust Control  In this paper we propose a new methodology for solving an uncertain stochastic Markovian control problem in discrete time. We call the proposed methodology the adaptive robust control. We demonstrate that the uncertain control problem under consideration can be solved in terms of associated adaptive robust Bellman equation. The success of our approach is to the great extend owed to the recursive methodology for construction of relevant confidence regions. We illustrate our methodology by considering an optimal portfolio allocation problem, and we compare results obtained using the adaptive robust control method with some other existing methods. 
Adaptive Scaling  Preprocessing data is an important step before any data analysis. In this paper, we focus on one particular aspect, namely scaling or normalization. We analyze various scaling methods in common use and study their effects on different statistical learning models. We will propose a new twostage scaling method. First, we use some training data to fit linear regression model and then scale the whole data based on the coefficients of regression. Simulations are conducted to illustrate the advantages of our new scaling method. Some real data analysis will also be given. 
Adaptive Software  http://…/Adaptive_software_development batteryreduction 
Adaptive SVM+  Incorporating additional knowledge in the learning process can be beneficial for several computer vision and machine learning tasks. Whether privileged information originates from a source domain that is adapted to a target domain, or as additional features available at training time only, using such privileged (i.e., auxiliary) information is of high importance as it improves the recognition performance and generalization. However, both primary and privileged information are rarely derived from the same distribution, which poses an additional challenge to the recognition task. To address these challenges, we present a novel learning paradigm that leverages privileged information in a domain adaptation setup to perform visual recognition tasks. The proposed framework, named Adaptive SVM+, combines the advantages of both the learning using privileged information (LUPI) paradigm and the domain adaptation framework, which are naturally embedded in the objective function of a regular SVM. We demonstrate the effectiveness of our approach on the publicly available Animals with Attributes and INTERACT datasets and report stateoftheart results in both of them. 
Adaptive System  The term adaptation is used in biology in relation to how living beings adapt to their environments, but with two different meanings. First, the continuous adaptation of an organism to its environment, so as to maintain itself in a viable state, through sensory feedback mechanisms. Second, the development (through evolutionary steps) of an adaptation (an anatomic structure, physiological process or behavior characteristic) that increases the probability of an organism reproducing itself (although sometimes not directly). Generally speaking, an adaptive system is a set of interacting or interdependent entities, real or abstract, forming an integrated whole that together are able to respond to environmental changes or changes in the interacting parts. Feedback loops represent a key feature of adaptive systems, allowing the response to changes; examples of adaptive systems include: natural ecosystems, individual organisms, human communities, human organizations, and human families. Some artificial systems can be adaptive as well; for instance, robots employ control systems that utilize feedback loops to sense new conditions in their environment and adapt accordingly. BayesBridge 
Adaptive ThoulessAndersonPalmer Mean Field Approach (ADATAP) 
We develop an advanced mean field method for approximating averages in probabilistic data models that is based on the TAP approach of disorder physics. In contrast to conventional TAP, where the knowledge of the distribution of couplings between the random variables is required, our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is sofar restricted to models with nonglassy behaviour, by replica calculations for a wide class of models as well as by simulations for a real data set. http://…nergybeliefpropagationandsparsity.pdf BayesLCA 
Adaptive Weights Clustering (AWC) 
This paper presents a new approach to nonparametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for different scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights \( w_{ij} \) each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of ‘no gap’ between them. % The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover different clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a stateoftheart performance of the method in various artificial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity. 
Adaptive Windowbased Streaming Edge Partitioning (ADWISE) 
In recent years, the graph partitioning problem gained importance as a mandatory preprocessing step for distributed graph processing on very large graphs. Existing graph partitioning algorithms minimize partitioning latency by assigning individual graph edges to partitions in a streaming manner — at the cost of reduced partitioning quality. However, we argue that the mere minimization of partitioning latency is not the optimal design choice in terms of minimizing total graph analysis latency, i.e., the sum of partitioning and processing latency. Instead, for complex and longrunning graph processing algorithms that run on very large graphs, it is beneficial to invest more time into graph partitioning to reach a higher partitioning quality — which drastically reduces graph processing latency. In this paper, we propose ADWISE, a novel windowbased streaming partitioning algorithm that increases the partitioning quality by always choosing the best edge from a set of edges for assignment to a partition. In doing so, ADWISE controls the partitioning latency by adapting the window size dynamically at runtime. Our evaluations show that ADWISE can reach the sweet spot between graph partitioning latency and graph processing latency, reducing the total latency of partitioning plus processing by up to 2347 percent compared to the stateoftheart. 
AdaptiveDynamic PCA  mvMonitoring 
adaQN  Recurrent Neural Networks (RNNs) are powerful models that achieve unparalleled performance on several pattern recognition problems. However, training of RNNs is a computationally difficult task owing to the wellknown ‘vanishing/exploding’ gradient problems. In recent years, several algorithms have been proposed for training RNNs. These algorithms either: exploit no (or limited) curvature information and have cheap periteration complexity; or attempt to gain significant curvature information at the cost of increased periteration cost. The former set includes diagonallyscaled firstorder methods such as ADAM and ADAGRAD while the latter consists of secondorder algorithms like HessianFree Newton and KFAC. In this paper, we present an novel stochastic quasiNewton algorithm (adaQN) for training RNNs. Our approach retains a low periteration cost while allowing for nondiagonal scaling through a stochastic LBFGS updating scheme. The method is judicious in storing and retaining LBFGS curvature pairs which is indirectly used as a means of controlling the quality of the steps. We present numerical experiments on two language modeling tasks and show that adaQN performs at par, if not better, than popular RNN training algorithms. These results suggest that quasiNewton algorithms have the potential to be a viable alternative to first and secondorder methods for training RNNs. 
Additive Noise Models (ANM) 
The core idea of these socalled Additive Noise Models (ANM) is that if X>Y, then the variability observed in Y will be either explained by X, or by some noise that is independent of X. BCEE 
Additive Polynomial Design Matrix (AP) 
An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality. apdesign 
Additive Principal Components (APC) 
Additive principal components are a nonlinear generalization of linear principal components. 
Additive Smoothing  In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing), or Lidstone smoothing, is a technique used to smooth categorical data. bcpa,BreakoutDetection 
Adjacency Diagram  The adjacency diagram is a spacefilling variant of the nodelink diagram; rather than drawing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in figure 4D is similar to the first nodelink diagram in that the root node appears at the top, with child nodes underneath. Because the nodes are now spacefilling, however, we can use a length encoding for the size of software classes and packages. This reveals an additional dimension that would be difficult to show in a nodelink diagram. bdvis 
Adjacency Matrix  In mathematics and computer science, an adjacency matrix is a means of representing which vertices (or nodes) of a graph are adjacent to which other vertices. Another matrix representation for a graph is the incidence matrix. bentcableAR 
Admixture Models  The core of the admixture model, proposed by Smith (1953) to deal with heterogeneous traits in genetic linkage analysis, is the hypothesis that a fraction lambda of pedigrees is linked to the marker locus being tested while a fraction 1 – lambda is unlinked. This model leads naturally to the vexing problem of testing for a mixture and has given rise to a large literature. bipartite,lpbrim 
Adometry  Adometry by Google solves the complex challenge of integrating, measuring, and optimizing marketing data across all channels — both online and offline — so you can generate actionable insights that improve ROI. BlandAltmanLeh 
Advanced Analysis of Variance (AANOVA) 
Book: Advanced Analysis of Variance 
Advanced Analytics  There is an increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially predictive modeling, machine learning techniques, and neural networks. BLCOP 
Advanced LSTM (ALSTM) 
Long shortterm memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (ALSTM), for better temporal context modeling. We employ ALSTM in weighted pooling RNN for emotion recognition. The ALSTM outperforms the conventional LSTM by 5.5% relatively. The ALSTM based weighted pooling RNN can also complement the stateoftheart emotion classification framework. This shows the advantage of ALSTM. 
Adversarial Eliminating With GAN (AEGAN) 
Although Neural networks could achieve stateoftheart performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples–inputs generated by utilizing imperceptible but intentional perturbations to samples from the datasets. How to defense against adversarial examples is an important problem which is well worth to research. So far, only two wellknown methods adversarial training and defensive distillation have provided a significant defense. In contrast to existing methods mainly based on model itself, we address the problem purely based on the adversarial examples itself. In this paper, a novel idea and the first framework based Generative Adversarial Nets named AEGAN capable of resisting adversarial examples are proposed. Extensive experiments on benchmark datasets indicate that AEGAN is able to defense against adversarial examples effectively. 
Adversarial Feature Learning (AFL) 
The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing generators learn to ‘linearize semantics’ in the latent space of such models. Intuitively, such latent spaces may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping — projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and selfsupervised feature learning. 
Adversarial GeneratorEncoder Networks  We present a new autoencodertype architecture, that is trainable in an unsupervised mode, sustains both generation and inference, and has the quality of conditional and unconditional samples boosted by adversarial learning. Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning. The game objective compares the divergences of each of the real and the generated data distributions with the canonical distribution in the latent space. We show that direct generatorvsencoder game leads to a tight coupling of the two components, resulting in samples and reconstructions of a comparable quality to some recentlyproposed more complex architectures. 
Adversarial Hashing Network (HashGAN) 
As the rapid growth of multimodal data, hashing methods for crossmodal retrieval have received considerable attention. Deepnetworksbased crossmodal hashing methods are appealing as they can integrate feature learning and hash coding into endtoend trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multimodal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multimodal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multimodal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other stateoftheart crossmodal hashing methods. 
Adversarial Machine Learning (AML) 
Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called “relaxing assumptions” …. Adversarial Machine Learning 
Adversarial Network Embedding  Learning lowdimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other highorder proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to stateoftheart approaches on benchmark network embedding tasks. 
Adversarial Transformation Networks  Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feedforward neural networks in a selfsupervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier’s outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest stateoftheart ImageNet classifier Inception ResNet v2. 
Adversary Model  In computer science, an online algorithm measures its competitiveness against different adversary models. For deterministic algorithms, the adversary is the same, the adaptive offline adversary. For randomized online algorithms competitiveness can depend upon the adversary model used. BMA,dga 
AffectLM  Human verbal communication includes affective messages which are conveyed through use of emotionally colored words. There has been a lot of research in this direction but the problem of integrating stateoftheart neural language models with affective information remains an area ripe for exploration. In this paper, we propose an extension to an LSTM (Long ShortTerm Memory) language model for generating conversational text, conditioned on affect categories. Our proposed model, AffectLM enables us to customize the degree of emotional content in generated sentences through an additional design parameter. Perception studies conducted using Amazon Mechanical Turk show that AffectLM generates naturally looking emotional sentences without sacrificing grammatical correctness. AffectLM also learns affectdiscriminative word representations, and perplexity experiments show that additional affective information in conversational text can improve language model prediction. 
Affinity Analysis  Affinity analysis is a data analysis and data mining technique that discovers cooccurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of crossselling and upselling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans. ➘ “Market Basket Analysis” bnclassify 
agate  agate is a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that helps you solve realworld problems CAM 
Age Period Cohort Model (APC) 
AgePeriodCohort models is a class of models for demographic rates (mortality/morbidity/fertility/…) observed for a broad age range over a reasonably long time period, and classified by age and date of followup (period) and date of birth (cohort). This type of followup can be shown in a Lexisdiagram; a coordinate system with data of followup along the xaxis, and age along the yaxis. A single persons lifetrajectory is therefore a straight line with slope 1 (as calender time and age advance at the same pace). Tabulated data enumerates the number of events and the risk time (sum of lengths of lifetrajectories) in some subsets of the Lexis diagram, usually subsets classified by age and period in equally long intervals. Individual lifelines can be shown with colouring according to states, or the diagram can just be shown to indicate what ages and periods are covered, and what subsets are used for classification of events and risk time. The AgePeriodCohort model describes the (log)rates as a sum of (nonlinear) age period and cohorteffects. The three variables age (at followup), a, period (i.e. date of followup), p, and cohort (date of birth), c, are related by a=pc – any one person’s age is calculated by subtracting the date of birth from the current date. Hence the three variables used to describe rates are linearly related, and the model can therefore be parametrized in different ways, and still produce the same estimated rates. In popular terms you can say that it is possible to move two linear effects around between the three terms, because the ageterms contains the linear effect of age, the periodterms contains the linear effect of period and the cohort effect contains the linear effect of cohort. An illustration of this phenomenon is in this little “film” of APCeffects on testis cancer rates in Denmark. All sets of estimates will yield the same set of fitted rates. CausalImpact,treatSens,MatchingFrontier 
Agent Based Model (ABM) 
An agentbased model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. It combines elements of game theory, complex systems, emergence, computational sociology, multiagent systems, and evolutionary programming. Monte Carlo Methods are used to introduce randomness. Particularly within ecology, ABMs are also called individualbased models (IBMs), and individuals within IBMs may be simpler than than fully autonomous agents within ABMs. A review of recent literature on individualbased models, agentbased models, and multiagent systems shows that ABMs are used on noncomputing related scientific domains including biology, ecology and social science. Agentbased modeling is related to, but distinct from, the concept of multiagent systems or multiagent simulation in that the goal of ABM is to search for explanatory insight into the collective behavior of agents obeying simple rules, typically in natural systems, rather than in designing agents or solving specific practical or engineering problems. CEC 
AgentBased Computational Economics (ACE) 
Agentbased computational economics (ACE) is the area of computational economics that studies economic processes, including whole economies, as dynamic systems of interacting agents. As such, it falls in the paradigm of complex adaptive systems. In corresponding agentbased models, the ‘agents’ are ‘computational objects modeled as interacting according to rules’ over space and time, not real people. The rules are formulated to model behavior and social interactions based on incentives and information. Such rules could also be the result of optimization, realized through use of AI methods (such as Qlearning and other reinforcement learning techniques). The theoretical assumption of mathematical optimization by agents in equilibrium is replaced by the less restrictive postulate of agents with bounded rationality adapting to market forces. ACE models apply numerical methods of analysis to computerbased simulations of complex dynamic problems for which more conventional methods, such as theorem formulation, may not find ready use. Starting from initial conditions specified by the modeler, the computational economy evolves over time as its constituent agents repeatedly interact with each other, including learning from interactions. In these respects, ACE has been characterized as a bottomup culturedish approach to the study of economic systems. ACE has a similarity to, and overlap with, game theory as an agentbased method for modeling social interactions. But practitioners have also noted differences from standard methods, for example in ACE events modeled being driven solely by initial conditions, whether or not equilibria exist or are computationally tractable, and in the modeling facilitation of agent autonomy and learning. The method has benefited from continuing improvements in modeling techniques of computer science and increased computer capabilities. The ultimate scientific objective of the method is to ‘test theoretical findings against realworld data in ways that permit empirically supported theories to cumulate over time, with each researcher’s work building appropriately on the work that has gone before.’ The subject has been applied to research areas like asset pricing, competition and collaboration, transaction costs, market structure and industrial organization and dynamics, welfare economics, and mechanism design, information and uncertainty, macroeconomics, and Marxist economics. Book: Introduction to AgentBased Economics 
AgentBased Modeling and Simulation (ABMS) 
➘ “Agent Based Model” AgentBased Modeling and Simulation rrepast 
Agglomerative Hierarchical Clustering (AHC) 
Hierarchical clustering algorithms are either topdown or bottomup. Bottomup algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottomup hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. Topdown clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached. http://…tive_Hierarchical_Clustering_Overview.htm cents 
Agglomerative InfoClustering  An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous infoclustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions. 
Aggregated Lexical Table (ALT) 
Correspondence Analysis on Generalised Aggregated Lexical Tables CGP 
Aggregated Wasserstein  We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture modelHMM (GMMHMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semimetric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMMHMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the KullbackLeibler divergence. 
Aggregation Operator  Aggregation operators are mathematical functions that are used to combine information. That is, they are used to combine N data (for example, N numerical values) in a single datum. The arithmetic mean and the weighted mean are the most wellknown aggregation operators. The median and the mode can also been as aggregation operators. The main difference between the arithmetic mean and the weighted mean is that the latter permits us to weight the different data according to their relevance. There exist a large number of different aggregation operators that differ on the assumptions on the data (data types) and about the type of information that we can incorporate in the model. For example, fuzzy integrals permits us to assign relevance to sets of information sources and not only to individual sources as for the weighted mean. chopthin 
Agnostic Disambiguation of Named Entities Using Linked Open Data (AGDISTIS) 
AGDISTIS is an Open Source Named Entity Disambiguation Framework able to link entities against every Linked Data Knowledge Base. The ongoing transition from the current Web of unstructured data to the Data Web yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework). One of the key steps towards extracting RDF from naturallanguage corpora is the disambiguation of named entities. AGDISTIS combines the HITS algorithm with label expansion strategies and string similarity measures. Based on this combination, it can efficiently detect the correct URIs for a given set of named entities within an input text. Furthermore, AGDISTIS is agnostic of the underlying knowledge base. AGDISTIS has been evaluated on different datasets against stateoftheart named entity disambiguation frameworks. http://…/public.pdf clusterCrit 
AIDE  In this paper, we present two new communicationefficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be viewed as a robustification strategy since the method is substantially better behaved than DANE on data partition arising in practice. It is well known that DANE algorithm does not match the communication complexity lower bounds. To bridge this gap, we propose an accelerated variant of the first method, called AIDE, that not only matches the communication lower bounds but can also be implemented using a purely firstorder oracle. Our empirical results show that AIDE is superior to other communication efficient algorithms in settings that naturally arise in machine learning applications. 
AIXIjs  Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the subfield of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropyseeking agents (Orseau, 2011), knowledgeseeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents. 
Akaike Information Criterion (AIC) 
The Akaike information criterion (AIC) is a measure of the relative quality of a statistical model for a given set of data. As such, AIC provides a means for model selection. AIC deals with the tradeoff between the goodness of fit of the model and the complexity of the model. It is founded on information theory: it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. AIC does not provide a test of a model in the sense of testing a null hypothesis; i.e. AIC can tell nothing about the quality of the model in an absolute sense. If all the candidate models fit poorly, AIC will not give any warning of that. cna 
Akid  Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides outofbox tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. Please refer to http://…/latest for documentation. 
Akismet  Akismet or Automattic Kismet is a spam filtering service. It attempts to filter link spam from blog comments and spam TrackBack pings. The filter works by combining information about spam captured on all participating blogs, and then using those spam rules to block future spam. Akismet is offered by Automattic, the company behind WordPress.com. Launched on October 25, 2005, Akismet is said to have captured over 100 billion spam comments and pings as of October 2013. cometExactTest 
akka  Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant eventdriven applications on the JVM. CompareCausalNetworks 
ALAMO  ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a lowcomplexity, linear model composed of explicit nonlinear transformations of the independent variables. Linear combinations of these nonlinear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivativefree optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate firstprinciples knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO’s constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs. 
Algebraic Statistics  Algebraic statistics is the use of algebra to advance statistics. Algebra has been useful for experimental design, parameter estimation, and hypothesis testing. Traditionally, algebraic statistics has been associated with the design of experiments and multivariate analysis (especially time series). In recent years, the term “algebraic statistics” has been sometimes restricted, sometimes being used to label the use of algebraic geometry and commutative algebra in statistics. Compind 
Algebraic Subspace Clustering (ASC) 
Algebraic Subspace Clustering (ASC) is a simple and elegant method based on polynomial fitting and differentiation for clustering noiseless data drawn from an arbitrary union of subspaces. In practice, however, ASC is limited to equidimensional subspaces because the estimation of the subspace dimension via algebraic methods is sensitive to noise. This paper proposes a new ASC algorithm that can handle noisy data drawn from subspaces of arbitrary dimensions. The key ideas are (1) to construct, at each point, a decreasing sequence of subspaces containing the subspace passing through that point; (2) to use the distances from any other point to each subspace in the sequence to construct a subspace clustering affinity, which is superior to alternative affinities both in theory and in practice. Experiments on the Hopkins 155 dataset demonstrate the superiority of the proposed method with respect to sparse and low rank subspace clustering methods. compLasso 
Algorithm QuasiOptimal Learning (AQ) 
The algorithm quasioptimal (AQ) is a powerful machine learning methodology aimed at learning symbolic decision rules from a set of examples and counterexamples. It was first proposed in the late 1960s to solve the Boolean function satisfiability problem and further refined over the following decade to solve the general covering problem. In its newest implementations, it is a powerful but yet little explored methodology for symbolic machine learning classification. It has been applied to solve several problems from different domains, including the generation of individuals within an evolutionary computation framework. The current article introduces the main concepts of the AQ methodology and describes AQ for source detection(AQ4SD), a tailored implementation of the AQ methodology to solve the problem of finding the sources of atmospheric releases using distributed sensor measurements. compositions 
Algorithmia  An Open Marketplace For Algorithms: We’re building a community around stateoftheart algorithm development. Users can create, share, and build on other algorithms and then instantly make them available as a web service. http://…ldingwebsiteexplorer5easysteps.html conformal 
Algorithmic Complexity (AC) 
The information content or complexity of an object can be measured by the length of its shortest description. For instance the string “01010101010101010101010101010101” has the short description “16 repetitions of 01”, while “11001000011000011101111011101100” presumably has no simpler description other than writing down the string itself. More formally, the Algorithmic “Kolmogorov” Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x , where the program is run on some fixed reference universal computer. conover.test 
Algorithmic Polynomial  The approximate degree of a Boolean function $f(x_{1},x_{2},\ldots,x_{n})$ is the minimum degree of a real polynomial that approximates $f$ pointwise within $1/3$. Upper bounds on approximate degree have a variety of applications in learning theory, differential privacy, and algorithm design in general. Nearly all known upper bounds on approximate degree arise in an existential manner from bounds on quantum query complexity. We develop a firstprinciples, classical approach to the polynomial approximation of Boolean functions. We use it to give the first constructive upper bounds on the approximate degree of several fundamental problems: – $O\bigl(n^{\frac{3}{4}\frac{1}{4(2^{k}1)}}\bigr)$ for the $k$element distinctness problem; – $O(n^{1\frac{1}{k+1}})$ for the $k$subset sum problem; – $O(n^{1\frac{1}{k+1}})$ for any $k$DNF or $k$CNF formula; – $O(n^{3/4})$ for the surjectivity problem. In all cases, we obtain explicit, closedform approximating polynomials that are unrelated to the quantum arguments from previous work. Our first three results match the bounds from quantum query complexity. Our fourth result improves polynomially on the $\Theta(n)$ quantum query complexity of the problem and refutes the conjecture by several experts that surjectivity has approximate degree $\Omega(n)$. In particular, we exhibit the first natural problem with a polynomial gap between approximate degree and quantum query complexity. 
Algorithmic Transparency  What information is your blackbox analytics system really relying on, and should it? copCAR 
Aligned Rank Transform (ART) 
Nonparametric data from multifactor experiments arise often in humancomputer interaction (HCI). Examples may include error counts, Likert responses, and preference tallies. But because multiple factors are involved, common nonparametric tests (e.g., Friedman) are inadequate, as they are unable to examine interaction effects. While some statistical techniques exist to handle such data, these techniques are not widely available and are complex. To address these concerns, we present the Aligned Rank Transform (ART) for nonparametric factorial data analysis in HCI. The ART relies on a preprocessing step that “aligns” data before applying averaged ranks, after which point common ANOVA procedures can be used, making the ART accessible to anyone familiar with the Ftest. Unlike most articles on the ART, which only address two factors, we generalize the ART to N factors. We also provide ARTool and ARTweb, desktop and Webbased programs for aligning and ranking data. Our reexamination of some published HCI results exhibits advantages of the ART. cord 
ALINE Algorithm (ALINE) 
The ALINE algorithm (Kondrak, 2000) assigns a similarity score to pairs of phoneticallytranscribed words on the basis of the decomposition of phonemes into elementary phonetic features. The algorithm was originally designed to identify and align cognates in vocabularies of related languages. Nevertheless, thanks to its grounding in universal phonetic principles, the algorithm can be used for estimating the similarity of any pair of words. The principal component of ALINE is a function that calculates the similarity of two phonemes that are expressed in terms of about a dozen multivalued phonetic features (Place, Manner, Voice, etc.). The phonetic features are assigned salience weights that express their relative importance. Feature values are encoded as oatingpoint numbers in the range [0;1]. For example, the feature Manner can take any of the following seven values: stop = 1.0, affricate = 0.9, fricative = 0.8, approximant = 0.6, high vowel = 0.4, mid vowel = 0.2, and low vowel = 0.0. The numerical values re ect the distances between vocal organs during speech production. The overall similarity score is the sum of individual similarity scores between pairs of phonemes in an optimal alignment of two words, which is computed by a dynamic programming algorithm (Wagner and Fischer, 1974). A constant insertion/deletion penalty is applied for each unaligned phoneme. Another constant penalty is set to reduce relative importance of vowelas opposed to consonant phoneme matches. The similarity value is normalized by the length of the longer word. ALINE’s behavior is controlled by a number of parameters: the maximum phonemic score, the insertion/ deletion penalty, the vowel penalty, and the feature salience weights. The parameters have default settings for the cognate matching task, but these settings can be optimized (tuned) on a development set that includes both positive and negative examples of similar words. Greg Kondrak Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification Coxnet 
Allan Variance (AVAR) 
The Allan variance (AVAR), also known as twosample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan. The Allan variance is intended to estimate stability due to noise processes and not that of systematic errors or imperfections such as frequency drift or temperature effects. The Allan variance and Allan deviation describe frequency stability, i.e. the stability in frequency. See also the section entitled ‘Interpretation of value’ below. There are also different adaptations or alterations of Allan variance, notably the modified Allan variance MAVAR or MVAR, the total variance, and the Hadamard variance. There also exist time stability variants such as time deviation TDEV or time variance TVAR. Allan variance and its variants have proven useful outside the scope of timekeeping and are a set of improved statistical tools to use whenever the noise processes are not unconditionally stable, thus a derivative exists. The general Msample variance remains important since it allows dead time in measurements and bias functions allows conversion into Allan variance values. Nevertheless, for most applications the special case of 2sample, or ‘Allan variance’ with T = tau is of greatest interest. CP 
AllPairs Testing  In computer science, allpairs testing or pairwise testing is a combinatorial method of software testing that, for each pair of input parameters to a system (typically, a software algorithm), tests all possible discrete combinations of those parameters. Using carefully chosen test vectors, this can be done much faster than an exhaustive search of all combinations of all parameters, by “parallelizing” the tests of parameter pairs. cqrReg 
Alluvial Diagram  Alluvial diagrams are a type of flow diagram originally developed to represent changes in network structure over time. In allusion to both their visual appearance and their emphasis on flow, alluvial diagrams are named after alluvial fans that are naturally formed by the soil deposited from streaming water. cqrReg,ggalluvial 
ALOJA  This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a longterm collaboration between BSC and Microsoft to automate the characterization of costeffectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex runtime environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendorneutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a testbed and tools to deploy and evaluate the costeffectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resourceconstrained search spaces. The predictive analytics extension, ALOJAML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables modelbased anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA datasets and framework to improve the design and deployment of Big Data applications. 
alphaAlgorithm  The alphaalgorithm is an algorithm used in process mining, aimed at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Maruster. Several extensions or modifications of it have since been presented, which will be listed below. It constructs P/T nets with special properties (workflow nets) from event logs (as might be collected by an ERP system). Each transition in the net corresponds to an observed task. cqrReg,SpatPCA 
AlphaOutliers  Crossover,crossdes 
AlphaSutte Indicator  ➘ “Sutte Indicator” RcmdrPlugin.sutteForecastR,sutteForecastR 
Alternating Direction Method of Multipliers (ADMM) 
We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate nonstationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM approach first splits the convex problem into a global quadratic program and a simple group lasso proximal update. We show that the global problem may be parallelized over rows of the time dependent transition matrices and furthermore that each subproblem may be rewritten in a form identical to the loglikelihood of a Gaussian state space model. Consequently, we develop a Kalman smoothing algorithm to solve the global update in time linear in the length of the series. crrp 
Alternating Directions Dual Decomposition (AD3) 
We present AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs, based on the alternating directions method of multipliers. Like other dual decomposition algorithms, AD3 has a modular architecture, where local subproblems are solved independently, and their solutions are gathered to compute a global update. The key characteristic of AD3 is that each local subproblem has a quadratic regularizer, leading to faster convergence, both theoretically and in practice. We provide closedform solutions for these AD3 subproblems for binary pairwise factors and factors imposing rstorder logic constraints. For arbitrary factors (large or combinatorial), we introduce an active set method which requires only an oracle for computing a local MAP con guration, making AD3 applicable to a wide range of problems. Experiments on synthetic and realworld problems show that AD3 compares favorably with the stateoftheart. csrplus 
Alternating Least Squares (ALS) 
In ALS you’re minimizing the entire loss function at once, but, only twiddling half the parameters. That’s because the optimization has an easy algebraic solution – if half your parameters are fixed. So you fix half, recompute the other half, and repeat. There is no gradient in the optimization step since each optimization problem is convex and doesn’t need an approximate approach. But, each problem you’re solving is not the “real” optimization problem – you fixed half the parameters. http://…/netflix_aaim08%28submitted%29.pdf CUSUMdesign 
Alternating Minimization Induced Active Set Algorithms (AMIAS) 
AMIAS 
AlthamPoisson Distribution  The multiplicative binomial model was introduced as a generalization of the binomial distribution for modelling correlated binomial data. This distribution has not been extensively explored and is revisited in the present study. Some properties of the multiplicative binomial distribution, such as, expressions for the factorial moments and the information matrix, are investigated. The distribution is also extended to accommodate data arising from a Wadley’s problem setting which is frequently encountered in dosemortality studies and is one in which the number of organisms initially treated with a drug is unobserved. The AlthamPoisson distribution is introduced by modelling the unobserved initial number of organisms, as specified by Formula in the multiplicative binomial model, with a Poisson distribution and its suitability for overdispersed data from a Wadley’s problem setting is explored. cvxbiclustr 
Amazon Machine Learning  Amazon Machine Learning is a new service that makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Once your models are ready, Amazon Machine Learning makes it easy to get predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure. Amazon Machine Learning is based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community. The service uses powerful algorithms to create ML models by finding patterns in your existing data. Then, Amazon Machine Learning uses these models to process new data and generate predictions for your application. Amazon Machine Learning is highly scalable and can generate billions of predictions daily, and serve those predictions in realtime and at high throughput. With Amazon Machine Learning, there is no upfront hardware or software investment, and you pay as you go, so you can start small and scale as your application grows. https://…/introducingamazonmachinelearning https://…rningmakedatadrivendecisionsatscale EffectStars 
Ambient Diagnostics  People can usually sense troubles in a car from noises, vibrations, or smells. An experienced driver can even tell where the problem is. We call this kind of skill ‘Ambient Diagnostics’. Ambient Diagnostics is an emerging field that is aimed at detecting abnormities from seemly disconnected ambient data that we take for granted. For example, the human body is a rich ambient data source: temperature, pulses, gestures, sound, forces, moisture, et al. Also, many electronic devices provide pervasive ambient data streams, such as mobile phones, surveillance cameras, satellite images, personal data assistants, wireless networks and so on. efflog 
Analysis of Covariance (ANCOVA) 
Covariance is a measure of how much two variables change together and how strong the relationship is between them. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV), while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV). Therefore, when performing ANCOVA, the DV means are adjusted to what they would be if all groups were equal on the CV. FactoMineR 
Analysis of Means (ANOM) 
The analysis of means (ANOM) is a graphical procedure for comparing a collection of means, rates, or proportions to see if any of them differ significantly from the overall mean, rate, or proportion. The ANOM is a type of multiple comparison procedure. The results of the analysis are summarized in an ANOM decision chart. This chart is similar in appearance to a control chart. It has a centerline, located at the overall mean (rate, or proportion), and upper and lower decision limits. Group means (or rates, or proportions) are plotted on this chart and if one falls beyond a decision limit then that group is said to be statistically different from the overal mean (rate or proportion). In many situations, you need to examine differences among more than two groups. When you are interested in comparing multiple group means, you can use the analysis of means (ANOM) as an alternative to the oneway analysis of variance F test. The ANOM provides a “confidence interval type of approach” that allows you to determine which, if any, of the c groups has a mean significantly different from the overall average of all the group means combined. Instead of looking at the lower and upper limits of a confidence interval, you will be studying which of the c group means are not contained in an interval formed between a lower decision line and an upper decision line. Any individual group mean not contained in this interval is deemed significantly higher than the overall average of all groups if it lies above the upper decision line. Similarly, any group mean that falls below the lower decision line is declared significantly lower than the overall group average. FactoMineR,BiplotGUI 
AnalysisofMarginalTailMeans (ATM) 
This paper presents a novel method, called AnalysisofmarginalTailMeans (ATM), for parameter optimization over a large, discrete design space. The key advantage of ATM is that it offers effective and robust optimization performance for both smooth and rugged response surfaces, using only a small number of function evaluations. This method can therefore tackle a wide range of engineering problems, particularly in applications where the performance metric to optimize is ‘blackbox’ and expensive to evaluate. The ATM framework unifies two parameter optimization methods in the literature: the AnalysisofmarginalMeans (AM) approach (Taguchi, 1986), and the PicktheWinner (PW) approach (Wu et al., 1990). In this paper, we show that by providing a continuum between AM and PW via the novel idea of marginal tail means, the proposed method offers a balance between three fundamental tradeoffs. By adaptively tuning these tradeoffs, ATM can then provide excellent optimization performance over a broad class of response surfaces using limited data. We illustrate the effectiveness of ATM using several numerical examples, and demonstrate how such a method can be used to solve two realworld engineering design problems. 
Analytic Hierarchy Process (AHP) 
The analytic hierarchy process (AHP) is a structured technique for organizing and analyzing complex decisions, based on mathematics and psychology. It was developed by Thomas L. Saaty in the 1970s and has been extensively studied and refined since then. It has particular application in group decision making, and is used around the world in a wide variety of decision situations, in fields such as government, business, industry, healthcare, shipbuilding and education. Rather than prescribing a ‘correct’ decision, the AHP helps decision makers find one that best suits their goal and their understanding of the problem. It provides a comprehensive and rational framework for structuring a decision problem, for representing and quantifying its elements, for relating those elements to overall goals, and for evaluating alternative solutions. Users of the AHP first decompose their decision problem into a hierarchy of more easily comprehended subproblems, each of which can be analyzed independently. The elements of the hierarchy can relate to any aspect of the decision problem—tangible or intangible, carefully measured or roughly estimated, well or poorly understood—anything at all that applies to the decision at hand. Once the hierarchy is built, the decision makers systematically evaluate its various elements by comparing them to each other two at a time, with respect to their impact on an element above them in the hierarchy. In making the comparisons, the decision makers can use concrete data about the elements, but they typically use their judgments about the elements’ relative meaning and importance. It is the essence of the AHP that human judgments, and not just the underlying information, can be used in performing the evaluations. The AHP converts these evaluations to numerical values that can be processed and compared over the entire range of the problem. A numerical weight or priority is derived for each element of the hierarchy, allowing diverse and often incommensurable elements to be compared to one another in a rational and consistent way. This capability distinguishes the AHP from other decision making techniques. In the final step of the process, numerical priorities are calculated for each of the decision alternatives. These numbers represent the alternatives’ relative ability to achieve the decision goal, so they allow a straightforward consideration of the various courses of action. FeaLect 
Analytical Data Mart  Data Mart is a table or a collection of tables containing only the information which the analysts require to do their job. This data is pulled from multiple sources, processed in a uniform manner, documented and optimized. Data Marts frequently contain data aggregated on the customer level, such as the average number of transactions in last 6 months, number of cash loans drawn by the client during the last 12 months, etc. When the computed aggregate values are available, it is much easier to prepare reports. Data Marts are built only once, at the start of the analytical process, but they are cyclically and automatically updated, so as to contain all the relevant information pertaining to customers/products/transactions in a given period of time. ➘ “Analytics Data Store” GameTheory 
Analytics  Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight. hierband 
Analytics Data Store (ADS) 
The ADS (which can be a distributed data store on a Cloud somewhere) supports the data requirements of statistical analysis. Here the data is organised, structured, integrated and enriched to meet the ongoing and occasionally volatile needs of the statisticians and data scientists focusing on data mining. Data in the ADS can be accumulative or completely refreshed. It can have a short life span or have a significantly long lifetime. The ADS is the logistics centre for analytics data. Not only can it be used to provide data into the statistical analysis process, but it can also be used to provide persistent long term storage for analysis outcomes and scenarios, and for future analysis, hence the ability to ‘write back’. The data and information in the ADS may also be augmented with data derived from data stored in the data warehouse, it may also benefit from having its own dedicated Data Mart specifically designed for this purpose. Results of statistical analysis on the ADS data may also result in feedback being used to tune the data reduction, filtering and enrichment rules further downstream, either in smart data analytics, complex event and discrimination adapters or in ET(AL) job streams. hiertest 
AnchorNet  Despite significant progress of deep learning in recent years, stateoftheart semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are wellsuited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intraclass, scale, or viewpoint variations. Trained only with weak imagelevel labels, the final representation successfully captures information about the object structure and improves results of stateoftheart semantic matching methods such as the deformable spatial pyramid or the proposal flow methods. We show positive results on the crossinstance matching task where different instances of the same object category are matched as well as on a new crosscategory semantic matching task aligning pairs of instances each from a different object class. 
Ancillary Statistic  In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals. This concept was introduced by the statistical geneticist Sir Ronald Fisher. https://… Lesson of the Day – Ancillary Statistics HighDimOut 
AngleBased Outlier Detection (ABOD) 
Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the fulldimensional Euclidean data space. In highdimensional data, these approaches are bound to deteriorate due to the notorious ‘curse of dimensionality’. In this paper, we propose a novel approach named ABOD (AngleBased Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the ‘curse of dimensionality’ are alleviated compared to purely distancebased approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the wellestablished distancebased method LOF for various artificial and a real world data set and show ABOD to perform especially well on highdimensional data. HMM 
Annabell  A cognitive neural architecture able to learn and communicate through natural language 
Annealed Generative Adversarial Networks  We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}GAN with a fixed annealing schedule was stable and did not suffer from mode collapse. 
Annotation Chart  Annotation charts are interactive time series line charts that support annotations. kappalab 
Annotation Query Language (AQL) 
Annotation Query Language (AQL) is the language for developing text analytics extractors in the InfoSphere BigInsights Text Analytics system. An extractor is a program written in AQL that extracts structured information from unstructured or semistructured text. AQL is a declarative language. The syntax of AQL is similar to that of Structured Query Language (SQL), but with several important differences. lba 
Annoy  Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large readonly filebased data structures that are mmapped into memory so that many processes may share the same data. RcppAnnoy 
Anomaly Detection  In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions. multimark,dga 
ANOVA Kernel Regression  Highorder parametric models that include terms for feature interactions are applied to various data min ing tasks, where ground truth depends on interactions of features. However, with sparse data, the high dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches which can partially re solve the three issues. In particular, models with fac torized parameters (e.g. Factorization Machines) and sparse learning algorithms (e.g. FTRLProximal) can tackle the first two issues but fail to address the third. Regarding to unstructured parameters, constraints or complicated regularization terms are applied such that hierarchical structures can be imposed. However, these methods make the optimization problem more challeng ing. In this work, we propose Strongly Hierarchical Factorization Machines and ANOVA kernel regression where all the three issues can be addressed without making the optimization problem more difficult. Ex perimental results show the proposed models signifi cantly outperform the stateoftheart in two data min ing tasks: coldstart user response time prediction and stock volatility prediction. 
Anscombe’s Quartet  Anscombe’s quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. 
Answer Set Programming – Reinforcement Learning (ASP(RL)) 
Nonstationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing nonstationary domains. 
Ant Colony Optimization (ACO) 
In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD thesis, the first algorithm was aiming to search for an optimal path in a graph, based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants. nettools 
Anticipatory Analytics  In fact, the application of predictive analytics to intelligence often focuses on what is missing in the data or how certain observations diverge from the expected norm. Some experts insist on calling this process not predictive analytics but anticipatory analytics, in order to distinguish it from similar processes in other realms. ngspatial 
AOGParsing Operator  This paper presents a method of learning qualitatively interpretable models in object detection using popular twostage regionbased ConvNet detection systems (i.e., RCNN). RCNN consists of a region proposal network and a RoI (RegionofInterest) prediction network.By interpretable models, we focus on weaklysupervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a topdown hierarchical and compositional grammar model embedded in a directed acyclic ANDOR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in RCNN, so the proposed method is applicable to many stateoftheart ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of topdown hierarchical and compositional grammar models and the discriminative power of bottomup deep neural networks through endtoend training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG onthefly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a foldingunfolding method to train the AOG and ConvNet endtoend. In experiments, we build on top of the RFCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to stateoftheart methods. 
Apache Accumulo  The Apache Accumulo sorted, distributed key/value store is based on Google’s BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of celllevel access labels and a serverside programming mechanism that can modify key/value pairs at various points in the data management process. oaxaca,GeneralOaxaca 
Apache Ambari  Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple. pcalg 
Apache Avro  Apache Avro is a data serialization system. 
Apache Beam  Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine. Apache Beam is: • UNIFIED – Use a single programming model for both batch and streaming use cases. • PORTABLE – Execute pipelines on multiple execution environments, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. • EXTENSIBLE – Write and share new SDKs, IO connectors, and transformation libraries. 
Apache Bigtop  Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadooprelated projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc…) developed by a community with a focus on the system as a whole, rather than individual projects. In short we strive to be for Hadoop what Debian is to Linux. QFRM 
Apache Cassandra / Cassandra Query Language (CQL) 
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra also places a high value on performance. In 2012, University of Toronto researchers studying NoSQL systems concluded that “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments.” QuACN 
Apache Chukwa  Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data. rbokeh 
Apache Clerezza  Clerezza allows to easily develop semantic web applications by providing tools to manipulate RDF data, create RESTful Web Services and Renderlets using ScalaServerPages. Contents are stored as triples based on W3C RDF specification. These triples are stored via Clerezza’s Smart Content Binding (SCB). SCB defines a technologyagnostic layer to access and modify triple stores. It provides a java implementation of the graph data model specified by W3C RDF and functionalities to operate on that data model. SCB offers a service interface to access multiple named graphs and it can use various providers to manage RDF graphs in a technology specific manner, e.g., using Jena or Sesame. It also provides for adaptors that allow an application to use various APIs (including the Jena api) to process RDF graphs. Furthermore, SCB offers a serialization and a parsing service to convert a graph into a certain representation (format) and vice versa. RcppAnnoy 
Apache CloudStack  Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an onpremises (private) cloud offering, or as part of a hybrid cloud solution. CloudStack is a turnkey solution that includes the entire “stack” of features most organizations want with an IaaS cloud: compute orchestration, NetworkasaService, user and account management, a full and open native API, resource accounting, and a firstclass User Interface (UI). CloudStack currently supports the most popular hypervisors: VMware, KVM, XenServer and Xen Cloud Platform (XCP). Users can manage their cloud with an easy to use Web interface, command line tools, and / or a fullfeatured RESTful API. In addition, CloudStack provides an API that’s compatible with AWS EC2 and S3 for organizations that wish to deploy hybrid clouds. SACOBRA 
Apache Commons Mathematics Library  Commons Math is a library of lightweight, selfcontained mathematics and statistics components addressing the most common problems not available in the Java programming language or Commons Lang. commonsMath 
Apache Crunch  The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many userdefined functions simple to write, easy to test, and efficient to run. Running on top of Hadoop MapReduce and Apache Spark, the Apache Crunch library is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce. The APIs are especially useful when processing data that does not fit naturally into relational model, such as time series, serialized object formats like protocol buffers or Avro records, and HBase rows and columns. For Scala users, there is the Scrunch API, which is built on top of the Java APIs and includes a REPL (readevalprint loop) for creating MapReduce pipelines. smacpod 
Apache Curator  New users of ZooKeeper are surprised to learn that a significant amount of connection management must be done manually. For example, when the ZooKeeper client connects to the ensemble it must negotiate a new session, etc. This takes some time. If you use a ZooKeeper client API before the connection process has completed, ZooKeeper will throw an exception. These types of exceptions are referred to as “recoverable” errors. Curator automatically handles connection management, greatly simplifying client code. Instead of directly using the ZooKeeper APIs you use Curator APIs that internally check for connection completion and wrap each ZooKeeper API in a retry loop. Curator uses a retry mechanism to handle recoverable errors and automatically retry operations. The method of retry is customizable. Curator comes bundled with several implementations (ExponentialBackoffRetry, etc.) or custom implementations can be written. smbinning 
Apache DataFu  Apache DataFu consists of two libraries: Apache DataFu Pig is a collection of useful userdefined functions for data analysis in Apache Pig. Apache DataFu Hourglass is a library for incrementally processing data using Apache Hadoop MapReduce. This library was inspired by the prevelance of sliding window computations over daily tracking data. Computations such as these typically happen at regular intervals (e.g. daily, weekly), and therefore the sliding nature of the computations means that much of the work is unnecessarily repeated. DataFu’s Hourglass was created to make these computations more efficient, yielding sometimes 5095% reductions in computational resources. smoof 
Apache Drill  Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL. It is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google’s Dremel. snht 
Apache Falcon  Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters. stats 
Apache Flink  Apache Flink is an open source platform for scalable batch and stream data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs for creating applications that use the Flink engine: 1. DataSet API for static data embedded in Java, Scala, and Python, 2. DataStream API for unbounded streams embedded in Java and Scala, and 3. Table API with a SQLlike expression language embedded in Java and Scala. Flink also bundles libraries for domainspecific use cases: 1. Machine Learning library, and 2. Gelly, a graph processing API and library. You can integrate Flink easily with other wellknown open source systems both for data input and output as well as deployment. ➘ “Gelly” support.BWS 
Apache Flume  Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store VIFCP 
Apache Forrest  Apache Forrest software is a publishing framework that transforms input from various sources into a unified presentation in one or more output formats. The modular and extensible plugin architecture of Apache Forrest is based on Apache Cocoon and the relevant industry standards that separate presentation from content. Forrest can generate static documents, or be used as a dynamic server, or be deployed by its automated facility. WCE,joint.Cox 
Apache Giraph  Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. zoib 
Apache Hadoop  Apache Hadoop is an opensource software framework for storage and largescale processing of datasets on clusters of commodity hardware. Hadoop is an Apache toplevel project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0. The Apache Hadoop framework is composed of the following modules: • Hadoop Common – contains libraries and utilities needed by other Hadoop modules • Hadoop Distributed File System (HDFS) – a distributed filesystem that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. • Hadoop YARN – a resourcemanagement platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications. • Hadoop MapReduce – a programming model for large scale data processing. Hadoop is being regarded as one of the best platforms for storing and managing big data. It owes its success to its high data storage and processing scalability, low price/performance ratio, high performance, high availability, high schema flexibility, and its capability to handle all types of data. 
Apache Hama  The Apache Hama is an efficient and scalable generalpurpose BSP computing engine which can be used to speed up a large variety of computeintensive analytics applications. 
Apache HBase  Use Apache HBase software when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. HBase is an opensource, distributed, versioned, columnoriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtablelike capabilities on top of Hadoop and HDFS. 
Apache Hive  The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides * tools to enable easy data extract/transform/load (ETL) * a mechanism to impose structure on a variety of data formats * access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * query execution via MapReduce Hive defines a simple SQLlike query language, called HiveQL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the builtin capabilities of the language. HiveQL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s). 
Apache Ignite  Apache Ignite InMemory Data Fabric is a highperformance, integrated and distributed inmemory platform for computing and transacting on largescale data sets in realtime, orders of magnitude faster than possible with traditional diskbased or flash technologies. Apache Ignite InMemory Data Fabric is designed to deliver uncompromised performance for a wide set of inmemory computing use cases from high performance computing, to the industry most advanced data grid, highly available service grid, and streaming. 
Apache Jena  Apache Jena provides a complete framework for building Semantic Web and Linked Data applications in Java, and provides: parsers for RDF/XML, Turtle and Ntriples; a Java programming API; a complete implementation of the SPARQL query language; a rulebased inference engine for RDFS and OWL entailments; TDB (a nonSQL persistent triple store); SDB (a persistent triples store built on a relational store) and Fuseki, an RDF server using web protocols. Jena complies with all relevant recommendations for RDF and related technologies from the W3C. 
Apache Kafka  Apache Kafka is an opensource message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, highthroughput, lowlatency platform for handling realtime data feeds. The design is heavily influenced by transactions logs. 
Apache Lucene  The goal of Apache Lucene and Solr is to provide world class search capabilities. The Apache Lucene project develops opensource search software, including: • Lucene Core, our flagship subproject, provides Javabased indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. • Solr is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface. • Open Relevance Project is a subproject with the aim of collecting and distributing free materials for relevance testing and performance. • PyLucene is a Python port of the Core project. http://…/structureapachelucene 
Apache Lucy  The Apache Lucy search engine library provides fulltext search for dynamic programming languages. 
Apache Mahout  Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. Mahout also provides Java libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but various algorithms are still missing. 
Apache Object Oriented Data Technology (OODT) 
Metadata for middleware (and vice versa): • Transparent access to distributed resources • Data discovery and query optimization • Distributed processing and virtual archives But it’s not just for science! It’s also a software architecture: • Models for information representation • Solutions to knowledge capture problems • Unification of technology, data, and metadata 
Apache Oozie  Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java mapreduce, Streaming mapreduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). 
Apache OpenNLP  Apache OpenNLP software supports the most common NLP tasks, such as tokenization, sentence segmentation, partofspeech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.. 
Apache Phoenix  Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native noSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of noSQL stores. 
Apache Pig  Pig is a highlevel platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language. Pig was originally developed at Yahoo Research around 2006 for researchers to have an adhoc way of creating and executing mapreduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation. 
Apache REEF  REEF, (Retainable Evaluator Execution Framework), is our approach to simplify and unify the lower layers of big data systems on modern resource managers. For managers like Apache YARN, Apache Mesos, Google Omega, and Facebook Corona, REEF provides a centralized control plane abstraction that can be used to build a decentralized data plane for supporting big data systems. Special consideration is given to graph computation and machine learning applications, both of which require data retention on allocated resources to execute multiple passes over the data. More broadly, applications that run on YARN will have the need for a variety of dataprocessing tasks e.g., data shuffle, group communication, aggregation, checkpointing, and many more. Rather than reimplement these for each application, REEF aims to provide them in a library form, so that they can be reused by higherlevel applications and tuned for a specific domain problem e.g., Machine Learning. In that sense, our longterm vision is that REEF will mature into a Big Data Application Server, that will host a variety of tool kits and applications, on modern resource managers. 
Apache S4 (S4) 
S4 is a generalpurpose, distributed, scalable, faulttolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. S4 fills the gap between complex proprietary systems and batchoriented open source computing platforms. We aim to develop a high performance computing platform that hides the complexity inherent in parallel processing system from the application programmer. The core platform is written in Java. The implementation is modular and pluggable, and S4 applications can be easily and dynamically combined for creating more sophisticated stream processing systems. 
Apache SAMOA (SAMOA) 
Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms. Apache SAMOA enables development of new ML algorithms without dealing with the complexity of underlying streaming processing engines (SPE, such as Apache Storm and Apache S4). Apache SAMOA users can develop distributed streaming ML algorithms once and execute the algorithms in multiple SPEs, i.e., code the algorithms once and execute them in multiple SPEs. 
Apache Samza  Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. 
Apache SINGA  SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feedforward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many builtin layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning. 
Apache Spark (Spark) 
Apache Spark is an opensource data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop opensource community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the twostage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for inmemory cluster computing that allows user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms. 
Apache Spatial Information System  Apache SIS provides data structures for geographic data and associated metadata along with methods to manipulate those data structures. The library is an implementation of GeoAPI interfaces and can be used for desktop or server applications. 
Apache Sqoop  Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. 
Apache Stanbol  Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol’s intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), ‘smart’ content workflows or email routing based on extracted entities, topics, etc Apache Stanbol – the Semantic Engine. In order to be used as a semantic engine via its services, all components offer their functionalities in terms of a RESTful web service API. Apache Stanbol’s main features are: • Content Enhancement: Services that add semantic information to ‘nonsemantic’ pieces of content. • Reasoning: Services that are able to retrieve additional semantic information about the content based on the semantic information retrieved via content enhancement. • Knowledge Models: Services that are used to define and manipulate the data models (e.g. ontologies) that are used to store the semantic information. • Persistence: Services that store (or cache) semantic information, i.e. enhanced content, entities, facts, and make it searchable. http://…/Apache_Stanbol 
Apache Storm (Storm) 
Storm is a distributed computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on September 17, 2011. A Storm application is designed as a topology of interfaces which create a “stream” of transformations. It provides similar functionality as a MapReduce job with the exception that the topology will theoretically run indefinitely until it is manually terminated. In 2013 the Apache Software Foundation has accepted Storm into its incubator program. 
Apache Tajo  A big data warehouse system on Hadoop. Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for lowlatency and scalable adhoc queries, online aggregation, and ETL (extracttransformload process) on largedata sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities. 
Apache Texen  Texen is a general purpose text generating utility. It is capable of producing almost any sort of text output. Driven by Ant, essentially an Ant Task, Texen uses a control template, an optional set of worker templates, and control context to govern the generated output. Although TexenTask can be used directly, it is usually subclassed to initialize your control context before generating any output. 
Apache Tez  Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem. 
Apache Tika  The Apache Tika toolkit detects and extracts metadata and text content from various documents – from PPT to CSV to PDF – using existing parser libraries. Tika unifies these parsers under a single interface to allow you to easily parse over a thousand different file types. Tika is useful for search engine indexing, content analysis, translation, and much more. 
Apache Twill  Apache Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their application logic. Apache Twill allows you to use YARN’s distributed capabilities with a programming model that is similar to running threads. 
Apache UIMA  Unstructured Information Management Applications (UIMA) are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as worksfor or locatedat. UIMA is made of many things UIMA enables applications to be decomposed into components, for example ‘language identification’ => ‘language specific segmentation’ => ‘sentence boundary detection’ => ‘entity detection (person/place names etc.)’. Each component implements interfaces defined by the framework and provides selfdescribing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes. Apache UIMA is an Apachelicensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). 
Apache Yarn (Yarn) 
We present the next generation of Hadoop compute platform known as YARN, which departs from its familiar, monolithic architecture. By separating resource management functions from the programming model, YARN delegates many schedulingrelated functions to perjob components. In this new context, MapReduce is just one of the applications running on top of YARN. This separation provides a great deal of flexibility in the choice of programming framework. Examples of alternative programming models that are becoming available on YARN are: Dryad, Giraph, Hoya, REEF, Spark, Storm and Tez. 
Apache Zeppelin  A webbased notebook that enables interactive data analytics. You can make beautiful datadriven, interactive and collaborative documents with SQL, Scala and more. 
Apache ZooKeeper  Apache ZooKeeper is an effort to develop and maintain an opensource server which enables highly reliable distributed coordination. 
Application Function Library (AFL) 
SAP HANA Library References: The SAP HANA Business Function Library (BFL) Reference describes the Business Function Library (BFL) delivered with SAP HANA. This application function library (AFL) contains prebuilt financial functions implemented in C++. The SAP HANA Predictive Analysis Library (PAL) Reference describes the Predictive Analysis Library (PAL) delivered with SAP HANA. This application function library (AFL) defines functions that can be called from within SAP HANA SQLScript procedures to perform analytic algorithms. 
Application Function Modeler (AFM) 
The Application Function Modeler 2.0 (AFM 2) is a graphical editor for complex data analysis pipelines in the HANA Studio. This tool is based on the HANA Data Scientist prototype developed at the HANA Platform Innovation Center in Potsdam, Germany. It is planned to be the next generation of the existing HANA Studio Application Function Modeler which was developed at the TIP CE&SP Algorithm Labs in Shanghai, China. The AFM 2 team consists of original and new developers from both locations. 
Applied Mathematics  Applied mathematics is a branch of mathematics that deals with mathematical methods that find use in science, engineering, business, computer science, and industry. Thus, ‘applied mathematics’ is a mathematical science with specialized knowledge. The term ‘applied mathematics’ also describes the professional specialty in which mathematicians work on practical problems by formulating and studying mathematical models. In the past, practical applications have motivated the development of mathematical theories, which then became the subject of study in pure mathematics where abstract concepts are studied for their own sake. The activity of applied mathematics is thus intimately connected with research in pure mathematics. 
Approximate Bayesian Computation (ABC) 
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all modelbased statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically wellfounded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences, e.g. in population genetics, ecology, epidemiology, and systems biology. 
Approximate LeaveOneOut Cross Validation (ALOOCV) 
We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the outofsample performance. A classical cross validation strategy is the leaveoneout cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer. 
Approximate Message Passing (AMP) 
The standard linear regression (SLR) problem is to recover a vector $\mathbf{x}^0$ from noisy linear observations $\mathbf{y}=\mathbf{Ax}^0+\mathbf{w}$. The approximate message passing (AMP) algorithm recently proposed by Donoho, Maleki, and Montanari is a computationally efficient iterative approach to SLR that has a remarkable property: for large i.i.d.\ subGaussian matrices $\mathbf{A}$, its periteration behavior is rigorously characterized by a scalar stateevolution whose fixed points, when unique, are Bayes optimal. AMP, however, is fragile in that even small deviations from the i.i.d.\ subGaussian model can cause the algorithm to diverge. This paper considers a ‘vector AMP’ (VAMP) algorithm and shows that VAMP has a rigorous scalar stateevolution that holds under a much broader class of large random matrices $\mathbf{A}$: those that are rightrotationally invariant. After performing an initial singular value decomposition (SVD) of $\mathbf{A}$, the periteration complexity of VAMP can be made similar to that of AMP. In addition, the fixed points of VAMP’s state evolution are consistent with the replica prediction of the minimum meansquared error recently derived by Tulino, Caire, Verd\’u, and Shamai. The effectiveness and state evolution predictions of VAMP are confirmed in numerical experiments. 
Approximate Nearest Neighbors (ANN) 
In some applications it may be acceptable to retrieve a “good guess” of the nearest neighbor. In those cases, we can use an algorithm which doesn’t guarantee to return the actual nearest neighbor in every case, in return for improved speed or memory savings. Often such an algorithm will find the nearest neighbor in a majority of cases, but this depends strongly on the dataset being queried. Algorithms that support the approximate nearest neighbor search include localitysensitive hashing, best bin first and balanced boxdecomposition tree based search. 
Approximate Semantic Matching  Eventbased systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, eventbased systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain eventbased systems. In order to address semantic coupling within eventbased systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauribased and distributional semanticsbased similarity and relatedness measures. The matcher is evaluated over a structured representation of Wikipedia and Freebase events. Initial evaluations show that the approach matches events with a maximal combined precisionrecall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach. Towards an Inexact Semantic Complex Event Processing Framework 
Apriori Algorithm  Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis. 
Arc Diagram  In graph drawing, an arc diagram is a style of graph drawing, in which the vertices of a graph are placed along a line in the Euclidean plane, with edges being drawn as semicircles in one of the two halfplanes bounded by the line, or as smooth curves formed by sequences of semicircles. In some cases, line segments of the line itself are also allowed as edges, as long as they connect only vertices that are consecutive along the line. 
ArcGIS  Esri’s ArcGIS is a geographic information system (GIS) for working with maps and geographic information. It is used for: creating and using maps; compiling geographic data; analyzing mapped information; sharing and discovering geographic information; using maps and geographic information in a range of applications; and managing geographic information in a database. The system provides an infrastructure for making maps and geographic information available throughout an organization, across a community, and openly on the Web. 
Archetypal Analysis  Archetypal analysis (Cutler and Breiman, 1994) has the aim to find a few, not necessarily observed, extremal observations (the archetypes) in a multivariate data set such that: 1. all the observations are approximated by convex combinations of the archetypes, and 2. all the archetypes are convex combinations of the observations. 
Arcsine Law  In probability theory, the arcsine laws are a collection of results for onedimensional random walks and Brownian motion (the Wiener process). The best known of these is attributed to Paul Lévy (1939). All three laws relate path properties of the Wiener process to the arcsine distribution. The Lévy Arcsine Law states that the proportion of time that the onedimensional Wiener process is positive follows an arcsine distribution. http://…/randomwalksandarcsinelaw 
Area under Curve (AUC) 
Sometimes, the ROC is used to generate a summary statistic. A common version is the area under the ROC curve, or “AUC” (“Area Under Curve”), or A’ (pronounced “aprime”), or “cstatistic”. 
Argument Component Detection (ACD) 
Argument component detection (ACD) is an important subtask in argumentation mining. ACD aims at detecting and classifying different argument components in natural language texts. 
Argumentation Mining  The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people’s argumentation. 
artfima (ARTFIMA) 

Artificial Continuous Prediction Market (ACPM) 
We propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML algorithm and a local decisionmaking procedure that determines what to bid on what value. The contributions of ACPM are: (i) adaptation to changes in data quality by the use of learning in: (a) the market, which weights each market participant to adjust the influence of each on the market prediction and (b) the participants, which use a Qlearning based trading strategy to incorporate the market prediction into their subsequent predictions, (ii) resilience to a changing population of low and highperforming participants. We demonstrate the effectiveness of ACPM by application to an influenzalike illnesses data set, showing ACPM outperforms a range of wellknown regression models and is resilient to variation in data source quality. 
Artificial General Intelligence (AGI) 
Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can. It is a primary goal of artificial intelligence research and an important topic for science fiction writers and futurists. Artificial general intelligence is also referred to as “strong AI”, “full AI” or as the ability to perform “general intelligent action”. AGI is associated with traits such as consciousness, sentience, sapience, and selfawareness observed in living beings. 
Artificial Intelligence (AI) 
Artificial intelligence (AI) is the humanlike intelligence exhibited by machines or software. It is also an academic field of study. Major AI researchers and textbooks define the field as “the study and design of intelligent agents”, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1955, defines it as “the science and engineering of making intelligent machines”. 
AspectOriented Software Development (AOSD) 
In computing, Aspectoriented software development (AOSD) is an emerging software development technology that seeks new modularizations of software systems in order to isolate secondary or supporting functions from the main program’s business logic. AOSD allows multiple concerns to be expressed separately and automatically unified into working systems. AspectOriented Software Development focuses on the identification, specification and representation of crosscutting concerns and their modularization into separate functional units as well as their automated composition into a working system. 
ASReml  ASReml is a statistical software package for fitting linear mixed models using restricted maximum likelihood, a technique commonly used in plant and animal breeding and quantitative genetics as well as other fields. It is notable for its ability to fit very large and complex data sets efficiently, due to its use of the average information algorithm and sparse matrix methods. It was originally developed by Arthur Gilmour. ASREML can be used in Windows, Linux, and as an addon to SPLUS and R. ASReml 
Association Analysis  
Association for Uncertainty in Artificial Intelligence (AUAI) 
The Association for Uncertainty in Artificial Intelligence is a nonprofit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty. Principles and applications developed within the UAI community have been at the forefront of research in Artificial Intelligence. The UAI community and annual meeting have been primary sources of advances in graphical models for representing and reasoning with uncertainty. 
Association Mapping  Association mapping (genetics), also known as “linkage disequilibrium mapping”, is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes (observable characteristics) to genotypes (the genetic constitution of organisms). 
Association Rule Classification (ARC) 
arc 
Association Rule Learning  Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al. introduced association rules for discovering regularities between products in largescale transaction data recorded by pointofsale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection, Continuous production, and bioinformatics. As opposed to sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. 
Association Rule Mining  
Associative Domain Adaptation  We propose associative domain adaptation, a novel technique for endtoend domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically domain invariant embeddings, while minimizing the classification error on the labeled source domain. We accomplish this by reinforcing associations between source and target data directly in embedding space. Our method can easily be added to any existing classification network with no structural and almost no computational overhead. We demonstrate the effectiveness of our approach on various benchmarks and achieve stateoftheart results across the board with a generic convolutional neural network architecture not specifically tuned to the respective tasks. Finally, we show that the proposed association loss produces embeddings that are more effective for domain adaptation compared to methods employing maximum mean discrepancy as a similarity measure in embedding space. 
Assumed Density Filtering Updating Beliefs on ActionValues (ADFQ) 
While offpolicy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings nonlinearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to offpolicy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on actionvalues (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Qlearning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as computational complexity, usage of uncertainty, and nonlinearity. 
Asymmetric Convolution Bidirectional Long ShortTerm Memory Networks (ACBLSTM) 
Recently deeplearning models have been shown to be capable of making remarkable performance in sentences and documents classification tasks. In this work, we propose a novel framework called ACBLSTM for modeling setences and documents, which combines the asymmetric convolution neural network (ACNN) with the Bidirectional Long ShortTerm Memory network (BLSTM). Experiment results demonstrate that our model achieves stateoftheart results on all six tasks, including sentiment analysis, question type classification, and subjectivity classification. 
Asynchronous Advantage ActorCritic (A3C) 

Asynchronous Distributed Gibbs (ADG) 
Gibbs sampling is a widely used Markov Chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. It is widely believed that MCMC methods do not extend easily to parallel implementations, as their inherently sequential nature incurs a large synchronization cost. This means that new solutions are needed to bring Bayesian analysis fully into the era of largescale computation. In this paper, we present a novel scheme – Asynchronous Distributed Gibbs (ADG) sampling – that allows us to perform MCMC in a parallel fashion with no synchronization or locking, avoiding the typical performance bottlenecks of parallel algorithms. Our method is especially attractive in settings, such as hierarchical randomeffects modeling in which each observation has its own random effect, where the problem dimension grows with the sample size. We prove convergence under some basic regularity conditions, and discuss the proof for similar parallelization schemes for other iterative algorithms. We provide three examples that illustrate some of the algorithm’s properties with respect to scaling. Because our hardware resources are bounded, we have not yet found a limit to the algorithm’s scaling, and thus its true capabilities remain unknown. 
Asynchronous Parallel SAGA (ASAGA) 
We describe Asaga, an asynchronous parallel version of the incremental gradient algorithm Saga that enjoys fast linear convergence rates. We highlight a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently proposed ‘perturbed iterate’ framework that resolves it. We thereby prove that Asaga can obtain a theoretical linear speedup on multicore systems even without sparsity assumptions. We present results of an implementation on a 40core architecture illustrating the practical speedup as well as the hardware overhead. 
Asynchronous Parallel Stochastic Coordinate Descent (ASYCD) 
An asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1=K) on general convex functions. Nearlinear speedup on a multicore system can be expected if the number of processors is O(n1=2) in unconstrained optimization and O(n1=4) in the separableconstrained case, where n is the number of variables. 
Asynchronous Stochastic Variational Inference  Stochastic variational inference (SVI) employs stochastic optimization to scale up Bayesian computation to massive data. Since SVI is at its core a stochastic gradientbased algorithm, horizontal parallelism can be harnessed to allow larger scale inference. We propose a lockfree parallel implementation for SVI which allows distributed computations over multiple slaves in an asynchronous style. We show that our implementation leads to linear speedup while guaranteeing an asymptotic ergodic convergence rate $O(1/\sqrt(T)$ ) given that the number of slaves is bounded by $\sqrt(T)$ ($T$ is the total number of iterations). The implementation is done in a highperformance computing (HPC) environment using message passing interface (MPI) for python (MPI4py). The extensive empirical evaluation shows that our parallel SVI is lossless, performing comparably well to its counterpart serial SVI with linear speedup. 
Atomic Information Resource Data Model (AIR) 
Entities are simply formed by grouping Attributes together. One or more Attributes are shared between two or more Entities. According to the AtomicDB terminology, shared attributes are called bridge concepts and this is the equivalent of the relationship that is implemented with primary and foreign keys on two tables but here we are completely independent to mix and match them. For example PartColor, could be merged with another attribute from a different table in another relational database. 
Atomic Triangular Matrix  An atomic (upper or lower) triangular matrix is a special form of unitriangular matrix, where all of the offdiagonal entries are zero, except for the entries in a single column. Such a matrix is also called a Gauss matrix or a Gauss transformation matrix. 
Atrain Distributed System (ADS) 
A special type of distributed system called by ‘Atrain Distributed System’ (ADS) which is very suitable for processing big data using the heterogeneous data structures ratrain or the homogeneous data structure rtrain. A simple ‘Atrain Distributed System’ is called an unitier ADS. The ‘Multitier Atrain Distributed System’ is an extension of the unitier ADS. The ADS is scalable upto any extent as many times as required. Two new type of network topologies are defined for ADS called by ‘multihorse cart’ topology and ‘cycle’ topology which can support increasing volume of big data. Where ratrain and rtrain data structures are introduced for the processing of big data, the data structures ‘heterogeneous data structure MA’ and ‘homogeneous data structure MT’ are introduced for the processing of big data including temporal big data too. Both MA and MT can be well implemented in multitier ADS. We define cyclic train and cyclic atrain, and then doubly linked train/atrain. A method is proposed on how to implement Solid Matrices, ndimensional arrays, ndimensional larrays etc. in a computer memory using the data structures MT and MA. http://…/biswasIJCO1420142.pdf 
ATRank  A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNNbased methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via selfattention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models. 
Attention Economy  Attention economics is an approach to the management of information that treats human attention as a scarce commodity, and applies economic theory to solve various information management problems. 
Attentional Multiagent Predictive Modeling (VAIN) 
Multiagent predictive modeling is an essential step for understanding physical, social and teamplay systems. Recently, Interaction Networks (INs) were proposed for the task of modeling multiagent physical systems, INs scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). In this paper we introduce VAIN, a novel attentional architecture for multiagent predictive modeling that scales linearly with the number of agents. We show that VAIN is effective for multiagent predictive modeling. Our method is evaluated on tasks from challenging multiagent prediction domains: chess and soccer, and outperforms competing multiagent approaches. 
Attentive Convolution Network (AttentiveConvNet) 
In NLP, convolution neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higherlevel representation of a word by taking into account its local fixedsize context in input text $t^x$. In this work, we propose an attentive convolution network, AttentiveConvNet. It extends the context scope of the convolution operation, deriving higherlevel features for a word not only from local context, but also from information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text $t^x$ that are distant or (ii) from a second input text, the context text $t^y$. In an evaluation on sentence relation classification (textual entailment and answer sentence selection) and text classification, experiments demonstrate that AttentiveConvNet has stateoftheart performance and outperforms RNN/CNN variants with and without attention. 
Attentive Memory Network  Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to keep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a standalone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a standalone task and present the Attentive Memory Network (AMN), an endtoend trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the stateoftheart models, while using considerably fewer computations. 
Attribution Modeling  Attribution modelling, in essence, means reporting on the impact of communication activity using metrics like: • Turnover • Profit • Customer retention • Volume of sales Instead of metrics like: • Share of voice • Web visits • Click through rate (CTR) • Impressions There’s a big difference between these two lists. The second list contains important metrics, but businesses could survive without ever increasing them. The business metrics in the first list, however, are essential for all companies that want to survive and thrive. Understanding the impact of communications on business metrics is – rightly – more important to senior executives. This is the primary objective of attribution modelling; to provide holistic, accurate information about the financial return activities are delivering so you can refine them, adjust what you’re doing, and use the same budget to deliver more value to your business and your customers. 
AuerGervini Graphical Bayesian Approach  This article approaches the problem of selecting significant principal components from a Bayesian model selection perspective. The resulting Bayes rule provides a simple graphical technique that can be used instead of (or together with) the popular scree plot to determine the number of significant components to retain. We study the theoretical properties of the new method and show, by examples and simulation, that it provides more clearcut answers than the scree plot in many interesting situations. PCDimension 
Augmented Backward Elimination (ABE) 
Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>. abe 
Augmented DickeyFuller Test (ADF) 
In statistics and econometrics, an augmented DickeyFuller test (ADF) is a test for a unit root in a time series sample. It is an augmented version of the DickeyFuller test for a larger and more complicated set of time series models. The augmented DickeyFuller (ADF) statistic, used in the test, is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. 
Augmented Intelligence  
Augmented Interval Markov Chains (AIMC) 
In this paper we propose augmented interval Markov chains (AIMCs): a generalisation of the familiar interval Markov chains (IMCs) where uncertain transition probabilities are in addition allowed to depend on one another. This new model preserves the flexibility afforded by IMCs for describing stochastic systems where the parameters are unclear, for example due to measurement error, but also allows us to specify transitions with probabilities known to be identical, thereby lending further expressivity. The focus of this paper is reachability in AIMCs. We study the qualitative, exact quantitative and approximate reachability problem, as well as natural subproblems thereof, and establish several upper and lower bounds for their complexity. We prove the exact reachability problem is at least as hard as the famous squareroot sum problem, but, encouragingly, the approximate version lies in $\mathbf{NP}$ if the underlying graph is known, whilst the restriction of the exact problem to a constant number of uncertain edges is in $\mathbf{P}$. Finally, we show that uncertainty in the graph structure affects complexity by proving $\mathbf{NP}$completeness for the qualitative subproblem, in contrast with an easilyobtained upper bound of $\mathbf{P}$ for the same subproblem with known graph structure. 
Augmented Inverse Probability Weighting (AIPWT) 
In this paper, we discuss an estimator for average treatment effects (ATEs) known as the augmented inverse propensity weighted (AIPW) estimator. This estimator has attractive theoretical properties and only requires practitioners to do two things they are already comfortable with: (1) specify a binary regression model for the propensity score, and (2) specify a regression model for the outcome variable. Perhaps the most interesting property of this estimator is its socalled ‘‘double robustness.’’ Put simply, the estimator remains consistent for the ATE if either the propensity score model or the outcome regression is misspecified but the other is properly specified. After explaining the AIPW estimator, we conduct a Monte Carlo experiment that compares the finite sample performance of the AIPW estimator to three common competitors: a regression estimator, an inverse propensity weighted (IPW) estimator, and a propensity score matching estimator. The Monte Carlo results show that the AIPW estimator has comparable or lower mean square error than the competing estimators when the propensity score and outcome models are both properly specified and, when one of the models is misspecified, the AIPW estimator is superior. ‘Robustsquared’ Imputation Models Using BART 
Augmented kmeans  Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, kmeans is a very important algorithm for this. In this technical report, we develop a new kmeans variant called Augmented kmeans, which is a hybrid of kmeans and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent reestimation of cluster means. Observations which can’t be firmly identified into clusters are excluded from the reestimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, nonsphericity, substantial overlap, and high scatter. Augmented kmeans frequently outperforms kmeans by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report. 
Augmented Lagrangian Method (ALM) 
Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems; the difference is that the augmented Lagrangian method adds an additional term to the unconstrained objective. This additional term is designed to mimic a Lagrange multiplier. The augmented Lagrangian is not the same as the method of Lagrange multipliers. Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation). 
Author Rank  Author Rank is a new aspect of Google’s search algorithm that will score online content creators. Similar to SEO rankings for sites and pages, authors will now have an associated ranking based on a few contributing factors, including, but not limited to: • Social sharing of your Google+ posts • Quality of backlinks to your content • Interactions with your content (comments and shares) • Timely and topical content • Reputation and authority on other social networks • PageRank In short, Google will be assessing your reputation, authority, and the general reception of your content to determine just how valuable you are as an author. This ranking methodology provides writers with a greater incentive to not only build out their Google Plus profiles (smart move, Google), but also to ensure their online presence is streamlined and connected across all social networks and blogs to which they contribute. A user who is wellconnected, wellinformed, produces great content, and is seen by the larger community as valuable, will without question reap the rewards of a high author ranking. A Guide on Google’s Author Rank 
Auto Regressive TIme VArying (ARTIVA) 
Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying generegulation networks from biological timecourse expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail. 
Autocorrelation  Autocorrelation, also known as serial correlation or crossautocorrelation, is the crosscorrelation of a signal with itself at different points in time (that is what the cross stands for). Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. 
AutoCorrelation  
AutoCorrelation Function (ACF) 
The autocorrelation function measures the correlation of a signal x(t) with itself shifted by some time delay tau. The autocorrelation function can be used to detect repeats or periodicity in a signal. E.g. to use the autocorrelation to assess the effect of fluctuations (noise) on a periodic signal. 
Autocorrelogram  
Autodependogram  In this paper the serial dependences between the observed time series and the lagged series, taken into account onebyone, are graphically analyzed by what we have chosen to call the ‘autodependogram’. This tool, is a sort of natural nonlinear counterpart of the wellknown autocorrelogram used in the linear context. The simple idea, instead of using autocorrelations at varying time lags, exploits the c2test statistics applied to convenient contingency tables. The usefulness of this graphical device is confirmed by simulations from certain classes of wellknown models, characterized by randomness and also by different kinds of linear and nonlinear dependences. The autodependogram is also applied to both environmental and economic real data. In this way its ability to detect nonlinear features is highlighted. http://…/00463531472f604d52000000.pdf 
Autoencoded Variational Inference For Topic Model (AVITM) 
Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven difficult to apply to topic models in practice. We present what is to our knowledge the first effective AEVB based inference method for latent Dirichlet allocation (LDA), which we call Autoencoded Variational Inference For Topic Model (AVITM). This model tackles the problems caused for AEVB by the Dirichlet prior and by component collapsing. We find that AVITM matches traditional methods in accuracy with much better inference time. Indeed, because of the inference network, we find that it is unnecessary to pay the computational cost of running variational optimization on test data. Because AVITM is black box, it is readily applied to new topic models. As a dramatic illustration of this, we present a new topic model called ProdLDA, that replaces the mixture model in LDA with a product of experts. By changing only one line of code from LDA, we find that ProdLDA yields much more interpretable topics, even if LDA is trained via collapsed Gibbs sampling. 
Autoencoder  An autoencoder, autoassociator or Diabolo network is an artificial neural network used for learning efficient codings. The aim of an autoencoder is to learn a compressed, distributed representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. 
AutoEncoder Feature Selector (AEFS) 
Highdimensional data in many areas such as computer vision and machine learning brings in computational and analytical difficulty. Feature selection which select a subset of features from original ones has been proven to be effective and efficient to deal with highdimensional data. In this paper, we propose a novel AutoEncoder Feature Selector (AEFS) for unsupervised feature selection. AEFS is based on the autoencoder and the group lasso regularization. Compared to traditional feature selection methods, AEFS can select the most important features in spite of nonlinear and complex correlation among features. It can be viewed as a nonlinear extension of the linear method regularized selfrepresentation (RSR) for unsupervised feature selection. In order to deal with noise and corruption, we also propose robust AEFS. An efficient iterative algorithm is designed for model optimization and experimental results verify the effectiveness and superiority of the proposed method. 
AutoEncoding Variational Bayes  How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is twofold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results. GitXiv 
Automated Lyric Annotation (ALA) 
Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation. We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is to rephrase the original text in a more easily understandable manner. However, in ALA the system must often include additional information to clarify niche terminology and abstract concepts. To stimulate research on this task, we release a large collection of crowdsourced annotations for song lyrics. We analyze the performance of translation and retrieval models on this task, measuring performance with both automated and human evaluation. We find that each model captures a unique type of information important to the task. 
Automated Machine Learning (AutoML) 
The Current State of Automated Machine Learning 
Automated Model Order Selection Algorithm (AMOS) 
One of the longstanding problems in spectral graph clustering (SGC) is the socalled model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent analysis of clustering reliability for SGC under the random interconnection model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliability guarantees. Comparing to three other automated graph clustering methods on realworld datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics. 
Automated Predictive Library (APL) 
Automated Predictive Library (APL) is a HANA Application Function Library (AFL) which is meant to expose the features of InfiniteInsight (aka Kxen) inside HANA. 
Automatic Interactive Data Exploration (AIDE) 
In this paper, we argue that database systems be augmented with an automated data exploration service that methodically steers users through the data in a meaningful way. Such an automated system is crucial for deriving insights from complex datasets found in many big data applications such as scientific and healthcare applications as well as for reducing the human effort of data exploration. Towards this end, we present AIDE, an Automatic Interactive Data Exploration framework that assists users in discovering new interesting data patterns and eliminate expensive adhoc exploratory queries. AIDE relies on a seamless integration of classification algorithms and data management optimization techniques that collectively strive to accurately learn the user interests based on his relevance feedback on strategically collected samples. We present a number of exploration techniques as well as optimizations that minimize the number of samples presented to the user while offering interactive performance. AIDE can deliver highly accurate query predictions for very common conjunctive queries with small user effort while, given a reasonable number of samples, it can predict with high accuracy complex disjunctive queries. It provides interactive performance as it limits the user wait time per iteration of exploration to less than a few seconds. 
Automatic Machine Learning (AutoML) 
The availability of large data sets and computational resources have encouraged the development of machine learning and datadriven models which pose an interesting alternative to explicit and fully structured models of behaviour. A battery of tools are now available which can automatically learn interesting mappings and simulate complex phenomena. These include techniques such as Hidden Markov Models, Neural Networks, Support Vector Machines, Mixture Models, Decision Trees and Bayesian Network Inference . In general, these tools have fallen into two classes: discriminative and generative models. The first attempts to optimize the learning for a particular task while the second models a phenomenon in its entirety. This difference between the two approaches will be addressed in this thesis in particular detail and the above probabilistic formalisms will be employed in deriving a machine learning system for our purposes. ChaLearn Automatic Machine Learning Challenge <a href="'Automate” target=”top”>https://…/ 
Automatic MultiscaleBased Peak Detection (AMPD) 
A method for automatic detection of peaks in noisy periodic and quasiperiodic signals. This method, called automatic multiscalebased peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scaledependent occurrences of local maxima. ampd 
Automatic Query Expansion (AQE) 
Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. In the context of search engines, query expansion involves evaluating a user’s input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as: • Finding synonyms of words, and searching for the synonyms as well • Finding all the various morphological forms of words by stemming each word in the search query • Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results • Reweighting the terms in the original query Query expansion is a methodology studied in the field of computer science, particularly within the realm of natural language processing and information retrieval. 
Automatic Short Answer Grading (ASAG) 
Automatic short answer grading (ASAG) techniques are designed to automatically assess short answers to questions in natural language, having a length of a few words to a few sentences. Supervised ASAG techniques have been demonstrated to be effective but suffer from a couple of key practical limitations. They are greatly reliant on instructor provided model answers and need labeled training data in the form of graded student answers for every assessment task. To overcome these, in this paper, we introduce an ASAG technique with two novel features. We propose an iterative technique on an ensemble of (a) a text classifier of student answers and (b) a classifier using numeric features derived from various similarity measures with respect to model answers. Second, we employ canonical correlation analysis based transfer learning on a common feature representation to build the classifier ensemble for questions having no labelled data. The proposed technique handsomely beats all winning supervised entries on the SCIENTSBANK dataset from the Student Response Analysis task of SemEval 2013. Additionally, we demonstrate generalizability and benefits of the proposed technique through evaluation on multiple ASAG datasets from different subject topics and standards. 
Automatic Speech Recognition (ASR) 
In computer science and electrical engineering, speech recognition (SR) is the translation of spoken words into text. It is also known as ‘automatic speech recognition’ (ASR), ‘computer speech recognition’, or just ‘speech to text’ (STT). Some SR systems use ‘training’ (also called ‘enrolment’) where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person’s specific voice and uses it to finetune the recognition of that person’s speech, resulting in increased accuracy. Systems that do not use training are called ‘speaker independent’ systems. Systems that use training are called ‘speaker dependent’. Speech recognition applications include voice user interfaces such as voice dialling (e.g. ‘Call home’), call routing (e.g. ‘I would like to make a collect call’), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speechtotext processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person’s voice or it can be used to authenticate or verify the identity of a speaker as part of a security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their speech recognition systems being based on deep learning. GitXiv 
Automatic Summarization  Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. An example of the use of summarization technology is search engines such as Google. Document summarization is another. Generally, there are two approaches to automatic summarization: extraction and abstraction. Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might generate. Such a summary might contain words not explicitly present in the original. Research into abstractive methods is an increasingly important and active research area, however due to complexity constraints, research to date has focused primarily on extractive methods. 
Autoregressive Conditional Duration (ACD) 
In financial econometrics, an autoregressive conditional duration (ACD, Engle and Russell (1998)) model considers irregularly spaced and autocorrelated intertrade durations. ACD is analogous to GARCH. Indeed, in a continuous double auction (a common trading mechanism in many financial markets) waiting times between two consecutive trades vary at random. ➘ “Generalized Autoregressive Conditional Heteroscedasticity” 
Autoregressive Conditional Heteroskedasticity (ARCH) 
In econometrics, autoregressive conditional heteroskedasticity (ARCH) models are used to characterize and model observed time series. They are used whenever there is reason to believe that, at any point in a series, the error terms will have a characteristic size or variance. In particular ARCH models assume the variance of the current error term or innovation to be a function of the actual sizes of the previous time periods’ error terms: often the variance is related to the squares of the previous innovations. Such models are often called ARCH models (Engle, 1982), although a variety of other acronyms are applied to particular structures of model which have a similar basis. ARCH models are employed commonly in modeling financial time series that exhibit timevarying volatility clustering, i.e. periods of swings followed by periods of relative calm. ARCHtype models are sometimes considered to be part of the family of stochastic volatility models but strictly this is incorrect since at time t the volatility is completely predetermined (deterministic) given previous values. 
Autoregressive Conditional Poisson (ACP) 
When modelling count data that comes in the form of a time series, the static Poisson regression and standard time series models are often not appropriate. A current study therefore involves the evaluation of several observationdriven and parameterdriven time series models for count data. In the observationdriven class of models, a fairly simple model is the Autoregressive Conditional Poisson (ACP) model. 
Autoregressive Distributed Lag Model (ARDL) 
A distributedlag model is a dynamic model in which the effect of a regressor x on y occurs over time rather than all at once. In the simple case of one explanatory variable and a linear relationship. This form is very similar to the infinitemovingaverage representation of an ARMA process, except that the lag polynomial on the righthand side is applied to the explanatory variable x rather than to a whitenoise process e. The individual coefficients ßs are called lag weights and the collectively comprise the lag distribution. They define the pattern of how x affects y over time. 
Autoregressive Fractionally Integrated Moving Average (ARFIMA,FARIMA) 
In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (autoregressive integrated moving average) models by allowing noninteger values of the differencing parameter. These models are useful in modeling time series with long memory – that is, in which deviations from the longrun mean decay more slowly than an exponential decay. The acronyms “ARFIMA” or “FARIMA” are often used, although it is also conventional to simply extend the “ARIMA(p,d,q)” notation for models, by simply allowing the order of differencing, d, to take fractional values. 
Autoregressive Integrated Moving Average (ARIMA) 
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). They are applied in some cases where data show evidence of nonstationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied to remove the nonstationarity. The model is generally referred to as an ARIMA(p,d,q) model where parameters p, d, and q are nonnegative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the BoxJenkins approach to timeseries modelling. When one of the three terms is zero, it is usual to drop “AR”, “I” or “MA” from the acronym describing the model. For example, ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1). 
Autoregressive Integrated Moving Average With Explanatory Variable (ARIMAX) 
Matlab: ARIMAX Model Specifications ARIMA vs. ARIMAX – which approach is better to analyze and forecast macroeconomic time series? 
AutoSklearn  autosklearn is an automated machine learning toolkit and a dropin replacement for a scikitlearn estimator. autosklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, metalearning and ensemble construction. Learn more about the technology behind autosklearn by reading this paper published at the NIPS 2015 . 
Auxiliary Inference Divergence Estimator (AIDE) 
Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific dataset. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the output distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics. 
Average Hazard Ratio (AHR) 
➘ “Hazard Ratio” Estimation of the Average Hazard Ratio 
Average Nearest Neighbor Degree (ANND) 

Average Nearest Neighbor Rank (ANNR) 
The average nearest neighbor degree (ANND) of a node of degree $k$, as a function of $k$, is often used to characterize dependencies between degrees of a node and its neighbors in a network. We study the limiting behavior of the ANND in undirected random graphs with general i.i.d. degree sequences and arbitrary joint degree distribution of neighbor nodes, when the graph size tends to infinity. When the degree distribution has finite variance, the ANND converges to a deterministic function and we prove that for the configuration model, where nodes are connected at random, this, naturally, is a constant. For degree distributions with infinite variance, the ANND in the configuration model scales with the size of the graph and we prove a central limit theorem that characterizes this behavior. As a result, the ANND is uninformative for graphs with infinite variance degree distributions. We propose an alternative measure, the average nearest neighbor rank (ANNR) and prove its convergence to a deterministic function whenever the degree distribution has finite mean. In addition to our theoretical results we provide numerical experiments to show the convergence of both functions in the configuration model and the erased configuration model, where selfloops and multiple edges are removed. These experiments also shed new light on the wellknown `structural negative correlations’, or `finitesize effects’, that arise in simple graphs, because large nodes can only have a limited number of large neighbors. In particular we show that the majority of such effects for regularly varying distributions are due to a sampling bias. ➘ “Average Nearest Neighbor Degree” 
Average Shifted Histogram (ASH) 
A simple device has been proposed for eliminating the bin edge problem of the frequency polygon while retaining many of the computational advantages of a density estimate based on bin counts. Scott (1983, 1985b) considered the problem of choosing among the collection of multivariate frequency polygons, each with the same smoothing parameter but di ering bin origins. Rather than choosing the \smoothest’ such curve or surface, he proposed averaging several of the shifted frequency polygons. As the average of piecewise linear curves is also piecewise linear, the resulting curve appears to be a frequency polygon as well. If the weights are nonnegative and sum to 1, the resulting \averaged shifted frequency polygon’ (ASFP) is nonnegative and integrates to 1. A nearly equivalent device is to average several shifted histograms, which is just as general but simpler to describe and analyze. The result is the \averaged shifted histogram’ (ASH). Since the average of piecewise constant functions such as the histogram is also piecewise constant, the ASH appears to be a histogram as well. In practice, the ASH is made continuous using either of the linear interpolation schemes described for the frequency polygon in Chapter 4 and will be referred to as the frequency polygon of the averaged shifted histogram (FPASH). The ASH is the practical choice for computationally and statistically e cient density estimation. 
Average Shifted Visual Predictive Checks (asVPC) 
➘ “Visual Predictive Checks” 
Average Topk Loss  In this work, we introduce the average top$k$ (AT$_k$) loss as a new ensemble loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the AT$_k$ loss is a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss, but can combines their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradientbased method. We provide an intuitive interpretation of the AT$_k$ loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of MAT$_k$ learning on the classification calibration of the AT$_k$ loss and the error bounds of AT$_k$SVM. We demonstrate the applicability of minimum average top$k$ learning for binary classification and regression using synthetic and real datasets. 
Average Treatment Effect (ATE) 
The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial (i.e., an experimental study), the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter (i.e., an estimand or property of a population) that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational and experimental study designs may enable one to estimate an ATE in a variety of ways. 
Averaged OneDependence Estimators (AODE) 
Averaged onedependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attributeindependence problem of the popular naive Bayes classifier. It frequently develops substantially more accurate classifiers than naive Bayes at the cost of a modest increase in the amount of computation. 
Averaged Shifted Frequency Polygon (ASFP) 
http://…/Scott2015.pdf 
Awk  AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unixlike operating systems. AWK was created at Bell Labs in the 1970s, and its name is derived from the family names of its authors – Alfred Aho, Peter Weinberger, and Brian Kernighan. The acronym is pronounced the same as the name of the bird, auk (which acts as an emblem of the language such as on The AWK Programming Language book cover – the book is often referred to by the abbreviation TAPL). When written in all lowercase letters, as awk, it refers to the Unix or Plan 9 program that runs scripts written in the AWK programming language. The AWK language is a datadriven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. While AWK has a limited intended application domain, and was especially designed to support oneliner programs, the language is Turingcomplete, and even the early Bell Labs users of AWK often wrote wellstructured large AWK programs. http://ferd.ca/awkin20minutes.html 
Azure Machine Learning  Azure Machine Learning, a fullymanaged service in the cloud that enables you to easily build, deploy and share advanced analytics solutions. No software to download, no servers to manage – all you need to start is a browser and internet connectivity. 
Advertisements