C Math Library (CML) 
➘ “C Numerical Library” 
C Numerical Library (CNL) 
The IMSL C Numerical Library provides advanced mathematical and statistical functionality for programmers to embed in their existing or new applications. Written in standard C, the IMSL C Library can be embedded into any C or C++ application as well as any existing application that can reference a C library. 
C4.5  C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. 
Cabinet Tree  Treemaps are wellknown for visualizing hierarchical data. Most related approaches have been focused on layout algorithms and paid little attention to other display properties and interactions. Furthermore, the structural information in conventional Treemaps is too implicit for viewers to perceive. This paper presents Cabinet Tree, an approach that: i) draws branches explicitly to show relational structures, ii) adapts a spaceoptimized layout for leaves and maximizes the space utilization, iii) uses coloring and labeling strategies to clearly reveal patterns and contrast different attributes intuitively. We also apply the continuous node selection and detail window techniques to support user interaction with different levels of the hierarchies. Our quantitative evaluations demonstrate that Cabinet Tree achieves good scalability for increased resolutions and big datasets. 
CacheDiff  We present a sampling method called, CacheDiff, that has both time and space complexity of O(k) to randomly select k items from a pool of N items, in which N is known. 
Caffe  Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2Clause license. http://…/neuralnetworkswithcaffeonthegpu Github 
Canberra Distance  The Canberra distance is a numerical measure of the distance between pairs of points in a vector space, introduced in 1966 and refined in 1967 by G. N. Lance and W. T. Williams. It is a weighted version of L1 (Manhattan) distance. The Canberra distance has been used as a metric for comparing ranked lists and for intrusion detection in computer security. 
CannistraiAlanisRavai Index (CAR) 
Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed CAR (CannistraiAlanisRavai) index shows the power of local link/triangle information in improving linkprediction accuracy. 
Canonical Correlated AutoEncoder (C2AE) 
Multilabel classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of labelcorrelation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows endtoend learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against stateoftheart methods for multilabel classification. 
Canonical Correlation Analysis (CCA,CANCOR) 
In statistics, canonicalcorrelation analysis (CCA) is a way of making sense of crosscovariance matrices. If we have two vectors X = (X1, …, Xn) and Y = (Y1, …, Ym) of random variables, and there are correlations among the variables, then canonicalcorrelation analysis will find linear combinations of the Xi and Yj which have maximum correlation with each other. T. R. Knapp notes ‘virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonicalcorrelation analysis, which is the general procedure for investigating the relationships between two sets of variables.’ Stochastic Approximation for Canonical Correlation Analysis 
Canonical Correspondence Analysis (CCA) 
In applied statistics, canonical correspondence analysis (CCA) is a multivariate constrained ordination technique that extracts major gradients among combinations of explanatory variables in a dataset. The requirements of a CCA are that the samples are random and independent and that the independent variables are consistent within the sample site and errorfree. 
Canonical Divergence Analysis (CDA) 
We aim to analyze the relation between two random vectors that may potentially have both different number of attributes as well as realizations, and which may even not have a joint distribution. This problem arises in many practical domains, including biology and architecture. Existing techniques assume the vectors to have the same domain or to be jointly distributed, and hence are not applicable. To address this, we propose Canonical Divergence Analysis (CDA). 
Canonical Variate Regression (CVR) 
CVR 
Canopy Clustering Algorithm  The canopy clustering algorithm is an unsupervised preclustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the Kmeans algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. The algorithm proceeds as follows, using two thresholds T_1 (the loose distance) and T_2 (the tight distance), where T_1 > T_2 . 1. Begin with the set of data points to be clustered. 2. Remove a point from the set, beginning a new ‘canopy’. 3. For each point left in the set, assign it to the new canopy if the distance less than the loose distance T_1. 4. If the distance of the point is additionally less than the tight distance T_2, remove it from the original set. 5. Repeat from step 2 until there are no more data points in the set to cluster. 6. These relatively cheaply clustered canopies can be subclustered using a more expensive but accurate algorithm. An important note is that individual data points may be part of several canopies. As an additional speedup, an approximate and fast distance metric can be used for 3, where a more accurate and slow distance metric can be used for step 4. Since the algorithm uses distance functions and requires the specification of distance thresholds, its applicability for highdimensional data is limited by the curse of dimensionality. Only when a cheap and approximative – lowdimensional – distance function is available, the produced canopies will preserve the clusters produced by Kmeans. 
CAPTheorem (Brewer’s theorem) 
In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it was successful or failed) • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) 
CaptureMarkRecapture Analysis  Mark and recapture is a method commonly used in ecology to estimate an animal population’s size. A portion of the population is captured, marked, and released. Later, another portion is captured and the number of marked individuals within the sample is counted. Since the number of marked individuals within the second sample should be proportional to the number of marked individuals in the whole population, an estimate of the total population size can be obtained by dividing the number of marked individuals by the proportion of marked individuals in the second sample. The method is most useful when it is not practical to count all the individuals in the population. Other names for this method, or closely related methods, include capturerecapture, capturemarkrecapture, markrecapture, sightresight, markreleaserecapture, multiple systems estimation, band recovery, the Petersen method and the Lincoln method. Another major application for these methods is in epidemiology, where they are used to estimate the completeness of ascertainment of disease registers. Typical applications include estimating the number of people needing particular services (i.e. services for children with learning disabilities, services for medically frail elderly living in the community), or with particular conditions(i.e. illegal drug addicts, people infected with HIV, etc.). 
Cartogram  A cartogram is a map in which some thematic mapping variable – such as travel time, population, or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable. There are two main types of cartograms: area and distance cartograms. Cartograms have a fairly long history, with examples from the mid1800s. 
CaseBased Reasoning (CBR) 
Casebased reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using casebased reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using casebased reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry), is treating nature as a database of solutions to problems. Casebased reasoning is a prominent kind of analogy making. 
CaseControl Study  A casecontrol study is a type of study design used widely, originally developed in epidemiology, although its use has also been advocated for the social sciences. It is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Casecontrol studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have that condition/disease (the “cases”) with patients who do not have the condition/disease but are otherwise similar (the “controls”). They require fewer resources but provide less evidence for causal inference than a randomized controlled trial. 
Catalan Number  In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursivelydefined objects. They are named after the Belgian mathematician Eugène Charles Catalan (18141894). Modular Catalan Numbers 
Catastrophe Modeling  Catastrophe modeling (also known as cat modeling) is the process of using computerassisted calculations to estimate the losses that could be sustained due to a catastrophic event such as a hurricane or earthquake. Cat modeling is especially applicable to analyzing risks in the insurance industry and is at the confluence of actuarial science, engineering, meteorology, and seismology. 
CatBoost  CatBoost delivers bestinclass accuracy unmatched by other gradient boosting algorithms today. It is an outofthebox solution that significantly improves data scientists’ ability to create predictive models using a variety of data sources, such as sensory, historical and transactional data. While most competing gradient boosting algorithms need to convert data descriptors to numerical form, CatBoost’s ability to support categorical data directly saves businesses time while increasing accuracy and efficiency. 
Categorical Cross Entropy  
Categorical Response Model  
Causal Additive Model (CAM) 
We develop estimation for potentially highdimensional additive structural equation models. A key component of our approach is to decouple order search among the variables from feature or edge selection in a directed acyclic graph encoding the causal structure. We show that the former can be done with nonregularized (restricted) maximum likelihood estimation while the latter can be efficiently addressed using sparse regression techniques. Thus, we substantially simplify the problem of structure search and estimation for an important class of causal models. We establish consistency of the (restricted) maximum likelihood estimator for low and highdimensional scenarios, and we also allow for misspecification of the error distribution. Furthermore, we develop an efficient computational algorithm which can deal with many variables, and the new method’s accuracy and performance is illustrated on simulated and real data. 
Causal Falling Rule List (CFRL) 
A causal falling rule list (CFRL) is a sequence of ifthen rules that specifies heterogeneous treatment effects, where (i) the order of rules determines the treatment effect subgroup a subject belongs to, and (ii) the treatment effect decreases monotonically down the list. A given CFRL parameterizes a hierarchical bayesian regression model in which the treatment effects are incorporated as parameters, and assumed constant within modelspecific subgroups. 
Causal Inference  Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. http://…mp;uid=2&uid=4&sid=21104618644387 
Causal Loglinear Model  ➘ “LogLinear Model” 
Causal Model  A causal model is an abstract model that describes the causal mechanisms of a system. The model must express more than correlation because correlation does not imply causation. Judea Pearl defines a causal model as an ordered triple <U,V,E> , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V. 
Causal Network  A causal network is a Bayesian network with an explicit requirement that the relationships be causal. The additional semantics of the causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(X=x)), then the probability density function changes to the one of the network obtained by cutting the links from the parents of X to X, and setting X to the caused value x. Using these semantics, one can predict the impact of external interventions from data obtained prior to intervention. ➚ “Bayesian Network” 
Causal Prediction  
Causal Transfer Learning  An important goal in both transfer learning and causal inference is to make accurate predictions when the distribution of the test set and the training set(s) differ. Such a distribution shift may happen as a result of an external intervention on the data generating process, causing certain aspects of the distribution to change, and others to remain invariant. We consider a class of causal transfer learning problems, where multiple training sets are given that correspond to different external interventions, and the task is to predict the distribution of a target variable given measurements of other variables for a new (yet unseen) intervention on the system. We propose a method for solving these problems that exploits causal reasoning but does neither rely on prior knowledge of the causal graph, nor on the the type of interventions and their targets. We evaluate the method on simulated and real world data and find that it outperforms a standard prediction method that ignores the distribution shift. 
Cell Suppression Problem (CSP) 
Cell suppression is one of the most frequently used techniques to prevent the disclosure of sensitive data in statistical tables. Finding the minimum cost set of nonsensitive entries to suppress, along with the sensitive ones, in order to make a table safe for publication, is a NPhard problem, denoted the cell suppression problem (CSP). 
Censored Time Series Analysis  Imputation method in the presence of censored data. The main message of the imputation method is that we should account for the variability of the censored part of the data by mimicking the complete data. That is, we impute the incomplete part with a conditional random sample rather than the conditional expectation or certain constants. Simulation results suggest that the imputation method reduces the possible biases and has similar standard errors than those from complete data. 
Censoring  In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual’s age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75. Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 300 pounds (140 kg). If a 350 lb (160 kg) individual is weighed using the scale, the observer would only know that the individual’s weight is at least 300 pounds (140 kg). The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown. Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range: values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding. 
Centered Autologistic Model  The traditional autologistic model was proposed by Besag (1972). The model is a Markov random field (MRF) model (Kindermann and Snell, 1980) 
Cerioli Outlier Detection  “Cerioli Outlier Dectection” is an iterated RMCD method of Cerioli (2010) for multivariate outlier detection via robust Mahalanobis distances. 
ChanDarwiche Distance  We propose a distance measure between two probability distributions, which allows one to bound the amount of belief change that occurs when moving from one distribution to another. We contrast the proposed measure with some well known measures, including KLdivergence, showing some theoretical properties on its ability to bound belief changes. We then present two practical applications of the proposed distance measure: sensitivity analysis in belief networks and probabilistic belief revision. We show how the distance measure can be easily computed in these applications, and then use it to bound global belief changes that result from either the perturbation of local conditional beliefs or the accommodation of soft evidence. Finally, we show that two well known techniques in sensitivity analysis and belief revision correspond to the minimization of our proposed distance measure and, hence, can be shown to be optimal from that viewpoint. 
Change Point Analysis (CPA) 
Changepoint analysis is a powerful new tool for determining whether a change has taken place. It is capable of detecting subtle changes missed by control charts. Further, it better characterizes the changes detected by providing confidence levels and confidence intervals. When collecting online data, a changepoint analysis is not a replacement for control charting. But, because a changepoint analysis can provide further information, the two methods can be used in a complementary fashion. When analyzing historical data, especially when dealing with large data sets, changepoint analysis is preferable to control charting. A changepoint analysis is more powerful, better characterizes the changes, controls the overall error rate, is robust to outliers, is more flexible and is simpler to use. CPA aims at detecting any change in the mean of a process in historical data. Example questions to be answered by performing CPA: • Did a change occur? • Did more than one change occur? • When did the changes occur? • How confident are we that they are real changes? http://…/changepoint.html 
Change Point Detection  In statistical analysis, change detection or change point detection tries to identify times when the probability distribution of a stochastic process or time series changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes. Specific applications, like step detection and edge detection, may be concerned with changes in the mean, variance, correlation, or spectral density of the process. More generally change detection also includes the detection of anomalous behavior: anomaly detection. 
ChangePoint Detection Procedure via VIF Regression (VIFCP) 

ChannelRecurrent Variational Autoencoders (CRVAE) 
Variational Autoencoder (VAE) is an efficient framework in modeling natural images with probabilistic latent spaces. However, when the input spaces become complex, VAE becomes less effective, potentially due to the oversimplification of its latent space construction. In this paper, we propose to integrate recurrent connections across channels to both inference and generation steps of VAE. Sequentially building up the complexity of highlevel features in this way allows us to capture globaltolocal and coarsetofine structures of the input data spaces. We show that our channelrecurrent VAE improves existing approaches in multiple aspects: (1) it attains lower negative loglikelihood than standard VAE on MNIST; when trained adversarially, (2) it generates face and bird images with substantially higher visual quality than the stateoftheart VAEGAN and (3) channelrecurrency allows learning more interpretable representations; finally (4) it achieves competitive classification results on STL10 in a semisupervised setup. 
Chaos Monkey  Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on nonholiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don’t, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond. 
Charged String Tensor Networks  Tensor network methods provide an intuitive graphical language to describe quantum states, channels, open quantum systems and a class of numerical approximation methods that efficiently simulate certain manybody states in one spatial dimension. There are two fundamental types of tensor networks in wide use today. The most common is similar to quantum circuits. The second is the braided class of tensor networks, used in topological quantum computing. Recently a third class of tensor networks was discovered by Jaffe, Liu and Wozniakowski—the JLWmodel—notably, the wires carry charge excitations. The rules in which network components can be moved, merged and manipulated in a graphical form of reasoning take an elegant form. For instance the relative charge locations on wires carries precise meaning and changing the ordering modifies a connected network specifically by a complex number. The type of isotopy discovered in the topological JLWmodel provides an alternative means to reason about quantum information, computation and protocols. Here we recall the tensornetwork building blocks used in a controlledNOT gate. Some open problems related to the JLWmodel are given. 
Charikar’s Algorithm  To detect nearduplicates this software uses the Charikar’s fingerprinting technique, this means characterizing each document with a unique 64bit vector, like a fingerprint. To determine whether two documents are Nearduplicates, we have to compare their fingerprints. To do this we use two algorithms, the algorithm developed by Moses Charikar and the Hamming distance algorithm, which allows us to measure the similarity between two vectors of n bits. What is Charikar’s algorithm? • Characterization of the document • Apply hash functions to the characteristics • Obtain fingerprint • Apply vector comparison function: Are (Doc1, doc2) nearduplicate? Hammingdistance (fingerprint (doc1), fingerprint (doc2)) = k GitXiv 
Chebyshev Distance  In mathematics, Chebyshev distance (or Tchebychev distance), maximum metric, or L8 metric is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension. It is named after Pafnuty Chebyshev. It is also known as chessboard distance, since in the game of chess the minimum number of moves needed by a king to go from one square on a chessboard to another equals the Chebyshev distance between the centers of the squares, if the squares have side length one, as represented in 2D spatial coordinates with axes aligned to the edges of the board. For example, the Chebyshev distance between f6 and e2 equals 4. 
Chernoff Faces  Chernoff faces, invented by Herman Chernoff, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty. Chernoff faces handle each variable differently. Because the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen (e.g. eye size and eyebrowslant have been found to carry significant weight). 
Chinese Restaurant Process  In probability theory, the Chinese restaurant process is a discretetime stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 is seated at an unoccupied table with probability 1. At time n + 1, a new customer chooses uniformly at random to sit at one of the following n + 1 places: directly to the left of one of the n customers already sitting at an occupied table, or at a new, unoccupied table. David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book. At time n, the value of the process is a partition of the set of n customers, where the tables are the blocks of the partition. Mathematicians are interested in the probability distribution of this random partition. 
ChiSquare Test  A chisquared test, also referred to as test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chisquared distribution when the null hypothesis is true. Also considered a chisquared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chisquared distribution as closely as desired by making the sample size large enough. The chisquare (I) test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference? 
CHisquared Automatic Interaction Detection (CHAID) 
CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHisquared Automatic Interaction Detection, based upon a formal extension of the US AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) procedures of the 1960s and 70s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s. In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research. Like other decision trees, CHAID’s advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis. One important advantage of CHAID over alternatives such as multiple regression is that it is nonparametric. 
Choice Modeling  Choice modelling attempts to model the decision process of an individual or segment in a particular context. Choice modelling may be used to estimate nonmarket environmental benefits and costs. Many alternative models exist in econometrics, marketing, sociometrics and other fields, including utility maximization, optimization applied to consumer theory, and a plethora of other identification strategies which may be more or less accurate depending on the data, sample, hypothesis and the particular decision being modelled. In addition Choice Modelling is regarded as the most suitable method for estimating consumers’ willingness to pay for quality improvements in multiple dimensions. Neuroscience Suggests Choice Model Misspecification 
Cholesky Decomposition  In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positivedefinite matrix into the product of a lower triangular matrix and its conjugate transpose, useful for efficient numerical solutions and Monte Carlo simulations. It was discovered by AndréLouis Cholesky for real matrices. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. 
Chopthin Resampler  Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. A simulation study shows that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented very efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN) and for Matlab are available. chopthin 
Choquet Integral  A Choquet integral is a subadditive or superadditive integral created by the French mathematician Gustave Choquet in 1953. It was initially used in statistical mechanics and potential theory, but found its way into decision theory in the 1980s, where it is used as a way of measuring the expected utility of an uncertain event. It is applied specifically to membership functions and capacities. In imprecise probability theory, the Choquet integral is also used to calculate the lower expectation induced by a 2monotone lower probability, or the upper expectation induced by a 2alternating upper probability. Using the Choquet integral to denote the expected utility of belief functions measured with capacities is a way to reconcile the Ellsberg paradox and the Allais paradox. http://…/Ayub_Khan_2009.pdf 
Choropleth Map  A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or percapita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. A special type of choropleth map is a prism map, a threedimensional map in which a given region’s height on the map is proportional to the statistical variable’s value for that region. 
ChowLiu Tree  In probability theory and statistics ChowLiu tree is an efficient method for constructing a secondorder product approximation of a joint probability distribution, first described in a paper by Chow & Liu (1968). The goals of such a decomposition, as with such Bayesian networks in general, may be either data compression or inference. Structure Learning in Bayesian Networks 
Christoffel Function  
Chronohorogram  
Circular Plot / Circos  Circos is a software package for visualizing data and information. It visualizes data in a circular layout – this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive. 
Circular Statistics  ➘ “Directional Statistics” 
Classical Test Theory (CTT) 
Classical test theory is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of testtakers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological tests. Classical test theory may be regarded as roughly synonymous with true score theory. The term ‘classical’ refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as item response theory, which sometimes bear the appellation ‘modern’ as in ‘modern latent trait theory’. Classical test theory as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick (1968) and Allen & Yen (1979/2002). The description of classical test theory below follows these seminal publications. 
Classification Accuracy (CA) 
In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results. Although the two words precision and accuracy can be synonymous in colloquial use, they are deliberately contrasted in the context of the scientific method. A measurement system can be accurate but not precise, precise but not accurate, neither, or both. For example, if an experiment contains a systematic error, then increasing the sample size generally increases precision but does not improve accuracy. The result would be a consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic error improves accuracy but does not change precision. A measurement system is considered valid if it is both accurate and precise. Related terms include bias (nonrandom or directed effects caused by a factor or factors unrelated to the independent variable) and error (random variability). The terminology is also applied to indirect measurements – that is, values obtained by a computational procedure from observed data. In addition to accuracy and precision, measurements may also have a measurement resolution, which is the smallest change in the underlying physical quantity that produces a response in the measurement. In numerical analysis, accuracy is also the nearness of a calculation to the true value; while precision is the resolution of the representation, typically defined by the number of decimal or binary digits. http://…/accuracy.htm 
Classification Based on Associations (CBA) 
Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of discovery is not predetermined, while for classification rule mining there is one and only one predetermined target. In this paper, we propose to integrate these two mining techniques. The integration is done by focusing on mining a special subset of association rules, called class association rules (CARs). An efficient algorithm is also given for building a classifier based on the set of discovered CARs. Experimental results show that the classifier built this way is, in general, more accurate than that produced by the stateoftheart classification system C4.5. In addition, this integration helps to solve a number of problems that exist in the current classification systems. rCBA 
Classification Based Preselection (CPS) 
In evolutionary algorithms, a preselection operator aims to select the promising offspring solutions from a candidate offspring set. It is usually based on the estimated or real objective values of the candidate offspring solutions. In a sense, the preselection can be treated as a classification procedure, which classifies the candidate offspring solutions into promising ones and unpromising ones. Following this idea, we propose a classification based preselection (CPS) strategy for evolutionary multiobjective optimization. When applying classification based preselection, an evolutionary algorithm maintains two external populations (training data set) that consist of some selected good and bad solutions found so far; then it trains a classifier based on the training data set in each generation. Finally it uses the classifier to filter the unpromising candidate offspring solutions and choose a promising one from the generated candidate offspring set for each parent solution. In such cases, it is not necessary to estimate or evaluate the objective values of the candidate offspring solutions. The classification based preselection is applied to three stateoftheart multiobjective evolutionary algorithms (MOEAs) and is empirically studied on two sets of test instances. The experimental results suggest that classification based preselection can successfully improve the performance of these MOEAs. 
Classification Rule  Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements of the population set are each assigned to one of the classes. A perfect test is such that every element in the population is assigned to the class it really belongs. An imperfect test is such that some errors appear, and then statistical analysis must be applied to analyse the classification. 
Classification Without Labels  Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truthlevel information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fullysupervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark versus gluoninitiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available. 
Cleverhans  cleverhans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models’ performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. 
Clickstream Analytics  A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing or using another software application. As the user clicks anywhere in the webpage or application, the action is logged on a client or inside the web server, as well as possibly the web browser, router, proxy server or ad server. Clickstream analysis is useful for web activity analysis, software testing, market research, and for analyzing employee productivity. 
ClickThrough Rate (CTR) 
Clickthrough rate (CTR) is a way of measuring the success of an online advertising campaign for a particular website as well as the effectiveness of an email campaign by the number of users that clicked on a specific link. 
Clipper  Machine learning is being deployed in a growing number of applications which demand realtime, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment. In this paper, we introduce Clipper, the first generalpurpose lowlatency prediction serving system. Interposing between enduser applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the TensorFlow Serving system and demonstrate comparable prediction throughput and latency on a range of models while enabling new functionality, improved accuracy, and robustness. 
Cloud Data  The Difference Between Big Data and Cloud Data: New technologies are required for the emergence and standardization of cloud data to take hold. Big data was meant as a holding cell for large amounts of data that could be sorted effectively only by specialized data scientists (this is becoming easier with OLAP on Hadoop type tools). The protocols for big data rely upon simple, standard protocols and can’t be adjusted easily to meet the demands of complex operations. Big data takes time to sort through and analyze, whereas cloud data is immediate and happens in the background using the tremendous resources of cloud servers. Cloud data requires a significantly higher number of resources since it must connect to databases in several geographically distributed services. Since cloud data must flexibly interact with several unique interfaces and security models, the mechanisms used for big data won’t work for cloud data. 
CLSTM  Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called CLSTM for sentence representation and text classification. CLSTM utilizes CNN to extract a sequence of higherlevel phrase representations, and are fed into a long shortterm memory recurrent neural network (LSTM) to obtain the sentence representation. CLSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the CLSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks. 
Cluster Validation  There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; ‘relative cluster validation’ is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low withincluster distances and high betweencluster separation. 
Clustered Latent Dirichlet Allocation (CLDA) 
The allrelevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and metainformation simultaneously. Here, we are proposing an efficient, scalable feature extraction algorithm, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features. The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied nonparametric hypothesis tests. We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics. 
Clustering / Cluster Analysis  Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multiobjective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multiobjective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties. 
Clustering Using REpresentatives (CURE) 
CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases that is more robust to outliers and identifies clusters having nonspherical shapes and wide variances in size. 
Clustering Validation Indices  The purpose of clustering is to determine the intrinsic grouping in a set of unlabeled data, where the objects in each group are indistinguishable under some criterion of similarity. Clustering is an unsupervised classification process fundamental to data mining (one of the most important tasks in data analysis). It has applications in several fields like bioinformatics, web data analysis, text mining and scientific data exploration. Clustering refers to unsupervised learning and, for that reason it has no a priori data set information. However, to get good results, the clustering algorithm depends on input parameters. For instance, kmeans and CURE algorithms require a number of clusters (k) to be created. In this sense, the question is: What is the optimal number of clusters? Currently, cluster validity indexes research has drawn attention as a means to give a solution. Many different cluster validity methods have been proposed without any a priori class information. Clustering validation is a technique to find a set of clusters that best fits natural partitions (number of clusters) without any class information. Generally speaking, there are two types of clustering techniques, which are based on external criteria and internal criteria. • External validation: Based on previous knowledge about data. • Internal validation: Based on the information intrinsic to the data alone. If we consider these two types of cluster validation to determine the correct number of groups from a dataset, one option is to use external validation indexes for which a priori knowledge of dataset information is required, but it is hard to say if they can be used in real problems (usually, real problems do not have prior information of the dataset in question). Another option is to use internal validity indexes which do not require a priori information from dataset. 
ClusterWise Linear Regression (CLR) 
Clusterwise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. 
CN2 Induction Algorithm  The CN2 induction algorithm is a learning algorithm for rule induction. It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3. 
CochranMantelHaenszel Statistics  In statistics, the CochranMantelHaenszel statistics are a collection of test statistics used in the analysis of stratified categorical data. They are named after William G. Cochran, Nathan Mantel and William Haenszel. One of these test statistics is the CochranMantelHaenszel (CMH) test, which allows the comparison of two groups on a dichotomous/categorical response. It is used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but influencing covariates can. In the CMH test, the data are arranged in a series of associated 2 × 2 contingency tables, the null hypothesis is that the observed response is independent of the treatment used in any 2 × 2 contingency table. The CMH test’s use of associated 2 × 2 contingency tables increases the ability of the test to detect associations (the power of the test is increased). sensitivity2x2xk 
Coded TeraSort  We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable innetwork coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97x – 3.39x speedup, compared with TeraSort, for typical settings of interest. 
CoDeepNEAT  The success of deep learning depends on finding an architecture to fit the task. As deep learning has scaled up to more challenging tasks, the architectures have become difficult to design by hand. This paper proposes an automated method, CoDeepNEAT, for optimizing deep learning architectures through evolution. By extending existing neuroevolution methods to topology, components, and hyperparameters, this method achieves results comparable to best human designs in standard benchmarks in object recognition and language modeling. It also supports building a realworld application of automated image captioning on a magazine website. Given the anticipated increases in available computing power, evolution of deep networks is promising approach to constructing deep learning applications in the future. 
Coefficient of Variation  In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation (RSD), which is expressed as a percentage. 
CoffeeScript  CoffeeScript is a little language that compiles into JavaScript. Underneath that awkward Javaesque patina, JavaScript has always had a gorgeous heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way. The golden rule of CoffeeScript is: “It’s just JavaScript”. The code compiles onetoone into the equivalent JS, and there is no interpretation at runtime. You can use any existing JavaScript library seamlessly from CoffeeScript (and viceversa). The compiled output is readable and prettyprinted, will work in every JavaScript runtime, and tends to run as fast or faster than the equivalent handwritten JavaScript. 
Cognitive Analytics  Cognitive Analytics: A hybrid of several disparate disciplines, methods, and practical technologies. 
Cognitive Architecture  A cognitive architecture can refer to a theory about the structure of the human mind. One of the main goals of a cognitive architecture is to summarize the various results of cognitive psychology in a comprehensive computer model. However, the results need to be in a formalized form so far that they can be the basis of a computer program. The formalized models can be used to further refine a comprehensive theory of cognition, and more immediately, as a commercially usable model. Successful cognitive architectures include ACTR (Adaptive Control of Thought, ACT), SOAR and OpenCog. 
Cognitive Bias  Cognitive biases are tendencies to think in certain ways. Cognitive biases can lead to systematic deviations from a standard of rationality or good judgment, and are often studied in psychology and behavioral economics. 
Cognitive Computing  Cognitive computing refers to the development of computer systems modeled after the human brain. Originally referred to as artificial intelligence, researchers began to use the modern term instead in the 1990s, to indicate that the science was designed to teach computers to think like a human mind, rather than developing an artificial system. This type of computing integrates technology and biology in an attempt to reengineer the brain, one of the most efficient and effective computers on Earth. Cognitive computing is a way of processing data that is neither linear nor deterministic. It uses the ideas behind neuroscience and psychology to augment human reasoning with better pattern matching while determining the optimal information a person needs to make decisions. Cognitive computing is different than other forms of software. Instead of shepherding data through predetermined pathways, it finds the previously unknown paths and patterns through the data. This is ultimately a more scalable model than relying on experts to synthesize data since there are too few experts of any sort available at any one time. Cognitive computing doesn’t try to fit data into an existing model; it looks at the data and figures out what the model is first. Cognitive Computing Cognitive Computing: Solving the Big Data Problem? Cognitive Computing Defined 
Cohort Analysis  Cohort analysis is a subset of behavioral analytics that takes the data from a given eCommerce platform, web application, or online game and rather than looking at all users as one unit, it breaks them into related groups for analysis. These related groups, or cohorts, usually share common characteristics or experiences within a defined timespan. Cohort analysis allows a company to ‘see patterns clearly across the lifecycle of a customer (or user), rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes.’ By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a cohort study, they are different and should not be viewed as one in the same. Cohort analysis has come to describe specifically the analysis of cohorts in regards to big data and business analytics, while a cohort study is a more general umbrella term that describes a type of study in which data is broken down into similar groups. 
Coincidence Analysis (CNA) 
CNA, a Boolean method of causal analysis presented in Baumgartner (2009a). CNA is a configurationl comparative method for the identification of complex causal dependencies—in particular, causal chains and common cause structures—in configurational data. CNA is related to QCA (Ragin 2008), but contrary to the latter does not minimize sufficient and necessary conditions by means of Quine McCluskey optimization, but based on its own custombuilt optimization algorithm. The latter greatly facilitates the analysis of data featuring chainlike causal dependencies among the conditions of an ultimate outcome. http://…/infer_c.pdf http://…/baumgartnerthiem.pdf cna 
Cointegration  The term cointegration was defined by Granger (1983) as a formulation of the phenomenon that nonstationary processes can have linear combinations that are stationary. It was his investigations of the relation between cointegration and error correction that brought modelling of vector autoregressions with unit roots and cointegration to the center of attention in applied and theoretical econometrics; see Engle and Granger (1987). Cointegration is a statistical property of time series variables. Cointegration has become an important property in contemporary time series analysis. Time series often have trends – either deterministic or stochastic. In a seminal paper, Charles Nelson and Charles Plosser (1982) showed that most time series have stochastic trends – these are also called unit root processes, or processes integrated of order 1—I(1). http://…/Cointegration 
coLaboratory Project  coLaboratory Project, a new tool for data science and analysis, designed to make collaborating on data easier. coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over adhoc workflows involving emailing documents back and forth. 
Collaborative Deep Learning (CDL) 
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CFbased methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CFbased methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to noni.i.d. (CFbased) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three realworld datasets from different domains show that CDL can significantly advance the state of the art. GitXiv 
Collaborative Deep Reinforcement Learning (CDRL) 
Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from different sources to assist the current learning goal. This collaborative learning procedure ensures that the knowledge is shared, continuously refined, and concluded from different perspectives to construct a more profound understanding. The idea of knowledge transfer has led to many advances in machine learning and data mining, but significant challenges remain, especially when it comes to reinforcement learning, heterogeneous model structures, and different learning tasks. Motivated by human collaborative learning, in this paper we propose a collaborative deep reinforcement learning (CDRL) framework that performs adaptive knowledge transfer among heterogeneous learning agents. Specifically, the proposed CDRL conducts a novel deep knowledge distillation method to address the heterogeneity among different learning tasks with a deep alignment network. Furthermore, we present an efficient collaborative Asynchronous Advantage ActorCritic (cA3C) algorithm to incorporate deep knowledge distillation into the online training of agents, and demonstrate the effectiveness of the CDRL framework using extensive empirical evaluation on OpenAI gym. 
Collaborative Filtering (CF) 
Collaborative filtering (CF) is a technique used by some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). (also called “peopletopeople correlation”) 
Collaborative Filtering – Neural Autoregressive Distribution Estimator (CFNADE) 
This paper proposes CFNADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CFNADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CFNADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CFNADE, which shows superior performance. Finally, CFNADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CFNADE with a single hidden layer beats all previous stateoftheart methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance. 
Collaborative Filtering with UserItem CoAutoregressive Models (CFUIcA) 
Besides the success on object recognition, machine translation and system control in games, (deep) neural networks have achieved stateoftheart results in collaborative filtering (CF) recently. Previous neural approaches for CF are either userbased or itembased, which cannot leverage all relevant information explicitly. We propose CFUIcA, a neural coautoregressive model for CF tasks, which exploit the structural autoregressiveness in the domains of both users and items. Furthermore, we separate the inherent dependence in this structure under a natural assumption and develop an efficient stochastic learning algorithm to handle large scale datasets. We evaluate CFUIcA on two popular benchmarks: MovieLens 1M and Netflix, and achieve stateoftheart predictive performance, which demonstrates the effectiveness of CFUIcA. 
Collective Adaptive Resourcesharing Markovian Agents (CARMA) 
In this paper we present CARMA, a language recently defined to support specification and analysis of collective adaptive systems. CARMA is a stochastic process algebra equipped with linguistic constructs specifically developed for modelling and programming systems that can operate in openended and unpredictable environments. This class of systems is typically composed of a huge number of interacting agents that dynamically adjust and combine their behaviour to achieve specific goals. A CARMA model, termed a collective, consists of a set of components, each of which exhibits a set of attributes. To model dynamic aggregations, which are sometimes referred to as ensembles, CARMA provides communication primitives that are based on predicates over the exhibited attributes. These predicates are used to select the participants in a communication. Two communication mechanisms are provided in the CARMA language: multicastbased and unicastbased. 
Collective Intelligence (COIN) 
Collective Intelligence is shared or group intelligence that emerges from the collaboration, collective efforts, and competition of many individuals and appears in consensus decision making. The term appears in sociobiology, political science and in context of mass peer review and crowdsourcing applications. It may involve consensus, social capital and formalisms such as voting systems, social media and other means of quantifying mass activity. Collective IQ is a measure of collective intelligence, although it is often used interchangeably with the term collective intelligence. (‘Building new conclusions from independent contributors is really what collective intelligence is all about.’) 
Collocation  In corpus linguistics, a collocation is a sequence of words or terms that cooccur more often than would be expected by chance. In phraseology, collocation is a subtype of phraseme. An example of a phraseological collocation, as propounded by Michael Halliday, is the expression strong tea. While the same meaning could be conveyed by the roughly equivalent *powerful tea, this expression is considered incorrect by English speakers. Conversely, the corresponding expression for computer, powerful computers is preferred over *strong computers. Phraseological collocations should not be confused with idioms, where meaning is derived, whereas collocations are mostly compositional. There are about six main types of collocations: adjective+noun, noun+noun (such as collective nouns), verb+noun, adverb+adjective, verbs+prepositional phrase (phrasal verbs), and verb+adverb. Collocation extraction is a task that extracts collocations automatically from a corpus, using computational linguistics. 
Columnoriented DBMS  A columnoriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data. In comparison, most relational DBMSs store data in rows. This columnoriented DBMS has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems where aggregates are computed over large numbers of similar data items. It is possible to achieve some of the benefits of columnoriented and roworiented organization with any DBMSs. Denoting one as columnoriented refers to both the ease of expression of a columnoriented structure and the focus on optimizations for columnoriented workloads. This approach is in contrast to roworiented or row store databases and with correlation databases, which use a valuebased storage structure. 
Combinations of Mutually Exclusive Alterations (CoMEt) 
Cancer is a heterogeneous disease with different combinations of genetic and epigenetic alterations driving the development of cancer in different individuals. While these alterations are believed to converge on genes in key cellular signaling and regulatory pathways, our knowledge of these pathways remains incomplete, making it difficult to identify driver alterations by their recurrence across genes or known pathways. We introduce Combinations of Mutually Exclusive Alterations (CoMEt), an algorithm to identify combinations of alterations de novo, without any prior biological knowledge (e.g. pathways or protein interactions). CoMEt searches for combinations of mutations that exhibit mutual exclusivity, a pattern expected for mutations in pathways. CoMEt has several important feature that distinguish it from existing approaches to analyze mutual exclusivity among alterations. These include: an exact statistical test for mutual exclusivity that is more sensitive in detecting combinations containing rare alterations; simultaneous identification of collections of one or more combinations of mutually exclusive alterations; simultaneous analysis of subtypespecific mutations; and summarization over an ensemble of collections of mutually exclusive alterations. These features enable CoMEt to robustly identify alterations affecting multiple pathways, or hallmarks of cancer. 
Combinatorial Optimization  In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible. It operates on the domain of those optimization problems, in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution. Some common problems involving combinatorial optimization are the traveling salesman problem (“TSP”) and the minimum spanning tree problem (“MST”). 
Common Cause Principle (CCP) 
It seems that a correlation between events A and B indicates either that A causes B, or that B causes A, or that A and B have a common cause. It also seems that causes always occur before their effects and, thus, that common causes always occur before the correlated events. Reichenbach was the first to formalize this idea rather precisely. 
Community Detection  Communities are often defined in terms of the partition of the set of vertices, that is each node is put into one and only one community. This is a useful simplification and most community detection methods find this type of community structure. However in some cases a better representation could be one where vertices are in more than one community. This might happen in a social network where each vertex represents a person, and the communities represent the different groups of friends: one community for family, another community for coworkers, one for friends in the same sports club, and so on. The use of cliques for community detection discussed below is just one example of how such overlapping community structure can be found. ➘ “Complex Network” Community detection algorithms: a comparative analysis A Comparison of Community Detection Algorithms on Artificial Networks 
Compact Trip Representation (CTR) 
We present a new Compact Trip Representation (CTR) that allows us to manage users’ trips (moving objects) over networks. These could be public transportation networks (buses, subway, trains, and so on) where nodes are stations or stops, or road networks where nodes are intersections. CTR represents the sequences of nodes and time instants in users’ trips. The spatial component is handled with a data structure based on the wellknown Compressed Suffix Array (CSA), which provides both a compact representation and interesting indexing capabilities. We also represent the temporal component of the trips, that is, the time instants when users visit nodes in their trips. We create a sequence with these time instants, which are then selfindexed with a balanced Wavelet Matrix (WM). This gives us the ability to solve rangeinterval queries efficiently. We show how CTR can solve relevant spatial and spatiotemporal queries over large sets of trajectories. Finally, we also provide experimental results to show the space requirements and query efficiency of CTR. 
Competing Risks  This form of analysis is known by its use of death certificates. In traditional overall survival analysis the cause of death is irrelevant to the analysis. In a competing risks survival analyses each death certificate is reviewed. If the disease of interest is cancer, and the person/patient dies of a car accident, the patient is labelled as censored at death, instead of being labelled as having died. Issues with this method arise as each hospital and or registry may code for causes of death differently. For example, there exists variability in the way a patient who has cancer and commits suicide is coded/labelled. In addition, if a patient has an eye removed due to an ocular cancer and dies getting hit while crossing the road because he didn’t see the car would often be considered to be censored rather than having died due to the cancer or its subsequent effects. ➘ “Survival Analysis” 
Competitive Analysis  Competitive analysis is a method invented for analyzing online algorithms, in which the performance of an online algorithm (which must satisfy an unpredictable sequence of requests, completing each request without being able to see the future) is compared to the performance of an optimal offline algorithm that can view the sequence of requests in advance. An algorithm is competitive if its competitive ratio – the ratio between its performance and the offline algorithm’s performance – is bounded. Unlike traditional worstcase analysis, where the performance of an algorithm is measured only for ‘hard’ inputs, competitive analysis requires that an algorithm perform well both on hard and easy inputs, where ‘hard’ and ‘easy’ are defined by the performance of the optimal offline algorithm. For many algorithms, performance is dependent not only on the size of the inputs, but also on their values. One such example is the quicksort algorithm, which sorts an array of elements. Such datadependent algorithms are analysed for averagecase and worstcase data. Competitive analysis is a way of doing worst case analysis for online and randomized algorithms, which are typically data dependent. In competitive analysis, one imagines an ‘adversary’ that deliberately chooses difficult data, to maximize the ratio of the cost of the algorithm being studied and some optimal algorithm. Adversaries range in power from the oblivious adversary, which has no knowledge of the random choices made by the algorithm pitted against it, to the adaptive adversary that has full knowledge of how an algorithm works and its internal state at any point during its execution. Note that this distinction is only meaningful for randomized algorithms. For a deterministic algorithm, either adversary can simply compute what state that algorithm must have at any time in the future, and choose difficult data accordingly. For example, the quicksort algorithm chooses one element, called the ‘pivot’, that is, on average, not too far from the center value of the data being sorted. Quicksort then separates the data into two piles, one of which contains all elements with value less than the value of the pivot, and the other containing the rest of the elements. If quicksort chooses the pivot in some deterministic fashion (for instance, always choosing the first element in the list), then it is easy for an adversary to arrange the data beforehand so that quicksort will perform in worstcase time. If, however, quicksort chooses some random element to be the pivot, then an adversary without knowledge of what random numbers are coming up cannot arrange the data to guarantee worstcase execution time for quicksort. The classic online problem first analysed with competitive analysis (Sleator & Tarjan 1985) is the list update problem: Given a list of items and a sequence of requests for the various items, minimize the cost of accessing the list where the elements closer to the front of the list cost less to access. (Typically, the cost of accessing an item is equal to its position in the list.) After an access, the list may be rearranged. Most rearrangements have a cost. The MoveToFront algorithm simply moves the requested item to the front after the access, at no cost. The Transpose algorithm swaps the accessed item with the item immediately before it, also at no cost. Classical methods of analysis showed that Transpose is optimal in certain contexts. In practice, MoveToFront performed much better. Competitive analysis was used to show that an adversary can make Transpose perform arbitrarily badly compared to an optimal algorithm, whereas MoveToFront can never be made to incur more than twice the cost of an optimal algorithm. In the case of online requests from a server, competitive algorithms are used to overcome uncertainties about the future. That is, the algorithm does not ‘know’ the future, while the imaginary adversary (the ‘competitor’) ‘knows’. Similarly, competitive algorithms were developed for distributed systems, where the algorithm has to react to a request arriving at one location, without ‘knowing’ what has just happened in a remote location. This setting was presented in (Awerbuch, Kutten & Peleg 1992). 
Competitive Intelligence (CI) 
Competitive intelligence is the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Competitive intelligence essentially means understanding and learning what’s happening in the world outside your business so one can be as competitive as possible. It means learning as much as possibleas soon as possibleabout one’s industry in general, one’s competitors, or even one’s county’s particular zoning rules. In short, it empowers you to anticipate and face challenges head on. A more focused definition of CI regards it as the organizational function responsible for the early identification of risks and opportunities in the market before they become obvious. Experts also call this process the early signal analysis. This definition focuses attention on the difference between dissemination of widely available factual information (such as market statistics, financial reports, newspaper clippings) performed by functions such as libraries and information centers, and competitive intelligence which is a perspective on developments and events aimed at yielding a competitive edge. Competitive Intelligence and 6 Tips for Its Effective Use 
Competitive Learning  Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include vector quantization and selforganising maps (Kohonen maps). https://…/handbookch7.html 
Complete Spatial Randomness (CSR) 
Complete spatial randomness (CSR) describes a point process whereby point events occur within a given study area in a completely random fashion. It is synonymous with a homogeneous spatial Poisson process. Such a process is modeled using only one parameter \rho, i.e. the density of points within the defined area. The term complete spatial randomness is commonly used in Applied Statistics in the context of examining certain point patterns, whereas in most other statistical contexts it is referred to the concept of a spatial Poisson process. 
Completed Partially Directed Acyclic Graph (CPDAG) 
➘ “Directed Acyclic Graph” 
CompleteLinkage Clustering  Completelinkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters until all elements end up being in the same cluster. At each step, the two clusters separated by the shortest distance are combined. The definition of ‘shortest distance’ is what differentiates between the different agglomerative clustering methods. In completelinkage clustering, the link between two clusters contains all element pairs, and the distance between clusters equals the distance between those two elements (one in each cluster) that are farthest away from each other. The shortest of these links that remains at any step causes the fusion of the two clusters whose elements are involved. The method is also known as farthest neighbour clustering. The result of the clustering can be visualized as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took place. 
Complex Adaptive System (CAS) 
Complexity theory is a relatively new field that began in the mid1980s at the Santa Fe Institute in New Mexico. Work at the Santa Fe Institute is usually presented as the study of Complex Adaptive Systems (CAS). The CAS movement is predominantly American, as opposed to the European “natural science” tradition in the area of cybernetics and systems. Like in cybernetics and systems theory, CAS shares the subject of general properties of complex systems across traditional disciplinary boundaries. However, CAS is distinguished by the extensive use of computer simulations as a research tool, and an emphasis on systems, such as markets or ecologies, which are less integrated or “organized” than the ones studied by the older tradition (e.g., organisms, machines and companies). 
Complex Event Processing (CEP) 
Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible. 
Complex Network  In the context of network theory, a complex network is a graph (network) with nontrivial topological features – features that do not occur in simple networks such as lattices or random graphs but often occur in graphs modelling real systems. The study of complex networks is a young and active area of scientific research inspired largely by the empirical study of realworld networks such as computer networks and social networks. 
Complex Systems  Complex systems present problems both in mathematical modelling and philosophical foundations. The study of complex systems represents a new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment. The equations from which models of complex systems are developed generally derive from statistical physics, information theory and nonlinear dynamics and represent organized but unpredictable behaviors of natural systems that are considered fundamentally complex. The physical manifestations of such systems are difficult to define, so a common choice is to identify ‘the system’ with the mathematical information model rather than referring to the undefined physical subject the model represents. Such systems are used to model processes in computer science, biology, economics, physics, chemistry, and many other fields. It is also called complex systems theory, complexity science, study of complex systems, sciences of complexity, nonequilibrium physics, and historical physics. A variety of abstract theoretical complex systems is studied as a field of mathematics. The key problems of complex systems are difficulties with their formal modelling and simulation. From such a perspective, in different research contexts complex systems are defined on the basis of their different attributes. Since all complex systems have many interconnected components, the science of networks and network theory are important aspects of the study of complex systems. A consensus regarding a single universal definition of complex system does not yet exist. For systems that are less usefully represented with equations various other kinds of narratives and methods for identifying, exploring, designing and interacting with complex systems are used. 
ComplexValued Neural Network (CVNN) 
The complexvalued Neural Network is an extension of a (usual) realvalued neural network, whose input and output signals and parameters such as weights and thresholds are all complex numbers (the activation function is inevitably a complexvalued function). Neural Networks have been applied to various fields such as communication systems, image processing and speech recognition, in which complex numbers are often used through the Fourier Transformation. This indicates that complexvalued neural networks are useful. In addition, in the human brain, an action potential may have different pulse patterns, and the distance between pulses may be different. This suggests that introducing complex numbers representing phase and amplitude into neural networks is appropriate. In these years the complexvalued neural networks expand the application fields in image processing, computer vision, optoelectronic imaging, and communication and so on. The potentially wide applicability yields new aspects of theories required for novel or more effective functions and mechanisms. 
Component Lasso Method  We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connectedcomponents structure of the sample covariance matrix to split the problem into smaller ones. It then applies the lasso to each subproblem separately, obtaining a coefficient vector for each one. Finally, it uses nonnegative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery. The modular structure also lends itself naturally to parallel computation. 
Composite Gaussian Process Models (CGP) 
A new type of nonstationary Gaussian process model is devel oped for approximating computationally expensive functions. The new model is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models lo cal details. The new predictor also incorporates a flexible variance model, which makes it more capable of approximating surfaces with varying volatility. Compared to the commonly used stationary Gaus sian process model, the new predictor is numerically more stable and can more accurately approximate complex surfaces when the experi mental design is sparse. In addition, the new model can also improve the prediction intervals by quantifying the change of local variability associated with the response. 
Composite Indicator (COIN) 
A composite indicator is formed when individual indicators are compiled into a single index, on the basis of an underlying model of the multidimensional concept that is being measured. A composite indicator measures multidimensional concepts (e.g. competitiveness, etrade or environmental quality) which cannot be captured by a single indicator. Ideally, a composite indicator should be based on a theoretical framework / definition, which allows individual indicators / variables to be selected, combined and weighted in a manner which reflects the dimensions or structure of the phenomena being measured. 
Composite Quantile Regression (CQR) 

Compositional Data  In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information. This definition, given by John Aitchison (1986) has several consequences: • A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted. • As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions. • As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant kappa. The vector operation assigning the constant sum representative is called closure, where D is the number of parts (components) and denotes a row vector. • Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex. 
Compositional Data Analysis (CoDa) 
Compositional data analysis deals with situations where the relevant information is contained only in the ratios between the measured variables, and not in the reported values. Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. A Concise Guide to Compositional Data Analysis Compositional,compositions 
Compositional Pattern Producing Network (DPPN) 
Compositional patternproducing networks (CPPNs) are a variation of artificial neural networks (ANNs) that differ in their set of activation functions and how they are applied. While ANNs often contain only sigmoid functions and sometimes Gaussian functions, CPPNs can include both types of functions and many others. The choice of functions for the canonical set can be biased toward specific types of patterns and regularities. For example, periodic functions such as sine produce segmented patterns with repetitions, while symmetric functions such as Gaussian produce symmetric patterns. Linear functions can be employed to produce linear or fractallike patterns. Thus, the architect of a CPPNbased genetic art system can bias the types of patterns it generates by deciding the set of canonical functions to include. 
Comprehensive EVent Ontology (CEVO) 
While the general analysis of named entities has received substantial research attention, the analysis of relations over named entities has not. In fact, a review of the literature on unstructured as well as structured data revealed a deficiency in research on the abstract conceptualization required to organize relations. We believe that such an abstract conceptualization can benefit various communities and applications such as natural language processing, information extraction, machine learning and ontology engineering. In this paper, we present CEVO (i.e., a Comprehensive EVent Ontology) built on Levin’s conceptual hierarchy of English verbs that categorizes verbs with the shared meaning and syntactic behavior. We present the fundamental concepts and requirements for this ontology. Furthermore, we present three use cases for demonstrating the benefits of this ontology on annotation tasks: 1) annotating relations in plain text, 2) annotating ontological properties and 3) linking textual relations to ontological properties. 
Compressed Learning (CL) 
In this paper, we provide theoretical results to show that compressed learning, learning directly in the compressed domain, is possible. In Particular, we provide tight bounds demonstrating that the linear kernel SVM’s classifier in the measurement domain, with high probability, has true accuracy close to the accuracy of the best linear threshold classifier in the data domain. We show that this is beneficial both from the compressed sensing and the machine learning points of view. Furthermore, we indicate that for a family of wellknown compressed sensing matrices, compressed learning is universal, in the sense that learning and classification in the measurement domain works provided that the data are sparse in some, even unknown, basis. Moreover, we show that our results are also applicable to a family of smooth manifoldlearning tasks. Finally, we support our claims with experimental results. Compressed Learning: A Deep Neural Network Approach 
Compressed, Complementary, ComputationallyEfficient Adaptive Gradient Online Learning (CompAdaGrad) 
The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method’s fullmatrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is computationally prohibitive and so the simpler diagonal version often is used in practice. We introduce a new method, CompAdaGrad, that navigates the space between these two schemes and show that this method can yield results much better than diagonal AdaGrad while avoiding the (effectively intractable) $O(n^3)$ computational complexity of fullmatrix AdaGrad for dimension $n$. CompAdaGrad essentially performs fullmatrix regularization in a lowdimensional subspace while performing diagonal regularization in the complementary subspace. We derive CompAdaGrad’s updates for composite mirror descent in case of the squared $\ell_2$ norm and the $\ell_1$ norm, demonstrate that its complexity per iteration is linear in the dimension, and establish guarantees for the method independent of the choice of composite regularizer. Finally, we show preliminary results on several datasets. 
Compressive Kmeans (CKM) 
The LloydMax algorithm is a classical approach to perform Kmeans clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of Kmeans (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to LloydMax, for a sketch size proportional to the number of centroids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike LloydMax which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of LloydMax, with similar clustering performance on artificial data. Finally, CKM achieves lower classification errors on handwritten digits classification. ➘ “LloydMax” 
Compressive Sampling (CS) 
Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the ShannonNyquist sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity which requires the signal to be sparse in some domain. The second one is incoherence which is applied through the isometric property which is sufficient for sparse signals. MRI is a prominent application. A Mathematical Introduction to Compressive Sensing An Introduction To Compressive Sampling Compressive Sensing 
Computational Intelligence (CI) 
Computational intelligence (CI) is a set of natureinspired computational methodologies and approaches to address complex realworld problems to which traditional approaches, i.e., first principles modeling or explicit statistical modeling, are ineffective or infeasible. Many such reallife problems are not considered to be wellposed problems mathematically, but nature provides many counterexamples of biological systems exhibiting the required function, practically. For instance, the human body has about 200 joints (degrees of freedom), but humans have little problem in executing a target movement of the hand, specified in just three Cartesian dimensions. Even if the torso were mechanically fixed, there is an excess of 7:3 parameters to be controlled for natural arm movement. Traditional models also often fail to handle uncertainty, noise and the presence of an everchanging context. Computational Intelligence provides solutions for such and other complicated problems and inverse problems. It primarily includes artificial neural networks, evolutionary computation and fuzzy logic. In addition, CI also embraces biologically inspired algorithms such as swarm intelligence and artificial immune systems, which can be seen as a part of evolutionary computation, and includes broader fields such as image processing, data mining, and natural language processing. Furthermore other formalisms: Dempster–Shafer theory, chaos theory and manyvalued logic are used in the construction of computational models. The characteristic of “intelligence” is usually attributed to humans. More recently, many products and items also claim to be “intelligent”. Intelligence is directly linked to the reasoning and decision making. Fuzzy logic was introduced in 1965 as a tool to formalise and represent the reasoning process and fuzzy logic systems which are based on fuzzy logic possess many characteristics attributed to intelligence. Fuzzy logic deals effectively with uncertainty that is common for human reasoning, perception and inference and, contrary to some misconceptions, has a very formal and strict mathematical backbone (‘is quite deterministic in itself yet allowing uncertainties to be effectively represented and manipulated by it’, so to speak). Neural networks, introduced in 1940s (further developed in 1980s) mimic the human brain and represent a computational mechanism based on a simplified mathematical model of the perceptrons (neurons) and signals that they process. Evolutionary computation, introduced in the 1970s and more popular since the 1990s mimics the populationbased sexual evolution through reproduction of generations. It also mimics genetics in so called genetic algorithms. 
Computational Linguistics  Computational linguistics is an interdisciplinary field concerned with the statistical or rulebased modeling of natural language from a computational perspective. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, logicians, philosophers, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others. Computational linguistics has theoretical and applied components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science, and applied computational linguistics focuses on the practical outcome of modeling human language use. 
Computational Network Toolkit (CNTK) 
CNTK (http://www.cntk.ai ), the Computational Network Toolkit by Microsoft Research, is a unified deeplearning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feedforward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an opensource license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code. 
Computational Theory of Mind  In philosophy, a computational theory of mind names a view that the human mind or the human brain (or both) is an information processing system and that thinking is a form of computing. The theory was proposed in its modern form by Hilary Putnam in 1961, and developed by the MIT philosopher and cognitive scientist (and Putnam’s PhD student) Jerry Fodor in the 1960s, 1970s and 1980s. Despite being vigorously disputed in analytic philosophy in the 1990s (due to work by Putnam himself, John Searle, and others), the view is common in modern cognitive psychology and is presumed by many theorists of evolutionary psychology; in the 2000s and 2010s the view has resurfaced in analytic philosophy (Scheutz 2003, Edelman 2008). The computational theory of mind holds that the mind is a computation that arises from the brain acting as a computing machine. The theory can be elaborated in many ways, the most popular of which is that the brain is a computer and the mind is the result of the program that the brain runs. A program is the finite description of an algorithm or effective procedure, which prescribes a deterministic sequence of discrete actions that produces outputs based only on inputs and the internal states (memory) of the computing machine. For any admissible input, algorithms terminate in a finite number of steps. So the computational theory of mind is the claim that the mind is a computation of a machine (the brain) that derives output representations of the world from input representations and internal memory in a deterministic (nonrandom) way that is consistent with the theory of computation. Computational theories of mind are often said to require mental representation because ‘input’ into a computation comes in the form of symbols or representations of other objects. A computer cannot compute an actual object, but must interpret and represent the object in some form and then compute the representation. The computational theory of mind is related to the representational theory of mind in that they both require that mental states are representations. However the two theories differ in that the representational theory claims that all mental states are representations while the computational theory leaves open that certain mental states, such as pain or depression, may not be representational and therefore may not be suitable for a computational treatment. These nonrepresentational mental states are known as qualia. In Fodor’s original views, the computational theory of mind is also related to the language of thought. The language of thought theory allows the mind to process more complex representations with the help of semantics. 
Computer Aided Diagnosis  In radiology, computeraided detection (CADe), also called computeraided diagnosis (CADx), are procedures in medicine that assist doctors in the interpretation of medical images. Imaging techniques in Xray, MRI, and Ultrasound diagnostics yield a great deal of information, which the radiologist has to analyze and evaluate comprehensively in a short time. CAD systems help scan digital images, e.g. from computed tomography, for typical appearances and to highlight conspicuous sections, such as possible diseases. 
Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS) 
Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc. 
Computer Science  Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information, whether such information is encoded as bits in a computer memory or transcribed in genes and protein structures in a biological cell. An alternate, more succinct definition of computer science is the study of automating algorithmic processes that scale. A computer scientist specializes in the theory of computation and the design of computational systems. Its subfields can be divided into a variety of theoretical and practical disciplines. Some fields, such as computational complexity theory (which explores the fundamental properties of computational and intractable problems), are highly abstract, while fields such as computer graphics emphasize realworld visual applications. Still other fields focus on the challenges in implementing computation. For example, programming language theory considers various approaches to the description of computation, while the study of computer programming itself investigates various aspects of the use of programming language and complex systems. Humancomputer interaction considers the challenges in making computers and computations useful, usable, and universally accessible to humans. 
Computer Vision (CV) 
Computer vision is a field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, highdimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. A theme in the development of this field has been to duplicate the abilities of human vision by electronically perceiving and understanding an image. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. Computer vision has also been described as the enterprise of automating and integrating a wide range of processes and representations for vision perception. Computer vision is the automatic analysis of images and videos by computers in order to gain some understanding of the world. Computer vision is inspired by the capabilities of the human vision system and, when initially addressed in the 1960s and 1970s, it was thought to be a relatively straightforward problem to solve. However, the reason we think/thought that vision is easy is that we have our own visual system which makes the task seem intuitive to our conscious minds. In fact, the human visual system is very complex and even the estimates of how much of the brain is involved with visual processing vary from 25% up to more than 50%. 
Concept Mining  Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. 
Conceptual Clustering  Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning. http://…/eswc2008PAM.pdf 
Concordance Correlation Coefficient  In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for interrater reliability. 
Condition Monitoring (CM) 
Condition monitoring (or, colloquially, CM) is the process of monitoring a parameter of condition in machinery (vibration, temperature etc.), in order to identify a significant change which is indicative of a developing fault. It is a major component of . The use of condition monitoring allows maintenance to be scheduled, or other actions to be taken to prevent failure and avoid its consequences. Condition monitoring has a unique benefit in that conditions that would shorten normal lifespan can be addressed before they develop into a major failure. Condition monitoring techniques are normally used on rotating equipment and other machinery (pumps, electric motors, internal combustion engines, presses), while periodic inspection using nondestructive testing techniques and fit for service (FFS) evaluation are used for stationary plant equipment such as steam boilers, piping and heat exchangers. http://…/9781466584051 
Conditional Autoregressive Model (CAR) 
The essential idea here is that the probability of values estimated at any given location are conditional on the level of neighboring values. mclcar 
Conditional Extreme Value Models  Extreme value theory (EVT) is often used to model environmental, financial and internet traffic data. Multivariate EVT assumes a multivariate domain of attraction condition for the distribution of a random vector necessitating that each component satisfy a marginal domain of attraction condition. Heffernan and Tawn [2004] and Heffernan and Resnick [2007] developed an approximation to the joint distribution of the random vector by conditioning on one of the components being in an extreme value domain. The usual method of analysis using multivariate extreme value theory often is not helpful either because of asymptotic independence or due to one component of the observation vector not being in a domain of attraction. These defects can be addressed by using the conditional extreme value model. 
Conditional Power (CP) 
Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis. In many clinical trials, a CP computation at a prespecified point in the study, such as midway, is used as the basis for early termination for futility when there is little evidence of a beneficial effect. 
Conditional Random Fields (CRF) 
Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to ‘neighboring’ samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples. CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences and in computer vision. Specifically, CRFs find applications in shallow parsing, named entity recognition and gene finding, among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation. 
Conditional Random Fields as Recurrent Neural Networks (CRFRNN) 
Pixellevel labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixellevel labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields as Recurrent Neural Networks. This network, called CRFRNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network endtoend with the usual backpropagation algorithm, avoiding offline postprocessing methods for object delineation. GitXiv 
ConditionBased Maintenance (CBM) 
Conditionbased maintenance (CBM), shortly described, is maintenance when need arises. This maintenance is performed after one or more indicators show that equipment is going to fail or that equipment performance is deteriorating. This concept is applicable to mission critical systems that incorporate active redundancy and fault reporting. It is also applicable to nonmission critical systems that lack redundancy and fault reporting. Conditionbased maintenance was introduced to try to maintain the correct equipment at the right time. CBM is based on using realtime data to prioritize and optimize maintenance resources. Observing the state of the system is known as condition monitoring. Such a system will determine the equipment’s health, and act only when maintenance is actually necessary. Developments in recent years have allowed extensive instrumentation of equipment, and together with better tools for analyzing condition data, the maintenance personnel of today are more than ever able to decide what is the right time to perform maintenance on some piece of equipment. Ideally conditionbased maintenance will allow the maintenance personnel to do only the right things, minimizing spare parts cost, system downtime and time spent on maintenance. http://…/3313ijmnct03.pdf 
CONESTA (CONESTA) 
Highdimensional prediction models are increasingly used to analyze biological data such as neuroimaging of genetic data sets. However, classical penalized algorithms yield to dense solutions that are difficult to interpret without arbitrary thresholding. Alternatives based on sparsityinducing penalties suffer from coefficient instability. Complex structured sparsityinducing penalties are a promising approach to force the solution to adhere to some domainspecific constraints and thus offering new perspectives in biomarker identification. We propose a generic optimization framework that can combine any smooth convex loss function with: (i) penalties whose proximal operator is known and (ii) with a large range of complex, nonsmooth convex structured penalties such as total variation, or overlapping group lasso. Although many papers have addressed a similar goal, few have tackled it in such a generic way and in the context of highdimensional data. The proposed continuation algorithm, called \textit{CONESTA}, dynamically smooths the complex penalties to avoid the computation of proximal operators, that are either not known or expensive to compute. The decreasing sequence of smoothing parameters is dynamically adapted, using the duality gap, in order to maintain the optimal convergence speed towards any globally desired precision with duality gap guarantee. First, we demonstrate, on both simulated data and on experimental MRI data, that CONESTA outperforms the excessive gap method, ADMM, proximal gradient smoothing (without continuation) and inexact FISTA in terms of convergence speed and/or precision of the solution. Second, on the experimental MRI data set, we establish the superiority of structured sparsityinducing penalties ($\ell_1$ and total variation) over nonstructured methods in terms of the recovery of meaningful and stable groups of predictive variables. 
Confidence  Confidence is defined as the probability of seeing the rule’s consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules X→Y and Y→X. Association rules have to satisfy a minimum confidence constraint, conf(X→Y)≥γ. Confidence is not downward closed and was developed together with support by Agrawal et al. (the socalled supportconfidence framework). Support is first used to find frequent (significant) itemsets exploiting its downward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold. A problem with confidence is that it is sensitive to the frequency of the consequent Y in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items. 
Confidence Interval  In statistics, a confidence interval (CI) is a type of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. 
Confidence Weighting (CW) 
Confidence weighting (CW) is concerned with measuring two variables: (1) what a respondent believes is a correct answer to a question and (2) what degree of certainty the respondent has toward the correctness of this belief. Confidence weighting when applied to a specific answer selection for a particular test or exam question is referred to in the literature from cognitive psychology as itemspecific confidence, a term typically used by researchers who investigate metamemory or metacognition, comprehension monitoring, or feelingofknowing. Itemspecific confidence is defined as calibrating the relationship between an objective performance of accuracy (e.g., a test answer selection) with the subjective measure of confidence, (e.g., a numeric value assigned to the selection). Studies on selfconfidence and metacognition during test taking have used itemspecific confidence as a way to assess the accuracy and confidence underlying knowledge judgments. Researchers outside of the field of cognitive psychology have used confidence weighting as applied to itemspecific judgments in assessing alternative conceptions of difficult concepts in high school biology and physics, developing and evaluating computerized adaptive testing, testing computerized assessments of learning and understanding, and developing and testing formative and summative classroom assessments. Confidence weighting is one of three components of the Risk Inclination Model. 
ConfidenceWeighted Linear Classification  We introduce confidenceweighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. 
Confident Multiple Choice Learning (CMCL) 
Ensemble methods are arguably the most trustworthy techniques for boosting the performance of machine learning models. Popular independent ensembles (IE) relying on naive averaging/voting scheme have been of typical choice for most applications involving deep neural networks, but they do not consider advanced collaboration among ensemble models. In this paper, we propose new ensemble methods specialized for deep neural networks, called confident multiple choice learning (CMCL): it is a variant of multiple choice learning (MCL) via addressing its overconfidence issue.In particular, the proposed major components of CMCL beyond the original MCL scheme are (i) new loss, i.e., confident oracle loss, (ii) new architecture, i.e., feature sharing and (iii) new training method, i.e., stochastic labeling. We demonstrate the effect of CMCL via experiments on the image classification on CIFAR and SVHN, and the foregroundbackground segmentation on the iCoseg. In particular, CMCL using 5 residual networks provides 14.05% and 6.60% relative reductions in the top1 error rates from the corresponding IE scheme for the classification task on CIFAR and SVHN, respectively. 
Configural Frequency Analysis (CFA) 
Configural frequency analysis (CFA) is a method of exploratory data analysis, introduced by Gustav A. Lienert in 1969. The goal of a configural frequency analysis is to detect patterns in the data that occur significantly more (such patterns are called Types) or significantly less often (such patterns are called Antitypes) than expected by chance. Thus, the idea of a CFA is to provide by the identified types and antitypes some insight into the structure of the data. Types are interpreted as concepts which are constituted by a pattern of variable values. Antitypes are interpreted as patterns of variable values that do in general not occur together. cfa 
Configurational Comparative Methods (CCM) 
Configurational comparative methods (CCMs) subsume techniques for the identification of complex causal dependencies in configurational data using the theoretical framework of Boolean algebra and its various extensions (Rihoux and Ragin, 2009). For example, Qualitative Comparative Analysis (QCA; Ragin, 1987, 2000, 2008)—hitherto the most prominent representative of CCMs—has been applied in areas as diverse as business administration (e.g., Chung, 2001), environmental science (van Vliet et al., 2013), evaluation (Cragun et al., 2014), political science (Thiem, 2011), public health (Longest and Thoits, 2012) and sociology (Crowley, 2013). Besides three standalone programs based on graphical user interfaces, three R packages for QCA are currently available, each with a different scope of functionality: QCA (Du¸sa and Thiem, 2014; Thiem and Du¸sa, 2013a,c), QCA3 (Huang, 2014) and SetMethods (Quaranta, 2013) (an addon package to Schneider and Wagemann, 2012). 
Confirmatory Analysis  1) Inferential Statistics – Deductive Approach: • Heavy reliance on probability models • Must accept untestable assumptions • Look for definite answers to specific questions • Emphasis on numerical calculations • Hypotheses determined at outset • Hypothesis tests and formal confidence interval estimation. 2) Advantages: • Provide precise information in the right circumstances • Wellestablished theory and methods. 3) Disadvantages: • Misleading impression of precision in less than ideal circumstances • Analysis driven by preconceived ideas • Difficult to notice unexpected results. 
Confirmatory Factor Analysis (CFA) 
In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research. It is used to test whether measures of a construct are consistent with a researcher’s understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959). In confirmatory factor analysis, the researcher first develops a hypothesis about what factors s/he believes are underlying the measures s/he has used (e.g., “Depression” being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with his/her theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to one another, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others. For some applications, the requirement of “zero loadings” (for indicators not supposed to load on a certain factor) has been regarded as too strict. A newly developed analysis method, “exploratory structural equation modeling”, specifies hypotheses about the relation between observed indicators and their supposed primary latent factors while allowing for estimation of loadings with other latent factors as well. relabeLoadings 
ConflictDriven Clause Learning (CDCL) 
In computer science, ConflictDriven Clause Learning (CDCL) is an algorithm for solving the Boolean satisfiability problem (SAT). Given a Boolean formula, the SAT problem asks for an assignment of variables so that the entire formula evaluates to true. The internal workings of CDCL SAT solvers were inspired by DPLL solvers. 
Conflictfree Asynchronous Machine Learning (CYCLADES) 
We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. CYCLADES is asynchronous during shared model updates, and requires no memory locking mechanisms, similar to HOGWILD!type algorithms. Unlike HOGWILD!, CYCLADES introduces no conflicts during the parallel execution, and offers a blackbox analysis for provable speedups across a large family of algorithms. Due to its inherent conflictfree nature and cache locality, our multicore implementation of CYCLADES consistently outperforms HOGWILD!type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to the HOGWILD! implementation of SGD, and up to 5x gains over asynchronous implementations of variance reduction algorithms. 
Conformal Prediction  Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability e, together with a method that makes a prediction ˆ y of a label y, it produces a set of labels, typically containing ˆ y, that also contains y with probability 1e. Conformal prediction can be applied to any method for producing ˆ y: a nearestneighbor method, a supportvector machine, ridge regression, etc. Conformal prediction is designed for an online setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1e of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other online compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. 
Confounding  http://…/confounding.html 
Confounding Variable  In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable. A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship, and the presence of misestimation for this reason is termed omittedvariable bias. While specific definitions may vary, in essence a confounding variable fits the following four criteria, here given in a hypothetical situation with variable of interest ‘V’, confounding variable ‘C’ and outcome of interest ‘O’: 1. C is associated (inversely or directly) with O 2. C is associated with O, independent of V 3. C is associated (inversely or directly) with V 4. C is not in the causal pathway of V to O (C is not a direct consequence of V, not a way by which V produces O) The above correlationbased definition, however, is metaphorical at best – a growing number of analysts agree that confounding is a causal concept, and as such, cannot be described in terms of correlations nor associations. 
Confusion Matrix  In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix , is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). 
Congruence Class Model (CCM) 
CCMnet 
Conjugate Gradient Method (CG) 
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positivedefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems. The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It was developed by Magnus Hestenes and Eduard Stiefel. 
Conjugate Prior  In Bayesian probability theory, if the posterior distributions p(thetax) are in the same family as the prior probability distribution p(theta), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function. For example, the Gaussian family is conjugate to itself (or selfconjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood which is also Gaussian. 
Connected Scatterplot  The connected scatterplot visualizes two related time series in a scatterplot and connects the points with a line in temporal sequence. News media are increasingly using this technique to present data under the intuition that it is understandable and engaging. To explore these intuitions, we (1) describe how paired time series relationships appear in a connected scatterplot, (2) qualitatively evaluate how well people understand trends depicted in this format, (3) quantitatively measure the types and frequency of misinterpretations, and (4) empirically evaluate whether viewers will preferentially view graphs in this format over the more traditional format. The results suggest that lowcomplexity connected scatterplots can be understood with little explanation, and that viewers are biased towards inspecting connected scatterplots over the more traditional format. We also describe misinterpretations of connected scatterplots and propose further research into mitigating these mistakes for viewers unfamiliar with the technique. 
Connection Analytics  Connection Analytics – an emerging discipline that provides answers to persistent business questions such as identification and influence of thought leaders, impact of external events or players on financial risk, or analysis of network performance based on causal relationships between nodes. It provides a new way of looking at people, products, physical phenomena, or events. Enterprises are using Big Data analytics to complement traditional SQL queries in answering very familiar questions, such as customer retention, marketing attribution, risk mitigation, and operational efficiency which, until now, required enormous compute power, timeconsuming data management and the need for learning highly specialized programming and query languages. 
Connection Scan Algorithm (CSA) 
We introduce the Connection Scan Algorithm (CSA) to efficiently answer queries to timetable information systems. The input consists, in the simplest setting, of a source position and a desired target position. The output consist is a sequence of vehicles such as trains or buses that a traveler should take to get from the source to the target. We study several problem variations such as the earliest arrival and profile problems. We present algorithm variants that only optimize the arrival time or additionally optimize the number of transfers in the Pareto sense. An advantage of CSA is that is can easily adjust to changes in the timetable, allowing the easy incorporation of known vehicle delays. We additionally introduce the Minimum Expected Arrival Time (MEAT) problem to handle possible, uncertain, future vehicle delays. We present a solution to the MEAT problem that is based upon CSA. Finally, we extend CSA using the multilevel overlay paradigm to answer complex queries on nationwide integrated timetables with trains and buses. 
Connectionist Temporal Classification (CTC) 
Many realworld sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or subword units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require presegmented training data, and postprocessing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMMRNN. 
ConoverIman Test  
Constrained Optimization By RAdial Basis Function Interpolation (COBRA) 

Constrained Policy Optimization (CPO) 
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015, Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in highdimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first generalpurpose policy search algorithm for constrained reinforcement learning with guarantees for nearconstraint satisfaction at each iteration. Our method allows us to train neural network policies for highdimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety. 
Content Grouping  Content Grouping lets you group content into a logical structure that reflects how you think about your site or app, and then view and compare aggregated metrics by group name in addition to being able to drill down to the individual URL, page title, or screen name. For example, you can see the aggregated number of pageviews for all pages in a group like Men/Shirts, and then drill in to see each URL or page title. You start by creating a Content Group, a collection of content. For example, on an ecommerce site that sells clothing, you might create groups for Men, Women, and Children. Then, within each group, you might create content like Shirts, Pants, Outerwear. This would let you compare aggregated statistics for each type of clothing within a group (e.g., Men’s Shirts vs Men’s Pants vs. Men’s Outerwear). It would also let you drill in to each group to see how individual Shirts pages compare to one another, for example, Men/Shirts/Tshirts/index.html vs Men/Shirts/DressShirts/index.html. 
Context Aware Bandits (CAB) 
In this paper, we present the CAB (Context Aware Bandits). With CAB we attempt to craft a bandit algorithm that can exploit collaborative effects and that can be deployed in a practical recommendation system setting, where the multiarmed bandits have been shown to perform well in particular with respect to the cold start problem. CAB exploits, a contextaware clustering technique augmenting explorationexploitation strategies in a contextual multiarmed bandit settings. CAB dynamically clusters the users based on the content universe under consideration. We demonstrate the efficacy of our approach on extensive realworld datasets, showing the scalability, and more importantly, the significant increased prediction performance compared to related stateoftheart methods. 
Context Awareness  Context awareness is a property of mobile devices that is defined complementarily to location awareness. Whereas location may determine how certain processes in a device operate, context may be applied more flexibly with mobile users, especially with users of smart phones. Context awareness originated as a term from ubiquitous computing or as socalled pervasive computing which sought to deal with linking changes in the environment with computer systems, which are otherwise static. The term has also been applied to business theory in relation to Contextual application design and business process management issues. 
Contextaware Sentiment Word Identification (sentiword2vec) 
Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing interword relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general objectbased conditions. 
Contextual / Common Query Language (CQL) 
Contextual Query Language (CQL), previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information. Based on the semantics of Z39.50, its design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex query languages. 
Contextual Bandit  The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised – in particular a clickornot on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “kArmed Bandits”, often with various relevant embellishments. The kArmed Bandit setting works on a roundbyround basis. On each round: 1. A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). 2. The world reveals the reward ra of the chosen arm (i.e. whether the ad is clicked on). http://…/Multiarmed_bandit#Contextual_Bandit 
Contextual Explanation Networks (CEN) 
We introduce contextual explanation networks (CENs)—a class of models that learn to predict by generating and leveraging intermediate explanations. CENs combine deep networks with contextspecific probabilistic models and construct explanations in the form of locallycorrect hypotheses. Contrary to the existing posthoc modelexplanation tools, CENs learn to predict and to explain jointly. Our approach offers two major advantages: (i) for each prediction, valid instancespecific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in lowresource settings. We prove that local approximations to the decision boundary of our networks are consistent with the generated explanations. Our results on image and text classification and survival analysis tasks demonstrate that CENs can easily match or outperform the stateoftheart while offering additional insights behind each prediction, valuable for decision support. 
Contextual MultiArmed Bandits  MultiArmed Bandits with side information. 
Continuous BagofWords (CBOW) 
The ‘continuous bagofwords model’ (CBOW) adds inputs from words within short window to predict the current word. http://…/1301.3781.pdf 
Continuous Computation Language (CCL) 
For Sybase Complex Event Procesing (CEP), developers create CEP applications using the Continuous Computation Language (CCL). Introduced in 2005, CCL was the first commercial, declarative SQLbased CEP language and remains the most extensive SQLbased CEP language on the market. Because the Continuous Computation Language (CCL) is a SQLbased language, it gives programmers a huge head start in creating CEP applications. The Sybase CEP Studio helps manage all aspects of the application development process, further increasing programmer productivity. 
Continuous Skipgram (Skipgram) 
The training objective of the Skipgram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document. More formally, given a sequence of training words w1,w2,w3, … ,wT , the objective of the Skipgram model is to maximize the average log probability, where c is the size of the training context (which can be a function of the center word wt). Larger c results in more training examples and thus can lead to a higher accuracy, at the expense of the 2 training time. http://…/1301.3781.pdf 
Continuous Time Stochastic Modelling (CTSM) 
In probability theory and statistics, a continuoustime stochastic process, or a continuousspacetime stochastic process is a stochastic process for which the index variable takes a continuous set of values, as contrasted with a discretetime process for which the index variable takes only distinct values. An alternative terminology uses continuous parameter as being more inclusive. A more restricted class of processes are the continuous stochastic processes: here the term often (but not always) implies both that the index variable is continuous and that sample paths of the process are continuous. Given the possible confusion, caution is needed. Continuoustime stochastic processes that are constructed from discretetime processes via a waiting time distribution are called continuoustime random walks. ctsmr 
Contrast  In statistics, particularly analysis of variance and linear regression, an orthogonal contrast is a linear combination of two or more factor level means (averages) whose coefficients add up to zero. Nonorthogonal contrasts do not necessarily sum to 0. Contrasts should be constructed “to answer specific research questions”, and do not necessarily have to be orthogonal. 
Contrast Analysis  ➚ “Contrast” 
Contrastive Divergence (CD) 
Contrastive Divergence (CD), an approximate MaximumLikelihood (ML) learning algorithm proposed by Geoffrey Hinton. Contrastive Divergence is basically a funky term for “approximate gradient descent”. 
Contrastivecenter Loss  The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastivecenter loss simultaneously considers intraclass compactness and interclass separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their noncorresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastivecenter loss. 
CONvergence of iterated CORrelations (CONCOR) 
Given an adjacency matrix, or a set of adjacency matrices for different relations, a correlation matrix can be formed by the following procedure. Form a profile vector for a vertex i by concatenating the ith row in every adjacency matrix; the i,jth element of the correlation matrix is the Pearson correlation coefficient of the profile vectors of i and j. This (square, symmetric) matrix is called the first correlation matrix. The procedure can be performed iteratively on the correlation matrix until convergence. Each entry is now 1 or 1. This matrix is used to split the data into two blocks such that members of the same block are positively correlated, members of different blocks are negatively correlated. CONCOR uses the above technique to split the initial data into two blocks. Successive splits are then applied to the separate blocks. At each iteration all blocks are submitted for analysis, however blocks containing two vertices are not split. Consequently npartitions of the binary tree can produce up to 2n blocks. Note that any similarity matrix can be used as input. http://…/concorinr 
Convergence of Random Variables  In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behaviour that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behaviour can be characterised: two readily understood behaviours are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution. http://…ty_theory#Convergence_of_random_variables http://…ofconvergenceinprobabilitytheory.jpg 
Convergent Cross Mapping (CCM) 
Convergent cross mapping (CCM) is a statistical test for a causeandeffect relationship between two time series variables that, like the Granger causality test, seeks to resolve the problem that correlation does not imply causation. While Granger causality is best suited for purely stochastic systems where the influence of the causal variables are separable (independent of each other), CCM is based on the theory of Dynamical systems and can be applied to systems where causal variables have synergistic effects. The test was developed in 2012 by the lab of George Sugihara of the Scripps Institution of Oceanography, La Jolla, California, USA. 
Convex Banding of the Covariance Matrix  We introduce a new sparse estimator of the covariance matrix for highdimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparselybanded, dataadaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonlystudied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactlybanded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. 
Convex Function  In mathematics, a realvalued function f(x) defined on an interval is called convex (or convex downward or concave upward) if the line segment between any two points on the graph of the function lies above the graph, in a Euclidean space (or more generally a vector space) of at least two dimensions. Equivalently, a function is convex if its epigraph (the set of points on or above the graph of the function) is a convex set. Wellknown examples of convex functions are the quadratic function f(x)=x^2 and the exponential function f(x)=e^x for any real number x. Convex functions play an important role in many areas of mathematics. They are especially important in the study of optimization problems where they are distinguished by a number of convenient properties. For instance, a (strictly) convex function on an open set has no more than one minimum. Even in infinitedimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and, as a result, they are the most wellunderstood functionals in the calculus of variations. In probability theory, a convex function applied to the expected value of a random variable is always less than or equal to the expected value of the convex function of the random variable. This result, known as Jensen’s inequality, underlies many important inequalities (including, for instance, the arithmetic–geometric mean inequality and Hölder’s inequality). Exponential growth is a special case of convexity. Exponential growth narrowly means “increasing at a rate proportional to the current value”, while convex growth generally means “increasing at an increasing rate (but not necessarily proportionally to current value)”. 
Convex Hierarchical Testing (CHT) 
We consider the testing of all pairwise interactions in a twoclass problem with many features. We devise a hierarchical testing framework that considers an interaction only when one or more of its constituent features has a nonzero main effect. The test is based on a convex optimization framework that seamlessly considers main effects and interactions together. 
Convex Optimization  Convex minimization, a subfield of optimization, studies the problem of minimizing convex functions over convex sets. The convexity property can make optimization in some sense “easier” than the general case – for example, any local minimum must be a global minimum. 
Convexified Convolutional Neural Networks (CCNN) 
We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a reproducing kernel Hilbert space, the CNN parameters can be represented as a lowrank matrix, which can be relaxed to obtain a convex optimization problem. For learning twolayer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. For learning deeper networks, we train CCNNs in a layerwise manner. Empirically, CCNNs achieve performance competitive with CNNs trained by backpropagation, SVMs, fullyconnected neural networks, stacked denoising autoencoders, and other baseline methods. 
ConvNetJS  ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you’re training. No software requirements, no compilers, no installations, no GPUs, no sweat. 
Convolution  In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions, giving the area overlap between the two functions as a function of the amount that one of the original functions is translated. Convolution is similar to crosscorrelation. It has applications that include probability, statistics, computer vision, image and signal processing, electrical engineering, and differential equations. 
Convolutional Neural Network  In computer science, a convolutional neural network is a type of feedforward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field. Convolutional networks were inspired by biological processes and are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing. They are widely used models for image recognition. http://…oduction_to_Convolutional_Neural_Networks 
Convolutional Recurrent Neural Network (CRNN) 
This paper proposes a novel framework for detecting redundancy in supervised sentence categorisation. Unlike traditional singleton neural network, our model incorporates characteraware convolutional neural network (CharCNN) with characteraware recurrent neural network (CharRNN) to form a convolutional recurrent neural network (CRNN). Our model benefits from CharCNN in that only salient features are selected and fed into the integrated CharRNN. CharRNN effectively learns long sequence semantics via sophisticated update mechanism. We compare our framework against the stateoftheart text classification algorithms on four popular benchmarking corpus. For instance, our model achieves competing precision rate, recall ratio, and F1 score on the Googlenews dataset. For twentynewsgroups data stream, our algorithm obtains the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, our framework obtains the best F1 score and almost equivalent precision rate and recall ratio over the top competitor. For the question classification collection, CRNN produces the optimal recall rate and F1 score and comparable precision rate. We also analyse three different RNN hidden recurrent cells’ impact on performance and their runtime efficiency. We observe that MGU achieves the optimal runtime and comparable performance against GRU and LSTM. For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec embeddings and report their performance differences. 
ConwayMaxwell Poisson (CMP) 
Count data are a popular outcome in many empirical studies, especially as big data has become available on human and social behavior. The ConwayMaxwell Poisson (CMP) distribution is popularly used for modeling count data due to its ability to handle both overdispersed and underdispersed data. Yet, current methods for estimating CMP regression models are not efficient, especially with highdimensional data. Extant methods use either nonlinear optimization or MCMC methods. We propose a flexible estimation framework for CMP regression based on iterative reweighed least squares (IRLS). Because CMP belongs to the exponential family, convergence is guaranteed and is more efficient. We also extend this framework to allow estimation for additive models with smoothing splines. We illustrate the usefulness of this approach through simulation study and application to real data on speed dating. 
Cook’s Distance  In statistics, Cook’s distance or Cook’s D is a commonly used estimate of the influence of a data point when performing least squares regression analysis. In a practical ordinary least squares analysis, Cook’s distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977. 
Cooperative Game Theory  In game theory, a cooperative game is a game where groups of players (‘coalitions’) may enforce cooperative behaviour, hence the game is a competition between coalitions of players, rather than between individual players. An example is a coordination game, when players choose the strategies by a consensus decisionmaking process. Recreational games are rarely cooperative, because they usually lack mechanisms by which coalitions may enforce coordinated behaviour on the members of the coalition. Such mechanisms, however, are abundant in real life situations (e.g. contract law). Cooperative theory starts with a formalization of games that abstracts away altogether from procedures and … concentrates, instead, on the possibilities for agreement. … There are several reasons that explain why cooperative games came to be treated separately. One is that when one does build negotiation and enforcement procedures explicitly into the model, then the results of a noncooperative analysis depend very strongly on the precise form of the procedures, on the order of making offers and counteroffers and so on. This may be appropriate in voting situations in which precise rules of parliamentary order prevail, where a good strategist can indeed carry the day. But problems of negotiation are usually more amorphous; it is difficult to pin down just what the procedures are. More fundamentally, there is a feeling that procedures are not really all that relevant; that it is the possibilities for coalition forming, promising and threatening that are decisive, rather than whose turn it is to speak. … Detail distracts attention from essentials. Some things are seen better from a distance; the Roman camps around Metzada are indiscernible when one is in them, but easily visible from the top of the mountain. 
Cooperative Inverse Reinforcement Learning (CIRL) 
For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as {\em cooperative inverse reinforcement learning} (CIRL). A CIRL problem is a cooperative, partialinformation game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm. 
Cooperative Learning  Learning paradigms involving varying levels of supervision have received a lot of interest within the computer vision and machine learning communities. The supervisory information is typically considered to come from a human supervisor — a ‘teacher’ figure. In this paper, we consider an alternate source of supervision — a ‘peer’ — i.e. a different machine. We introduce cooperative learning, where two agents trying to learn the same visual concepts, but in potentially different environments using different sources of data (sensors), communicate their current knowledge of these concepts to each other. Given the distinct sources of data in both agents, the mode of communication between the two agents is not obvious. We propose the use of visual attributes — semantic midlevel visual properties such as furry, wooden, etc.– as the mode of communication between the agents. Our experiments in three domains — objects, scenes, and animals — demonstrate that our proposed cooperative learning approach improves the performance of both agents as compared to their performance if they were to learn in isolation. Our approach is particularly applicable in scenarios where privacy, security and/or bandwidth constraints restrict the amount and type of information the two agents can exchange. 
Coordinate Descent (CD) 
Coordinate descent is a nonderivative optimization algorithm. To find a local minimum of a function, one does line search along one coordinate direction at the current point in each iteration. One uses different coordinate directions cyclically throughout the procedure. On nonseparable functions the algorithm may fail to find the optimum in a reasonable number of function evaluations. To improve the convergence an appropriate coordinate system can be gradually learned, such that new search coordinates obtained using PCA are as decorrelated as possible with respect to the objective function 
Coordinate Descent Algorithms (CDA) 
This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving largescale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering. To keep the primer uptodate, we intend to publish this monograph only after no additional topics need to be added and we foresee no further major advances in the area. 
copCAR Regression Model (copCAR) 
NonGaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copulabased and employs the areal GLMM#s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online. copCAR 
Copula  In probability theory and statistics, a copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables. They are named for their resemblance to grammatical copulas in linguistics. 
Copula Statistic (CoS) 
A new index based on empirical copulas, termed the Copula Statistic (CoS), is introduced for assessing the strength of multivariate dependence and for testing statistical independence. New properties of the copulas are proved. They allow us to define the CoS in terms of a relative distance function between the empirical copula, the Fr\’echetHoeffding bounds and the independence copula. Monte Carlo simulations reveal that for large sample sizes, the CoS is approximately normal. This property is utilised to develop a CoSbased statistical test of independence against various noisy functional dependencies. It is shown that this test exhibits higher statistical power than the Total Information Coefficient (TICe), the Distance Correlation (dCor), the Randomized Dependence Coefficient (RDC), and the Copula Correlation (Ccor) for monotonic and circular functional dependencies. Furthermore, the R2equitability of the CoS is investigated for estimating the strength of a collection of functional dependencies with additive Gaussian noise. Finally, the CoS is applied to a real stock market data set from which we infer that a bivariate analysis is insufficient to unveil multivariate dependencies and to two gene expression data sets of the Yeast and of the E. Coli, which allow us to demonstrate the good performance of the CoS. 
Corpora Agnostic Word Vectorization Method (WordNet2Vec) 
A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position – role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. 
Corpus Linguistics  Corpus linguistics is the study of language as expressed in samples (corpora) of “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process. Corpus linguistics adherents believe that reliable language analysis best occurs on fieldcollected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Sinclair advocating minimal annotation and allowing texts to ‘speak for themselves’, to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour. 
Correlated Topic Model (CTM) 
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than xray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution. We derive a meanfield variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets. 
CORrelation ALignment (CORAL) 
In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the secondorder statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lowerdimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORALLDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves stateoftheart performance on standard benchmark datasets. Our code is available at:~\url{https://…/CORAL} 
CORrelation Differences (CORD) 
Given a zero mean random vector X=:(X1,…,Xp) ∈ R^p, we consider the problem of defining and estimating a partition G of {1,…,p} such that the components of X with indices in the same group of the partition have a similar, communitylike behavior. We introduce a new model, the Gexchangeable model, to define group similarity. This model is a natural extension of the more commonly used Glatent model, for which the partition G is generally not identifiable, without additional restrictions on X. In contrast, we show that for any random vector X there exists an identifiable partition G according to which X is Gexchangeable, thereby providing a clear target for community estimation. Moreover, we provide another model, the Gblock covariance model, which generalizes the Gexchangeable model, and can be of interest in its own right for defining group similarity. We discuss connections between the three types of Gmodels. We exploit the connection with Gblock covariance models to develop a new metric, CORD, and a homonymous method for community estimation. We specialize and analyze our method for Gaussian copula data. We show that this method recovers the partition according to which X is Gexchangeable with a Gblock copula correlation matrix. In the particular case of Gaussian distributions, this estimator, under mild assumptions, identifies the unique minimal partition according to the Glatent model. The CORD estimator is consistent as long as the communities are separated at a rate that we prove to be minimax optimal, via lower bound calculations. Our procedure is fast and extensive numerical studies show that it recovers communities defined by our models, while existing variable clustering algorithms typically fail to do so. This is further supported by two realdata examples. 
Correntropy  Correntropy is a nonlinear similarity measure between two random variables. Learning with the Maximum Correntropy Criterion Induced Losses for Regression 
Correspondence Analysis (CA) 
Correspondence analysis (CA) is a multivariate statistical technique proposed by Hirschfeld and later developed by JeanPaul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in twodimensional graphical form. ➘ “Principal Component Analysis” 
Cortana Analytics  Cortana Analytics is a fully managed big data and advanced analytics suite that enables you to transform your data into intelligent action. 
CortexNet  In the past five years we have observed the rise of incredibly well performing feedforward neural networks trained supervisedly for vision related tasks. These models have achieved superhuman performance on object recognition, localisation, and detection in still images. However, there is a need to identify the best strategy to employ these networks with temporal visual inputs and obtain a robust and stable representation of video data. Inspired by the human visual system, we propose a deep neural network family, CortexNet, which features not only bottomup feedforward connections, but also it models the abundant topdown feedback and lateral connections, which are present in our visual cortex. We introduce two training schemes – the unsupervised MatchNet and weakly supervised TempoNet modes – where a network learns how to correctly anticipate a subsequent frame in a video clip or the identity of its predominant subject, by learning egomotion clues and how to automatically track several objects in the current scene. Find the project website at https://…/. 
Cosine Distance  ➘ “Cosine Similarity” 
Cosine Similarity  Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of 1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in. Note that these bounds apply for any number of dimensions, and cosine similarity is most commonly used in highdimensional positive spaces. For example, in information retrieval and text mining, each term is notionally assigned a different dimension and a document is characterised by a vector where the value of each dimension corresponds to the number of times that term appears in the document. Cosine similarity then gives a useful measure of how similar two documents are likely to be in terms of their subject matter. The technique is also used to measure cohesion within clusters in the field of data mining. 
Cosinor Analysis  Cosinor analysis uses the least squares method to fit a sine wave to a time series. Cosinor analysis is often used in the analysis of biologic time series that demonstrate predictible rhythms. This method can be used with an unequally spaced time series. 
Counterfactual Fairness  Machine learning has matured to the point to where it is now being considered to automate decisions in loan lending, employee hiring, and predictive policing. In many of these scenarios however, previous decisions have been made that are unfairly biased against certain subpopulations (e.g., those of a particular race, gender, or sexual orientation). Because this past data is often biased, machine learning predictors must account for this to avoid perpetuating discriminatory practices (or incidentally making new ones). In this paper, we develop a framework for modeling fairness in any dataset using tools from counterfactual inference. We propose a definition called counterfactual fairness that captures the intuition that a decision is fair towards an individual if it gives the same predictions in (a) the observed world and (b) a world where the individual had always belonged to a different demographic group, other background causes of the outcome being equal. We demonstrate our framework on two realworld problems: fair prediction of law school success, and fair modeling of an individual’s criminality in policing data. 
Counterfactual Inference  
CountMin Sketch  In computing, the countmin sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sublinear space, at the expense of overcounting some events due to collisions. The countmin sketch was invented in 2003 by Graham Countmin sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets. Spectral Bloom filters with multiset policy are conceptually isomorphic to the countmin sketch. 
Coupled Sparse Asymmetric Least Squares (COSALES) 
SALES 
Covariance Matrix Adaptation Evolution Strategy (CMAES) 
CMAES stands for Covariance Matrix Adaptation Evolution Strategy. Evolution strategies (ES) are stochastic, derivativefree methods for numerical optimization of nonlinear or nonconvex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value f(x). Like this, over the generation sequence, individuals with better and better fvalues are generated. In an evolution strategy, new candidate solutions are sampled according to a multivariate normal distribution in the R^n. Recombination amounts to selecting a new mean value for the distribution. Mutation amounts to adding a random vector, a perturbation with zero mean. Pairwise dependencies between the variables in the distribution are represented by a covariance matrix. The covariance matrix adaptation (CMA) is a method to update the covariance matrix of this distribution. This is particularly useful, if the function f is illconditioned. Adaptation of the covariance matrix amounts to learning a second order model of the underlying objective function similar to the approximation of the inverse Hessian matrix in the QuasiNewton method in classical optimization. In contrast to most classical methods, fewer assumptions on the nature of the underlying objective function are made. Only the ranking between candidate solutions is exploited for learning the sample distribution and neither derivatives nor even the function values themselves are required by the method. 
Covariate Balancing Propensity Score (CBPS) 
The propensity score plays a central role in a variety of causal inference settings. In particular, matching and weighting methods based on the estimated propensity score have become increasingly common in observational studies. Despite their popularity and theoretical appeal, the main practical difficulty of these methods is that the propensity score must be estimated. Researchers have found that slight misspecification of the propensity score model can result in substantial bias of estimated treatment effects. In this paper, we introduce covariate balancing propensity score (CBPS) methodology, which models treatment assignment while optimizing the covariate balance. This is done by exploiting the dual characteristics of the propensity score as a covariate balancing score and the conditional probability of treatment assignment. The estimation of the CBPS is done within the generalized method of moments or empirical likelihood framework. We find that the CBPS dramatically improves the poor empirical performance of propensity score matching and weighting methods reported in the literature. We also show that the CBPS can be extended to a number of other important settings, including the estimation of the generalized propensity score for nonbinary treatments and the generalization of experimental estimates to a target population. Opensource software is available for implementing the proposed methods. 
Coverage Probability  In statistics, the coverage probability of a confidence interval is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of months that people with a particular type of cancer remain in remission following successful treatment with chemotherapy. The confidence interval aims to contain the unknown mean remission duration with a given probability. This is the “confidence level” or “confidence coefficient” of the constructed interval which is effectively the “nominal coverage probability” of the procedure for constructing confidence intervals. The “nominal coverage probability” is often set at 0.95. The coverage probability is the actual probability that the interval contains the true mean remission duration in this example. 
Cox ProportionalHazards Regression  Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. The method represents the effects of explanatory variables as a multiplier of a common baseline hazard function, h0(t). The hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. 
Cox Regression  The term Cox regression model (omitting proportional hazards) is sometimes used to describe the extension of the Cox model to include timedependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model. 
Coxcomb Plot / Polar Area Diagram  The polar area diagram is similar to a usual pie chart, except sectors are equal angles and differ rather in how far each sector extends from the center of the circle. The polar area diagram is used to plot cyclic phenomena (e.g., count of deaths by month). For example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors (one per month) all with the same angle of 30 degrees each. The radius of each sector would be proportional to the square root of the death count for the month, so the area of a sector represents the number of deaths in a month. If the death count in each month is subdivided by cause of death, it is possible to make multiple comparisons on one diagram, as is seen in the polar area diagram famously developed by Florence Nightingale. 
Credible Interval  In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis; Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. For example, in an experiment that determines the uncertainty distribution of parameter t, if the probability that t lies between 35 and 45 is 0.95, then 35 <= t <= 45 is a 95% credible interval. 
Credible Interval / Credibility Interval  In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics. For example, in an experiment that determines the uncertainty distribution of parameter , if the probability that lies between 35 and 45 is 0.95, then is a 95% credible interval. 
Critical Line Algorithm (CLA) 
The critical line method developed by the Nobel Prize winner H. Markowitz is a classical technique for the construction of a minimumvariance frontier within the paradigm of ‘the expected returnrisk’ (meanvariance) and finding minimum portfolios. Considerable interest has recently been attracted to the development of a fast algorithm for the construction of the minimumvariance frontier. In some works, such algorithms have been used to find statistically stable optimal portfoli.o An OpenSource Implementation of the CriticalLine Algorithm for Portfolio Optimization The Constrained Critical Line Algorithm The Critical Line Method Applying Markowitz’s Critical Line Algorithm 
Cross Entropy  In information theory, the cross entropy between two probability distributions over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an ‘unnatural’ probability distribution q, rather than the ‘true’ distribution p. 
Cross Industry Standard Process for Data Mining (CRISPDM) 
CRISPDM stands for Cross Industry Standard Process for Data Mining. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners. The only other data mining standard named in these polls was SEMMA. However, 34 times as many people reported using CRISPDM. A review and critique of data mining process models in 2009 called the CRISPDM the “de facto standard for developing data mining and knowledge discovery projects.” Other reviews of CRISPDM and data mining process models include Kurgan and Musilek’s 2006 review, and Azevedo and Santos’ 2008 comparison of CRISPDM and SEMMA. 
Cross Validation  Crossvalidation, sometimes called rotation estimation, is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is worth highlighting that in a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). The goal of cross validation is to define a dataset to “test” the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc. 
CrossCat  CrossCat is a domaingeneral, Bayesian method for analyzing highdimensional data tables. CrossCat estimates the full joint distribution over the variables in the table from the data, via approximate inference in a hierarchical, nonparametric Bayesian model, and provides efficient samplers for every conditional distribution. CrossCat combines strengths of nonparametric mixture modeling and Bayesian network structure learning: it can model any joint distribution given enough data by positing latent variables, but also discovers independencies between the observable variables. A range of exploratory analysis and predictive modeling tasks can be addressed via CrossCat, including detecting predictive relationships between variables, finding multiple overlapping clusterings, imputing missing values, and simultaneously selecting features and classifying rows. Research on CrossCat has shown that it is suitable for analysis of realworld tables of up to 10 million cells, including hospital cost and quality measures, voting records, handwritten digits, and statelevel unemployment time series. 
CrossEntropy Clustering  We build a general and easily applicable clustering theory, which we call crossentropy clustering (shortly CEC), which joins the advantages of classical kmeans (easy implementation and speed) with those of EM (a ne invariance and ability to adapt to clusters of desired shapes). Moreover, contrary to kmeans and EM, CEC nds the optimal number of clusters by automatically removing groups which have negative information cost. Although CEC, like EM, can be build on an arbitrary family of densities, in the most important case of Gaussian CEC the division into clusters is a ne invariant. <a href="'Introduction” target=”top”>http://…/1508.04559v1 
CrossLingual Text Classification (CLTC) 
Crosslingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. 
CrossNets  We propose a novel neural network structure called CrossNets, which considers architectures on directed acyclic graphs. This structure builds on previous generalizations of feed forward models, such as ResNets, by allowing for all forward cross connections between layers (both adjacent and nonadjacent). The addition of cross connections among the network increases information flow across the whole network, leading to better training and testing performances. The superior performance of the network is tested against four benchmark datasets: MNIST, CIFAR10, CIFAR100, and SVHN. We conclude with a proof of convergence for Crossnets to a local minimum for error, where weights for connections are chosen through backpropagation with momentum. 
CrossOver Design  In randomized trials, a crossover design is one in which each subject receives each treatment, in succession. For example, subject 1 first receives treatment A, then treatment B, then treatment C. Subject 2 might receive treatment B, then treatment A, then treatment C. A crossover design has the advantage of eliminating individual subject differences from the overall treatment effect, thus enhancing statistical power. On the other hand, it is important in a crossover study that the underlying condition (say, a disease) not change over time, and that the effects of one treatment disappear before the next is applied. 
Croston Method  It is easier to predict demand when there’s a pattern. But in case of irregular or intermittent demand the simple technique of smoothing does not work. Croston’s forecasting method is a standard approach to deal with intermittent demand. It detects the cyclic pattern of demand and divides the period into 2 time series: 1. Zero demand values 2. Non zero demand values Then demand smoothing is used on both time series separately and demand is forecasted. 
Cubist  Cubist is a powerful tool for generating rulebased models that balance the need for accurate prediction against the requirements of intelligibility. Cubist models generally give better results than those produced by simple techniques such as multivariate linear regression, while also being easier to understand than neural networks. 
Cubist Model  Cubist is a rule{based model that is an extension of Quinlan’s M5 model tree. A tree is grown where the terminal leaves contain linear regression models. These models are based on the predictors used in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction is made using the linear regression model at the terminal node of the tree, but is \smoothed’ by taking into account the prediction from the linear model in the previous node of the tree (which also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for simpli cation. This is explained better in Quinlan (1992). Wang and Witten (1997) attempted to recreate this model using a \rational reconstruction’ of Quinlan (1992) that is the basis for the M5P model in Weka (and the R package RWeka). http://…i=10.1.1.34.885&rep=rep1&type=pdf https://…/ensemblelearningwithcubistmodel 
CUDA Deep Neural Network library (cuDNN) 
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPUaccelerated library of primitives for deep neural networks. It emphasizes performance, easeofuse, and low memory overhead. cuDNN is designed to be integrated into higherlevel machine learning frameworks, such as the popular Caffe, Theano, or Torch software frameworks. The simple, dropin design allows developers to focus on designing and implementing neural net models rather than tuning for performance, while still achieving the high performance delivered by modern parallel computing hardware. If you are a data scientist looking for an interactive deep learning training system, check out the new NVIDIA DIGITS solution, which automatically uses cuDNN. cuDNN is freely available to CUDA Registered Developers. As a registered developer you can download the latest version of cuDNN, access the support forum and file bug reports. 
CUDA RecurREnt Neural Network Toolkit (CURRENNT) 
In this article, we introduce CURRENNT, an opensource parallel implementation of deep recurrent neural networks (RNNs) supporting graphics processing units (GPUs) through NVIDIA’s Computed Uni ed Device Architecture (CUDA). CURRENNT supports uni and bidirectional RNNs with Long ShortTerm Memory (LSTM) memory cells which overcome the vanishing gradient problem. To our knowledge, CURRENNT is the rst publicly available parallel implementation of deep LSTMRNNs. Benchmarks are given on a noisy speech recognition task from the 2013 2nd CHiME Speech Separation and Recognition Challenge, where LSTMRNNs have been shown to deliver best performance. In the result, double digit speedups in bidirectional LSTM training are achieved with respect to a reference singlethreaded CPU implementation. CURRENNT is available under the GNU General Public License. http://…/currennt. 
Cumulative Distribution Function (CDF) 
In probability theory and statistics, the cumulative distribution function (CDF), or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found to have a value less than or equal to x. In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are also used to specify the distribution of multivariate random variables. 
Cumulative Gains Model Quality Metric  In developing risk models, developers employ a number of graphical and numerical tools to evaluate the quality of candidate models. These traditionally involve numerous measures including the KS statistic or one of many Area Under the Curve (AUC) methodologies on ROC and cumulative Gains charts. Typical employment of these methodologies involves one of two scenarios. The first is as a tool to evaluate one or more models and ascertain the effectiveness of that model. Second however is the inclusion of such a metric in the model building process itself such as the way Ferri et al. proposed to use Area Under the ROC curve in the splitting criterion of a decision tree. However, these methods fail to address situations involving competing models where one model is not strictly above the other. Nor do they address differing values of end points as the magnitudes of these typical measures may vary depending on target definition making standardization difficult. Some of these problems are starting to be addressed. Marcade Chief Technology officer of the software vendor KXEN gives an overview of several metric techniques and proposes a new solution to the problem in data mining techniques. Their software uses two statistics called KI and KR. We will examine the shortfalls he addresses more thoroughly and propose a new metric which can be used as an improvement to the KI and KR statistics. Although useful in a machine learning sense of developing a model, these same issues and solutions apply to evaluating a single model’s performance as related by Siddiqi and Mays with respect to risk scorecards. We will not specifically give examples of each application of the new statistics but rather make the claim that it is useful in most situations where an AUC or model separation statistic (such as KS) is used. 
Cumulative Sum Control Chart (CUSUM) 
In statistical quality control, the CUSUM (or cumulative sum control chart) is a sequential analysis technique developed by E. S. Page of the University of Cambridge. It is typically used for monitoring change detection. CUSUM was announced in Biometrika, in 1954, a few years after the publication of Wald’s SPRT algorithm. Page referred to a ‘quality number’ \theta, by which he meant a parameter of the probability distribution; for example, the mean. He devised CUSUM as a method to determine changes in it, and proposed a criterion for deciding when to take corrective action. When the CUSUM method is applied to changes in mean, it can be used for step detection of a time series. A few years later, George Alfred Barnard developed a visualization method, the Vmask chart, to detect both increases and decreases in \theta. 
Curated Taxonomy  Curated Taxonomy: Topdown hierarchy of topics, where each node is assigned specific positive and negative vocabulary rules. For example, the topic “RVs” would have positive vocabularies including ‘recreational vehicles’, ‘motor homes’ and ‘travel trailers’. The topic “Shampoo” would have negative vocabularies including ‘carpet’, ‘dogs’, and ‘Warren Beatty’. 
Curriculum Accelerated SelfSupervised Learning (CASSL) 
Recent selfsupervised learning approaches focus on using a few thousand data points to learn policies for highlevel, lowdimensional action spaces. However, scaling this framework for highdimensional control require either scaling up the data collection efforts or using a clever sampling strategy for training. We present a novel approach – Curriculum Accelerated SelfSupervised Learning (CASSL) – to train policies that map visual information to highlevel, higher dimensional action spaces. CASSL orders the sampling of training data based on control dimensions: the learning and sampling are focused on few control parameters before other parameters. The right curriculum for learning is suggested by variancebased global sensitivity analysis of the control space. We apply our CASSL framework to learning how to grasp using an adaptive, underactuated multifingered gripper, a challenging system to control. Our experimental results indicate that CASSL provides significant improvement and generalization compared to baseline methods such as staged curriculum learning (8% increase) and complete endtoend learning with random exploration (14% improvement) tested on a set of novel objects. 
Curse of Dimensionality  There are multiple phenomena referred to by this name in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data however all objects appear to be sparse and dissimilar in many ways which prevents common data organization strategies from being efficient. 
Custom Grouper User Language (CGUL) 
Custom Grouper User Language (CGUL) is a sentencebased language that enables you to perform pattern matching using character or tokenbased regular expressions combined with linguistic attributes to define custom entity types. Working with CGUL can be very challenging. http://…/hana_options_adp (SAP HANA Text Analysis Extraction Customization Guide) 
Customer Acquisition Cost (CAC) 
Customer Acquisition Cost is the cost associated in convincing a customer to buy a product/service. This cost is incurred by the organization to convince a potential customer. This cost is inclusive of the product cost as well as the cost involved in research, marketing, and accessibility costs. This is an important business metric. It plays a major role in calculating the value of the customer to the company and the resulting return on investment (ROI) of acquisition. The calculation of customer valuation helps a company decide how much of its resources can be profitably spent on a particular customer. In general terms, it helps to decide the worth of the customer to the company. Customer Acquisition Cost (abbreviated to CAC) refers to the resources that a business must allocate (financial or otherwise) in order to acquire an additional customer. Numerically, customer acquisition cost is typically expressed as a ratio — dividing the sum total of CAC by the number of additional patrons acquired by the business as a result of the customer acquisition strategy. 
Customer Experience Analytics (CEA) 
➘ “Customer Experience Management” 
Customer Experience Management (CEM) 
Customer experience management (CEM or CXM) is the process that companies use to oversee and track all interactions with a customer during the duration of their relationship. This involves the strategy of building around the needs of individual customers. According to Jeananne Rae, companies are realizing that ‘building great consumer experiences is a complex enterprise, involving strategy, integration of technology, orchestrating business models, brand management and CEO commitment.’ 
Customer LifeTime Value (CLTV,CLV) 
In marketing, customer lifetime value (CLV) (or often CLTV), lifetime customer value (LCV), or user lifetime value (LTV) is a prediction of the net profit attributed to the entire future relationship with a customer. The prediction model can have varying levels of sophistication and accuracy, ranging from a crude heuristic to the use of complex predictive analytics techniques. Customer lifetime value (CLV) can also be defined as the dollar value of a customer relationship, based on the present value of the projected future cash flows from the customer relationship. Customer lifetime value is an important concept in that it encourages firms to shift their focus from quarterly profits to the longterm health of their customer relationships. Customer lifetime value is an important number because it represents an upper limit on spending to acquire new customers. For this reason it is an important element in calculating payback of advertising spent in marketing mix modeling. One of the first accounts of the term Customer Lifetime Value is in the 1988 book Database Marketing, which includes detailed worked examples. Early adopters of Customer Lifetime Value models in the 1990s include Edge Consulting and BrandScience. 
Customer Segmentation  The act of separating a group of clients into sets of similar individuals that are related from a marketing or demographicperspective. For example, a business that practices customer segmentation might group its current or potential customers according to their gender, buying tendencies, age group, and special interests. 
Cypher Query Language (CQL) 
Cypher is a declarative graph query language for the (open source) graph database Neo4j that allows for expressive and efficient querying and updating of the graph store. Cypher is a relatively simple but still very powerful language. Very complicated database queries can easily be expressed through Cypher. This allows you to focus on your domain instead of getting lost in database access. 
Cython  The Cython programming language is a superset of Python with a foreign function interface for invoking C/C++ routines and the ability to declare the static type of subroutine parameters and results, local variables, and class attributes. It actually is a Python to C source code translator that integrates with the CPython interpreter on a lowlevel. http://cython.org http://…/9781491901557 runcython 
Advertisements