C Math Library(CML) ➘ “C Numerical Library” C Numerical Library(CNL) The IMSL C Numerical Library provides advanced mathematical and statistical functionality for programmers to embed in their existing or new applications. Written in standard C, the IMSL C Library can be embedded into any C or C++ application as well as any existing application that can reference a C library. C4.5 C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. Cabinet Tree Treemaps are well-known for visualizing hierarchical data. Most related approaches have been focused on layout algorithms and paid little attention to other display properties and interactions. Furthermore, the structural information in conventional Treemaps is too implicit for viewers to perceive. This paper presents Cabinet Tree, an approach that: i) draws branches explicitly to show relational structures, ii) adapts a space-optimized layout for leaves and maximizes the space utilization, iii) uses coloring and labeling strategies to clearly reveal patterns and contrast different attributes intuitively. We also apply the continuous node selection and detail window techniques to support user interaction with different levels of the hierarchies. Our quantitative evaluations demonstrate that Cabinet Tree achieves good scalability for increased resolutions and big datasets. CacheDiff We present a sampling method called, CacheDiff, that has both time and space complexity of O(k) to randomly select k items from a pool of N items, in which N is known. Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license. http://…/neural-networks-with-caffe-on-the-gpu Github Cannistrai-Alanis-Ravai Index(CAR) Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed CAR (Cannistrai-Alanis-Ravai) index shows the power of local link/triangle information in improving link-prediction accuracy. Canonical Correlation Analysis(CCA,CANCOR) In statistics, canonical-correlation analysis (CCA) is a way of making sense of cross-covariance matrices. If we have two vectors X = (X1, …, Xn) and Y = (Y1, …, Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of the Xi and Yj which have maximum correlation with each other. T. R. Knapp notes ‘virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables.’ Stochastic Approximation for Canonical Correlation Analysis Canonical Correspondence Analysis(CCA) In applied statistics, canonical correspondence analysis (CCA) is a multivariate constrained ordination technique that extracts major gradients among combinations of explanatory variables in a dataset. The requirements of a CCA are that the samples are random and independent and that the independent variables are consistent within the sample site and error-free. Canonical Divergence Analysis(CDA) We aim to analyze the relation between two random vectors that may potentially have both different number of attributes as well as realizations, and which may even not have a joint distribution. This problem arises in many practical domains, including biology and architecture. Existing techniques assume the vectors to have the same domain or to be jointly distributed, and hence are not applicable. To address this, we propose Canonical Divergence Analysis (CDA). Canonical Variate Regression(CVR) CVR Canopy Clustering Algorithm The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the K-means algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. The algorithm proceeds as follows, using two thresholds T_1 (the loose distance) and T_2 (the tight distance), where T_1 > T_2 . 1. Begin with the set of data points to be clustered. 2. Remove a point from the set, beginning a new ‘canopy’. 3. For each point left in the set, assign it to the new canopy if the distance less than the loose distance T_1. 4. If the distance of the point is additionally less than the tight distance T_2, remove it from the original set. 5. Repeat from step 2 until there are no more data points in the set to cluster. 6. These relatively cheaply clustered canopies can be sub-clustered using a more expensive but accurate algorithm. An important note is that individual data points may be part of several canopies. As an additional speed-up, an approximate and fast distance metric can be used for 3, where a more accurate and slow distance metric can be used for step 4. Since the algorithm uses distance functions and requires the specification of distance thresholds, its applicability for high-dimensional data is limited by the curse of dimensionality. Only when a cheap and approximative – low-dimensional – distance function is available, the produced canopies will preserve the clusters produced by K-means. CAP-Theorem(Brewer’s theorem) In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it was successful or failed) • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) Capture-Mark-Recapture Analysis Mark and recapture is a method commonly used in ecology to estimate an animal population’s size. A portion of the population is captured, marked, and released. Later, another portion is captured and the number of marked individuals within the sample is counted. Since the number of marked individuals within the second sample should be proportional to the number of marked individuals in the whole population, an estimate of the total population size can be obtained by dividing the number of marked individuals by the proportion of marked individuals in the second sample. The method is most useful when it is not practical to count all the individuals in the population. Other names for this method, or closely related methods, include capture-recapture, capture-mark-recapture, mark-recapture, sight-resight, mark-release-recapture, multiple systems estimation, band recovery, the Petersen method and the Lincoln method. Another major application for these methods is in epidemiology, where they are used to estimate the completeness of ascertainment of disease registers. Typical applications include estimating the number of people needing particular services (i.e. services for children with learning disabilities, services for medically frail elderly living in the community), or with particular conditions(i.e. illegal drug addicts, people infected with HIV, etc.). Cartogram A cartogram is a map in which some thematic mapping variable – such as travel time, population, or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable. There are two main types of cartograms: area and distance cartograms. Cartograms have a fairly long history, with examples from the mid-1800s. Case-Based Reasoning(CBR) Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using case-based reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry), is treating nature as a database of solutions to problems. Case-based reasoning is a prominent kind of analogy making. Case-Control Study A case-control study is a type of study design used widely, originally developed in epidemiology, although its use has also been advocated for the social sciences. It is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Case-control studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have that condition/disease (the “cases”) with patients who do not have the condition/disease but are otherwise similar (the “controls”). They require fewer resources but provide less evidence for causal inference than a randomized controlled trial. Catalan Number In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively-defined objects. They are named after the Belgian mathematician Eugène Charles Catalan (1814-1894). Modular Catalan Numbers Catastrophe Modeling Catastrophe modeling (also known as cat modeling) is the process of using computer-assisted calculations to estimate the losses that could be sustained due to a catastrophic event such as a hurricane or earthquake. Cat modeling is especially applicable to analyzing risks in the insurance industry and is at the confluence of actuarial science, engineering, meteorology, and seismology. Categorical Cross Entropy Categorical Response Model Causal Additive Model(CAM) We develop estimation for potentially high-dimensional additive structural equation models. A key component of our approach is to decouple order search among the variables from feature or edge selection in a directed acyclic graph encoding the causal structure. We show that the former can be done with nonregularized (restricted) maximum likelihood estimation while the latter can be efficiently addressed using sparse regression techniques. Thus, we substantially simplify the problem of structure search and estimation for an important class of causal models. We establish consistency of the (restricted) maximum likelihood estimator for low- and high-dimensional scenarios, and we also allow for misspecification of the error distribution. Furthermore, we develop an efficient computational algorithm which can deal with many variables, and the new method’s accuracy and performance is illustrated on simulated and real data. Causal Falling Rule List(CFRL) A causal falling rule list (CFRL) is a sequence of if-then rules that specifies heterogeneous treatment effects, where (i) the order of rules determines the treatment effect subgroup a subject belongs to, and (ii) the treatment effect decreases monotonically down the list. A given CFRL parameterizes a hierarchical bayesian regression model in which the treatment effects are incorporated as parameters, and assumed constant within model-specific subgroups. Causal Inference Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. http://…mp;uid=2&uid=4&sid=21104618644387 Causal Loglinear Model ➘ “Log-Linear Model” Causal Model A causal model is an abstract model that describes the causal mechanisms of a system. The model must express more than correlation because correlation does not imply causation. Judea Pearl defines a causal model as an ordered triple , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V. Causal Network A causal network is a Bayesian network with an explicit requirement that the relationships be causal. The additional semantics of the causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(X=x)), then the probability density function changes to the one of the network obtained by cutting the links from the parents of X to X, and setting X to the caused value x. Using these semantics, one can predict the impact of external interventions from data obtained prior to intervention. ➚ “Bayesian Network” Causal Prediction Cell Suppression Problem(CSP) Cell suppression is one of the most frequently used techniques to prevent the disclosure of sensitive data in statistical tables. Finding the minimum cost set of nonsensitive entries to suppress, along with the sensitive ones, in order to make a table safe for publication, is a NP-hard problem, denoted the cell suppression problem (CSP). Censored Time Series Analysis Imputation method in the presence of censored data. The main message of the imputation method is that we should account for the variability of the censored part of the data by mimicking the complete data. That is, we impute the incomplete part with a conditional random sample rather than the conditional expectation or certain constants. Simulation results suggest that the imputation method reduces the possible biases and has similar standard errors than those from complete data. Censoring In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual’s age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75. Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 300 pounds (140 kg). If a 350 lb (160 kg) individual is weighed using the scale, the observer would only know that the individual’s weight is at least 300 pounds (140 kg). The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown. Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range: values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding. Centered Autologistic Model The traditional autologistic model was proposed by Besag (1972). The model is a Markov random field (MRF) model (Kindermann and Snell, 1980) Cerioli Outlier Detection “Cerioli Outlier Dectection” is an iterated RMCD method of Cerioli (2010) for multivariate outlier detection via robust Mahalanobis distances. Chan-Darwiche Distance We propose a distance measure between two probability distributions, which allows one to bound the amount of belief change that occurs when moving from one distribution to another. We contrast the proposed measure with some well known measures, including KL-divergence, showing some theoretical properties on its ability to bound belief changes. We then present two practical applications of the proposed distance measure: sensitivity analysis in belief networks and probabilistic belief revision. We show how the distance measure can be easily computed in these applications, and then use it to bound global belief changes that result from either the perturbation of local conditional beliefs or the accommodation of soft evidence. Finally, we show that two well known techniques in sensitivity analysis and belief revision correspond to the minimization of our proposed distance measure and, hence, can be shown to be optimal from that viewpoint. Change Point Analysis(CPA) Change-point analysis is a powerful new tool for determining whether a change has taken place. It is capable of detecting subtle changes missed by control charts. Further, it better characterizes the changes detected by providing confidence levels and confidence intervals. When collecting online data, a change-point analysis is not a replacement for control charting. But, because a change-point analysis can provide further information, the two methods can be used in a complementary fashion. When analyzing historical data, especially when dealing with large data sets, change-point analysis is preferable to control charting. A change-point analysis is more powerful, better characterizes the changes, controls the overall error rate, is robust to outliers, is more flexible and is simpler to use. CPA aims at detecting any change in the mean of a process in historical data. Example questions to be answered by performing CPA: • Did a change occur? • Did more than one change occur? • When did the changes occur? • How confident are we that they are real changes? http://…/changepoint.html Change Point Detection In statistical analysis, change detection or change point detection tries to identify times when the probability distribution of a stochastic process or time series changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes. Specific applications, like step detection and edge detection, may be concerned with changes in the mean, variance, correlation, or spectral density of the process. More generally change detection also includes the detection of anomalous behavior: anomaly detection. Change-Point Detection Procedure via VIF Regression(VIFCP) Chaos Monkey Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on non-holiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don’t, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond. Charged String Tensor Networks Tensor network methods provide an intuitive graphical language to describe quantum states, channels, open quantum systems and a class of numerical approximation methods that efficiently simulate certain many-body states in one spatial dimension. There are two fundamental types of tensor networks in wide use today. The most common is similar to quantum circuits. The second is the braided class of tensor networks, used in topological quantum computing. Recently a third class of tensor networks was discovered by Jaffe, Liu and Wozniakowski—the JLW-model—notably, the wires carry charge excitations. The rules in which network components can be moved, merged and manipulated in a graphical form of reasoning take an elegant form. For instance the relative charge locations on wires carries precise meaning and changing the ordering modifies a connected network specifically by a complex number. The type of isotopy discovered in the topological JLW-model provides an alternative means to reason about quantum information, computation and protocols. Here we recall the tensor-network building blocks used in a controlled-NOT gate. Some open problems related to the JLW-model are given. Charikar’s Algorithm To detect near-duplicates this software uses the Charikar’s fingerprinting technique, this means characterizing each document with a unique 64-bit vector, like a fingerprint. To determine whether two documents are Near-duplicates, we have to compare their fingerprints. To do this we use two algorithms, the algorithm developed by Moses Charikar and the Hamming distance algorithm, which allows us to measure the similarity between two vectors of n bits. What is Charikar’s algorithm? • Characterization of the document • Apply hash functions to the characteristics • Obtain fingerprint • Apply vector comparison function: Are (Doc1, doc2) near-duplicate? Hamming-distance (fingerprint (doc1), fingerprint (doc2)) = k GitXiv Chernoff Faces Chernoff faces, invented by Herman Chernoff, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty. Chernoff faces handle each variable differently. Because the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen (e.g. eye size and eyebrow-slant have been found to carry significant weight). Chinese Restaurant Process In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 is seated at an unoccupied table with probability 1. At time n + 1, a new customer chooses uniformly at random to sit at one of the following n + 1 places: directly to the left of one of the n customers already sitting at an occupied table, or at a new, unoccupied table. David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book. At time n, the value of the process is a partition of the set of n customers, where the tables are the blocks of the partition. Mathematicians are interested in the probability distribution of this random partition. Chi-Square Test A chi-squared test, also referred to as test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough. The chi-square (I) test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference? CHi-squared Automatic Interaction Detection(CHAID) CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detection, based upon a formal extension of the US AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) procedures of the 1960s and 70s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s. In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research. Like other decision trees, CHAID’s advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis. One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric. Choice Modeling Choice modelling attempts to model the decision process of an individual or segment in a particular context. Choice modelling may be used to estimate non-market environmental benefits and costs. Many alternative models exist in econometrics, marketing, sociometrics and other fields, including utility maximization, optimization applied to consumer theory, and a plethora of other identification strategies which may be more or less accurate depending on the data, sample, hypothesis and the particular decision being modelled. In addition Choice Modelling is regarded as the most suitable method for estimating consumers’ willingness to pay for quality improvements in multiple dimensions. Neuroscience Suggests Choice Model Misspecification Cholesky Decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, useful for efficient numerical solutions and Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. Chopthin Resampler Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. A simulation study shows that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented very efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN) and for Matlab are available. chopthin Choquet Integral A Choquet integral is a subadditive or superadditive integral created by the French mathematician Gustave Choquet in 1953. It was initially used in statistical mechanics and potential theory, but found its way into decision theory in the 1980s, where it is used as a way of measuring the expected utility of an uncertain event. It is applied specifically to membership functions and capacities. In imprecise probability theory, the Choquet integral is also used to calculate the lower expectation induced by a 2-monotone lower probability, or the upper expectation induced by a 2-alternating upper probability. Using the Choquet integral to denote the expected utility of belief functions measured with capacities is a way to reconcile the Ellsberg paradox and the Allais paradox. http://…/Ayub_Khan_2009.pdf Choropleth Map A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. A special type of choropleth map is a prism map, a three-dimensional map in which a given region’s height on the map is proportional to the statistical variable’s value for that region. Chow-Liu Tree In probability theory and statistics Chow-Liu tree is an efficient method for constructing a second-order product approximation of a joint probability distribution, first described in a paper by Chow & Liu (1968). The goals of such a decomposition, as with such Bayesian networks in general, may be either data compression or inference. Structure Learning in Bayesian Networks Christoffel Function Chronohorogram Circular Plot / Circos Circos is a software package for visualizing data and information. It visualizes data in a circular layout – this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive. Circular Statistics ➘ “Directional Statistics” Classical Test Theory(CTT) Classical test theory is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological tests. Classical test theory may be regarded as roughly synonymous with true score theory. The term ‘classical’ refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as item response theory, which sometimes bear the appellation ‘modern’ as in ‘modern latent trait theory’. Classical test theory as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick (1968) and Allen & Yen (1979/2002). The description of classical test theory below follows these seminal publications. Classification Accuracy(CA) In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results. Although the two words precision and accuracy can be synonymous in colloquial use, they are deliberately contrasted in the context of the scientific method. A measurement system can be accurate but not precise, precise but not accurate, neither, or both. For example, if an experiment contains a systematic error, then increasing the sample size generally increases precision but does not improve accuracy. The result would be a consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic error improves accuracy but does not change precision. A measurement system is considered valid if it is both accurate and precise. Related terms include bias (non-random or directed effects caused by a factor or factors unrelated to the independent variable) and error (random variability). The terminology is also applied to indirect measurements – that is, values obtained by a computational procedure from observed data. In addition to accuracy and precision, measurements may also have a measurement resolution, which is the smallest change in the underlying physical quantity that produces a response in the measurement. In numerical analysis, accuracy is also the nearness of a calculation to the true value; while precision is the resolution of the representation, typically defined by the number of decimal or binary digits. http://…/accuracy.htm Classification Based on Associations(CBA) Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. In this paper, we propose to integrate these two mining techniques. The integration is done by focusing on mining a special subset of association rules, called class association rules (CARs). An efficient algorithm is also given for building a classifier based on the set of discovered CARs. Experimental results show that the classifier built this way is, in general, more accurate than that produced by the state-of-the-art classification system C4.5. In addition, this integration helps to solve a number of problems that exist in the current classification systems. rCBA Classification Rule Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements of the population set are each assigned to one of the classes. A perfect test is such that every element in the population is assigned to the class it really belongs. An imperfect test is such that some errors appear, and then statistical analysis must be applied to analyse the classification. Cleverhans cleverhans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models’ performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. Clickstream Analytics A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing or using another software application. As the user clicks anywhere in the webpage or application, the action is logged on a client or inside the web server, as well as possibly the web browser, router, proxy server or ad server. Clickstream analysis is useful for web activity analysis, software testing, market research, and for analyzing employee productivity. Click-Through Rate(CTR) Click-through rate (CTR) is a way of measuring the success of an online advertising campaign for a particular website as well as the effectiveness of an email campaign by the number of users that clicked on a specific link. Clipper Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment. In this paper, we introduce Clipper, the first general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the TensorFlow Serving system and demonstrate comparable prediction throughput and latency on a range of models while enabling new functionality, improved accuracy, and robustness. Cloud Data The Difference Between Big Data and Cloud Data: New technologies are required for the emergence and standardization of cloud data to take hold. Big data was meant as a holding cell for large amounts of data that could be sorted effectively only by specialized data scientists (this is becoming easier with OLAP on Hadoop type tools). The protocols for big data rely upon simple, standard protocols and can’t be adjusted easily to meet the demands of complex operations. Big data takes time to sort through and analyze, whereas cloud data is immediate and happens in the background using the tremendous resources of cloud servers. Cloud data requires a significantly higher number of resources since it must connect to databases in several geographically distributed services. Since cloud data must flexibly interact with several unique interfaces and security models, the mechanisms used for big data won’t work for cloud data. C-LSTM Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification. C-LSTM utilizes CNN to extract a sequence of higher-level phrase representations, and are fed into a long short-term memory recurrent neural network (LSTM) to obtain the sentence representation. C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the C-LSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks. Cluster Validation There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; ‘relative cluster validation’ is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. Clustered Latent Dirichlet Allocation(CLDA) The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and meta-information simultaneously. Here, we are proposing an efficient, scalable feature extraction algorithm, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features. The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied non-parametric hypothesis tests. We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics. Clustering / Cluster Analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties. Clustering Using REpresentatives(CURE) CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases that is more robust to outliers and identifies clusters having non-spherical shapes and wide variances in size. Clustering Validation Indices The purpose of clustering is to determine the intrinsic grouping in a set of unlabeled data, where the objects in each group are indistinguishable under some criterion of similarity. Clustering is an unsupervised classification process fundamental to data mining (one of the most important tasks in data analysis). It has applications in several fields like bioinformatics, web data analysis, text mining and scientific data exploration. Clustering refers to unsupervised learning and, for that reason it has no a priori data set information. However, to get good results, the clustering algorithm depends on input parameters. For instance, k-means and CURE algorithms require a number of clusters (k) to be created. In this sense, the question is: What is the optimal number of clusters? Currently, cluster validity indexes research has drawn attention as a means to give a solution. Many different cluster validity methods have been proposed without any a priori class information. Clustering validation is a technique to find a set of clusters that best fits natural partitions (number of clusters) without any class information. Generally speaking, there are two types of clustering techniques, which are based on external criteria and internal criteria. • External validation: Based on previous knowledge about data. • Internal validation: Based on the information intrinsic to the data alone. If we consider these two types of cluster validation to determine the correct number of groups from a dataset, one option is to use external validation indexes for which a priori knowledge of dataset information is required, but it is hard to say if they can be used in real problems (usually, real problems do not have prior information of the dataset in question). Another option is to use internal validity indexes which do not require a priori information from dataset. Cluster-Wise Linear Regression(CLR) Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. CN2 Induction Algorithm The CN2 induction algorithm is a learning algorithm for rule induction. It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3. Cochran-Mantel-Haenszel Statistics In statistics, the Cochran-Mantel-Haenszel statistics are a collection of test statistics used in the analysis of stratified categorical data. They are named after William G. Cochran, Nathan Mantel and William Haenszel. One of these test statistics is the Cochran-Mantel-Haenszel (CMH) test, which allows the comparison of two groups on a dichotomous/categorical response. It is used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but influencing covariates can. In the CMH test, the data are arranged in a series of associated 2 × 2 contingency tables, the null hypothesis is that the observed response is independent of the treatment used in any 2 × 2 contingency table. The CMH test’s use of associated 2 × 2 contingency tables increases the ability of the test to detect associations (the power of the test is increased). sensitivity2x2xk Coded TeraSort We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97x – 3.39x speedup, compared with TeraSort, for typical settings of interest. CoDeepNEAT The success of deep learning depends on finding an architecture to fit the task. As deep learning has scaled up to more challenging tasks, the architectures have become difficult to design by hand. This paper proposes an automated method, CoDeepNEAT, for optimizing deep learning architectures through evolution. By extending existing neuroevolution methods to topology, components, and hyperparameters, this method achieves results comparable to best human designs in standard benchmarks in object recognition and language modeling. It also supports building a real-world application of automated image captioning on a magazine website. Given the anticipated increases in available computing power, evolution of deep networks is promising approach to constructing deep learning applications in the future. Coefficient of Variation In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation (RSD), which is expressed as a percentage. CoffeeScript CoffeeScript is a little language that compiles into JavaScript. Underneath that awkward Java-esque patina, JavaScript has always had a gorgeous heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way. The golden rule of CoffeeScript is: “It’s just JavaScript”. The code compiles one-to-one into the equivalent JS, and there is no interpretation at runtime. You can use any existing JavaScript library seamlessly from CoffeeScript (and vice-versa). The compiled output is readable and pretty-printed, will work in every JavaScript runtime, and tends to run as fast or faster than the equivalent handwritten JavaScript. Cognitive Analytics Cognitive Analytics: A hybrid of several disparate disciplines, methods, and practical technologies. Cognitive Architecture A cognitive architecture can refer to a theory about the structure of the human mind. One of the main goals of a cognitive architecture is to summarize the various results of cognitive psychology in a comprehensive computer model. However, the results need to be in a formalized form so far that they can be the basis of a computer program. The formalized models can be used to further refine a comprehensive theory of cognition, and more immediately, as a commercially usable model. Successful cognitive architectures include ACT-R (Adaptive Control of Thought, ACT), SOAR and OpenCog. Cognitive Bias Cognitive biases are tendencies to think in certain ways. Cognitive biases can lead to systematic deviations from a standard of rationality or good judgment, and are often studied in psychology and behavioral economics. Cognitive Computing Cognitive computing refers to the development of computer systems modeled after the human brain. Originally referred to as artificial intelligence, researchers began to use the modern term instead in the 1990s, to indicate that the science was designed to teach computers to think like a human mind, rather than developing an artificial system. This type of computing integrates technology and biology in an attempt to re-engineer the brain, one of the most efficient and effective computers on Earth. Cognitive computing is a way of processing data that is neither linear nor deterministic. It uses the ideas behind neuroscience and psychology to augment human reasoning with better pattern matching while determining the optimal information a person needs to make decisions. Cognitive computing is different than other forms of software. Instead of shepherding data through pre-determined pathways, it finds the previously unknown paths and patterns through the data. This is ultimately a more scalable model than relying on experts to synthesize data since there are too few experts of any sort available at any one time. Cognitive computing doesn’t try to fit data into an existing model; it looks at the data and figures out what the model is first. Cognitive Computing Cognitive Computing: Solving the Big Data Problem? Cohort Analysis Cohort analysis is a subset of behavioral analytics that takes the data from a given eCommerce platform, web application, or online game and rather than looking at all users as one unit, it breaks them into related groups for analysis. These related groups, or cohorts, usually share common characteristics or experiences within a defined timespan. Cohort analysis allows a company to ‘see patterns clearly across the lifecycle of a customer (or user), rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes.’ By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a cohort study, they are different and should not be viewed as one in the same. Cohort analysis has come to describe specifically the analysis of cohorts in regards to big data and business analytics, while a cohort study is a more general umbrella term that describes a type of study in which data is broken down into similar groups. Coincidence Analysis(CNA) CNA, a Boolean method of causal analysis presented in Baumgartner (2009a). CNA is a configurationl comparative method for the identification of complex causal dependencies—in particular, causal chains and common cause structures—in configurational data. CNA is related to QCA (Ragin 2008), but contrary to the latter does not minimize sufficient and necessary conditions by means of Quine- McCluskey optimization, but based on its own custom-built optimization algorithm. The latter greatly facilitates the analysis of data featuring chainlike causal dependencies among the conditions of an ultimate outcome. http://…/infer_c.pdf http://…/baumgartner-thiem.pdf cna Cointegration The term cointegration was defined by Granger (1983) as a formulation of the phenomenon that nonstationary processes can have linear combinations that are stationary. It was his investigations of the relation between cointegration and error correction that brought modelling of vector autoregressions with unit roots and cointegration to the center of attention in applied and theoretical econometrics; see Engle and Granger (1987). Cointegration is a statistical property of time series variables. Cointegration has become an important property in contemporary time series analysis. Time series often have trends – either deterministic or stochastic. In a seminal paper, Charles Nelson and Charles Plosser (1982) showed that most time series have stochastic trends – these are also called unit root processes, or processes integrated of order 1—I(1). http://…/Cointegration coLaboratory Project coLaboratory Project, a new tool for data science and analysis, designed to make collaborating on data easier. coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over ad-hoc workflows involving emailing documents back and forth. Collaborative Deep Learning(CDL) Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art. GitXiv Collaborative Deep Reinforcement Learning(CDRL) Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from different sources to assist the current learning goal. This collaborative learning procedure ensures that the knowledge is shared, continuously refined, and concluded from different perspectives to construct a more profound understanding. The idea of knowledge transfer has led to many advances in machine learning and data mining, but significant challenges remain, especially when it comes to reinforcement learning, heterogeneous model structures, and different learning tasks. Motivated by human collaborative learning, in this paper we propose a collaborative deep reinforcement learning (CDRL) framework that performs adaptive knowledge transfer among heterogeneous learning agents. Specifically, the proposed CDRL conducts a novel deep knowledge distillation method to address the heterogeneity among different learning tasks with a deep alignment network. Furthermore, we present an efficient collaborative Asynchronous Advantage Actor-Critic (cA3C) algorithm to incorporate deep knowledge distillation into the online training of agents, and demonstrate the effectiveness of the CDRL framework using extensive empirical evaluation on OpenAI gym. Collaborative Filtering(CF) Collaborative filtering (CF) is a technique used by some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). (also called “people-to-people correlation”) Collaborative Filtering – Neural Autoregressive Distribution Estimator(CF-NADE) This paper proposes CF-NADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CF-NADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CF-NADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CF-NADE, which shows superior performance. Finally, CF-NADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CF-NADE with a single hidden layer beats all previous state-of-the-art methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance. Collaborative Filtering with User-Item Co-Autoregressive Models(CF-UIcA) Besides the success on object recognition, machine translation and system control in games, (deep) neural networks have achieved state-of-the-art results in collaborative filtering (CF) recently. Previous neural approaches for CF are either user-based or item-based, which cannot leverage all relevant information explicitly. We propose CF-UIcA, a neural co-autoregressive model for CF tasks, which exploit the structural autoregressiveness in the domains of both users and items. Furthermore, we separate the inherent dependence in this structure under a natural assumption and develop an efficient stochastic learning algorithm to handle large scale datasets. We evaluate CF-UIcA on two popular benchmarks: MovieLens 1M and Netflix, and achieve state-of-the-art predictive performance, which demonstrates the effectiveness of CF-UIcA. Collective Adaptive Resource-sharing Markovian Agents(CARMA) In this paper we present CARMA, a language recently defined to support specification and analysis of collective adaptive systems. CARMA is a stochastic process algebra equipped with linguistic constructs specifically developed for modelling and programming systems that can operate in open-ended and unpredictable environments. This class of systems is typically composed of a huge number of interacting agents that dynamically adjust and combine their behaviour to achieve specific goals. A CARMA model, termed a collective, consists of a set of components, each of which exhibits a set of attributes. To model dynamic aggregations, which are sometimes referred to as ensembles, CARMA provides communication primitives that are based on predicates over the exhibited attributes. These predicates are used to select the participants in a communication. Two communication mechanisms are provided in the CARMA language: multicast-based and unicast-based. Collective Intelligence(COIN) Collective Intelligence is shared or group intelligence that emerges from the collaboration, collective efforts, and competition of many individuals and appears in consensus decision making. The term appears in sociobiology, political science and in context of mass peer review and crowdsourcing applications. It may involve consensus, social capital and formalisms such as voting systems, social media and other means of quantifying mass activity. Collective IQ is a measure of collective intelligence, although it is often used interchangeably with the term collective intelligence. (‘Building new conclusions from independent contributors is really what collective intelligence is all about.’) Collocation In corpus linguistics, a collocation is a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation, as propounded by Michael Halliday, is the expression strong tea. While the same meaning could be conveyed by the roughly equivalent *powerful tea, this expression is considered incorrect by English speakers. Conversely, the corresponding expression for computer, powerful computers is preferred over *strong computers. Phraseological collocations should not be confused with idioms, where meaning is derived, whereas collocations are mostly compositional. There are about six main types of collocations: adjective+noun, noun+noun (such as collective nouns), verb+noun, adverb+adjective, verbs+prepositional phrase (phrasal verbs), and verb+adverb. Collocation extraction is a task that extracts collocations automatically from a corpus, using computational linguistics. Column-oriented DBMS A column-oriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data. In comparison, most relational DBMSs store data in rows. This column-oriented DBMS has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems where aggregates are computed over large numbers of similar data items. It is possible to achieve some of the benefits of column-oriented and row-oriented organization with any DBMSs. Denoting one as column-oriented refers to both the ease of expression of a column-oriented structure and the focus on optimizations for column-oriented workloads. This approach is in contrast to row-oriented or row store databases and with correlation databases, which use a value-based storage structure. Combinations of Mutually Exclusive Alterations(CoMEt) Cancer is a heterogeneous disease with different combinations of genetic and epigenetic alterations driving the development of cancer in different individuals. While these alterations are believed to converge on genes in key cellular signaling and regulatory pathways, our knowledge of these pathways remains incomplete, making it difficult to identify driver alterations by their recurrence across genes or known pathways. We introduce Combinations of Mutually Exclusive Alterations (CoMEt), an algorithm to identify combinations of alterations de novo, without any prior biological knowledge (e.g. pathways or protein interactions). CoMEt searches for combinations of mutations that exhibit mutual exclusivity, a pattern expected for mutations in pathways. CoMEt has several important feature that distinguish it from existing approaches to analyze mutual exclusivity among alterations. These include: an exact statistical test for mutual exclusivity that is more sensitive in detecting combinations containing rare alterations; simultaneous identification of collections of one or more combinations of mutually exclusive alterations; simultaneous analysis of subtype-specific mutations; and summarization over an ensemble of collections of mutually exclusive alterations. These features enable CoMEt to robustly identify alterations affecting multiple pathways, or hallmarks of cancer. Combinatorial Optimization In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible. It operates on the domain of those optimization problems, in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution. Some common problems involving combinatorial optimization are the traveling salesman problem (“TSP”) and the minimum spanning tree problem (“MST”). Common Cause Principle(CCP) It seems that a correlation between events A and B indicates either that A causes B, or that B causes A, or that A and B have a common cause. It also seems that causes always occur before their effects and, thus, that common causes always occur before the correlated events. Reichenbach was the first to formalize this idea rather precisely. Community Detection Communities are often defined in terms of the partition of the set of vertices, that is each node is put into one and only one community. This is a useful simplification and most community detection methods find this type of community structure. However in some cases a better representation could be one where vertices are in more than one community. This might happen in a social network where each vertex represents a person, and the communities represent the different groups of friends: one community for family, another community for co-workers, one for friends in the same sports club, and so on. The use of cliques for community detection discussed below is just one example of how such overlapping community structure can be found. ➘ “Complex Network” Community detection algorithms: a comparative analysis A Comparison of Community Detection Algorithms on Artificial Networks Compact Trip Representation(CTR) We present a new Compact Trip Representation (CTR) that allows us to manage users’ trips (moving objects) over networks. These could be public transportation networks (buses, subway, trains, and so on) where nodes are stations or stops, or road networks where nodes are intersections. CTR represents the sequences of nodes and time instants in users’ trips. The spatial component is handled with a data structure based on the well-known Compressed Suffix Array (CSA), which provides both a compact representation and interesting indexing capabilities. We also represent the temporal component of the trips, that is, the time instants when users visit nodes in their trips. We create a sequence with these time instants, which are then self-indexed with a balanced Wavelet Matrix (WM). This gives us the ability to solve range-interval queries efficiently. We show how CTR can solve relevant spatial and spatio-temporal queries over large sets of trajectories. Finally, we also provide experimental results to show the space requirements and query efficiency of CTR. Competing Risks This form of analysis is known by its use of death certificates. In traditional overall survival analysis the cause of death is irrelevant to the analysis. In a competing risks survival analyses each death certificate is reviewed. If the disease of interest is cancer, and the person/patient dies of a car accident, the patient is labelled as censored at death, instead of being labelled as having died. Issues with this method arise as each hospital and or registry may code for causes of death differently. For example, there exists variability in the way a patient who has cancer and commits suicide is coded/labelled. In addition, if a patient has an eye removed due to an ocular cancer and dies getting hit while crossing the road because he didn’t see the car would often be considered to be censored rather than having died due to the cancer or its subsequent effects. ➘ “Survival Analysis” Competitive Analysis Competitive analysis is a method invented for analyzing online algorithms, in which the performance of an online algorithm (which must satisfy an unpredictable sequence of requests, completing each request without being able to see the future) is compared to the performance of an optimal offline algorithm that can view the sequence of requests in advance. An algorithm is competitive if its competitive ratio – the ratio between its performance and the offline algorithm’s performance – is bounded. Unlike traditional worst-case analysis, where the performance of an algorithm is measured only for ‘hard’ inputs, competitive analysis requires that an algorithm perform well both on hard and easy inputs, where ‘hard’ and ‘easy’ are defined by the performance of the optimal offline algorithm. For many algorithms, performance is dependent not only on the size of the inputs, but also on their values. One such example is the quicksort algorithm, which sorts an array of elements. Such data-dependent algorithms are analysed for average-case and worst-case data. Competitive analysis is a way of doing worst case analysis for on-line and randomized algorithms, which are typically data dependent. In competitive analysis, one imagines an ‘adversary’ that deliberately chooses difficult data, to maximize the ratio of the cost of the algorithm being studied and some optimal algorithm. Adversaries range in power from the oblivious adversary, which has no knowledge of the random choices made by the algorithm pitted against it, to the adaptive adversary that has full knowledge of how an algorithm works and its internal state at any point during its execution. Note that this distinction is only meaningful for randomized algorithms. For a deterministic algorithm, either adversary can simply compute what state that algorithm must have at any time in the future, and choose difficult data accordingly. For example, the quicksort algorithm chooses one element, called the ‘pivot’, that is, on average, not too far from the center value of the data being sorted. Quicksort then separates the data into two piles, one of which contains all elements with value less than the value of the pivot, and the other containing the rest of the elements. If quicksort chooses the pivot in some deterministic fashion (for instance, always choosing the first element in the list), then it is easy for an adversary to arrange the data beforehand so that quicksort will perform in worst-case time. If, however, quicksort chooses some random element to be the pivot, then an adversary without knowledge of what random numbers are coming up cannot arrange the data to guarantee worst-case execution time for quicksort. The classic on-line problem first analysed with competitive analysis (Sleator & Tarjan 1985) is the list update problem: Given a list of items and a sequence of requests for the various items, minimize the cost of accessing the list where the elements closer to the front of the list cost less to access. (Typically, the cost of accessing an item is equal to its position in the list.) After an access, the list may be rearranged. Most rearrangements have a cost. The Move-To-Front algorithm simply moves the requested item to the front after the access, at no cost. The Transpose algorithm swaps the accessed item with the item immediately before it, also at no cost. Classical methods of analysis showed that Transpose is optimal in certain contexts. In practice, Move-To-Front performed much better. Competitive analysis was used to show that an adversary can make Transpose perform arbitrarily badly compared to an optimal algorithm, whereas Move-To-Front can never be made to incur more than twice the cost of an optimal algorithm. In the case of online requests from a server, competitive algorithms are used to overcome uncertainties about the future. That is, the algorithm does not ‘know’ the future, while the imaginary adversary (the ‘competitor’) ‘knows’. Similarly, competitive algorithms were developed for distributed systems, where the algorithm has to react to a request arriving at one location, without ‘knowing’ what has just happened in a remote location. This setting was presented in (Awerbuch, Kutten & Peleg 1992). Competitive Intelligence(CI) Competitive intelligence is the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Competitive intelligence essentially means understanding and learning what’s happening in the world outside your business so one can be as competitive as possible. It means learning as much as possible-as soon as possible-about one’s industry in general, one’s competitors, or even one’s county’s particular zoning rules. In short, it empowers you to anticipate and face challenges head on. A more focused definition of CI regards it as the organizational function responsible for the early identification of risks and opportunities in the market before they become obvious. Experts also call this process the early signal analysis. This definition focuses attention on the difference between dissemination of widely available factual information (such as market statistics, financial reports, newspaper clippings) performed by functions such as libraries and information centers, and competitive intelligence which is a perspective on developments and events aimed at yielding a competitive edge. Competitive Intelligence and 6 Tips for Its Effective Use Competitive Learning Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include vector quantization and self-organising maps (Kohonen maps). https://…/handbookch7.html Complete Spatial Randomness(CSR) Complete spatial randomness (CSR) describes a point process whereby point events occur within a given study area in a completely random fashion. It is synonymous with a homogeneous spatial Poisson process. Such a process is modeled using only one parameter \rho, i.e. the density of points within the defined area. The term complete spatial randomness is commonly used in Applied Statistics in the context of examining certain point patterns, whereas in most other statistical contexts it is referred to the concept of a spatial Poisson process. Completed Partially Directed Acyclic Graph(CPDAG) ➘ “Directed Acyclic Graph” Complete-Linkage Clustering Complete-linkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters until all elements end up being in the same cluster. At each step, the two clusters separated by the shortest distance are combined. The definition of ‘shortest distance’ is what differentiates between the different agglomerative clustering methods. In complete-linkage clustering, the link between two clusters contains all element pairs, and the distance between clusters equals the distance between those two elements (one in each cluster) that are farthest away from each other. The shortest of these links that remains at any step causes the fusion of the two clusters whose elements are involved. The method is also known as farthest neighbour clustering. The result of the clustering can be visualized as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took place. Complex Adaptive System(CAS) Complexity theory is a relatively new field that began in the mid-1980s at the Santa Fe Institute in New Mexico. Work at the Santa Fe Institute is usually presented as the study of Complex Adaptive Systems (CAS). The CAS movement is predominantly American, as opposed to the European “natural science” tradition in the area of cybernetics and systems. Like in cybernetics and systems theory, CAS shares the subject of general properties of complex systems across traditional disciplinary boundaries. However, CAS is distinguished by the extensive use of computer simulations as a research tool, and an emphasis on systems, such as markets or ecologies, which are less integrated or “organized” than the ones studied by the older tradition (e.g., organisms, machines and companies). Complex Event Processing(CEP) Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible. Complex Network In the context of network theory, a complex network is a graph (network) with non-trivial topological features – features that do not occur in simple networks such as lattices or random graphs but often occur in graphs modelling real systems. The study of complex networks is a young and active area of scientific research inspired largely by the empirical study of real-world networks such as computer networks and social networks. Complex Systems Complex systems present problems both in mathematical modelling and philosophical foundations. The study of complex systems represents a new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment. The equations from which models of complex systems are developed generally derive from statistical physics, information theory and non-linear dynamics and represent organized but unpredictable behaviors of natural systems that are considered fundamentally complex. The physical manifestations of such systems are difficult to define, so a common choice is to identify ‘the system’ with the mathematical information model rather than referring to the undefined physical subject the model represents. Such systems are used to model processes in computer science, biology, economics, physics, chemistry, and many other fields. It is also called complex systems theory, complexity science, study of complex systems, sciences of complexity, non-equilibrium physics, and historical physics. A variety of abstract theoretical complex systems is studied as a field of mathematics. The key problems of complex systems are difficulties with their formal modelling and simulation. From such a perspective, in different research contexts complex systems are defined on the basis of their different attributes. Since all complex systems have many interconnected components, the science of networks and network theory are important aspects of the study of complex systems. A consensus regarding a single universal definition of complex system does not yet exist. For systems that are less usefully represented with equations various other kinds of narratives and methods for identifying, exploring, designing and interacting with complex systems are used. Complex-Valued Neural Network(CVNN) The complex-valued Neural Network is an extension of a (usual) real-valued neural network, whose input and output signals and parameters such as weights and thresholds are all complex numbers (the activation function is inevitably a complex-valued function). Neural Networks have been applied to various fields such as communication systems, image processing and speech recognition, in which complex numbers are often used through the Fourier Transformation. This indicates that complex-valued neural networks are useful. In addition, in the human brain, an action potential may have different pulse patterns, and the distance between pulses may be different. This suggests that introducing complex numbers representing phase and amplitude into neural networks is appropriate. In these years the complex-valued neural networks expand the application fields in image processing, computer vision, optoelectronic imaging, and communication and so on. The potentially wide applicability yields new aspects of theories required for novel or more effective functions and mechanisms. Component Lasso Method We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then applies the lasso to each subproblem separately, obtaining a coefficient vector for each one. Finally, it uses non-negative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery. The modular structure also lends itself naturally to parallel computation. Composite Gaussian Process Models(CGP) A new type of nonstationary Gaussian process model is devel- oped for approximating computationally expensive functions. The new model is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models lo- cal details. The new predictor also incorporates a flexible variance model, which makes it more capable of approximating surfaces with varying volatility. Compared to the commonly used stationary Gaus- sian process model, the new predictor is numerically more stable and can more accurately approximate complex surfaces when the experi- mental design is sparse. In addition, the new model can also improve the prediction intervals by quantifying the change of local variability associated with the response. Composite Indicator(COIN) A composite indicator is formed when individual indicators are compiled into a single index, on the basis of an underlying model of the multi-dimensional concept that is being measured. A composite indicator measures multi-dimensional concepts (e.g. competitiveness, e-trade or environmental quality) which cannot be captured by a single indicator. Ideally, a composite indicator should be based on a theoretical framework / definition, which allows individual indicators / variables to be selected, combined and weighted in a manner which reflects the dimensions or structure of the phenomena being measured. Composite Quantile Regression(CQR) Compositional Data In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information. This definition, given by John Aitchison (1986) has several consequences: • A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted. • As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions. • As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant kappa. The vector operation assigning the constant sum representative is called closure, where D is the number of parts (components) and denotes a row vector. • Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex. Compositional Data Analysis(CoDa) Compositional data analysis deals with situations where the relevant information is contained only in the ratios between the measured variables, and not in the reported values. Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. A Concise Guide to Compositional Data Analysis Compositional,compositions Compositional Pattern Producing Network(DPPN) Compositional pattern-producing networks (CPPNs) are a variation of artificial neural networks (ANNs) that differ in their set of activation functions and how they are applied. While ANNs often contain only sigmoid functions and sometimes Gaussian functions, CPPNs can include both types of functions and many others. The choice of functions for the canonical set can be biased toward specific types of patterns and regularities. For example, periodic functions such as sine produce segmented patterns with repetitions, while symmetric functions such as Gaussian produce symmetric patterns. Linear functions can be employed to produce linear or fractal-like patterns. Thus, the architect of a CPPN-based genetic art system can bias the types of patterns it generates by deciding the set of canonical functions to include. Comprehensive EVent Ontology(CEVO) While the general analysis of named entities has received substantial research attention, the analysis of relations over named entities has not. In fact, a review of the literature on unstructured as well as structured data revealed a deficiency in research on the abstract conceptualization required to organize relations. We believe that such an abstract conceptualization can benefit various communities and applications such as natural language processing, information extraction, machine learning and ontology engineering. In this paper, we present CEVO (i.e., a Comprehensive EVent Ontology) built on Levin’s conceptual hierarchy of English verbs that categorizes verbs with the shared meaning and syntactic behavior. We present the fundamental concepts and requirements for this ontology. Furthermore, we present three use cases for demonstrating the benefits of this ontology on annotation tasks: 1) annotating relations in plain text, 2) annotating ontological properties and 3) linking textual relations to ontological properties. Compressed Learning(CL) In this paper, we provide theoretical results to show that compressed learning, learning directly in the compressed domain, is possible. In Particular, we provide tight bounds demonstrating that the linear kernel SVM’s classifier in the measurement domain, with high probability, has true accuracy close to the accuracy of the best linear threshold classifier in the data domain. We show that this is beneficial both from the compressed sensing and the machine learning points of view. Furthermore, we indicate that for a family of well-known compressed sensing matrices, compressed learning is universal, in the sense that learning and classification in the measurement domain works provided that the data are sparse in some, even unknown, basis. Moreover, we show that our results are also applicable to a family of smooth manifold-learning tasks. Finally, we support our claims with experimental results. Compressed Learning: A Deep Neural Network Approach Compressed, Complementary, Computationally-Efficient Adaptive Gradient Online Learning(CompAdaGrad) The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method’s full-matrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is computationally prohibitive and so the simpler diagonal version often is used in practice. We introduce a new method, CompAdaGrad, that navigates the space between these two schemes and show that this method can yield results much better than diagonal AdaGrad while avoiding the (effectively intractable) $O(n^3)$ computational complexity of full-matrix AdaGrad for dimension $n$. CompAdaGrad essentially performs full-matrix regularization in a low-dimensional subspace while performing diagonal regularization in the complementary subspace. We derive CompAdaGrad’s updates for composite mirror descent in case of the squared $\ell_2$ norm and the $\ell_1$ norm, demonstrate that its complexity per iteration is linear in the dimension, and establish guarantees for the method independent of the choice of composite regularizer. Finally, we show preliminary results on several datasets. Compressive K-means(CKM) The Lloyd-Max algorithm is a classical approach to perform K-means clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of K-means (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to Lloyd-Max, for a sketch size proportional to the number of cen-troids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike Lloyd-Max which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of Lloyd-Max, with similar clustering performance on artificial data. Finally, CKM achieves lower classification errors on handwritten digits classification. ➘ “Lloyd-Max” Compressive Sampling(CS) Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Shannon-Nyquist sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity which requires the signal to be sparse in some domain. The second one is incoherence which is applied through the isometric property which is sufficient for sparse signals. MRI is a prominent application. A Mathematical Introduction to Compressive Sensing An Introduction To Compressive Sampling Compressive Sensing Computational Intelligence(CI) Computational intelligence (CI) is a set of nature-inspired computational methodologies and approaches to address complex real-world problems to which traditional approaches, i.e., first principles modeling or explicit statistical modeling, are ineffective or infeasible. Many such real-life problems are not considered to be well-posed problems mathematically, but nature provides many counterexamples of biological systems exhibiting the required function, practically. For instance, the human body has about 200 joints (degrees of freedom), but humans have little problem in executing a target movement of the hand, specified in just three Cartesian dimensions. Even if the torso were mechanically fixed, there is an excess of 7:3 parameters to be controlled for natural arm movement. Traditional models also often fail to handle uncertainty, noise and the presence of an ever-changing context. Computational Intelligence provides solutions for such and other complicated problems and inverse problems. It primarily includes artificial neural networks, evolutionary computation and fuzzy logic. In addition, CI also embraces biologically inspired algorithms such as swarm intelligence and artificial immune systems, which can be seen as a part of evolutionary computation, and includes broader fields such as image processing, data mining, and natural language processing. Furthermore other formalisms: Dempster–Shafer theory, chaos theory and many-valued logic are used in the construction of computational models. The characteristic of “intelligence” is usually attributed to humans. More recently, many products and items also claim to be “intelligent”. Intelligence is directly linked to the reasoning and decision making. Fuzzy logic was introduced in 1965 as a tool to formalise and represent the reasoning process and fuzzy logic systems which are based on fuzzy logic possess many characteristics attributed to intelligence. Fuzzy logic deals effectively with uncertainty that is common for human reasoning, perception and inference and, contrary to some misconceptions, has a very formal and strict mathematical backbone (‘is quite deterministic in itself yet allowing uncertainties to be effectively represented and manipulated by it’, so to speak). Neural networks, introduced in 1940s (further developed in 1980s) mimic the human brain and represent a computational mechanism based on a simplified mathematical model of the perceptrons (neurons) and signals that they process. Evolutionary computation, introduced in the 1970s and more popular since the 1990s mimics the population-based sexual evolution through reproduction of generations. It also mimics genetics in so called genetic algorithms. Computational Linguistics Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, logicians, philosophers, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others. Computational linguistics has theoretical and applied components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science, and applied computational linguistics focuses on the practical outcome of modeling human language use. Computational Network Toolkit(CNTK) CNTK (http://www.cntk.ai ), the Computational Network Toolkit by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code. Computational Theory of Mind In philosophy, a computational theory of mind names a view that the human mind or the human brain (or both) is an information processing system and that thinking is a form of computing. The theory was proposed in its modern form by Hilary Putnam in 1961, and developed by the MIT philosopher and cognitive scientist (and Putnam’s PhD student) Jerry Fodor in the 1960s, 1970s and 1980s. Despite being vigorously disputed in analytic philosophy in the 1990s (due to work by Putnam himself, John Searle, and others), the view is common in modern cognitive psychology and is presumed by many theorists of evolutionary psychology; in the 2000s and 2010s the view has resurfaced in analytic philosophy (Scheutz 2003, Edelman 2008). The computational theory of mind holds that the mind is a computation that arises from the brain acting as a computing machine. The theory can be elaborated in many ways, the most popular of which is that the brain is a computer and the mind is the result of the program that the brain runs. A program is the finite description of an algorithm or effective procedure, which prescribes a deterministic sequence of discrete actions that produces outputs based only on inputs and the internal states (memory) of the computing machine. For any admissible input, algorithms terminate in a finite number of steps. So the computational theory of mind is the claim that the mind is a computation of a machine (the brain) that derives output representations of the world from input representations and internal memory in a deterministic (non-random) way that is consistent with the theory of computation. Computational theories of mind are often said to require mental representation because ‘input’ into a computation comes in the form of symbols or representations of other objects. A computer cannot compute an actual object, but must interpret and represent the object in some form and then compute the representation. The computational theory of mind is related to the representational theory of mind in that they both require that mental states are representations. However the two theories differ in that the representational theory claims that all mental states are representations while the computational theory leaves open that certain mental states, such as pain or depression, may not be representational and therefore may not be suitable for a computational treatment. These non-representational mental states are known as qualia. In Fodor’s original views, the computational theory of mind is also related to the language of thought. The language of thought theory allows the mind to process more complex representations with the help of semantics. Computer Aided Diagnosis In radiology, computer-aided detection (CADe), also called computer-aided diagnosis (CADx), are procedures in medicine that assist doctors in the interpretation of medical images. Imaging techniques in X-ray, MRI, and Ultrasound diagnostics yield a great deal of information, which the radiologist has to analyze and evaluate comprehensively in a short time. CAD systems help scan digital images, e.g. from computed tomography, for typical appearances and to highlight conspicuous sections, such as possible diseases. Computer Assisted/Aided Qualitative Data Analysis Software(CAQDAS) Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc. Computer Science Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information, whether such information is encoded as bits in a computer memory or transcribed in genes and protein structures in a biological cell. An alternate, more succinct definition of computer science is the study of automating algorithmic processes that scale. A computer scientist specializes in the theory of computation and the design of computational systems. Its subfields can be divided into a variety of theoretical and practical disciplines. Some fields, such as computational complexity theory (which explores the fundamental properties of computational and intractable problems), are highly abstract, while fields such as computer graphics emphasize real-world visual applications. Still other fields focus on the challenges in implementing computation. For example, programming language theory considers various approaches to the description of computation, while the study of computer programming itself investigates various aspects of the use of programming language and complex systems. Human-computer interaction considers the challenges in making computers and computations useful, usable, and universally accessible to humans. Computer Vision(CV) Computer vision is a field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. A theme in the development of this field has been to duplicate the abilities of human vision by electronically perceiving and understanding an image. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. Computer vision has also been described as the enterprise of automating and integrating a wide range of processes and representations for vision perception. Computer vision is the automatic analysis of images and videos by computers in order to gain some understanding of the world. Computer vision is inspired by the capabilities of the human vision system and, when initially addressed in the 1960s and 1970s, it was thought to be a relatively straightforward problem to solve. However, the reason we think/thought that vision is easy is that we have our own visual system which makes the task seem intuitive to our conscious minds. In fact, the human visual system is very complex and even the estimates of how much of the brain is involved with visual processing vary from 25% up to more than 50%. Concept Mining Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. Conceptual Clustering Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning. http://…/eswc2008-PAM.pdf Concordance Correlation Coefficient In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability. Condition Monitoring(CM) Condition monitoring (or, colloquially, CM) is the process of monitoring a parameter of condition in machinery (vibration, temperature etc.), in order to identify a significant change which is indicative of a developing fault. It is a major component of . The use of condition monitoring allows maintenance to be scheduled, or other actions to be taken to prevent failure and avoid its consequences. Condition monitoring has a unique benefit in that conditions that would shorten normal lifespan can be addressed before they develop into a major failure. Condition monitoring techniques are normally used on rotating equipment and other machinery (pumps, electric motors, internal combustion engines, presses), while periodic inspection using non-destructive testing techniques and fit for service (FFS) evaluation are used for stationary plant equipment such as steam boilers, piping and heat exchangers. http://…/9781466584051 Conditional Autoregressive Model(CAR) The essential idea here is that the probability of values estimated at any given location are conditional on the level of neighboring values. mclcar Conditional Extreme Value Models Extreme value theory (EVT) is often used to model environmental, financial and internet traffic data. Multivariate EVT assumes a multivariate domain of attraction condition for the distribution of a random vector necessitating that each component satisfy a marginal domain of attraction condition. Heffernan and Tawn [2004] and Heffernan and Resnick [2007] developed an approximation to the joint distribution of the random vector by conditioning on one of the components being in an extreme value domain. The usual method of analysis using multivariate extreme value theory often is not helpful either because of asymptotic independence or due to one component of the observation vector not being in a domain of attraction. These defects can be addressed by using the conditional extreme value model. Conditional Power(CP) Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis. In many clinical trials, a CP computation at a pre-specified point in the study, such as mid-way, is used as the basis for early termination for futility when there is little evidence of a beneficial effect. Conditional Random Fields(CRF) Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to ‘neighboring’ samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples. CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences and in computer vision. Specifically, CRFs find applications in shallow parsing, named entity recognition and gene finding, among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation. Conditional Random Fields as Recurrent Neural Networks(CRF-RNN) Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields as Recurrent Neural Networks. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation. GitXiv Condition-Based Maintenance(CBM) Condition-based maintenance (CBM), shortly described, is maintenance when need arises. This maintenance is performed after one or more indicators show that equipment is going to fail or that equipment performance is deteriorating. This concept is applicable to mission critical systems that incorporate active redundancy and fault reporting. It is also applicable to non-mission critical systems that lack redundancy and fault reporting. Condition-based maintenance was introduced to try to maintain the correct equipment at the right time. CBM is based on using real-time data to prioritize and optimize maintenance resources. Observing the state of the system is known as condition monitoring. Such a system will determine the equipment’s health, and act only when maintenance is actually necessary. Developments in recent years have allowed extensive instrumentation of equipment, and together with better tools for analyzing condition data, the maintenance personnel of today are more than ever able to decide what is the right time to perform maintenance on some piece of equipment. Ideally condition-based maintenance will allow the maintenance personnel to do only the right things, minimizing spare parts cost, system downtime and time spent on maintenance. http://…/3313ijmnct03.pdf CONESTA(CONESTA) High-dimensional prediction models are increasingly used to analyze biological data such as neuroimaging of genetic data sets. However, classical penalized algorithms yield to dense solutions that are difficult to interpret without arbitrary thresholding. Alternatives based on sparsity-inducing penalties suffer from coefficient instability. Complex structured sparsity-inducing penalties are a promising approach to force the solution to adhere to some domain-specific constraints and thus offering new perspectives in biomarker identification. We propose a generic optimization framework that can combine any smooth convex loss function with: (i) penalties whose proximal operator is known and (ii) with a large range of complex, non-smooth convex structured penalties such as total variation, or overlapping group lasso. Although many papers have addressed a similar goal, few have tackled it in such a generic way and in the context of high-dimensional data. The proposed continuation algorithm, called \textit{CONESTA}, dynamically smooths the complex penalties to avoid the computation of proximal operators, that are either not known or expensive to compute. The decreasing sequence of smoothing parameters is dynamically adapted, using the duality gap, in order to maintain the optimal convergence speed towards any globally desired precision with duality gap guarantee. First, we demonstrate, on both simulated data and on experimental MRI data, that CONESTA outperforms the excessive gap method, ADMM, proximal gradient smoothing (without continuation) and inexact FISTA in terms of convergence speed and/or precision of the solution. Second, on the experimental MRI data set, we establish the superiority of structured sparsity-inducing penalties ($\ell_1$ and total variation) over non-structured methods in terms of the recovery of meaningful and stable groups of predictive variables. Confidence Confidence is defined as the probability of seeing the rule’s consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules X→Y and Y→X. Association rules have to satisfy a minimum confidence constraint, conf(X→Y)≥γ. Confidence is not down-ward closed and was developed together with support by Agrawal et al. (the so-called support-confidence framework). Support is first used to find frequent (significant) itemsets exploiting its down-ward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold. A problem with confidence is that it is sensitive to the frequency of the consequent Y in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items. Confidence Interval In statistics, a confidence interval (CI) is a type of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. Confidence Weighting(CW) Confidence weighting (CW) is concerned with measuring two variables: (1) what a respondent believes is a correct answer to a question and (2) what degree of certainty the respondent has toward the correctness of this belief. Confidence weighting when applied to a specific answer selection for a particular test or exam question is referred to in the literature from cognitive psychology as item-specific confidence, a term typically used by researchers who investigate metamemory or metacognition, comprehension monitoring, or feeling-of-knowing. Item-specific confidence is defined as calibrating the relationship between an objective performance of accuracy (e.g., a test answer selection) with the subjective measure of confidence, (e.g., a numeric value assigned to the selection). Studies on self-confidence and metacognition during test taking have used item-specific confidence as a way to assess the accuracy and confidence underlying knowledge judgments. Researchers outside of the field of cognitive psychology have used confidence weighting as applied to item-specific judgments in assessing alternative conceptions of difficult concepts in high school biology and physics, developing and evaluating computerized adaptive testing, testing computerized assessments of learning and understanding, and developing and testing formative and summative classroom assessments. Confidence weighting is one of three components of the Risk Inclination Model. Confidence-Weighted Linear Classification We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. Configurational Comparative Methods(CCM) Configurational comparative methods (CCMs) subsume techniques for the identification of complex causal dependencies in configurational data using the theoretical framework of Boolean algebra and its various extensions (Rihoux and Ragin, 2009). For example, Qualitative Comparative Analysis (QCA; Ragin, 1987, 2000, 2008)—hitherto the most prominent representative of CCMs—has been applied in areas as diverse as business administration (e.g., Chung, 2001), environmental science (van Vliet et al., 2013), evaluation (Cragun et al., 2014), political science (Thiem, 2011), public health (Longest and Thoits, 2012) and sociology (Crowley, 2013). Besides three stand-alone programs based on graphical user interfaces, three R packages for QCA are currently available, each with a different scope of functionality: QCA (Du¸sa and Thiem, 2014; Thiem and Du¸sa, 2013a,c), QCA3 (Huang, 2014) and SetMethods (Quaranta, 2013) (an add-on package to Schneider and Wagemann, 2012). Confirmatory Analysis 1) Inferential Statistics – Deductive Approach: • Heavy reliance on probability models • Must accept untestable assumptions • Look for definite answers to specific questions • Emphasis on numerical calculations • Hypotheses determined at outset • Hypothesis tests and formal confidence interval estimation. 2) Advantages: • Provide precise information in the right circumstances • Well-established theory and methods. 3) Disadvantages: • Misleading impression of precision in less than ideal circumstances • Analysis driven by preconceived ideas • Difficult to notice unexpected results. Confirmatory Factor Analysis(CFA) In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research. It is used to test whether measures of a construct are consistent with a researcher’s understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959). In confirmatory factor analysis, the researcher first develops a hypothesis about what factors s/he believes are underlying the measures s/he has used (e.g., “Depression” being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with his/her theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to one another, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others. For some applications, the requirement of “zero loadings” (for indicators not supposed to load on a certain factor) has been regarded as too strict. A newly developed analysis method, “exploratory structural equation modeling”, specifies hypotheses about the relation between observed indicators and their supposed primary latent factors while allowing for estimation of loadings with other latent factors as well. relabeLoadings Conflict-Driven Clause Learning(CDCL) In computer science, Conflict-Driven Clause Learning (CDCL) is an algorithm for solving the Boolean satisfiability problem (SAT). Given a Boolean formula, the SAT problem asks for an assignment of variables so that the entire formula evaluates to true. The internal workings of CDCL SAT solvers were inspired by DPLL solvers. Conflict-free Asynchronous Machine Learning(CYCLADES) We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. CYCLADES is asynchronous during shared model updates, and requires no memory locking mechanisms, similar to HOGWILD!-type algorithms. Unlike HOGWILD!, CYCLADES introduces no conflicts during the parallel execution, and offers a black-box analysis for provable speedups across a large family of algorithms. Due to its inherent conflict-free nature and cache locality, our multi-core implementation of CYCLADES consistently outperforms HOGWILD!-type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to the HOGWILD! implementation of SGD, and up to 5x gains over asynchronous implementations of variance reduction algorithms. Conformal Prediction Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability e, together with a method that makes a prediction ˆ y of a label y, it produces a set of labels, typically containing ˆ y, that also contains y with probability 1-e. Conformal prediction can be applied to any method for producing ˆ y: a nearest-neighbor method, a support-vector machine, ridge regression, etc. Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1-e of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other on-line compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. Confounding http://…/confounding.html Confounding Variable In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable. A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship, and the presence of misestimation for this reason is termed omitted-variable bias. While specific definitions may vary, in essence a confounding variable fits the following four criteria, here given in a hypothetical situation with variable of interest ‘V’, confounding variable ‘C’ and outcome of interest ‘O’: 1. C is associated (inversely or directly) with O 2. C is associated with O, independent of V 3. C is associated (inversely or directly) with V 4. C is not in the causal pathway of V to O (C is not a direct consequence of V, not a way by which V produces O) The above correlation-based definition, however, is metaphorical at best – a growing number of analysts agree that confounding is a causal concept, and as such, cannot be described in terms of correlations nor associations. Confusion Matrix In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix , is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). Congruence Class Model(CCM) CCMnet Conjugate Gradient Method(CG) In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems. The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It was developed by Magnus Hestenes and Eduard Stiefel. Conjugate Prior In Bayesian probability theory, if the posterior distributions p(theta|x) are in the same family as the prior probability distribution p(theta), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function. For example, the Gaussian family is conjugate to itself (or self-conjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood which is also Gaussian. Connected Scatterplot The connected scatterplot visualizes two related time series in a scatterplot and connects the points with a line in temporal sequence. News media are increasingly using this technique to present data under the intuition that it is understandable and engaging. To explore these intuitions, we (1) describe how paired time series relationships appear in a connected scatterplot, (2) qualitatively evaluate how well people understand trends depicted in this format, (3) quantitatively measure the types and frequency of misinterpretations, and (4) empirically evaluate whether viewers will preferentially view graphs in this format over the more traditional format. The results suggest that low-complexity connected scatterplots can be understood with little explanation, and that viewers are biased towards inspecting connected scatterplots over the more traditional format. We also describe misinterpretations of connected scatterplots and propose further research into mitigating these mistakes for viewers unfamiliar with the technique. Connection Analytics Connection Analytics – an emerging discipline that provides answers to persistent business questions such as identification and influence of thought leaders, impact of external events or players on financial risk, or analysis of network performance based on causal relationships between nodes. It provides a new way of looking at people, products, physical phenomena, or events. Enterprises are using Big Data analytics to complement traditional SQL queries in answering very familiar questions, such as customer retention, marketing attribution, risk mitigation, and operational efficiency which, until now, required enormous compute power, time-consuming data management and the need for learning highly specialized programming and query languages. Connection Scan Algorithm(CSA) We introduce the Connection Scan Algorithm (CSA) to efficiently answer queries to timetable information systems. The input consists, in the simplest setting, of a source position and a desired target position. The output consist is a sequence of vehicles such as trains or buses that a traveler should take to get from the source to the target. We study several problem variations such as the earliest arrival and profile problems. We present algorithm variants that only optimize the arrival time or additionally optimize the number of transfers in the Pareto sense. An advantage of CSA is that is can easily adjust to changes in the timetable, allowing the easy incorporation of known vehicle delays. We additionally introduce the Minimum Expected Arrival Time (MEAT) problem to handle possible, uncertain, future vehicle delays. We present a solution to the MEAT problem that is based upon CSA. Finally, we extend CSA using the multilevel overlay paradigm to answer complex queries on nation-wide integrated timetables with trains and buses. Connectionist Temporal Classification(CTC) Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN. Conover-Iman Test Constrained Optimization By RAdial Basis Function Interpolation(COBRA) Content Grouping Content Grouping lets you group content into a logical structure that reflects how you think about your site or app, and then view and compare aggregated metrics by group name in addition to being able to drill down to the individual URL, page title, or screen name. For example, you can see the aggregated number of pageviews for all pages in a group like Men/Shirts, and then drill in to see each URL or page title. You start by creating a Content Group, a collection of content. For example, on an ecommerce site that sells clothing, you might create groups for Men, Women, and Children. Then, within each group, you might create content like Shirts, Pants, Outerwear. This would let you compare aggregated statistics for each type of clothing within a group (e.g., Men’s Shirts vs Men’s Pants vs. Men’s Outerwear). It would also let you drill in to each group to see how individual Shirts pages compare to one another, for example, Men/Shirts/T-shirts/index.html vs Men/Shirts/DressShirts/index.html. Context- Aware Bandits(CAB) In this paper, we present the CAB (Context- Aware Bandits). With CAB we attempt to craft a bandit algorithm that can exploit collaborative effects and that can be deployed in a practical recommendation system setting, where the multi-armed bandits have been shown to perform well in particular with respect to the cold start problem. CAB exploits, a context-aware clustering technique augmenting exploration-exploitation strategies in a contextual multi-armed bandit settings. CAB dynamically clusters the users based on the content universe under consideration. We demonstrate the efficacy of our approach on extensive real-world datasets, showing the scalability, and more importantly, the significant increased prediction performance compared to related state-of-the-art methods. Context Awareness Context awareness is a property of mobile devices that is defined complementarily to location awareness. Whereas location may determine how certain processes in a device operate, context may be applied more flexibly with mobile users, especially with users of smart phones. Context awareness originated as a term from ubiquitous computing or as so-called pervasive computing which sought to deal with linking changes in the environment with computer systems, which are otherwise static. The term has also been applied to business theory in relation to Contextual application design and business process management issues. Context-aware Sentiment Word Identification(sentiword2vec) Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions. Contextual / Common Query Language(CQL) Contextual Query Language (CQL), previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information. Based on the semantics of Z39.50, its design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex query languages. Contextual Bandit The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised – in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “k-Armed Bandits”, often with various relevant embellishments. The k-Armed Bandit setting works on a round-by-round basis. On each round: 1. A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). 2. The world reveals the reward ra of the chosen arm (i.e. whether the ad is clicked on). http://…/Multi-armed_bandit#Contextual_Bandit Contextual Multi-Armed Bandits Multi-Armed Bandits with side information. Continuous Bag-of-Words(CBOW) The ‘continuous bag-of-words model’ (CBOW) adds inputs from words within short window to predict the current word. http://…/1301.3781.pdf Continuous Computation Language(CCL) For Sybase Complex Event Procesing (CEP), developers create CEP applications using the Continuous Computation Language (CCL). Introduced in 2005, CCL was the first commercial, declarative SQL-based CEP language and remains the most extensive SQL-based CEP language on the market. Because the Continuous Computation Language (CCL) is a SQL-based language, it gives programmers a huge head start in creating CEP applications. The Sybase CEP Studio helps manage all aspects of the application development process, further increasing programmer productivity. Continuous Skip-gram(Skip-gram) The training objective of the Skip-gram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document. More formally, given a sequence of training words w1,w2,w3, … ,wT , the objective of the Skip-gram model is to maximize the average log probability, where c is the size of the training context (which can be a function of the center word wt). Larger c results in more training examples and thus can lead to a higher accuracy, at the expense of the 2 training time. http://…/1301.3781.pdf Continuous Time Stochastic Modelling(CTSM) In probability theory and statistics, a continuous-time stochastic process, or a continuous-space-time stochastic process is a stochastic process for which the index variable takes a continuous set of values, as contrasted with a discrete-time process for which the index variable takes only distinct values. An alternative terminology uses continuous parameter as being more inclusive. A more restricted class of processes are the continuous stochastic processes: here the term often (but not always) implies both that the index variable is continuous and that sample paths of the process are continuous. Given the possible confusion, caution is needed. Continuous-time stochastic processes that are constructed from discrete-time processes via a waiting time distribution are called continuous-time random walks. ctsmr Contrast In statistics, particularly analysis of variance and linear regression, an orthogonal contrast is a linear combination of two or more factor level means (averages) whose coefficients add up to zero. Non-orthogonal contrasts do not necessarily sum to 0. Contrasts should be constructed “to answer specific research questions”, and do not necessarily have to be orthogonal. Contrast Analysis ➚ “Contrast” Contrastive Divergence(CD) Contrastive Divergence (CD), an approximate Maximum-Likelihood (ML) learning algorithm proposed by Geoffrey Hinton. Contrastive Divergence is basically a funky term for “approximate gradient descent”. CONvergence of iterated CORrelations(CONCOR) Given an adjacency matrix, or a set of adjacency matrices for different relations, a correlation matrix can be formed by the following procedure. Form a profile vector for a vertex i by concatenating the ith row in every adjacency matrix; the i,jth element of the correlation matrix is the Pearson correlation coefficient of the profile vectors of i and j. This (square, symmetric) matrix is called the first correlation matrix. The procedure can be performed iteratively on the correlation matrix until convergence. Each entry is now 1 or -1. This matrix is used to split the data into two blocks such that members of the same block are positively correlated, members of different blocks are negatively correlated. CONCOR uses the above technique to split the initial data into two blocks. Successive splits are then applied to the separate blocks. At each iteration all blocks are submitted for analysis, however blocks containing two vertices are not split. Consequently n-partitions of the binary tree can produce up to 2n blocks. Note that any similarity matrix can be used as input. http://…/concor-in-r Convergence of Random Variables In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behaviour that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behaviour can be characterised: two readily understood behaviours are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution. http://…ty_theory#Convergence_of_random_variables http://…-of-convergence-in-probability-theory.jpg Convergent Cross Mapping(CCM) Convergent cross mapping (CCM) is a statistical test for a cause-and-effect relationship between two time series variables that, like the Granger causality test, seeks to resolve the problem that correlation does not imply causation. While Granger causality is best suited for purely stochastic systems where the influence of the causal variables are separable (independent of each other), CCM is based on the theory of Dynamical systems and can be applied to systems where causal variables have synergistic effects. The test was developed in 2012 by the lab of George Sugihara of the Scripps Institution of Oceanography, La Jolla, California, USA. Convex Banding of the Covariance Matrix We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. Convex Function In mathematics, a real-valued function f(x) defined on an interval is called convex (or convex downward or concave upward) if the line segment between any two points on the graph of the function lies above the graph, in a Euclidean space (or more generally a vector space) of at least two dimensions. Equivalently, a function is convex if its epigraph (the set of points on or above the graph of the function) is a convex set. Well-known examples of convex functions are the quadratic function f(x)=x^2 and the exponential function f(x)=e^x for any real number x. Convex functions play an important role in many areas of mathematics. They are especially important in the study of optimization problems where they are distinguished by a number of convenient properties. For instance, a (strictly) convex function on an open set has no more than one minimum. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and, as a result, they are the most well-understood functionals in the calculus of variations. In probability theory, a convex function applied to the expected value of a random variable is always less than or equal to the expected value of the convex function of the random variable. This result, known as Jensen’s inequality, underlies many important inequalities (including, for instance, the arithmetic–geometric mean inequality and Hölder’s inequality). Exponential growth is a special case of convexity. Exponential growth narrowly means “increasing at a rate proportional to the current value”, while convex growth generally means “increasing at an increasing rate (but not necessarily proportionally to current value)”. Convex Hierarchical Testing(CHT) We consider the testing of all pairwise interactions in a two-class problem with many features. We devise a hierarchical testing framework that considers an interaction only when one or more of its constituent features has a nonzero main effect. The test is based on a convex optimization framework that seamlessly considers main effects and interactions together. Convex Optimization Convex minimization, a subfield of optimization, studies the problem of minimizing convex functions over convex sets. The convexity property can make optimization in some sense “easier” than the general case – for example, any local minimum must be a global minimum. Convexified Convolutional Neural Networks(CCNN) We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a reproducing kernel Hilbert space, the CNN parameters can be represented as a low-rank matrix, which can be relaxed to obtain a convex optimization problem. For learning two-layer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. For learning deeper networks, we train CCNNs in a layer-wise manner. Empirically, CCNNs achieve performance competitive with CNNs trained by backpropagation, SVMs, fully-connected neural networks, stacked denoising auto-encoders, and other baseline methods. ConvNetJS ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you’re training. No software requirements, no compilers, no installations, no GPUs, no sweat. Convolution In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions, giving the area overlap between the two functions as a function of the amount that one of the original functions is translated. Convolution is similar to cross-correlation. It has applications that include probability, statistics, computer vision, image and signal processing, electrical engineering, and differential equations. Convolutional Neural Network In computer science, a convolutional neural network is a type of feed-forward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field. Convolutional networks were inspired by biological processes and are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing. They are widely used models for image recognition. http://…oduction_to_Convolutional_Neural_Networks Conway-Maxwell Poisson(CMP) Count data are a popular outcome in many empirical studies, especially as big data has become available on human and social behavior. The Conway-Maxwell Poisson (CMP) distribution is popularly used for modeling count data due to its ability to handle both overdispersed and underdispersed data. Yet, current methods for estimating CMP regression models are not efficient, especially with high-dimensional data. Extant methods use either nonlinear optimization or MCMC methods. We propose a flexible estimation framework for CMP regression based on iterative reweighed least squares (IRLS). Because CMP belongs to the exponential family, convergence is guaranteed and is more efficient. We also extend this framework to allow estimation for additive models with smoothing splines. We illustrate the usefulness of this approach through simulation study and application to real data on speed dating. Cook’s Distance In statistics, Cook’s distance or Cook’s D is a commonly used estimate of the influence of a data point when performing least squares regression analysis. In a practical ordinary least squares analysis, Cook’s distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977. Cooperative Game Theory In game theory, a cooperative game is a game where groups of players (‘coalitions’) may enforce cooperative behaviour, hence the game is a competition between coalitions of players, rather than between individual players. An example is a coordination game, when players choose the strategies by a consensus decision-making process. Recreational games are rarely cooperative, because they usually lack mechanisms by which coalitions may enforce coordinated behaviour on the members of the coalition. Such mechanisms, however, are abundant in real life situations (e.g. contract law). Cooperative theory starts with a formalization of games that abstracts away altogether from procedures and … concentrates, instead, on the possibilities for agreement. … There are several reasons that explain why cooperative games came to be treated separately. One is that when one does build negotiation and enforcement procedures explicitly into the model, then the results of a non-cooperative analysis depend very strongly on the precise form of the procedures, on the order of making offers and counter-offers and so on. This may be appropriate in voting situations in which precise rules of parliamentary order prevail, where a good strategist can indeed carry the day. But problems of negotiation are usually more amorphous; it is difficult to pin down just what the procedures are. More fundamentally, there is a feeling that procedures are not really all that relevant; that it is the possibilities for coalition forming, promising and threatening that are decisive, rather than whose turn it is to speak. … Detail distracts attention from essentials. Some things are seen better from a distance; the Roman camps around Metzada are indiscernible when one is in them, but easily visible from the top of the mountain. Cooperative Inverse Reinforcement Learning(CIRL) For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as {\em cooperative inverse reinforcement learning} (CIRL). A CIRL problem is a cooperative, partial-information game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm. Coordinate Descent(CD) Coordinate descent is a non-derivative optimization algorithm. To find a local minimum of a function, one does line search along one coordinate direction at the current point in each iteration. One uses different coordinate directions cyclically throughout the procedure. On non-separable functions the algorithm may fail to find the optimum in a reasonable number of function evaluations. To improve the convergence an appropriate coordinate system can be gradually learned, such that new search coordinates obtained using PCA are as decorrelated as possible with respect to the objective function Coordinate Descent Algorithms(CDA) This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering. To keep the primer up-to-date, we intend to publish this monograph only after no additional topics need to be added and we foresee no further major advances in the area. copCAR Regression Model(copCAR) Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM#s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online. copCAR Copula In probability theory and statistics, a copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables. They are named for their resemblance to grammatical copulas in linguistics. Copula Statistic(CoS) A new index based on empirical copulas, termed the Copula Statistic (CoS), is introduced for assessing the strength of multivariate dependence and for testing statistical independence. New properties of the copulas are proved. They allow us to define the CoS in terms of a relative distance function between the empirical copula, the Fr\’echet-Hoeffding bounds and the independence copula. Monte Carlo simulations reveal that for large sample sizes, the CoS is approximately normal. This property is utilised to develop a CoS-based statistical test of independence against various noisy functional dependencies. It is shown that this test exhibits higher statistical power than the Total Information Coefficient (TICe), the Distance Correlation (dCor), the Randomized Dependence Coefficient (RDC), and the Copula Correlation (Ccor) for monotonic and circular functional dependencies. Furthermore, the R2-equitability of the CoS is investigated for estimating the strength of a collection of functional dependencies with additive Gaussian noise. Finally, the CoS is applied to a real stock market data set from which we infer that a bivariate analysis is insufficient to unveil multivariate dependencies and to two gene expression data sets of the Yeast and of the E. Coli, which allow us to demonstrate the good performance of the CoS. Corpora Agnostic Word Vectorization Method(WordNet2Vec) A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position – role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. Corpus Linguistics Corpus linguistics is the study of language as expressed in samples (corpora) of “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process. Corpus linguistics adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Sinclair advocating minimal annotation and allowing texts to ‘speak for themselves’, to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour. Correlated Topic Model(CTM) Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets. CORrelation ALignment(CORAL) In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves state-of-the-art performance on standard benchmark datasets. Our code is available at:~\url{https://…/CORAL} CORrelation Differences(CORD) Given a zero mean random vector X=:(X1,…,Xp) ∈ R^p, we consider the problem of defining and estimating a partition G of {1,…,p} such that the components of X with indices in the same group of the partition have a similar, community-like behavior. We introduce a new model, the G-exchangeable model, to define group similarity. This model is a natural extension of the more commonly used G-latent model, for which the partition G is generally not identifiable, without additional restrictions on X. In contrast, we show that for any random vector X there exists an identifiable partition G according to which X is G-exchangeable, thereby providing a clear target for community estimation. Moreover, we provide another model, the G-block covariance model, which generalizes the G-exchangeable model, and can be of interest in its own right for defining group similarity. We discuss connections between the three types of G-models. We exploit the connection with G-block covariance models to develop a new metric, CORD, and a homonymous method for community estimation. We specialize and analyze our method for Gaussian copula data. We show that this method recovers the partition according to which X is G-exchangeable with a G-block copula correlation matrix. In the particular case of Gaussian distributions, this estimator, under mild assumptions, identifies the unique minimal partition according to the G-latent model. The CORD estimator is consistent as long as the communities are separated at a rate that we prove to be minimax optimal, via lower bound calculations. Our procedure is fast and extensive numerical studies show that it recovers communities defined by our models, while existing variable clustering algorithms typically fail to do so. This is further supported by two real-data examples. Correntropy Correntropy is a nonlinear similarity measure between two random variables. Learning with the Maximum Correntropy Criterion Induced Losses for Regression Correspondence Analysis(CA) Correspondence analysis (CA) is a multivariate statistical technique proposed by Hirschfeld and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form. ➘ “Principal Component Analysis” Cortana Analytics Cortana Analytics is a fully managed big data and advanced analytics suite that enables you to transform your data into intelligent action. Cosine Similarity Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in. Note that these bounds apply for any number of dimensions, and cosine similarity is most commonly used in high-dimensional positive spaces. For example, in information retrieval and text mining, each term is notionally assigned a different dimension and a document is characterised by a vector where the value of each dimension corresponds to the number of times that term appears in the document. Cosine similarity then gives a useful measure of how similar two documents are likely to be in terms of their subject matter. The technique is also used to measure cohesion within clusters in the field of data mining. Cosinor Analysis Cosinor analysis uses the least squares method to fit a sine wave to a time series. Cosinor analysis is often used in the analysis of biologic time series that demonstrate predictible rhythms. This method can be used with an unequally spaced time series. Counterfactual Fairness Machine learning has matured to the point to where it is now being considered to automate decisions in loan lending, employee hiring, and predictive policing. In many of these scenarios however, previous decisions have been made that are unfairly biased against certain subpopulations (e.g., those of a particular race, gender, or sexual orientation). Because this past data is often biased, machine learning predictors must account for this to avoid perpetuating discriminatory practices (or incidentally making new ones). In this paper, we develop a framework for modeling fairness in any dataset using tools from counterfactual inference. We propose a definition called counterfactual fairness that captures the intuition that a decision is fair towards an individual if it gives the same predictions in (a) the observed world and (b) a world where the individual had always belonged to a different demographic group, other background causes of the outcome being equal. We demonstrate our framework on two real-world problems: fair prediction of law school success, and fair modeling of an individual’s criminality in policing data. Counterfactual Inference Count-Min Sketch In computing, the count-min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count-min sketch was invented in 2003 by Graham Count-min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets. Spectral Bloom filters with multi-set policy are conceptually isomorphic to the count-min sketch. Coupled Sparse Asymmetric Least Squares(COSALES) SALES Covariance Matrix Adaptation Evolution Strategy(CMA-ES) CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value f(x). Like this, over the generation sequence, individuals with better and better f-values are generated. In an evolution strategy, new candidate solutions are sampled according to a multivariate normal distribution in the R^n. Recombination amounts to selecting a new mean value for the distribution. Mutation amounts to adding a random vector, a perturbation with zero mean. Pairwise dependencies between the variables in the distribution are represented by a covariance matrix. The covariance matrix adaptation (CMA) is a method to update the covariance matrix of this distribution. This is particularly useful, if the function f is ill-conditioned. Adaptation of the covariance matrix amounts to learning a second order model of the underlying objective function similar to the approximation of the inverse Hessian matrix in the Quasi-Newton method in classical optimization. In contrast to most classical methods, fewer assumptions on the nature of the underlying objective function are made. Only the ranking between candidate solutions is exploited for learning the sample distribution and neither derivatives nor even the function values themselves are required by the method. Covariate Balancing Propensity Score(CBPS) The propensity score plays a central role in a variety of causal inference settings. In particular, matching and weighting methods based on the estimated propensity score have become increasingly common in observational studies. Despite their popularity and theoretical appeal, the main practical difficulty of these methods is that the propensity score must be estimated. Researchers have found that slight misspecification of the propensity score model can result in substantial bias of estimated treatment effects. In this paper, we introduce covariate balancing propensity score (CBPS) methodology, which models treatment assignment while optimizing the covariate balance. This is done by exploiting the dual characteristics of the propensity score as a covariate balancing score and the conditional probability of treatment assignment. The estimation of the CBPS is done within the generalized method of moments or empirical likelihood framework. We find that the CBPS dramatically improves the poor empirical performance of propensity score matching and weighting methods reported in the literature. We also show that the CBPS can be extended to a number of other important settings, including the estimation of the generalized propensity score for non-binary treatments and the generalization of experimental estimates to a target population. Open-source software is available for implementing the proposed methods. Coverage Probability In statistics, the coverage probability of a confidence interval is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of months that people with a particular type of cancer remain in remission following successful treatment with chemotherapy. The confidence interval aims to contain the unknown mean remission duration with a given probability. This is the “confidence level” or “confidence coefficient” of the constructed interval which is effectively the “nominal coverage probability” of the procedure for constructing confidence intervals. The “nominal coverage probability” is often set at 0.95. The coverage probability is the actual probability that the interval contains the true mean remission duration in this example. Cox Proportional-Hazards Regression Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. The method represents the effects of explanatory variables as a multiplier of a common baseline hazard function, h0(t). The hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. Cox Regression The term Cox regression model (omitting proportional hazards) is sometimes used to describe the extension of the Cox model to include time-dependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model. Coxcomb Plot / Polar Area Diagram The polar area diagram is similar to a usual pie chart, except sectors are equal angles and differ rather in how far each sector extends from the center of the circle. The polar area diagram is used to plot cyclic phenomena (e.g., count of deaths by month). For example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors (one per month) all with the same angle of 30 degrees each. The radius of each sector would be proportional to the square root of the death count for the month, so the area of a sector represents the number of deaths in a month. If the death count in each month is subdivided by cause of death, it is possible to make multiple comparisons on one diagram, as is seen in the polar area diagram famously developed by Florence Nightingale. Credible Interval In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis; Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. For example, in an experiment that determines the uncertainty distribution of parameter t, if the probability that t lies between 35 and 45 is 0.95, then 35 <= t <= 45 is a 95% credible interval. Credible Interval / Credibility Interval In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics. For example, in an experiment that determines the uncertainty distribution of parameter , if the probability that lies between 35 and 45 is 0.95, then is a 95% credible interval. Critical Line Algorithm(CLA) The critical line method developed by the Nobel Prize winner H. Markowitz is a classical technique for the construction of a minimum-variance frontier within the paradigm of ‘the expected return-risk’ (mean-variance) and finding minimum portfolios. Considerable interest has recently been attracted to the development of a fast algorithm for the construction of the minimum-variance frontier. In some works, such algorithms have been used to find statistically stable optimal portfoli.o An Open-Source Implementation of the Critical-Line Algorithm for Portfolio Optimization The Constrained Critical Line Algorithm The Critical Line Method Applying Markowitz’s Critical Line Algorithm Cross Entropy In information theory, the cross entropy between two probability distributions over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an ‘unnatural’ probability distribution q, rather than the ‘true’ distribution p. Cross Industry Standard Process for Data Mining(CRISP-DM) CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners. The only other data mining standard named in these polls was SEMMA. However, 3-4 times as many people reported using CRISP-DM. A review and critique of data mining process models in 2009 called the CRISP-DM the “de facto standard for developing data mining and knowledge discovery projects.” Other reviews of CRISP-DM and data mining process models include Kurgan and Musilek’s 2006 review, and Azevedo and Santos’ 2008 comparison of CRISP-DM and SEMMA. Cross Validation Cross-validation, sometimes called rotation estimation, is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is worth highlighting that in a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). The goal of cross validation is to define a dataset to “test” the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc. CrossCat CrossCat is a domain-general, Bayesian method for analyzing high-dimensional data tables. CrossCat estimates the full joint distribution over the variables in the table from the data, via approximate inference in a hierarchical, nonparametric Bayesian model, and provides efficient samplers for every conditional distribution. CrossCat combines strengths of nonparametric mixture modeling and Bayesian network structure learning: it can model any joint distribution given enough data by positing latent variables, but also discovers independencies between the observable variables. A range of exploratory analysis and predictive modeling tasks can be addressed via CrossCat, including detecting predictive relationships between variables, finding multiple overlapping clusterings, imputing missing values, and simultaneously selecting features and classifying rows. Research on CrossCat has shown that it is suitable for analysis of real-world tables of up to 10 million cells, including hospital cost and quality measures, voting records, handwritten digits, and state-level unemployment time series. Cross-Entropy Clustering We build a general and easily applicable clustering theory, which we call crossentropy clustering (shortly CEC), which joins the advantages of classical kmeans (easy implementation and speed) with those of EM (a ne invariance and ability to adapt to clusters of desired shapes). Moreover, contrary to k-means and EM, CEC nds the optimal number of clusters by automatically removing groups which have negative information cost. Although CEC, like EM, can be build on an arbitrary family of densities, in the most important case of Gaussian CEC the division into clusters is a ne invariant.