Conceptual Statistics and Data Mining
With the advent of computers, very large datasets have become routine. Standard statistical methods don’t have the power or flexibility to analyse these efficiently, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals, distributions, etc. The summarized data have their own internal structure, which must be taken into account in any analysis. This text presents a unified account of symbolic data, how they arise, and how they are structured. The reader is introduced to symbolic analytic methods described in the consistent statistical framework required to carry out such a summary and subsequent analysis.