Nikolaos Stratinakis, "Applied cluster analysis", Master Thesis, School of Production Engineering and Management, Technical University of Crete, Chania, Greece, 2018
https://doi.org/10.26233/heallink.tuc.79054
Cluster analysis is a method designed to classify existing observations using the information that exists in some variables. Looking at the observations, one can say how similar they are to a number of variables, creating groups (from observations) that resemble each other. A successful analysis should result in groups for which the observations within each group will be as homogeneous as possible, but observations of different groups will vary as much as possible. Grouping takes place with the help of the concept of distance or similarity.Cluster analysis is important not only in many sciences such as sociology, biology and statistics, but also in many areas of information technology such as pattern recognition, knowledge mining, data recovery, artificial intelligence, and mechanical learning. In the 1st chapter of the thesis there is an introduction to the Cluster Analysis (philosophy of the C.A., methods, advantages / disadvantages of the C.A. and finally problems of its implementation). The 2nd chapter describes the main measures of distance and similarity (depending on the type of variables) used by the C.A. and detailed examples are given. Chapter 3 lists the basic methods of group linkage and describes hierarchical classification methods, with examples. The 4th chapter describes the non-hierarchical k-means method. Finally, in the 5th chapter an application of the C.A., in the classification of atmospheric pollution measuring stations in the Attica region, is given.