Το work with title Bootstrap clustering approaches for organization of data: application in improving grade separability in cervical neoplasia by Vourlaki Ioanna-Theoni, Balas Costas, Livanos Georgios, Vardoulakis Emmanouil, Giakos George C., Zervakis Michail is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
I. Vourlaki, C. Balas, G. Livanos, M. Vardoulakis, G. Giakos and M. Zervakis, "Bootstrap clustering approaches for organization of data: application in improving grade separability in cervical neoplasia," Biomed. Signal Process. Control, vol. 49, pp. 263-273, Mar. 2019. doi: 10.1016/j.bspc.2018.12.014
https://doi.org/10.1016/j.bspc.2018.12.014
This study introduces a novel technique for the self-organization of large datasets, without prior knowledge on the statistical distribution of data. The particular application of interest concerns the self-organization of diffuse reflectance time-lapse curves, expressing the biomarker uptake-wash out kinetics in cervical epithelium. Dynamic spectral imaging generates one curve per pixel resulting to about 700.000 curves per person examined. It comprises an established technology for the non-invasive diagnosis of precancerous lesions, since various curve profiles represent distinct neoplasia grades. The methodology developed in this study constitutes an effective automatic clustering approach improving the discrimination ability between a number of precancerous and non-precancerous abnormalities. The automatic clustering of such a large number of curves is expected to facilitate early diagnosis and prognosis, as well as to assist the management of cervical neoplasia. The effectiveness of the proposed approach stems from the incorporation of data bootstrapping within the clustering approach and the adoption of appropriate distance metrics for assessing class coherence. Each bootstrap step derives a set of cluster centroids, which are then regrouped into active class centers based on a meta-data clustering step. The proposed methodology searches for hidden characteristics within the processed dataset and reveals additional data structures or subclasses that can be utilized for identifying irregular groups, which are of particular importance in disease modeling and management. More specifically, a hidden class was revealed in cervical neoplasia with significant confidence indicated by the metrics of Silhouette, Calonski Harabasz and Dunn´s indices, standard deviation of minimum distance metrics. The results of this study show that appropriate bootstrap extensions of simple clustering schemes can effectively organize large time-series data, giving rise to exploratory approaches for subclass identification that facilitate accurate and early disease diagnosis.