Theodoros Papadomanolakis, "Head tumor diagnostics using MRI, patient’s pathology data and machine learning algorithms ", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2020
https://doi.org/10.26233/heallink.tuc.85653
Background and study aims: Magnetic Resonance Imaging (MRI) of the brain along with patients’ pathology data can greatly assist radiologists and doctors in providing a more precise diagnosis and therapy. Because of their unpredictable appearance and shape, segmenting brain tumors from multi-modal imaging data is one of the most challenging tasks in medical image analysis. Manual detection and classification of brain tumor by an expert is still considered the most acceptable method, but it is too time-consuming, especially because of the large amount of data that have to be analyzed manually. The purpose of the present study is to train and validate AI algorithms, i.e. Machine Learning (ML) such as Support Vector Machine (SVM) and deep learning algorithms such as CNN algorithms, to classify MRI images of brains between non tumorous and tumorous. Materials and methods: The image dataset selected contains total 291 male and female adult persons, from which 210 tumorous and 81 non-tumorous cases that a neurosurgeon partner, has segmented all visually. The healthy MRI scans are provided by “St. George” general hospital of Chania, Greece and the unhealthy MRI scans are provided from the Multimodal Image Segmentation Challenge (BRATS). All the MRI images are T2 weighted, from the axial plane. The above dataset divided into subsets. The training sub dataset amounts to 191 tumorous/66 non tumorous cases, and the validation sub dataset which amounts to 19 tumorous/15 non-tumorous cases (56%/44%). Many different scenarios of different methodologies, each one including combination of AI algorithms, using different kind of features as input, and different size of data sets, i.e. balanced or unbalanced data set (74% tumorous/26% non-tumorous), and different training techniques, were implemented in this thesis. Performance metrics such as accuracy, sensitivity and specificity are computed to evaluate the effectiveness of each implemented methodology. Standardization of the training set and 10-fold split for grid search using cross-validation was applied. The balanced dataset amounts to 66 tumorous and 66 non tumorous cases. The unbalanced set amounts to 191 tumorous /non tumorous 66 cases. Training implemented using the gray scale pixel values of the raw whole images as data features values, also using three-level discrete wavelet transform coefficients of the raw whole images and alternatively using the measure of wavelet entropy calculated from three-level discrete wavelet transform coefficients of each whole row image (and of the quarters in which the image is divided). In all cases training implemented with or without applying Principal Component Analysis (in order to reduce the dimensionality of coefficients to 15). The augmentation technique was applied in the case of balanced dataset, in order to generate dataset of 400 tumorous images and 400 non tumorous images, for training the CNN algorithm. Results: Τhe implemented algorithm based on CNN, trained by balanced dataset, using the discrete wavelet transform coefficients of the whole row images, provided the highest scores: 100% Sensitivity, 97% Accuracy, 93% Specificity, 95% Precision, 0% FNR and 6% FPR and the algorithm implemented based on SVM, trained by balanced dataset, using the pixel values of row whole images as features provided the second highest scores: 100% Sensitivity, 91% Accuracy, 80% Specificity, 86% Precision, 0% FNR and 20% FPR. In both algorithms, no PCA technic is applied. Moreover, it is observed that for the scenario where the training implemented using unbalanced dataset, the features extracted of the images divided in quarters provided better results than that extracted of the whole image.