Το work with title Evaluation of imputation methods in ovarian tumor diagnostic models using generalized linear models and support vector machines by Zervakis Michalis, Dimou Ioannis, Sabine Van Huffel, Dirk Timmerman is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
I. Dimou, B. V. Calster, S. V. Huffel, D. Timmerman, M. Zervakis ," Evaluation of imputation methods in ovarian tumor diagnostic models using generalized linear models and support vector machines ," Med. Decision Mak.,vol.30 ,no.1 , pp. 123-131 ,2010.doi:10.1177/0272989X09340579
https://doi.org/10.1177/0272989X09340579
Neglecting missing values in diagnostic models can result in unreliable and suboptimal performance on new data. In this study, the authors imputed missing values for the CA-125 tumor marker in a large data set of ovarian tumors that was used to develop models for predicting malignancy. Four imputation techniques were applied: regression imputation, expectation-maximization, data augmentation, and hotdeck. Models using the imputed data sets were compared with models without CA-125 to investigate the important clinical issue concerning the necessity of CA-125 information for diagnostic models and with models using only complete cases to investigate differences between imputation and complete case strategies for missing values. The models are based on Bayesian generalized linear models (GLMs) and Bayesian least squares support vector machines. Results indicate that the use of CA-125 resulted in small, clinically nonsignificant increases in the AUC of diagnostic models. Minor differences between imputation methods were observed, and imputing CA-125 resulted in minor differences in the AUC compared with complete case analysis (CCA). However, GLM parameter estimates of predictor variables often differed between CCA and models based on imputation. The authors conclude that CA-125 is not indispensable in diagnostic models for ovarian tumors and that missing value imputation is preferred over CCA.