Machine learning-based ovarian cancer prediction with XGboost and stochastic gradient boosting models


ÖZHAN O., TUNÇ Z., ÇİÇEK İ. B.

Medicine Science, cilt.12, sa.1, ss.231-237, 2023 (Hakemli Dergi) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 12 Sayı: 1
  • Basım Tarihi: 2023
  • Doi Numarası: 10.5455/medscience.2022.09.207
  • Dergi Adı: Medicine Science
  • Derginin Tarandığı İndeksler: TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.231-237
  • İnönü Üniversitesi Adresli: Evet

Özet

Ovarian cancer is one of the most common types of gynecological malignancies with its high mortality rate, silent and occult tumor growth, late onset of symptoms and diagnosis in advanced stages. Therefore, the need to develop new diagnostic techniques to predict the course of the disease and the prognosis of this malignancy has increased. In this study, ovarian cancer and benign ovarian tumor samples will be classified to create an accurate diagnostic predictive model using the machine learning method XGBoost and Stochastic Gradient Boosting and disease-related risk factors will be determined. This current study considered the open- access ovarian cancer and benign ovarian tumor samples data set. For this purpose, data from 349 patients were included. The data set was divided as 80:20 as a training and test dataset. XGBoost and Stochastic Gradient Boosting were constructed for the classification via five-fold cross-validation. Accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, and negative predictive value performance metrics were evaluated for model performance. Among the performance criteria in the test stage obtained from the XGBoost model that has the best classification result; accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were obtained as 89.5%, 88.7%, 85.7%, 91.7%, 85.7%, 91.7%, and 85.7%, respectively. According to the variable importance obtained as a result of the model, the variables most associated with the diagnosis were CA72-4, HE4, LYM%, ALB, EO%, BUN, RBC, NEU, and MCV, respectively. The applied machine learning model successfully classified ovarian cancer and created a highly accurate diagnostic prediction model. The results from the study revealed effective parameters that can diagnose ovarian cancer with high accuracy. With the parameters determined as a result of the modeling, the clinician will be able to simplify and facilitate the decision-making process for the diagnosis of ovarian cancer.