Estimation of risk factors associated with colorectal cancer: an application of knowledge discovery in databases


Firat F., ARSLAN A. K., ÇOLAK C., HARPUTLUOĞLU H.

KUWAIT JOURNAL OF SCIENCE, cilt.43, sa.2, ss.151-161, 2016 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 43 Sayı: 2
  • Basım Tarihi: 2016
  • Dergi Adı: KUWAIT JOURNAL OF SCIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.151-161
  • Anahtar Kelimeler: Artificial neural networks, colorectal cancer, knowledge discovery in databases, risk factors, NEURAL-NETWORKS, CONSUMPTION, POPULATION, MORTALITY, PLATELET, SODIUM
  • İnönü Üniversitesi Adresli: Evet

Özet

Colorectal cancer is one of the first reasons for death due to cancer in the world. The goal of this study is to predict important risk factors of colorectal cancer (CRC) by knowledge discovery in databases (KDD) methods. This study comprised a retrospective CRC data of patients who had been diagnosed with colorectal cancer. The selected records between 1 January 2010 and 1 March 2014 were collected randomly from Turgut Ozal Medical Centre databases. The study included 160 individuals: 80 patients admitted to Department of Oncology and diagnosed with CRC, and 80 control subjects with non-CRC categorization. The groups were matched for age and gender. We mined retrospective CRC data from large integrated health systems with electronic health records. Specific demographical and clinical variables including calcium, hemoglobin, white blood cells, platelets, potassium, sodium, glucose, creatinine and total bilirubin were used in multilayer perceptron (MLP) artificial neural networks (ANN) modeling. In this study, patient and control groups consist of 160 individuals. In each group, 45 of these (56.3%) are male, and 35 (43.7%) are women. Mean age of CRC patients and control groups is 58.6 +/- 13.0. While the accuracy was 71.31% in training dataset (n=122), the accuracy was 81.82% in testing dataset. Area under curve (AUC) values of training and testing datasets were 0.73 and 0.81, respectively. The suggested MLP ANN model identified significant factors of calcium, creatinine, potassium, platelets, sodium, hemoglobin and total bilirubin. Taken together, the suggested MLP ANN model might be used for the estimation of risk factors associated with CRC as an application of medical KDD.