JOURNAL OF ISTANBUL FACULTY OF MEDICINE-ISTANBUL TIP FAKULTESI DERGISI, 2022 (ESCI)
Objective: It is crucial to know the underlying causes of hepa-tocellular carcinoma (HCC) for optimal management. This study aims to classify open access gene expression data of HCC pa-tients who have an HBV or HCV infection using the XGboost method.Material and Methods: This case-control study considered the open-access gene expression data of patients with HBV-related HCC and HCV-related HCC. For this purpose, data from 17 patients with HBV+HCC and 17 patients with HCV+HCC were included. XGboost was constructed for the classification via ten-fold cross-validation. Accuracy, balanced accuracy, sensitivity, specificity, the positive predictive value, the negative predictive value, and F1 score performance metrics were evaluated for a model performance. Results: With the feature selection approach, 17 genes were chosen, and modeling was done using these input variables. Accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and the F1 score obtained from the XGboost model were 97.1%, 97.1%, 94.1%, 100%, 100%, 94.4%, and 97%, respectively. Based on the variable importance findings from the XGboost, the ALDOC, GLUD2, TRAPPC10, FLJ12998, RPL39, KDELR2, and KIAA0446 genes can be employed as potential biomarkers for HBV-related HCC.Conclusion: As a result of the study, two different etiological factors (HBV and HCV) causing HCC were classified using a ma-chine learning-based prediction approach, and genes that could be biomarkers for HBV-related HCC were identified. After the resulting genes have been clinically validated in subsequent research, therapeutic procedures based on these genes can be established and their utility in clinical practice documented.