Prediction of COVID-19 Based on Genomic Biomarkers of Metagenomic Next-Generation Sequencing Data Using Artificial Intelligence Technology

Akbulut, AHMET; Yağın, FATMA; Çolak, CEMİL

doi:10.14744/etd.2022.00868

Prediction of COVID-19 Based on Genomic Biomarkers of Metagenomic Next-Generation Sequencing Data Using Artificial Intelligence Technology

Akbulut A. S., Yağın F. H., Çolak C.

ERCIYES MEDICAL JOURNAL, cilt.44, ss.544-548, 2022 (ESCI, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 44
Basım Tarihi: 2022
Doi Numarası: 10.14744/etd.2022.00868
Dergi Adı: ERCIYES MEDICAL JOURNAL
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Academic Search Premier, CAB Abstracts, EMBASE, Veterinary Science Database, Directory of Open Access Journals, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.544-548
Anahtar Kelimeler: Artificial intelligence, Boruta, COVID-19 pandemic, feature selection, multi-layer perceptron, SARS-CoV-2 virus
İnönü Üniversitesi Adresli: Evet

Özet

Objective: The primary aim of this study was to use metagenomic next-generation sequencing (mNGS) data to identify coronavirus 2019 (COVID-19)-related biomarker genes and to construct a machine learning model that could successfully differentiate patients with COVID-19 from healthy controls.

Materials and Methods: The mNGS dataset used in the study demonstrated expression of 15,979 genes in the upper airway in 234 patients who were COVID-19 negative and COVID-19 positive. The Boruta method was used to select qualitative biomarker genes associated with COVID-19. Random forest (RF), gradient boosting tree (GBT), and multi-layer perceptron (MLP) models were used to predict COVID-19 based on the selected biomarker genes.

Results: The MLP (0.936) model outperformed the GBT (0.851), and RF (0.809) models in predicting COVID-19. The three most important biomarker candidate genes associated with COVID-19 were IFI27, TPTI, and FAM83A.

Conclusion: The proposed model (MLP) was able to predict COVID-19 successfully. The results showed that the generated model and selected biomarker candidate genes can be used as diagnostic models for clinical testing or potential therapeutic targets and vaccine design