Classification of healthy controls and Covid-19 cases established on transcriptomic analysis using proposed ensemble model


Küçükakçalı Z., Yaşar Ş., Çolak C.

Medicine Science , vol.10, no.4, pp.1524-1533, 2021 (Peer-Reviewed Journal)

  • Publication Type: Article / Article
  • Volume: 10 Issue: 4
  • Publication Date: 2021
  • Doi Number: 10.5455/medscience.2021.09.284
  • Journal Name: Medicine Science
  • Journal Indexes: TR DİZİN (ULAKBİM)
  • Page Numbers: pp.1524-1533
  • Inonu University Affiliated: Yes

Abstract

COVID-19, which is a highly contagious disease, has different symptoms in humans. Therefore, the scientific and genetic status of the virus should be clarified as soon

as possible. This study aims to classify COVID-19 and determine the important genes related to the disease by applying the ensemble learning techniques on the public

COVID-19 dataset. The data set consists of 579 genes belonging to 32 individuals. While 10 of these people are not COVID-19, 22 are people with COVID-19. In this

study Lasso, one of the feature selection methods was used. The ensemble learning methods (Bagging, Boosting, and Stacking) were applied to the public dataset. The

performance of the models used was evaluated with accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Of the constructed ensemble

models, the Stacking technique produced the best classification performance compared to the Bagging and Boosting methods. Accuracy, sensitivity, specificity, positive

predictive value, negative predictive value, and F1 score obtained from the Stacking technique were 99.85%, 99.91%, 99.82%, 99.64%, 99.95%, and 99.89respectively.

CD22, CD19, C4BPA, ARHGDIB, AICDA, CCR5, CCL7, CCL26, CCL22 and CCL16 genes calculated from the Stacking method were the most important genes related

to COVID-19. The genes determined from the model may be determinants for early diagnosis and treatment of the COVID-19 disease.