Medicine Science , vol.10, no.4, pp.1524-1533, 2021 (Peer-Reviewed Journal)
COVID-19, which is a highly contagious disease, has different symptoms in humans. Therefore, the scientific and genetic status of the virus should be clarified as soon
as possible. This study aims to classify COVID-19 and determine the important genes related to the disease by applying the ensemble learning techniques on the public
COVID-19 dataset. The data set consists of 579 genes belonging to 32 individuals. While 10 of these people are not COVID-19, 22 are people with COVID-19. In this
study Lasso, one of the feature selection methods was used. The ensemble learning methods (Bagging, Boosting, and Stacking) were applied to the public dataset. The
performance of the models used was evaluated with accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Of the constructed ensemble
models, the Stacking technique produced the best classification performance compared to the Bagging and Boosting methods. Accuracy, sensitivity, specificity, positive
predictive value, negative predictive value, and F1 score obtained from the Stacking technique were 99.85%, 99.91%, 99.82%, 99.64%, 99.95%, and 99.89respectively.
CD22, CD19, C4BPA, ARHGDIB, AICDA, CCR5, CCL7, CCL26, CCL22 and CCL16 genes calculated from the Stacking method were the most important genes related
to COVID-19. The genes determined from the model may be determinants for early diagnosis and treatment of the COVID-19 disease.