Karc1 summarization: A simple and effective approach for automatic text summarization using Karc1 entropy


Hark C., KARCI A.

INFORMATION PROCESSING & MANAGEMENT, cilt.57, 2020 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 57 Konu: 3
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1016/j.ipm.2019.102187
  • Dergi Adı: INFORMATION PROCESSING & MANAGEMENT

Özet

Increases in the amount of text resources available via the Internet has amplified the need for automated document summarizing tools. However, further efforts are needed in order to improve the quality of the existing summarization tools currently available. The current study proposes Karc1 Summarization, a novel methodology for extractive, generic summarization of text documents. Karc1 Entropy was used for the first time in a document summarization method within a unique approach. An important feature of the proposed system is that it does not require any kind of information source or training data. At the stage of presenting the input text, a tool for text processing was introduced; known as KUSH (named after its authors; Karc1, Uckan, Seyyarer, and Hark), and is used to protect semantic consistency between sentences. The Karc1 Entropy-based solution chooses the most effective, generic and most informational sentences within a paragraph or unit of text. Experimentation with the Karc1 Summarization approach was tested using openaccess document text (Document Understanding Conference; DUC-2002, DUC-2004) datasets. Performance achievement of the Karci Summarization approach was calculated using metrics known as Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The experimental results showed that the proposed summarizer outperformed all current state-of-the-art methods in terms of 200-word summaries in the metrics of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-W-1.2. In addition, the proposed summarizer outperformed the nearest competitive summarizers by a factor of 6.4% for ROUGE-1 Recall on the DUC-2002 dataset. These results demonstrate that Karci Summarization is a promising technique and it is therefore expected to attract interest from researchers in the field. Our approach was shown to have a high potential for adoptability. Moreover, the method was assessed as quite insensitive to disorderly and missing texts due to its KUSH text processing module.