Enhancing extractive multi-documents summarization with a novel dominating set model for semantic relationship detection


Yunus S., HARK C., OKUMUŞ F.

Engineering Science and Technology, an International Journal, cilt.69, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 69
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.jestch.2025.102127
  • Dergi Adı: Engineering Science and Technology, an International Journal
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: Automatic text summarization, Graph theory, Minimum dominating set, Natural language processing
  • İnönü Üniversitesi Adresli: Evet

Özet

In this paper, the Dominant Set-Based Extractive Text summarizing (DSETS) framework is proposed, which gives a new approach to automatic text summarizing. Utilizing the Minimum Dominant Set technique, the proposed framework creates summaries based on a word-level graphical representation that minimizes information loss while maintaining significant semantics. DSETS aims to inspire an alternative perspective on the computational text summarization method. The proposed framework distributes the processing load and reduces time complexity with the segmentation it applies, thus providing more scalable performance on large datasets. Additionally, empirical runtime and memory evaluations revealed that the proposed segmentation strategy reduced processing time by up to 24 % and offered comparable memory usage to lighter baseline methods, demonstrating its practicality in resource-constrained environments. After comparing the effectiveness of the DSETS framework with a series of text summarization techniques, it was determined that it offers significantly improved text summarization performance. Experiments were conducted using four different datasets (BBC News, XSum, CNN/Daily Mail and MultiNews) and summaries of varying word lengths were generated. The proposed framework achieved the highest ROUGE (1, 2, L, W) scores on most of the summary configurations generated on different datasets and various word counts. In particular, ROUGE-W F-scores improved by up to 15.8 %, while ROUGE-1 and ROUGE-L showed significant increases of 3 % to 8 % across various summary lengths. The evaluation results suggest that the DSETS framework was able to outperform many state-of-the-art summarization methods, with improvements observed between 1.3 % and 15.8 % depending on the metric and dataset. To better understand which parts of the system contributed most to this success, an ablation study was carried out. The findings from this analysis indicated that the segmentation mechanism and the semantic filtering process played a key role—particularly in enhancing recall-based performance. Taken together, these results indicate that DSETS is not only a strong and reliable framework for extractive summarization, especially in single-topic documents, but also a promising option for building lightweight and interpretable summarization systems in future applications.