Graph-based extractive summarisation for long documents

Gokhan, Tuba ORCID: 0009-0006-3341-7111 (2024). Graph-based extractive summarisation for long documents. University of Birmingham. Ph.D.

[img]
Preview
Gokhan2024PhD.pdf
Text - Accepted Version
Available under License All rights reserved.

Download (8MB) | Preview

Abstract

The ability to extract the most important information from a longer document or a collection of documents quickly and accurately has always been essential for effective communication and decision-making. By leveraging text summarisation systems, this process can be facilitated more efficiently. Text summarisation help streamline the process by extracting only the essential information from a document or series of documents. Despite significant progress in methods for text summarisation, challenges remain, particularly for unsupervised methods.

This thesis investigates novel unsupervised methods and models in Natural Language Processing (NLP) to improve the quality of text summarisation. Our research makes three major contributions. The first contribution involves improving the performance of sentence similarity detection by combining Deep Learning/Transformer-based models with cluster-based approaches. Our proposed approach improves upon state-of-the-art performance on the Financial News Summarisation (FNS) dataset, indicating its potential for improving the quality of text summarisation. The second contribution explores improving graph models by incorporating more features when calculating node weights. Our proposed approach achieves significant performance gains on four benchmark datasets, demonstrating the potential of incorporating additional features for improving text summarisation. Finally, we propose a novel ranking algorithm for unsupervised graph-based text summarisation. Our proposed algorithm is based on graph centrality measures and can be used to identify the most important nodes in a graph-based summary. We demonstrate the effectiveness of our algorithm through analysis and experiments on four benchmark datasets.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Lee, MarkUNSPECIFIEDUNSPECIFIED
Smith, PhillipUNSPECIFIEDUNSPECIFIED
Madabushi, Harish TayyarUNSPECIFIEDUNSPECIFIED
Licence: All rights reserved
College/Faculty: Colleges > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: None/not applicable
Subjects: Q Science > Q Science (General)
T Technology > T Technology (General)
URI: http://etheses.bham.ac.uk/id/eprint/14859

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year