Automatic text summarisation using linguistic knowledge-based semantics

Mohamed, Muhidin Abdullahi (2016). Automatic text summarisation using linguistic knowledge-based semantics. University of Birmingham. Ph.D.

[img]
Preview
Mohamed16PhD.pdf
PDF - Accepted Version

Download (3MB)

Abstract

Text summarisation is reducing a text document to a short substitute summary. Since the commencement of the field, almost all summarisation research works implemented to this date involve identification and extraction of the most important document/cluster segments, called extraction. This typically involves scoring each document sentence according to a composite scoring function consisting of surface level and semantic features. Enabling machines to analyse text features and understand their meaning potentially requires both text semantic analysis and equipping computers with an external semantic knowledge. This thesis addresses extractive text summarisation by proposing a number of semantic and knowledge-based approaches. The work combines the high-quality semantic information in WordNet, the crowdsourced encyclopaedic knowledge in Wikipedia, and the manually crafted categorial variation in CatVar, to improve the summary quality. Such improvements are accomplished through sentence level morphological analysis and the incorporation of Wikipedia-based named-entity semantic relatedness while using heuristic algorithms. The study also investigates how sentence-level semantic analysis based on semantic role labelling (SRL), leveraged with a background world knowledge, influences sentence textual similarity and text summarisation. The proposed sentence similarity and summarisation methods were evaluated on standard publicly available datasets such as the Microsoft Research Paraphrase Corpus (MSRPC), TREC-9 Question Variants, and the Document Understanding Conference 2002, 2005, 2006 (DUC 2002, DUC 2005, DUC 2006) Corpora. The project also uses Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for the quantitative assessment of the proposed summarisers’ performances. Results of our systems showed their effectiveness as compared to related state-of-the-art summarisation methods and baselines. Of the proposed summarisers, the SRL Wikipedia-based system demonstrated the best performance.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Oussalah, MouradUNSPECIFIEDUNSPECIFIED
Russell, MartinUNSPECIFIEDUNSPECIFIED
Licence:
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Engineering, Department of Electronic, Electrical and Systems Engineering
Funders: None/not applicable
Subjects: P Language and Literature > P Philology. Linguistics
T Technology > TK Electrical engineering. Electronics Nuclear engineering
URI: http://etheses.bham.ac.uk/id/eprint/6659

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year