Mohamed, Muhidin Abdullahi (2016). Automatic text summarisation using linguistic knowledge-based semantics. University of Birmingham. Ph.D.
|
Mohamed16PhD.pdf
PDF - Accepted Version Download (3MB) |
Abstract
Text summarisation is reducing a text document to a short substitute summary. Since the commencement of the field, almost all summarisation research works implemented to this date involve identification and extraction of the most important document/cluster segments, called extraction. This typically involves scoring each document sentence according to a composite scoring function consisting of surface level and semantic features. Enabling machines to analyse text features and understand their meaning potentially requires both text semantic analysis and equipping computers with an external semantic knowledge. This thesis addresses extractive text summarisation by proposing a number of semantic and knowledge-based approaches. The work combines the high-quality semantic information in WordNet, the crowdsourced encyclopaedic knowledge in Wikipedia, and the manually crafted categorial variation in CatVar, to improve the summary quality. Such improvements are accomplished through sentence level morphological analysis and the incorporation of Wikipedia-based named-entity semantic relatedness while using heuristic algorithms. The study also investigates how sentence-level semantic analysis based on semantic role labelling (SRL), leveraged with a background world knowledge, influences sentence textual similarity and text summarisation. The proposed sentence similarity and summarisation methods were evaluated on standard publicly available datasets such as the Microsoft Research Paraphrase Corpus (MSRPC), TREC-9 Question Variants, and the Document Understanding Conference 2002, 2005, 2006 (DUC 2002, DUC 2005, DUC 2006) Corpora. The project also uses Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for the quantitative assessment of the proposed summarisers’ performances. Results of our systems showed their effectiveness as compared to related state-of-the-art summarisation methods and baselines. Of the proposed summarisers, the SRL Wikipedia-based system demonstrated the best performance.
Type of Work: | Thesis (Doctorates > Ph.D.) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Award Type: | Doctorates > Ph.D. | |||||||||
Supervisor(s): |
|
|||||||||
Licence: | ||||||||||
College/Faculty: | Colleges (2008 onwards) > College of Engineering & Physical Sciences | |||||||||
School or Department: | School of Engineering, Department of Electronic, Electrical and Systems Engineering | |||||||||
Funders: | None/not applicable | |||||||||
Subjects: | P Language and Literature > P Philology. Linguistics T Technology > TK Electrical engineering. Electronics Nuclear engineering |
|||||||||
URI: | http://etheses.bham.ac.uk/id/eprint/6659 |
Actions
Request a Correction | |
View Item |
Downloads
Downloads per month over past year