Bawakid, Abdullah (2011)
Ph.D. thesis, University of Birmingham.
When humans summarize a document they usually read the text first, understand it then attempt to write a summary. In essence, these processes require at least some basic level of background knowledge by the reader. The least of which would be the Natural Language the text is written in. In this thesis, an attempt is made to bridge the gap of machines understanding by proposing a framework backed with knowledge repositories constructed by humans and containing real human concepts. I use WordNet, a hierarchically-structured repository that was created by linguistic experts and is rich in its explicitly defined lexical relations. With WordNet, algorithms for computing the semantic similarity between terms were proposed and implemented. These algorithms were especially useful when applied to the application of Automatic Documents Summarization as shown with the obtained evaluation results.
I also use Wikipedia, the largest encyclopedia to date. Because of its openness and structure, three problems had to be handled in this thesis: Extracting knowledge and features from Wikipedia, enriching the representation of text documents with the extracted features, and using them in the application of Automatic Summarization. When applying the features extractor to a summarization system, competitive evaluation results were obtained.
This unpublished thesis/dissertation is copyright of the author and/or third parties.
The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged.
Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.
Repository Staff Only: item control page