Automatic documents summarization using ontology based methodologies

Bawakid, Abdullah (2011). Automatic documents summarization using ontology based methodologies. University of Birmingham. Ph.D.

Bawakid_11_PhD.pdf
PDF
Download (4MB)

Abstract

When humans summarize a document they usually read the text first, understand it then attempt to write a summary. In essence, these processes require at least some basic level of background knowledge by the reader. The least of which would be the Natural Language the text is written in. In this thesis, an attempt is made to bridge the gap of machines understanding by proposing a framework backed with knowledge repositories constructed by humans and containing real human concepts. I use WordNet, a hierarchically-structured repository that was created by linguistic experts and is rich in its explicitly defined lexical relations. With WordNet, algorithms for computing the semantic similarity between terms were proposed and implemented. These algorithms were especially useful when applied to the application of Automatic Documents Summarization as shown with the obtained evaluation results.

I also use Wikipedia, the largest encyclopedia to date. Because of its openness and structure, three problems had to be handled in this thesis: Extracting knowledge and features from Wikipedia, enriching the representation of text documents with the extracted features, and using them in the application of Automatic Summarization. When applying the features extractor to a summarization system, competitive evaluation results were obtained.

Type of Work:

Thesis (Doctorates > Ph.D.)

Award Type:

Doctorates > Ph.D.

Supervisor(s):