Wan, Jizheng ORCID: https://orcid.org/0000-0002-1069-4582 (2022). Semantic impact - a novel approach for domain concept selection in ontology learning. University of Birmingham. Ph.D.
There is a more recent version of this item available. |
|
Wan2022PhD.pdf
Text - Accepted Version Available under License All rights reserved. Download (5MB) | Preview |
Abstract
One of the remaining challenges of Ontology Learning (OL) is the significant dependence on human interference to decide which of the “learnt” concepts from a training corpus are relevant and/or important to the domain of discourse. Though part of this challenge is deeply rooted in expert knowledge of the application domain, there is no doubt that a good relevance/importance measure with which concepts can be semantically judged serves as a good enhancement to the OL weaponry. A new measure called “Semantic Impact” (SI) is, therefore, proposed to bridge between explicitly defined formal semantics (in the form of ontologies) and the distributional semantics learnt from a vast amount of data.
SI aims to consistently and objectively quantify the semantic importance of a concept by aggregating two different measures: informativeness of a concept and its connectivity (or correlation) with the other concepts. Furthermore, it has been evaluated through two experiments.
The first experiment was conducted within the news domain – using 200 BBC News articles about Donald Trump (between February 2017 and September 2017) to semantically assess the impact of the concepts identified from the corpus/corpora. This experiment successfully learnt, for example, the Date concept is one of the most important concepts in the News domain, even if it has not been included in the BBC Core Concept ontology.
The second experiment was conducted within the biological area – using 2000 documents from PubMed on “Candida” to determine which diseases are more “semantic impact” in the Candida domain knowledge. The results are promising. The proposed system has identified that the most correlated (connected) concept to Disease_D003645 (Sudden Death) is Disease_D003643 (Death) without any pre-defined knowledge (or symbolic processing of such labels). Furthermore, a semantic analogy has been identified between Disease_D008223 (Lymphoma) and Disease_D008228 (Non-Hodgkin Lymphoma) due to a close SI between the two concepts.
In addition, we have systematically evaluated the result from various angles and demonstrated that each component within the SI can produce a good and consistent result. At the macro-level, the overall SI result shows a strong clustering trend. At the micro-level, the SI results for both semantically important and non-important concepts are reasonable and reproducible. Moreover, we have compared it with a contemporary mainstream method to show the advantages of the SI algorithm together with its reproducibility.
Type of Work: | Thesis (Doctorates > Ph.D.) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Award Type: | Doctorates > Ph.D. | |||||||||
Supervisor(s): |
|
|||||||||
Licence: | All rights reserved | |||||||||
College/Faculty: | Colleges (2008 onwards) > College of Engineering & Physical Sciences | |||||||||
School or Department: | School of Computer Science | |||||||||
Funders: | None/not applicable | |||||||||
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science | |||||||||
URI: | http://etheses.bham.ac.uk/id/eprint/12509 |
Available Versions of this Item
- Semantic impact - a novel approach for domain concept selection in ontology learning. (deposited 21 Feb 2023 12:17) [Currently Displayed]
Actions
Request a Correction | |
View Item |
Downloads
Downloads per month over past year