Context classification for improved semantic understanding of mathematical formulae

Almomen, Randa (2018). Context classification for improved semantic understanding of mathematical formulae. University of Birmingham. Ph.D.

[img]
Preview
Almomen18PhD.pdf
PDF - Accepted Version

Download (1MB)

Abstract

The correct semantic interpretation of mathematical formulae in electronic mathematical documents is an important prerequisite for advanced tasks such as search, accessibility or computational processing. Especially in advanced maths, the meaning of characters and symbols is highly domain dependent, and only limited information can be gained from considering individual formulae and their structures. Although many approaches have been proposed for semantic interpretation of mathematical formulae, most of them rely on the limited semantics from maths representation languages whereas very few use maths context as a source of information. This thesis presents a novel approach for principal extraction of semantic information of mathematical formulae from their context in documents. We utilised different supervised machine learning (SML) techniques (i.e. Linear-Chain Conditional Random Fields (CRF), Maximum Entropy (MaxEnt) and Maximum Entropy Markov Models (MEMM) combined with Rprop- and Rprop+ optimisation algorithms) to detect definitions of simple and compound mathematical expressions, thereby deriving their meaning. The learning algorithms demand annotated corpus which its development considered as one of this thesis contributions. The corpus has been developed utilising a novel approach to extract desired maths expressions and sub-formulae and manually annotated by two independent annotators employing a standard measure for inter-annotation agreement. The thesis further developed a new approach to feature representation depending on the definitions' templates that extracted from maths documents to defeat the restraint of conventional window-based features. All contributions were evaluated by various techniques including employing the common metrics recall, precision, and harmonic F-measure.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Sorge, VolkerUNSPECIFIEDUNSPECIFIED
Licence:
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: None/not applicable
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
URI: http://etheses.bham.ac.uk/id/eprint/8611

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year