A linear grammar approach for the analysis of mathematical documents

Baker, Josef B. (2012). A linear grammar approach for the analysis of mathematical documents. University of Birmingham. Ph.D.


Download (1MB)


Many approaches have been proposed for the recognition of mathematical formulae, traditionally using the results of optical character recognition over scanned documents. However, optical character recognition generally performs poorly when presented with mathematics, making it difficult to accurately parse formulae. Due to the rapidly increasing number of natively digital documents available, an alternative to optical character recognition is now available, that of analysing files directly instead of images.
In this thesis, we explore such a method, analysing files in the ubiquitous Portable Document Format directly and combining it with image analysis, to produce the necessary information for the analysis of mathematical formulae and documents.
We also revisit a method proposed in the 1960s for parsing handwritten mathematics. An extremely efficient, yet impractical approach due to a reliance of perfect input and precise character positioning. We heavily modify and extend this method, removing many of its restrictions and use it in conjunction with the perfect input from the PDF analysis, yielding high quality results which compare favourably with the leading scientific document analysis system.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: None/not applicable
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
T Technology > T Technology (General)
URI: http://etheses.bham.ac.uk/id/eprint/3377


Request a Correction Request a Correction
View Item View Item


Downloads per month over past year