eTheses Repository

A linear grammar approach for the analysis of mathematical documents

Baker, Josef B. (2012)
Ph.D. thesis, University of Birmingham.

Loading
PDF (1466Kb)

Abstract

Many approaches have been proposed for the recognition of mathematical formulae, traditionally using the results of optical character recognition over scanned documents. However, optical character recognition generally performs poorly when presented with mathematics, making it difficult to accurately parse formulae. Due to the rapidly increasing number of natively digital documents available, an alternative to optical character recognition is now available, that of analysing files directly instead of images.
In this thesis, we explore such a method, analysing files in the ubiquitous Portable Document Format directly and combining it with image analysis, to produce the necessary information for the analysis of mathematical formulae and documents.
We also revisit a method proposed in the 1960s for parsing handwritten mathematics. An extremely efficient, yet impractical approach due to a reliance of perfect input and precise character positioning. We heavily modify and extend this method, removing many of its restrictions and use it in conjunction with the perfect input from the PDF analysis, yielding high quality results which compare favourably with the leading scientific document analysis system.

Type of Work:Ph.D. thesis.
Supervisor(s):Sorge, Volker
School/Faculty:Colleges (2008 onwards) > College of Engineering & Physical Sciences
Department:School of Computer Science
Subjects:QA Mathematics
QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
Institution:University of Birmingham
ID Code:3377
This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.
Export Reference As : ASCII + BibTeX + Dublin Core + EndNote + HTML + METS + MODS + OpenURL Object + Reference Manager + Refer + RefWorks
Share this item :
QR Code for this page

Repository Staff Only: item control page