eTheses Repository

The automatic extraction of linguistic information from text corpora

Mason, Oliver Jan (2006)
Ph.D. thesis, University of Birmingham.

PDF (1549Kb)


This is a study exploring the feasibility of a fully automated analysis of linguistic data. It identifies a requirement for large-scale investigations, which cannot be done manually by a human researcher. Instead, methods from natural language processing are suggested as a way to analyse large amounts of corpus data without any human intervention. Human involvement hinders scalability and introduces a bias which prevents studies from being completely replicable. The fundamental assumption underlying this work is that linguistic analysis must be empirical, and that reliance on existing theories or even descriptive categories should be avoided as far as possible. In this thesis we report the results of a number of case studies investigating various areas of language description, lexis, grammar, and meaning. The aim of these case studies is to see how far we can automate the analysis of different aspects of language, both with data gathering and subsequent processing of the data. The outcomes of the feasibility studies demonstrate the practicability of such automated analyses.

Type of Work:Ph.D. thesis.
Supervisor(s):Barnbrook, Geoff
School/Faculty:Schools (1998 to 2008) > School of Humanities
Keywords:corpus linguistics, computational linguistics, natural language processing
Subjects:PE English
P Philology. Linguistics
QA76 Computer software
Institution:University of Birmingham
Library Catalogue:Check for printed version of this thesis
ID Code:116
This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.
Export Reference As : ASCII + BibTeX + Dublin Core + EndNote + HTML + METS + MODS + OpenURL Object + Reference Manager + Refer + RefWorks
Share this item :
QR Code for this page

Repository Staff Only: item control page