The selective use of gaze in automatic speech recognition

Shen, Ao (2014). The selective use of gaze in automatic speech recognition. University of Birmingham. Ph.D.

[img]
Preview
Shen14PhD.pdf
PDF - Accepted Version

Download (3MB)

Abstract

The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between the ASR training and recognition conditions, which leads to considerable performance degradation. To improve noise-robustness, exploiting prior knowledge of the acoustic noise in speech enhancement, feature extraction and recognition models are popular approaches. An alternative approach presented in this thesis is to introduce eye gaze as an extra modality. Eye gaze behaviours have roles in interaction and contain information about cognition and visual attention; not all behaviours are relevant to speech. Therefore, gaze behaviours are used selectively to improve ASR performance. This is achieved by inference procedures using noise-dependant models of gaze behaviours and their temporal and semantic relationship with speech. `Selective gaze-contingent ASR' systems are proposed and evaluated on a corpus of eye movement and related speech in different clean, noisy environments. The best performing systems utilise both acoustic and language model adaptation.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Cooke, NeilUNSPECIFIEDUNSPECIFIED
Licence:
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Engineering, Department of Electronic, Electrical and Systems Engineering
Funders: None/not applicable
Subjects: P Language and Literature > P Philology. Linguistics
T Technology > TK Electrical engineering. Electronics Nuclear engineering
URI: http://etheses.bham.ac.uk/id/eprint/5202

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year