Eye-speech affect detection for automatic speech recognition

Alhargan, Ashwaq H (2019). Eye-speech affect detection for automatic speech recognition. University of Birmingham. Ph.D.

[img] Alhargan2019PhD.pdf
Text - Accepted Version
Restricted to Repository staff only until 1 January 2025.
Available under License All rights reserved.

Download (3MB) | Request a copy


Human-computer interaction (HCI) is becoming increasingly natural. Machines are now able to recognise faces, to understand individual speech and to converse like a human would. However, they are still far from exhibiting humanlike intelligence. Affects play an important role in interaction, so understanding and responding to them are necessary steps towards more natural HCI.
This thesis reports the development and evaluation of affect detection systems suitable for use in real-life HCI applications (e.g. speech-enabled interfaces such as Alexa) using speech and eye movement modalities. A corpus of spontaneous affective responses in these modalities within an interactive virtual gaming environment, designed to elicit different affective states corresponding to the arousal and valence dimensions, was collected. A support vector machine was employed as a classifier to detect the affects elicited from both modalities. Several features of eye movement, namely pupillary response, fixation, saccade and blinking, are assessed for use in affect detection and new pupil response features based on the Hilbert transform are proposed. Acoustic and lexical characteristics of speech are investigated. The detection results suggest that eye movement is superior to speech, with pupillary response features based on Hilbert transform yielding superior performance on the arousal dimension, whereas saccade and fixation features perform better on the valence dimension. The improvement made by combining information from eye movement and speech modalities suggests that the two modalities carry complementary information for affect detection and that both warrant incorporation where feasible. An ASR application integrating affective information from both modalities for affect robustness was investigated. The best performing system uses affective information from eye movements, significantly reducing word error rates compared to the speech modality alone. This work highlights the potential of eye movements as an additional modality to speech to enhance the accuracy of affect detection and facilitate the development of robust affect-aware speech-enabled interfaces

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Engineering, Department of Electronic, Electrical and Systems Engineering
Funders: None/not applicable
Subjects: Q Science > Q Science (General)
T Technology > T Technology (General)
URI: http://etheses.bham.ac.uk/id/eprint/9405


Request a Correction Request a Correction
View Item View Item


Downloads per month over past year