eTheses Repository

Speech recognition in programmable logic

Melnikoff, Stephen Jonathan (2003)
Ph.D. thesis, University of Birmingham.

Loading
PDF (1022Kb)

Abstract

Speech recognition is a computationally demanding task, especially the decoding part, which converts pre-processed speech data into words or sub-word units, and which incorporates Viterbi decoding and Gaussian distribution calculations. In this thesis, this part of the recognition process is implemented in programmable logic, specifically, on a field-programmable gate array (FPGA). Relevant background material about speech recognition is presented, along with a critical review of previous hardware implementations. Designs for a decoder suitable for implementation in hardware are then described. These include details of how multiple speech files can be processed in parallel, and an original implementation of an algorithm for summing Gaussian mixture components in the log domain. These designs are then implemented on an FPGA. An assessment is made as to how appropriate it is to use hardware for speech recognition. It is concluded that while certain parts of the recognition algorithm are not well suited to this medium, much of it is, and so an efficient implementation is possible. Also presented is an original analysis of the requirements of speech recognition for hardware and software, which relates the parameters that dictate the complexity of the system to processing speed and bandwidth. The FPGA implementations are compared to equivalent software, written for that purpose. For a contemporary FPGA and processor, the FPGA outperforms the software by an order of magnitude.

Type of Work:Ph.D. thesis.
Supervisor(s):Quigley, Steven Francis
School/Faculty:Schools (1998 to 2008) > School of Engineering
Department:Electronic, Electrical and Computer Engineering
Additional Information:

Publications in the Appendix are available at http://eprints.bham.ac.uk/23/ http://eprints.bham.ac.uk/24/ http://eprints.bham.ac.uk/25/ http://eprints.bham.ac.uk/26/ http://eprints.bham.ac.uk/27/ http://eprints.bham.ac.uk/28/

Keywords:Speech recognition, programmable logic, FPGA
Subjects:TK Electrical engineering. Electronics Nuclear engineering
QA75 Electronic computers. Computer science
Institution:University of Birmingham
Library Catalogue:Check for printed version of this thesis
ID Code:16
This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.
Export Reference As : ASCII + BibTeX + Dublin Core + EndNote + HTML + METS + MODS + OpenURL Object + Reference Manager + Refer + RefWorks
Share this item :
QR Code for this page

Repository Staff Only: item control page