Automatic evaluation of speech quality

Award Information
Agency:
Department of Health and Human Services
Branch
n/a
Amount:
$99,973.00
Award Year:
2004
Program:
SBIR
Phase:
Phase I
Contract:
1R43DC007255-01
Award Id:
71467
Agency Tracking Number:
DC007255
Solicitation Year:
n/a
Solicitation Topic Code:
n/a
Solicitation Number:
n/a
Small Business Information
COMMUNICATION DISORDERS TECHNOLOGY (Currently COMMUNICATION DISORDERS TECHNOLOGY, INC)
COMMUNICATION DISORDERS TECHNLGY, 501 N MORTON ST, STE 215, BLOOMINGTON, IN, 47404
Hubzone Owned:
N
Minority Owned:
N
Woman Owned:
N
Duns:
n/a
Principal Investigator:
CHARLES WATSON
(812) 855-0710
WATSON@INDIANA.EDU
Business Contact:
(812) 336-1766
Research Institution:
n/a
Abstract
DESCRIPTION (provided by applicant): Tests of several different approaches to the automatic evaluation of the quality of speech segments are proposed. Previous systems for use in pronunciation training have typically employed either automatic speech-recognition (ASR) technology, or have used templates based on a limited number of utterances rated as excellent by L1 listeners (and sometimes also employing a second set of utterances containing a common pronunciation error). Here speech-processing technologies (HMM's and ANN's) will be developed specifically for use as evaluation systems (not recognition systems) to predict quality and locus-of-error judgments assigned by listeners. Termed the "evaluation-of-single-words" (ESW) approach, the special feature of these systems will derive from the training tokens employed in their development: multiple recordings of a single word made by groups of native and non-native talkers. Sixty talkers will be native speakers of Arabic, whose intelligibility in English ranges from poor to near-perfect, and 60 talkers will be native speakers of middle-American English. There will be twelve words divided between one, two, and three syllables. Ten productions of each word will be recorded by each talker, yielding 14,400 tokens. Each token will be rated by listening juries for pronunciation quality, and the tokens will also be categorized into perceptual clusters, using MDS and cluster-analysis techniques. At least two computer-based evaluation systems (HMM and ANN) will be trained for each individual word, with the goals of predicting overall pronunciation quality and identifying specific commonly occurring pronunciation errors. It is expected that these word-specific systems, each representing a discrete "evaluator" custom-built for an individual word, will approach the maximum accuracy that can be expected of this class of processors. If successful, the ESW approach may have a broad range of applications in pronunciation training, identification of a speaker's L1, foreign-language instruction, and other non-lexical applications. However, our specific goal is the development of systems that can provide informative feedback during automated pronunciation training. In ASR applications, the goal is to respond the same way to a word, no matter how it is pronounced. The goal of an ESW system is to respond differentially to pronunciation variants. This distinction between ASR and ESW is central to the development of successful evaluation systems as it dictates different modeling constraints.

* information listed above is at the time of submission.

Agency Micro-sites


SBA logo

Department of Agriculture logo

Department of Commerce logo

Department of Defense logo

Department of Education logo

Department of Energy logo

Department of Health and Human Services logo

Department of Homeland Security logo

Department of Transportation logo

Enviromental Protection Agency logo

National Aeronautics and Space Administration logo

National Science Foundation logo
US Flag An Official Website of the United States Government