USA flag logo/image

An Official Website of the United States Government

Automatic evaluation of speech quality

Award Information

Agency:
Department of Health and Human Services
Branch:
N/A
Award ID:
71467
Program Year/Program:
2004 / SBIR
Agency Tracking Number:
DC007255
Solicitation Year:
N/A
Solicitation Topic Code:
N/A
Solicitation Number:
N/A
Small Business Information
COMMUNICATION DISORDERS TECHNOLOGY, INC
3100 John Hinkle Pl. BLOOMINGTON, IN 47408-
View profile »
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No
 
Phase 1
Fiscal Year: 2004
Title: Automatic evaluation of speech quality
Agency: HHS
Contract: 1R43DC007255-01
Award Amount: $99,973.00
 

Abstract:

DESCRIPTION (provided by applicant): Tests of several different approaches to the automatic evaluation of the quality of speech segments are proposed. Previous systems for use in pronunciation training have typically employed either automatic speech-recognition (ASR) technology, or have used templates based on a limited number of utterances rated as excellent by L1 listeners (and sometimes also employing a second set of utterances containing a common pronunciation error). Here speech-processing technologies (HMM's and ANN's) will be developed specifically for use as evaluation systems (not recognition systems) to predict quality and locus-of-error judgments assigned by listeners. Termed the "evaluation-of-single-words" (ESW) approach, the special feature of these systems will derive from the training tokens employed in their development: multiple recordings of a single word made by groups of native and non-native talkers. Sixty talkers will be native speakers of Arabic, whose intelligibility in English ranges from poor to near-perfect, and 60 talkers will be native speakers of middle-American English. There will be twelve words divided between one, two, and three syllables. Ten productions of each word will be recorded by each talker, yielding 14,400 tokens. Each token will be rated by listening juries for pronunciation quality, and the tokens will also be categorized into perceptual clusters, using MDS and cluster-analysis techniques. At least two computer-based evaluation systems (HMM and ANN) will be trained for each individual word, with the goals of predicting overall pronunciation quality and identifying specific commonly occurring pronunciation errors. It is expected that these word-specific systems, each representing a discrete "evaluator" custom-built for an individual word, will approach the maximum accuracy that can be expected of this class of processors. If successful, the ESW approach may have a broad range of applications in pronunciation training, identification of a speaker's L1, foreign-language instruction, and other non-lexical applications. However, our specific goal is the development of systems that can provide informative feedback during automated pronunciation training. In ASR applications, the goal is to respond the same way to a word, no matter how it is pronounced. The goal of an ESW system is to respond differentially to pronunciation variants. This distinction between ASR and ESW is central to the development of successful evaluation systems as it dictates different modeling constraints.

Principal Investigator:

Charles S. Watson
8128550710
WATSON@INDIANA.EDU

Business Contact:


8123361766
Small Business Information at Submission:

COMMUNICATION DISORDERS TECHNOLOGY
COMMUNICATION DISORDERS TECHNLGY 501 N MORTON ST, STE 215 BLOOMINGTON, IN 47404

EIN/Tax ID: 351785272
DUNS: N/A
Number of Employees: N/A
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No