USA flag logo/image

An Official Website of the United States Government

STTR Phase I: Small Footprint Speech Synthesis

Award Information

Agency:
National Science Foundation
Branch:
N/A
Award ID:
74536
Program Year/Program:
2005 / STTR
Agency Tracking Number:
0441125
Solicitation Year:
N/A
Solicitation Topic Code:
N/A
Solicitation Number:
N/A
Small Business Information
BIOSPEECH, INC.
940 UPPER DEVON LANE LAKE OSWEGO, OR 97034-
View profile »
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No
 
Phase 1
Fiscal Year: 2005
Title: STTR Phase I: Small Footprint Speech Synthesis
Agency: NSF
Contract: 041125
Award Amount: $99,751.00
 

Abstract:

This Small Business Technology Transfer Phase I project aims to develop and implement a new algorithm in the area of text-to-speech synthesis (TTS) that will lead to (i) dramatic decreases in disk and memory requirements at a given speech quality level and (ii) minimization of the amount of voice recordings needed to create a new synthetic voice. Most current TTS systems operate by concatenating segments of recorded speech ([acoustic] units). A challenge for TTS is coarticulation: The dependency of the acoustic manifestations of a phoneme on its neighbors. Current TTS systems use multi-phone acoustic units such as diphones, which preserve coarticulatory patterns naturally present in speech. However, this approach requires a large amount of recordings and generates systems with large footprints. Biospeech proposes a uniphone approach that addresses coarticulation processes with an explicit model. The method uses complex spectral vectors (basis vectors) representing brief segments of speech inside single phonemes, and decomposes these into two components: A formant vector and a spectral balance vector. To generate speech, the formant and spectral balance vectors derived from the basis vectors corresponding to successive phonemes are subjected to separate--and hence generally asynchronous--interpolation operations using time varying weights; the formant and spectral balance vector trajectories thus created are re-combined to create a trajectory in complex spectral space; finally, this trajectory is converted into output speech with the inverse Fourier transform. Asynchronicity is necessitated by the quasi-independence of articulators underlying different spectral features (e.g., frication, formant frequencies). The proposed work has implications for other speech technologies, including Automatic Speech Recognition (ASR). Current ASR technologies address coarticulation by using multi-phone units, typical triphones. The number of triphones in English is over 70,000, and thus requires a large amount of training recordings. The proposed model could dramatically impact on the amount of recordings required for system training. Second, TTS has generally recognized societal benefits for universal access, education, and information access by voice. For example, TTS-based augmentative devices are available for individuals who have lost their voice; and reading machines for the blind have been available for several decades. Third, the approach will make higher-quality TTS more available for smaller devices. For example, voice based caller ID on low-end mobile telephones is currently not possible due to memory limitations. Fourth, it enables voice adaptation with a minimum of recordings. This will enable building personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds or for individuals who are about to undergo surgery that will irreversibly alter their speech. The method proffered by Biospeech only requires recordings of valid samples of each of (less than 50) phonemes instead of each of (2000 or more) diphones.

Principal Investigator:

Alexander Kain
Dr.
5033299604
kain@biospeech.com

Business Contact:

Lois Black
5035342891
lmblack@cslu.ogi.edu
Small Business Information at Submission:

Biospeech Incorporated
940 Upper Devon Lane Lake Oswego, OR 97034

EIN/Tax ID:
DUNS: N/A
Number of Employees:
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No
Research Institution Information:
Oregon Health & Science University
20000 N.W. Walker Road
Beaverton, OR 97006
Contact: Jan P. van Santen
Contact Phone: (503) 748-1138
RI Type: Nonprofit college or university