A Novel Speech Separation Approach for Enhanced Speaker Identification and Speech Recognition

Award Information
Agency: Department of Defense
Branch: Navy
Contract: N00014-08-C-0677
Agency Tracking Number: N074-039-0242
Amount: $500,000.00
Phase: Phase II
Program: STTR
Awards Year: 2008
Solicitation Year: 2007
Solicitation Topic Code: N07-T039
Solicitation Number: N/A
Small Business Information
13619 Valley Oak Circle, ROCKVILLE, MD, 20850
DUNS: 620282256
HUBZone Owned: N
Woman Owned: Y
Socially and Economically Disadvantaged: Y
Principal Investigator
 Chiman Kwan
 Chief Technology Officer
 (240) 505-2641
Business Contact
 Chihwa Yung
Title: President
Phone: (301) 315-2322
Email: chihwa.yung@signalpro.net
Research Institution
 Carol Espy-Wilson
 Department of Electrical & Com
2405 A.V. Williams Bldg.
College Park, MD, 20742
 (301) 405-7411
 Nonprofit college or university
In order to improve the performance of speaker identification, voiceprint matching, and speech recognition in noisy and clutter (multiple-speaker cocktail party) environment, we need an integrated approach. In this project, we propose a novel approach that addresses this challenging problem in a unified framework. First, we propose to apply microphone(s) to acquire speech signals. Single microphone is more challenging in dealing with noisy conditions. With multiple microphones, it is possible to have much better Direction of Arrivals (DOA) estimation and background noise suppression. As a result, the collected speech will have high SNR. Second, we propose state-of-the-art speech separation techniques to separate voices for both single microphone and multiple microphones. Third, we propose to apply the latest speech enhancement algorithms, including Minimum Mean Square Error (MMSE), Modified Phase Opponency (MPO), and possibly other methods, to remove any residual noise in the separated voice streams. Fourth, robust features based on Mel-frequency Cepstral Coefficients (MFCC) will be applied to extract speech features. Finally, Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) will be used to identify the speaker and recognize the speech. Dynamic Time Warping (DTW) technique will be used for voiceprint verification.

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government