A Novel Speech Separation Approach for Enhanced Speaker Identification and Speech Recognition

Award Information
Agency:
Department of Defense
Branch
Navy
Amount:
$500,000.00
Award Year:
2008
Program:
STTR
Phase:
Phase II
Contract:
N00014-08-C-0677
Award Id:
83488
Agency Tracking Number:
N074-039-0242
Solicitation Year:
n/a
Solicitation Topic Code:
n/a
Solicitation Number:
n/a
Small Business Information
13619 Valley Oak Circle, ROCKVILLE, MD, 20850
Hubzone Owned:
N
Minority Owned:
N
Woman Owned:
N
Duns:
620282256
Principal Investigator:
ChimanKwan
Chief Technology Officer
(240) 505-2641
chiman.kwan@signalpro.net
Business Contact:
ChihwaYung
President
(301) 315-2322
chihwa.yung@signalpro.net
Research Institute:
U. MARYLAND
Carol Espy-Wilson
Department of Electrical & Com
2405 A.V. Williams Bldg.
College Park, MD, 20742
(301) 405-7411
Nonprofit college or university
Abstract
In order to improve the performance of speaker identification, voiceprint matching, and speech recognition in noisy and clutter (multiple-speaker cocktail party) environment, we need an integrated approach. In this project, we propose a novel approach that addresses this challenging problem in a unified framework. First, we propose to apply microphone(s) to acquire speech signals. Single microphone is more challenging in dealing with noisy conditions. With multiple microphones, it is possible to have much better Direction of Arrivals (DOA) estimation and background noise suppression. As a result, the collected speech will have high SNR. Second, we propose state-of-the-art speech separation techniques to separate voices for both single microphone and multiple microphones. Third, we propose to apply the latest speech enhancement algorithms, including Minimum Mean Square Error (MMSE), Modified Phase Opponency (MPO), and possibly other methods, to remove any residual noise in the separated voice streams. Fourth, robust features based on Mel-frequency Cepstral Coefficients (MFCC) will be applied to extract speech features. Finally, Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) will be used to identify the speaker and recognize the speech. Dynamic Time Warping (DTW) technique will be used for voiceprint verification.

* information listed above is at the time of submission.

Agency Micro-sites


SBA logo

Department of Agriculture logo

Department of Commerce logo

Department of Defense logo

Department of Education logo

Department of Energy logo

Department of Health and Human Services logo

Department of Homeland Security logo

Department of Transportation logo

Enviromental Protection Agency logo

National Aeronautics and Space Administration logo

National Science Foundation logo
US Flag An Official Website of the United States Government