A Novel Unsupervised Audio Clustering Approach in Noisy Environments

Award Information
Agency: Department of Defense
Branch: Navy
Contract: N00014-12-M-0037
Agency Tracking Number: N112-163-0451
Amount: $80,000.00
Phase: Phase I
Program: SBIR
Awards Year: 2012
Solicitation Year: 2011
Solicitation Topic Code: N112-163
Solicitation Number: 2011.2
Small Business Information
SIGNAL PROCESSING, INC.
MD, Rockville, MD, 20850-3563
DUNS: 620282256
HUBZone Owned: N
Woman Owned: Y
Socially and Economically Disadvantaged: Y
Principal Investigator
 Chiman Kwan
 Chief Technology Officer
 (240) 505-2641
 chiman.kwan@signalpro.net
Business Contact
 Chihwa Yung
Title: Chief Operations Officer
Phone: (301) 315-2322
Email: chihwa.yung@signalpro.net
Research Institution
 Stub
Abstract
Detection of conversations in a noisy environment is challenging. We propose the following novel framework for audio clustering. First, we propose to apply computational auditory scene analysis (CASA) as a front-end to separate speech signals from non-speech background noise. Inspired by auditory perception, CASA typically segregates speech from noise by producing a binary time-frequency mask. The binary masks are then used to reconstruct clean speeches. Second, since the reconstructed clean speeches may contain more than one speaker"s voice, we propose an unsupervised audio clustering approach to perform speech separation. Unreliable time-frequency (T-F) units in simultaneous streams are reconstructed using a speech prior, and cepstral features are subsequently derived for clustering. We search for two clusters exhibiting the biggest speaker difference, i.e. the trace of the between- and within-cluster scatter matrix ratio. To speed up the search process, a genetic algorithm (GA) is employed. Third, after we extract the audio streams of each speaker, we go one more step. We propose to apply the latest speaker identification algorithm developed by our team for each separated voice stream. The reason to apply robust algorithms is that there may still be residual noise in the separated voice streams.

* information listed above is at the time of submission.

Agency Micro-sites

US Flag An Official Website of the United States Government