You are here
Acoustic Source Separation and Localization
Title: Senior System Engineer
Phone: (407) 384-9956
Phone: (407) 384-9956
This is a proposal to determine the technical feasibility of a system capable of separating and localizing intermixed sounds in an auditory scene. Our approach will be to design and develop a computational auditory model that overcomes the inherent theoretical limits of the classic Fourier-based model. Phase I effort will produce a requirements specification and design documentation for the lower two levels of a five level computational auditory model, and include source code for the system components that we make operational. The (five level) objective system consists of the implementation of a real-time model capable of waveform analysis analogous to that of the human ear. We call it the Waveform Information Vector (WIV) to Time-Space Translator, or “ WIVEX” processor. Preliminary experiments have demonstrated the feasibility of parts of the model to encode and extract, in real time, meaningful information directly from the signal waveform. For example it can separate environmental sounds of all kinds, including speech, while determining their individual direction of arrival. Our Phase I research objectives are to: formalize the requirements and design specifications for a real-time system, through analysis, design and prototyping of components. Phase I will conclude with the delivery of necessary artifacts and demonstration of specific auditory functions (see Table 1) that are not now and probably never will be achievable with Fourier-based technology. All demonstrations are intended to run in real time, synchronously with the input signal. Table 1: Phase I Auditory Functions 1) Pitch detection in both speech and music, or any other tonal acoustic source 2) Instantaneous direction of arrival used to separate sound sources via binaural perception 3) Instantaneous monaural separation of sources by recognizing patterns in waveshape zero intervals 4) Phonetic segmentation of speech and environmental sources by pattern similarities 5) Meaningful labeling of waveshape components 6) Replication of psychoacoustic experiments in two-tone interference not explainable in current auditory theory 7) Demonstration of autonomic source selection according to attention priority from background of mixed signal sources Collectively, successful completion of these tasks should go a long way toward confirming the technical feasibility of the WIVEX as the basis of a more relevant auditory model. As such, this could produce a compelling theory for understanding the biophysical functions in the auditory pathways of not just humans but the entire animal kingdom. It will be shown that the operational functions and processing components of this model are analogous to neurological capabilities and merely require fundamental algorithms and mathematical computations. The proposed model is somewhat analogous to the auditory system of the animal kingdom in that it is built as an evolutionary hierarchy of processing levels that begin at a low level by extracting primitive meaning such as direction of arrival, amplitude, and encoded simple waveshape features. It then progresses upward in five stages of cognitive perception and culminates in complex aspects of human linguistic and emotional communication. These individual functions will be carried out in real time, synchronized with the incoming signal waveform. Thus it is possible to isolate and understand the basic auditory functions while at the same time peeling off highly useful applications.
* Information listed above is at the time of submission. *