Robust Speech Recognition for Virtual Realities using Multi-Stream Fusion
Small Business Information
Tanner Research, Inc.
180 North Vinedo Avenue, Pasadena, CA, 91107
AbstractThis phase I project aims to develop a continuous-speech recognition system that works robustly in noisy environments. This goal will be achieved by fusing multiple processing streams. The high parallelism of this approach lends itself well to implementation in very large scale integration (VLSI) or in special-purpose speech boards. We will also evaluate and train a commecially available recognition system and compare the recognition results. Traditional speech-recognition front ends produce speech vectors using spectral, cepstral, or LPC coefficients. Biologically inspired models use filter banks and lateral inhibition for adaptive thresholding. Back ends use engineering appraoches (hidden Markov models, HMM), as well as neural approaches (time-delay neural networks and learning vector quantization). Combining five front ends and three back ends, a total of 15 processing streams are obtained. All 15 streams perform phoneme recognition in parallel. These 15 phonemic estimates are fed to word-based HMMs and then to a neural network that arbitrates among the workd estimates. In the end, the streams will be evaluated and the system will be pruned, retaining only the four or five most robust processing streams. Tanner Research has considerable experience with these front and back ends and have found that they all have their strengths and weaknesses. Combining them is expected to lead to overall improved accuracy.
* information listed above is at the time of submission.