Chris Pal, Brendan Frey and Trausti Kristjansson
One approach to achieving noise and distortion robust speechrecognition is to remove noise and distortion with algorithms of low complexity prior to the use of much higher complexity speech recognizers. This approach has been referred to as cleaning. In this paper we present an approach for speech cleaning using a time-varying, non-linear probabilistic model of a signals log Mel-filter-bank representation. We then present a new non-linear probabilistic inference technique and show results using this technique within the probabilistic cleaning model. In this approach we represent distributions for underlying noise, speech and channel characteristics as Gaussian mixtures and use Gaussian basis functions to model the non-linear likelihood function. This allows us to efficiently compute complex multi-modal probability distributions over speech and noise components of the underlying signal. We show how this method can be used to clean speech features and present results using theAurora 2 speech recognizer trained on clean speech data. We present competitive initial results from a minimum mean square error version of this approach for a subset of the Aurora 2 noisy digits recognition tasks.