Trausti Kristjansson, John Hershey
We present a framework for speech enhancement and robust speech recognition that exploits the harmonic structureof speech. We achieve substantial gains in signal to noise ratio (SNR) of enhanced speech as well as considerable gains in accuracy of automatic speech recognition in very noisy conditions.The method exploits the harmonic structure of speech by employing a high frequency resolution speech model in the log-spectrum domain and reconstructs the signal from the estimated posteriors of the clean signal and the phases from the original noisy signal.We achieve a gain in signal to noise ratio of 8.38 dB for enhancement of speech at 0 dB. We also present recognition results on the Aurora 2 data-set. At 0 dB SNR, we achievea reduction of relative word error rate of 43.75% over the baseline, and 15.90% over the equivalent low-resolution algorithm.
Leave a Reply