T. Kristjansson,  J. Hershey,  P. Olsen,  S. Rennie,  R. Gopinath,

Interspeech 2006, Winner of PASCAL Speech Separation Challenge

Abstract:

We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate tempo- ral dynamics. We report on two methods for introducing dynam- ics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the sep- aration performance. Once the signals have been separated they are then recognized using speaker dependent labeling.

Super Human Speech Separation Paper PDF

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*