J. Hershey, T. Kristjansson, P. Olsen, S. Rennie
We present a model-based system capable of separating and recognizing speech oftwo speakers from a single-channel recording. The system uses models of speechthat capture dynamics at two levels: an acoustic level that models the continuity ofthe spectrum, and a grammatical level that models linguistic constraints from thephone level up to phrases. We present speech recognition experiments demonstrat-ing that the full system performs close to humans overall. Remarkably the systemexceeds human recognition performance in 0dB conditions. Since the pattern ofperformance across conditions is quite different for humans, we hypothesize thatthe auditory system uses different strategies than our model.