S. Rennie, P. Olsen, J. Hershey, T. Kristjansson


We describe a system that can separate and recognize the simulta-neous speech of two speakers from a single channel recording andcompare the performance of the system to that of human subjects.The system, which we call Iroquois, uses models of dynamicsto achieve performance near that of human listeners, when aver-aged across all conditions. However the system exhibits a patternof performance across conditions that is different from that of hu-man subjects. In conditions where the amplitude of the speakersis similar, our surpasses human performance by over 50%. Wehypothesize that the system accomplishes this remarkable feat byemploying a different strategy to that of the human auditory sys-tem.

The Iroquois Model: Using Temporal Dynamics to Separate Speakers

