Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/nk3bbrboys6x/public_html/traustikristjansson_papers/wp-includes/option.php on line 1
Music Models for Music-Speech Separation | Trausti Kristjansson's Blog

Thad Hughes, Trausti Kristjansson


We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio prior to speech. We show over 8% relative improvement in WER at 10 dB SNR for a real world Voice Search ASR sys- tem.

We investigate the relationship between ASR accuracy and the amount of music background used as prologue and the the size of music models.

Our study shows that performance peaks when using a music prologue of around 6 seconds to train the music model. We hypothesize that this is due to the dynamic nature of mu- sic and the structure of popular music. Adding more history beyond a certain point does not improve results. Additionally, we show moderately sized 8-component music GMM models suffice to model this amount of music prologue.


Music Models for Music-Speech Separation (PDF)



Leave a Reply

Your email address will not be published. Required fields are marked *