In this thesis I advocate a probabilistic view of robust speech recognition. I discuss the classification of distorted features using an optimal classifier, and I show how the generation of noisy speech can be represented as a generative graphical probability model. By doing so, my aim is to build a conceptual framework that provides a unified understanding of robust speech recognition, and to some extent bridges the gap between a purely signal processing viewpoint and the pattern classification or decoding viewpoint.
The most tangible contribution of this thesis is the introduction of the Algonquin method for robust speech recognition. It exemplifies the probabilistic method and en- compasses a number of novel ideas. For example, it uses a probability distribution to describe the relationship between clean speech, noise, channel and the resultant noisy speech. It employs a variational approach to find an approximation to the joint posterior distribution which can be used for the purpose of restoring the distorted observations. It also allows us to estimate the parameters of the environment using a Generalized EM method.
Another important contribution of this thesis is a new paradigm for robust speech recognition, which we call uncertainty decoding. This new paradigm follows naturally from the standard way of performing inference in the graphical probability model that describes noisy speech generation.
Ph.D. Thesis: Speech Recognition in Adverse Environments: a Probabilistic Approach (PDF)
Leave a Reply