T. Kristjansson, T.S. Huang, P. Ramesh, B.H. Juang
A variety of media involve the spoken interaction of people. For this media to be useful, indexing and browsing facilities must be provided to the user. In this paper we present a unified framework for indexing and gisting spoken interactions of people. We use speaker identification, prosody analysis and word spotting as preprocessing steps to find the structure of the meeting. The structure is modeled using a stochastic approach based on the Hidden Markov Model. The result of the analysis is an outline or table of content, as well as a rich set of visual queues for navigating the media. In addition to the automatic analysis, we provide the user with tools for browsing the meeting, as well as tools for directing the analysis and editing the results. We present early results using the proposed framework.