Hierarchical nonstationarity in a class of doubly stochastic models with application to automatic speech recognition
Keywords:
speech recognition, stochastic processes, automatic speech recognition, hierarchical nonstationarity, speech signals, doubly stochastic process models, global nonstationarity, underlying Markov chain, statistics, output data-generation process models, speech data, standard HMM, single-level nonstationarityAbstract
Introduces the concept of two-level (global and local) hierarchical nonstationarity for describing the complex, elastic, and highly dynamic nature of speech signals. A general class of doubly stochastic process models are developed to implement this concept. In this class of models, the global nonstationarity is embodied through an underlying Markov chain (or any other scheme capable of providing nonlinear time warping mechanisms) which governs the evolution of the parameters in a set of output stochastic processes. The local nonstationarity is realized by assuming state-conditioned, time-varying first and second order statistics in the output data-generation process models. To provide practical algorithms for speech recognition which allow the model parameters to be reliably estimated, the local nonstationarity is represented in a parametric form. Simulation results demonstrated close fitting of the model to the actual speech data. Results from speech recognition experiments provided evidence for the effectiveness of the model in comparison with the standard HMM, which is a degenerated case-with single-level nonstationarity-of the proposed modelDownloads
Published
How to Cite
Issue
Section
License
Copyright on articles is held by the author(s). The corresponding author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide exclusive licence (or non-exclusive license for government employees) to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future)
i) to publish, reproduce, distribute, display and store the Contribution;
ii) to translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution;
iii) to exploit all subsidiary rights in the Contribution,
iv) to provide the inclusion of electronic links from the Contribution to third party material where-ever it may be located;
v) to licence any third party to do any or all of the above.