A Spectral-Temporal Suppression Hodel for Speech Recognition


  • P. L. Divenyi


Speech recognition systems, however heterogeneous in their conceptions and schemes, share at least one basic feature: the inclusion of a vocoder-type front-end. While many of the early, and some of the contemporary, systems adopted a pragmatic design for their front-end filter bank, there were some efforts (e.g., Chistovich et al., 1975; Searle et al., 1979) toward providing the recognizer with an input stage that was modeled after the human ear. The motivation for such a design was the desire to optimize the recognition process from the very first stage on. However, work by auditory physiologists on auditory nerve responses to speech (Young and Sachs, 1979; Delgutte, 1980) signaled a welcome convergence of interests by two groups of scientists on the problem of speech processing in the auditory system. More recent work by several investigators, some of which is included in the present symposium, has been directed toward designing recognizer frontends that resembled the ear more-and-more closely, and toward examining effects of model parameter modifications on recognition performance.

Computational models of the auditory system fall into two major classes, depending on whether the calculations are performed in the time or in the spectral domain. The advantage of time-domain algorithms lies mainly in their speed, whereas spectrally-based algorithms may more closely approximate the actual auditory processes because they are able to deal more directly with non-linear filtering operations. The present model is spectral in the sense that the filtering computations are executed in the frequency domain.




How to Cite

Divenyi PL. A Spectral-Temporal Suppression Hodel for Speech Recognition. Canadian Acoustics [Internet]. 2022 Dec. 3 [cited 2023 Feb. 8];14(3 bis):12-3. Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/3499



Proceedings of the Acoustics Week in Canada