The Auditory Processing of Speech


  • Shihab A. Shamma


The processing or speech in the mammalian auditory periphery Is discussed in terms or the spatio-temporal nature or the distribution or the cochlear response and the novel encoding schemes this permits. Algorithms to detect specific morphological features or the response patterns are also considered for the extraction or stimulus spectral parameters.

The remarkable abilities or the human auditory system to detect, separate, and recognize speech and environmental sounds has been the subject or extensive physiological and psychological research for several decades. The results of this research have strongly Influenced developments in various fields ranging from auditory prostheses to the encoding, analysis, and automatic recognition of speech. In recent years, improved experimental techniques have precipitated major advances in our understanding of sound processing in the auditory periphery. Most important among these is the Introduction of nerve-fiber population recordings which made possible the reconstruction of both the temporal and spatial distribution of activity on the auditory-nerve In response to acoustic stimuli [1, 2]. Sachs et al. utilized such data to demonstrate the existence of a highly accurate temporal structure that is capable of providing a faithful and robust representation of speech spectra over a wide dynamic range and under relatively low signal-to-noise conditions [3, 4]. Their work has since motivated further research into the various algorithms that the central nervous system (CNS) might employ to detect and extract these and other response features, and the possible neural structures that underly them [5, 6].

In pursuit or these goals, we have constructed and analyzed the spatio-temporal response patterns of cat's auditory-nerve to synthesized speech sounds [14, 5]. These patterns are formed by spatially organizing the temporal response waveforms (or PST histograms) or the auditory-nerve-fibers according to their characteristic frequency (CF) [4]. The resulting display highlights the interplay of temporal and spatial cues across the fiber array and suggest novel ways or viewing cochlear processing and encoding of complex sounds [7. 5]. The availability of such experimental data, however, is at present limited by technical constraints and the massive amount of processing required to handle them. Thus, in order to analyze new speech tokens, and to facilitate the necessary manipulation of stimulus and/or processing conditions and parameters, we have developed detailed biophysical and computational models or the auditory periphery and used them to generate spatio-temporal response patterns to natural and synthesized speech stimuli. Various CNS schemes for the estimation or stimulus spectral parameters are then Investigated based on these patterns.




How to Cite

Shamma SA. The Auditory Processing of Speech. Canadian Acoustics [Internet]. 2022 Dec. 3 [cited 2023 Feb. 8];14(3 bis):14-5. Available from:



Proceedings of the Acoustics Week in Canada