Speech Recognition Experiments with a Cochlear Model
Abstract
There are several ways that a computational model or auditory processing in the cochlea can be applied as the front end or a speech recognition system. For an initial round or experimentation, the fine time structure in the model's output has been used to do spectral sharpening, yielding a "cochleagram" representation analogous to a short-time spectral representation. In later experiments, fine time structure will be exploited for a more detailed characterization of sounds, and for sound separation. So far, experiments have been done with only two words ("one" and "nine") spoken by 112 talkers, to limit the range of phonetic variation to simple voiced sounds, while providing a good sample of inter-speaker variation. The structure of the vector space of "auditory spectra" has been examined through vector quantization experiments, which yield a measure of information content and local dimensionality.
The inclusion of more dimensions of perceptual variation, such as pitch and loudness, in a speech front end representation is both an opportunity and a problem. Much larger vector quantization codebooks and more training data may be needed to take advantage of the extra information dimensions. A product-code approach and an improved algorithm for finding the nearest neighbor codeword are suggested to help cope with the problem and take advantage of the opportunity.
Preliminary recognition experiments using a single codebook per word and no time sequence information have shown a performance of about 97% correct one/nine discrimination for talkers outside the training set, and 100% correct for second repetitions from talkers in the training set. Further experiments are currently underway.
Downloads
Published
How to Cite
Issue
Section
License
Copyright on articles is held by the author(s). The corresponding author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide exclusive licence (or non-exclusive license for government employees) to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future)
i) to publish, reproduce, distribute, display and store the Contribution;
ii) to translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution;
iii) to exploit all subsidiary rights in the Contribution,
iv) to provide the inclusion of electronic links from the Contribution to third party material where-ever it may be located;
v) to licence any third party to do any or all of the above.