Speech Recognition Experiments with a Cochlear Model
Résumé
There are several ways that a computational model or auditory processing in the cochlea can be applied as the front end or a speech recognition system. For an initial round or experimentation, the fine time structure in the model's output has been used to do spectral sharpening, yielding a "cochleagram" representation analogous to a short-time spectral representation. In later experiments, fine time structure will be exploited for a more detailed characterization of sounds, and for sound separation. So far, experiments have been done with only two words ("one" and "nine") spoken by 112 talkers, to limit the range of phonetic variation to simple voiced sounds, while providing a good sample of inter-speaker variation. The structure of the vector space of "auditory spectra" has been examined through vector quantization experiments, which yield a measure of information content and local dimensionality.
The inclusion of more dimensions of perceptual variation, such as pitch and loudness, in a speech front end representation is both an opportunity and a problem. Much larger vector quantization codebooks and more training data may be needed to take advantage of the extra information dimensions. A product-code approach and an improved algorithm for finding the nearest neighbor codeword are suggested to help cope with the problem and take advantage of the opportunity.
Preliminary recognition experiments using a single codebook per word and no time sequence information have shown a performance of about 97% correct one/nine discrimination for talkers outside the training set, and 100% correct for second repetitions from talkers in the training set. Further experiments are currently underway.
Fichiers supplémentaires
Publié-e
Comment citer
Numéro
Rubrique
Licence
Author Licensing Addendum
This Licensing Addendum ("Addendum") is entered into between the undersigned Author(s) and Canadian Acoustics journal published by the Canadian Acoustical Association (hereinafter referred to as the "Publisher"). The Author(s) and the Publisher agree as follows:
-
Retained Rights: The Author(s) retain(s) the following rights:
- The right to reproduce, distribute, and publicly display the Work on the Author's personal website or the website of the Author's institution.
- The right to use the Work in the Author's teaching activities and presentations.
- The right to include the Work in a compilation for the Author's personal use, not for sale.
-
Grant of License: The Author(s) grant(s) to the Publisher a worldwide exclusive license to publish, reproduce, distribute, and display the Work in Canadian Acoustics and any other formats and media deemed appropriate by the Publisher.
-
Attribution: The Publisher agrees to include proper attribution to the Author(s) in all publications and reproductions of the Work.
-
No Conflict: This Addendum is intended to be in harmony with, and not in conflict with, the terms and conditions of the original agreement entered into between the Author(s) and the Publisher.
-
Copyright Clause: Copyright on articles is held by the Author(s). The corresponding Author has the right to grant on behalf of all Authors and does grant on behalf of all Authors, a worldwide exclusive license to the Publisher and its licensees in perpetuity, in all forms, formats, and media (whether known now or created in the future), including but not limited to the rights to publish, reproduce, distribute, display, store, translate, create adaptations, reprints, include within collections, and create summaries, extracts, and/or abstracts of the Contribution.