Using Auditory Models for Speaker Normalization in Speech Recognition

Authors

  • Anthony Bladon

Abstract

Auditorily-transformed versions of the speech spectrum may well be a useful way of reducing the apparently nonuniform physical differences between speakers. A speaker normalization technique of this kind is however justified to different degrees by different kinds of speech event. Does this presuppose a need for higher-level (phonetic class) information at the acoustic level in speaker-independent ASR?

"It is obvious from our experiment that the unqualified assumption does not hold - auditory models used as speech recognition front ends will not consistently improve performance."

Blomberg et al.'s (1984) ominous words are ones which this symposium ought. to take seriously to heart. They conflict with our initial theoretical expectations. This paper will not attempt to investigate what reasons lie behind the inconsistent results which some authors have found. Rather, we will focus on an aspect of the speech recognition task where the prognosis for auditory modelling promises ta bear some fruit, namely, speaker differences (in speaker-independent speech recognition).

Additional Files

Published

2022-12-03

How to Cite

1.
Bladon A. Using Auditory Models for Speaker Normalization in Speech Recognition. Canadian Acoustics [Internet]. 2022 Dec. 3 [cited 2024 Nov. 21];14(3 bis):16-7. Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/3501

Issue

Section

Proceedings of the Acoustics Week in Canada