Enhancing Automatic Speech Recognition of a Regional Dialect: A Pilot Study with Québécois French

Authors

  • Xinyi Zhang École de technologie supérieure,
  • Lucia Eve Berger MILA, Université de Montréal,
  • Duc-Hoa Tran École de technologie supérieure,
  • Rachel Bouserhal École de technologie supérieure

Abstract

Automatic speech recognition (ASR) technologies have advanced in recent years, but performance still varies for underrepresented low-resource testing conditions. This proof-of-concept study examined the speech transcription performance of a state-of-the-art multilingual ASR system, Whisper, for Québécois French (QF). Due to the low-resource nature of QF, we developed a semi-automated pipeline for creating machine-learning-ready, well-aligned speech-text corpora from the Web. We created a data set of 2.5 hours from in total 83 speakers, covering various common topics. Using this data set, we measured the zero-shot Word Error Rate (WER) of Whisper’s base-size model (74M parameters) and small-size model (244M parameters). We found substantial performance gaps between its QF zero-shot WER and its published French WER on standardized benchmarks, motivating us to fine-tune Whisper for QF. Whisper was trained with the large-scale weak supervision method, and it has been reported to have high robustness, suitable as a pre-trained model. We maintained all the pre-trained weights in the encoder blocks and only unfroze the decoder blocks for retraining, resulting in 52M trainable parameters for the base-size model and 153M trainable parameters for the small-size model. A reduction of WER was seen after fine-tuning for both sizes; the fine-tuned small-size model achieved an average WER of 19.5%, approaching Whisper’s performance on the standard, well-represented French dialect. Our study showcased a promising initial approach in leveraging Whisper as a pre-trained model for targeted adaptive applications, particularly in the context of regional dialects, even with limited resources.

Additional Files

Published

2023-10-09

How to Cite

1.
Zhang X, Berger LE, Tran D-H, Bouserhal R. Enhancing Automatic Speech Recognition of a Regional Dialect: A Pilot Study with Québécois French. Canadian Acoustics [Internet]. 2023 Oct. 9 [cited 2024 Apr. 27];51(3):76-7. Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/4117

Issue

Section

Proceedings of the Acoustics Week in Canada