Enhancing Automatic Speech Recognition of a Regional Dialect: A Pilot Study with Québécois French

Xinyi Zhang; Lucia Eve Berger; Duc-Hoa Tran; Rachel Bouserhal

Enhancing Automatic Speech Recognition of a Regional Dialect: A Pilot Study with Québécois French

Authors

Xinyi Zhang École de technologie supérieure,
Lucia Eve Berger MILA, Université de Montréal,
Duc-Hoa Tran École de technologie supérieure,
Rachel Bouserhal École de technologie supérieure

Abstract

Automatic speech recognition (ASR) technologies have advanced in recent years, but performance still varies for underrepresented low-resource testing conditions. This proof-of-concept study examined the speech transcription performance of a state-of-the-art multilingual ASR system, Whisper, for Québécois French (QF). Due to the low-resource nature of QF, we developed a semi-automated pipeline for creating machine-learning-ready, well-aligned speech-text corpora from the Web. We created a data set of 2.5 hours from in total 83 speakers, covering various common topics. Using this data set, we measured the zero-shot Word Error Rate (WER) of Whisper’s base-size model (74M parameters) and small-size model (244M parameters). We found substantial performance gaps between its QF zero-shot WER and its published French WER on standardized benchmarks, motivating us to fine-tune Whisper for QF. Whisper was trained with the large-scale weak supervision method, and it has been reported to have high robustness, suitable as a pre-trained model. We maintained all the pre-trained weights in the encoder blocks and only unfroze the decoder blocks for retraining, resulting in 52M trainable parameters for the base-size model and 153M trainable parameters for the small-size model. A reduction of WER was seen after fine-tuning for both sizes; the fine-tuned small-size model achieved an average WER of 19.5%, approaching Whisper’s performance on the standard, well-represented French dialect. Our study showcased a promising initial approach in leveraging Whisper as a pre-trained model for targeted adaptive applications, particularly in the context of regional dialects, even with limited resources.

Additional Files

Published

2023-10-09

How to Cite

Zhang X, Berger LE, Tran D-H, Bouserhal R. Enhancing Automatic Speech Recognition of a Regional Dialect: A Pilot Study with Québécois French. Canadian Acoustics [Internet]. 2023 Oct. 9 [cited 2024 Nov. 21];51(3):76-7. Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/4117

Download Citation

Issue

Vol. 51 No. 3 (2023): Proceedings of Acoustics Week in Canada 2023 Conference

Section

Proceedings of the Acoustics Week in Canada

License

Author Licensing Addendum

This Licensing Addendum ("Addendum") is entered into between the undersigned Author(s) and Canadian Acoustics journal published by the Canadian Acoustical Association (hereinafter referred to as the "Publisher"). The Author(s) and the Publisher agree as follows:

Retained Rights: The Author(s) retain(s) the following rights:
- The right to reproduce, distribute, and publicly display the Work on the Author's personal website or the website of the Author's institution.
- The right to use the Work in the Author's teaching activities and presentations.
- The right to include the Work in a compilation for the Author's personal use, not for sale.
Grant of License: The Author(s) grant(s) to the Publisher a worldwide exclusive license to publish, reproduce, distribute, and display the Work in Canadian Acoustics and any other formats and media deemed appropriate by the Publisher.
Attribution: The Publisher agrees to include proper attribution to the Author(s) in all publications and reproductions of the Work.
No Conflict: This Addendum is intended to be in harmony with, and not in conflict with, the terms and conditions of the original agreement entered into between the Author(s) and the Publisher.
Copyright Clause: Copyright on articles is held by the Author(s). The corresponding Author has the right to grant on behalf of all Authors and does grant on behalf of all Authors, a worldwide exclusive license to the Publisher and its licensees in perpetuity, in all forms, formats, and media (whether known now or created in the future), including but not limited to the rights to publish, reproduce, distribute, display, store, translate, create adaptations, reprints, include within collections, and create summaries, extracts, and/or abstracts of the Contribution.

Enhancing Automatic Speech Recognition of a Regional Dialect: A Pilot Study with Québécois French

Authors

Abstract

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Language

Subscription

Make a Submission

Information