What Do Forced Alignment Likelihood Scores Tell Us About the Aligned Speech?

Ayushi Mrigen; Daniel Brenner; Benjamin V. Tucker

What Do Forced Alignment Likelihood Scores Tell Us About the Aligned Speech?

Auteurs-es

Ayushi Mrigen Indian Institute of Technology, Kharagpur
Daniel Brenner University of Alberta
Benjamin V. Tucker University of Alberta

Résumé

Standard forced alignment systems are a widely used tool in phonetic research. Forced alignment uses Hidden Markov Models to align a sequence of phones to a sound recording. As a byproduct, it computes likelihood scores for every aligned phone and word. This study investigates the extent to which these likelihood scores can be: (1) pressed into use in speech research, (2) interpreted as a measure of acoustic distance (of some variety) to the modeled phones and place individual aligned segments within their distribution of phonetic variation. The present study is a first step in accomplishing these goals. To this end, first vowels in hold-out portions of the TIMIT (Zue & Seneff 1988) and Buckeye (Pitt et al. 2005) corpora were cross-aligned with phone models trained on the remaining portions of those corpora (tokens of [i] were aligned with the [?] phone, the [e] phone, the [?] phone, etc.), and the resulting likelihood scores were compared to acoustic measures like duration and formant frequencies to determine which acoustic properties are encapsulated in the scores. These were also compared with scores provided by the freely available Penn Forced Aligner (Yuan & Liberman, 2008). Preliminary analyses find a strong correlation between the cross alignment scores and F1 x F2 geometric distance, as well as the duration of the phones. This establishes that these probability measures show a relationship with some acoustic characteristics of the segments. The results of this initial analysis are promising. Future evaluation is needed to explore the full scope and limitations of the application of these measures.

References:

[1] Pitt, M. A., Johnson, K., Hume, E., Kiesling, S., & Raymond, W. 2005. The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication 45, 89-95.

[2] Zue, V. & Seneff, S. Transcription and Alignment of the TIMIT Database. Proceedings of the 2nd Meeting on Advanced Man -- Machine Interface through Spoken Language 1988, 11.1-11.10.

[3] Yuan, J. & Liberman, M. 2008. Speaker identification on the SCOTUS corpus. Proceedings of Acoustics 2008.

Bibliographies de l'auteur-e

Ayushi Mrigen, Indian Institute of Technology, Kharagpur

Department of Mathematics

Daniel Brenner, University of Alberta

PostDoctoral Researcher, Department of Linguistics

Benjamin V. Tucker, University of Alberta

Associate Professor, Department of Linguistics

Fichiers supplémentaires

PDF (English)

Publié-e

2016-08-24

Comment citer

Mrigen A, Brenner D, Tucker BV. What Do Forced Alignment Likelihood Scores Tell Us About the Aligned Speech?. Canadian Acoustics [Internet]. 24 août 2016 [cité 7 août 2024];44(3). Disponible à: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/2958

Télécharger la référence

Numéro

Vol. 44 No. 3 (2016)

Rubrique

Actes du congrès de la Semaine canadienne d'acoustique

Licence

Author Licensing Addendum

This Licensing Addendum ("Addendum") is entered into between the undersigned Author(s) and Canadian Acoustics journal published by the Canadian Acoustical Association (hereinafter referred to as the "Publisher"). The Author(s) and the Publisher agree as follows:

Retained Rights: The Author(s) retain(s) the following rights:
- The right to reproduce, distribute, and publicly display the Work on the Author's personal website or the website of the Author's institution.
- The right to use the Work in the Author's teaching activities and presentations.
- The right to include the Work in a compilation for the Author's personal use, not for sale.
Grant of License: The Author(s) grant(s) to the Publisher a worldwide exclusive license to publish, reproduce, distribute, and display the Work in Canadian Acoustics and any other formats and media deemed appropriate by the Publisher.
Attribution: The Publisher agrees to include proper attribution to the Author(s) in all publications and reproductions of the Work.
No Conflict: This Addendum is intended to be in harmony with, and not in conflict with, the terms and conditions of the original agreement entered into between the Author(s) and the Publisher.
Copyright Clause: Copyright on articles is held by the Author(s). The corresponding Author has the right to grant on behalf of all Authors and does grant on behalf of all Authors, a worldwide exclusive license to the Publisher and its licensees in perpetuity, in all forms, formats, and media (whether known now or created in the future), including but not limited to the rights to publish, reproduce, distribute, display, store, translate, create adaptations, reprints, include within collections, and create summaries, extracts, and/or abstracts of the Contribution.

What Do Forced Alignment Likelihood Scores Tell Us About the Aligned Speech?

Auteurs-es

Résumé

Bibliographies de l'auteur-e

Ayushi Mrigen, Indian Institute of Technology, Kharagpur

Daniel Brenner, University of Alberta

Benjamin V. Tucker, University of Alberta

Fichiers supplémentaires

Publié-e

Comment citer

Numéro

Rubrique

Licence

Articles les plus lus du,de la,des même-s auteur-e-s

Langue

Abonnement

Faire une soumission

Renseignements