What Do Forced Alignment Likelihood Scores Tell Us About the Aligned Speech?

  • Ayushi Mrigen Indian Institute of Technology, Kharagpur
  • Daniel Brenner University of Alberta
  • Benjamin V. Tucker University of Alberta

Abstract

Standard forced alignment systems are a widely used tool in phonetic research. Forced alignment uses Hidden Markov Models to align a sequence of phones to a sound recording. As a byproduct, it computes likelihood scores for every aligned phone and word. This study investigates the extent to which these likelihood scores can be:  (1) pressed into use in speech research, (2) interpreted as a measure of acoustic distance (of some variety) to the modeled phones and place individual aligned segments within their distribution of phonetic variation. The present study is a first step in accomplishing these goals. To this end, first vowels in hold-out portions of the TIMIT (Zue & Seneff 1988) and Buckeye (Pitt et al. 2005) corpora were cross-aligned with phone models trained on the remaining portions of those corpora (tokens of [i] were aligned with the [?] phone, the [e] phone, the [?] phone, etc.), and the resulting likelihood scores were compared to acoustic measures like duration and formant frequencies to determine which acoustic properties are encapsulated in the scores. These were also compared with scores provided by the freely available Penn Forced Aligner (Yuan & Liberman, 2008). Preliminary analyses find a strong correlation between the cross alignment scores and F1 x F2 geometric distance, as well as the duration of the phones. This establishes that these probability measures show a relationship with some acoustic characteristics of the segments. The results of this initial analysis are promising. Future evaluation is needed to explore the full scope and limitations of the application of these measures.



References:

[1] Pitt, M. A., Johnson, K., Hume, E., Kiesling, S., & Raymond, W. 2005. The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication 45, 89-95.

[2] Zue, V. & Seneff, S. Transcription and Alignment of the TIMIT Database. Proceedings of the 2nd Meeting on Advanced Man -- Machine Interface through Spoken Language 1988, 11.1-11.10.

[3] Yuan, J. & Liberman, M. 2008. Speaker identification on the SCOTUS corpus. Proceedings of Acoustics 2008.

Author Biographies

Ayushi Mrigen, Indian Institute of Technology, Kharagpur
Department of Mathematics
Daniel Brenner, University of Alberta
PostDoctoral Researcher, Department of Linguistics
Benjamin V. Tucker, University of Alberta
Associate Professor, Department of Linguistics
Published
2016-08-24
How to Cite
1.
Mrigen A, Brenner D, Tucker BV. What Do Forced Alignment Likelihood Scores Tell Us About the Aligned Speech?. Canadian Acoustics [Internet]. 2016Aug.24 [cited 2019Aug.23];44(3). Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/2958
Section
Proceedings of the Acoustics Week in Canada