Prosodylab-aligner: A tool for forced alignment of laboratory speech

Authors

  • Kyle Gorman Department of Linguistics, University of Pennsylvania, 619 Williams Hall, 255 S. 36 th St., Philadelphia, PA 19104-6305, United States
  • Jonathan Howell Department of Linguistics, McGill University, 1085 Dr. Penfield, Montreal, QC H3A1A7, Canada
  • Michael Wagner Department of Linguistics, McGill University, 1085 Dr. Penfield, Montreal, QC H3A1A7, Canada

Keywords:

Computer operating systems, Acoustic model, Hidden Markov model toolkits, Mac OS X, Model estimation, Monophones, Open-source, Resampling, Television programming

Abstract

The Penn Forced Aligner automates the alignment process using the Hidden Markov Model Toolkit (HTK). The core of Prosodylab-Aligner is align. py, a script which performs acoustic model training and alignment. This script automates calls to HTK and SoX, an open-source command-line tool which is capable of resampling audio. The included README file provides instructions for installing HTK and SoX on Linux and Mac OS X, and can also be run on Windows. During training, the model is initialized with flat-start monophones, which are then submitted to a single round of model estimation. Then, a tied-state 'small pause' model is inserted and used in a second round of estimation. The data is then aligned once to choose the most likely pronunciation of all homonyms. Web audio is downloaded from Ramp, a company which indexes radio and television programming, including NBC, PBS, Fox and CBS Radio, and processed using standard UNIX tools.

Published

2011-09-01

How to Cite

1.
Gorman K, Howell J, Wagner M. Prosodylab-aligner: A tool for forced alignment of laboratory speech. Canadian Acoustics [Internet]. 2011Sep.1 [cited 2020Oct.25];39(3):192-3. Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/2476

Issue

Section

Proceedings of the Acoustics Week in Canada