A comparison of pitch extraction methodologies for dolphin vocalization
Keywords:
Amplitude modulation, Hidden Markov models, Mammals, Markov processes, Mathematical programming, Signal to noise ratio, Signaling, Systems engineering, Wavelet transforms, Frequency ranging, Pitch extractionAbstract
When collecting and analyzing marine mammal vocalizations one of the most important goals is to automatically extract the pitch/fundamental frequency of the collected calls. In dolphins we can assume that there are two main pitched sounds: whistles, which can be described as tonal AM-FM signals, and bursts, which can be described as highly harmonic signals. There are three main difficulties with pitch extraction on dolphin vocalizations that arise from the nature of the data. First, most underwater recordings are restricted to a low signal-to-noise ratio due to reflections, hardware noise and other interferences. This constitutes a big challenge for most existing pitch trackers. Second, one has to take into account the significant differences in the frequency range of bottlenose dolphin vocalizations compared to humans. Finally, dolphin whistles and bursts generally are emitted in two distinct frequency ranges, which result in different modes in the analysis data. In this work we compare our novel pitch extraction approach with two widely popular algorithms. Our approach uses hierarchy-based hidden Markov models (HMM) with cepstral coefficients as features. We quantitatively compare the performance of our algorithm with Yin, which is based on a modified autocorrelation method and get_f0, a popular off-the-shelf pitch tracker that utilizes linear predictive coefficients (LPC) and dynamic programming. Our approach outperforms the comparative methods by at least a factor of 10%.Downloads
Published
How to Cite
Issue
Section
License
Copyright on articles is held by the author(s). The corresponding author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide exclusive licence (or non-exclusive license for government employees) to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future)
i) to publish, reproduce, distribute, display and store the Contribution;
ii) to translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution;
iii) to exploit all subsidiary rights in the Contribution,
iv) to provide the inclusion of electronic links from the Contribution to third party material where-ever it may be located;
v) to licence any third party to do any or all of the above.