Strategies to Enhance Whispered Speech Speaker Verification: A Comparative Analysis
AbstractToday, automated speech-enabled tools are increasingly being used in everyday environments. This mobility has created new challenges for developers, who are now faced with input speech of varying styles (e.g. whispered) and corrupted by different noise sources. In this paper, special emphasis is placed on whispered speech, an underexplored yet burgeoning area due to the rapid proliferation of smartphones around the world. More specifically, this paper explores the performance boundaries achievable with whispered speech for a speaker verification task, both in matched and mismatched train/test conditions. Several strategies are investigated to improve the performance in the mismatched scenario, as well as in situations involving ambient noise. Our results agree with previously reported studies in adjacent areas, that significant gains could be obtained by training speaker models with both naturally voiced and whispered speech data. Moreover, additional gains could be achieved with speaking style and gender dependent systems. Overall, speaker verification performance inline with that obtained with naturally-voiced speech could be attained for whispered speech once specific strategies were put in place. Particularly, feature fusion showed to be an important strategy for practical applications in both clean and noisy conditions.
Copyright on articles is held by the author(s). The corresponding author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide exclusive licence (or non-exclusive license for government employees) to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future)
i) to publish, reproduce, distribute, display and store the Contribution;
ii) to translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution;
iii) to exploit all subsidiary rights in the Contribution,
iv) to provide the inclusion of electronic links from the Contribution to third party material where-ever it may be located;
v) to licence any third party to do any or all of the above.