Segmentation into audio document speakers: A new approach based on the one- class support vector methods

Belkacem Fergani; Manuel Davy; Amrane Houacine

Segmentation into audio document speakers: A new approach based on the one- class support vector methods

Authors

Belkacem Fergani LCPTS - USTHB, B.P. 32,El Alia, Bab Ezzouar, Alger, Algeria
Manuel Davy LAGIS/CNRS, Cité Scientifique, BP 48, 59651 Villeneuve d'As cq Cedex, France
Amrane Houacine LCPTS - USTHB, B.P. 32,El Alia, Bab Ezzouar, Alger, Algeria

Keywords:

Acoustic signal processing, Audio recordings, Data mining, Database systems, Information retrieval, Support vector machines, Digital sound files, Speaker diarization, Text files

Abstract

With recent and continued increases in the number of available sound archives (radio, TV, Web,...), effective methods must be established to facilitate the process of searching for information within massive databases. Of less complexity than the original sound file but nevertheless containing a summary of important information pertaining to the signal, text files (index files) are linked to the digital sound files. An example of relevant information found in the text file is as follows: 45 minutes of speech, 1 minute of music, 10 speakers (6 men and 4 women). These index files, stored with the original signal, will contribute considerably to the information retrieval process, allowing an immediate and direct access to the information sought. If one would like to know who speaks and when in a sound file, the index key is hence the speaker. A preliminary stage of a speaker indexing system is speaker diarization. State-of-the-art speaker diarization techniques require two main steps: speaker turn detection which consists of detecting speaker turn times, that is boundaries of audio file segments where only one speaker is present, followed by a clustering step which consists of labelling the previous segments in terms of speakers. These two stages require a metric to be defined in order to compare and groups speech segments. This paper presents a novel approach for the speaker diarization of audio recordings. The proposed approach uses a metric based on one-class Support Vector Machines (SVM-I), introduced recently by one of the authors, for the speaker change detection and clustering tasks. Through many experiments using two databases of broadcast recordings, we demonstrate the relevance and superiority of this approach compared to the traditional method based on the generalized likelihood ratio using bayesian information criterion (RVG-BIC).

Additional Files

PDF (Français (Canada))

Published

2007-12-01

How to Cite

Fergani B, Davy M, Houacine A. Segmentation into audio document speakers: A new approach based on the one- class support vector methods. Canadian Acoustics [Internet]. 2007 Dec. 1 [cited 2026 Aug. 2];35(4):3-10. Available from: https://jcaa.caa-aca.ca/index.php/jcaa/article/view/1974

Download Citation

Issue

Vol. 35 No. 4 (2007)

Section

Technical Articles

License

Author Licensing Addendum

This Licensing Addendum ("Addendum") is entered into between the undersigned Author(s) and Canadian Acoustics journal published by the Canadian Acoustical Association (hereinafter referred to as the "Publisher"). The Author(s) and the Publisher agree as follows:

Retained Rights: The Author(s) retain(s) the following rights:
- The right to reproduce, distribute, and publicly display the Work on the Author's personal website or the website of the Author's institution.
- The right to use the Work in the Author's teaching activities and presentations.
- The right to include the Work in a compilation for the Author's personal use, not for sale.
Grant of License: The Author(s) grant(s) to the Publisher a worldwide exclusive license to publish, reproduce, distribute, and display the Work in Canadian Acoustics and any other formats and media deemed appropriate by the Publisher.
Attribution: The Publisher agrees to include proper attribution to the Author(s) in all publications and reproductions of the Work.
No Conflict: This Addendum is intended to be in harmony with, and not in conflict with, the terms and conditions of the original agreement entered into between the Author(s) and the Publisher.
Copyright Clause: Copyright on articles is held by the Author(s). The corresponding Author has the right to grant on behalf of all Authors and does grant on behalf of all Authors, a worldwide exclusive license to the Publisher and its licensees in perpetuity, in all forms, formats, and media (whether known now or created in the future), including but not limited to the rights to publish, reproduce, distribute, display, store, translate, create adaptations, reprints, include within collections, and create summaries, extracts, and/or abstracts of the Contribution.

Segmentation into audio document speakers: A new approach based on the one- class support vector methods

Authors

Keywords:

Abstract

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Language

Subscription

Make a Submission

Information