Forensic Automatic Speaker Recognition based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network
DOI:
https://doi.org/10.17469/O2111AISV000026Keywords:
forensic linguistics, voice comparison, speaker verification, speaker recognitionAbstract
In the field of automatic forensic voice comparison (FVC), the use of neural networks models in the processing chain is increasingly frequent. Recently, Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) demonstrated high discrimination and accuracy in the non-forensic speaker verification task. In this contribution, after illustrating the fundamental differences between the forensic and non-forensic tasks of speaker verification – from a linguistic, logical and methodological perspective – the performances of a software implementation of automatic FVC based on ECAPA-TDNN, at different simulated operating conditions (noise level, net speech duration), are verified. Preliminary results confirm excellent performance in non-critical operating conditions. The hypotheses on the performance trend are also preliminarily confirmed: as the duration of the speech samples increases, and the noise level decreases, the evaluation metrics improve at a rate that depends on the combination of these two factors.Downloads
Published
29-12-2023
Issue
Section
Articles
License
Copyright (c) 2023 AISV - Associazione Italiana di Scienze della Voce [Italian Association for Speech Sciences]

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.