Forensic Automatic Speaker Recognition based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network
DOI:
https://doi.org/10.17469/O2111AISV000026Parole chiave:
forensic linguistics, voice comparison, speaker verification, speaker recognitionAbstract
In the field of automatic forensic voice comparison (FVC), the use of neural networks models in the processing chain is increasingly frequent. Recently, Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) demonstrated high discrimination and accuracy in the non-forensic speaker verification task. In this contribution, after illustrating the fundamental differences between the forensic and non-forensic tasks of speaker verification – from a linguistic, logical and methodological perspective – the performances of a software implementation of automatic FVC based on ECAPA-TDNN, at different simulated operating conditions (noise level, net speech duration), are verified. Preliminary results confirm excellent performance in non-critical operating conditions. The hypotheses on the performance trend are also preliminarily confirmed: as the duration of the speech samples increases, and the noise level decreases, the evaluation metrics improve at a rate that depends on the combination of these two factors.Downloads
Pubblicato
29-12-2023
Fascicolo
Sezione
Articoli
Licenza
Copyright (c) 2023 AISV - Associazione Italiana di Scienze della Voce

Questo lavoro è fornito con la licenza Creative Commons Attribuzione - Non commerciale 4.0 Internazionale.