Forensic Automatic Speaker Recognition based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network

Authors

  • Francesco Sigona Laboratory CRIL (Centro di Ricerca Interdisciplinare sul Linguaggio) & DReaM, Department of Humanities, University of Salento, Lecce, Italy https://orcid.org/0000-0003-2939-0009
  • Giuseppe Vitolo Dipartimento di ingegneria dell’innovazione, Università del Salento –Italy
  • Mirko Grimaldi Centro di Ricerca Interdisciplinare sul Linguaggio (CRIL) – Dipartimento di Studi Umanistici, Università del Salento, Italia https://orcid.org/0000-0002-0940-3645

DOI:

https://doi.org/10.17469/O2111AISV000026

Keywords:

forensic linguistics, voice comparison, speaker verification, speaker recognition

Abstract

In the field of automatic forensic voice comparison (FVC), the use of neural networks models in the processing chain is increasingly frequent. Recently, Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) demonstrated high discrimination and accuracy in the non-forensic speaker verification task. In this contribution, after illustrating the fundamental differences between the forensic and non-forensic tasks of speaker verification – from a linguistic, logical and methodological perspective – the performances of a software implementation of automatic FVC based on ECAPA-TDNN, at different simulated operating conditions (noise level, net speech duration), are verified. Preliminary results confirm excellent performance in non-critical operating conditions. The hypotheses on the performance trend are also preliminarily confirmed: as the duration of the speech samples increases, and the noise level decreases, the evaluation metrics improve at a rate that depends on the combination of these two factors.

Downloads

Published

29-12-2023

Most read articles by the same author(s)

Similar Articles

<< < 1 2 3 4 5 6 7 8 9 10 11 

You may also start an advanced similarity search for this article.