Comparing formant extraction methods according to speaking style and added noise in Forensic Voice Comparison

Dávid Sztahó; Attila Fejes; György Szaszák

doi:10.15775/Besztud.2022.7-35

Sztahó Dávid Budapest University of Technology and Economics
Fejes Attila Special Service for National Security
Szaszák György Budapest University of Technology and Economics

DOI: https://doi.org/10.15775/Besztud.2022.7-35

Absztrakt

In forensic voice comparison, formant measurements are a “traditional” way of comparing speaker identities. Deep learning may offer a new way of estimating formant values; therefore, it is essential to compare its performance in a forensic way of use. In this study, four formant estimation methods are compared: three based on LPC and one on deep learning. Several aspects of formant modelling in forensic voice comparison were investigated: comparisons according to utterance lengths, speaking styles, samples corrupted with various noises: reverberation and white noise. Results are reported according to Cllr, AUC and EER metrics. It was found that the length of recording used as suspect samples influences performance to a large extent. Additionally, formant tracking based on deep learning lags behind the other methods in all metrics. Same and different speaking styles also have a measurable effect on performance. Samples corrupted with reverberation do not deteriorate results but white noise does. There are no exact results on which method is better and which is to be used in studies and works. Cllr values shows that the three LPC based methods perform similarly. They all make large mistakes when samples are corrupted with white noses. Although deepformants performs slightly worse than the other used in this study, it seems to have more resilience to white noise.