Changes in the results of voice biometric systems using different technologies in case of different speech tasks and voice sample lengths

Attila Fejes; Dávid Sztahó

doi:10.15775/Besztud.2025.2.132-153

Attila Fejes Nemzetbiztonsági Szakszolgálat Szakértői Intézet https://orcid.org/0000-0003-4139-5718
Dávid Sztahó Budapest University of Technology and Economics

DOI: https://doi.org/10.15775/Besztud.2025.2.132-153

Keywords: forensic speaker comparison, voice biometrics, speech style, forensic science, Likelihood Ratio

Abstract

Abstract
During forensic speaker comparison, the audio forensics expert appointed to perform the investigation works with audio recordings of different types and durations. Distinct speech samples and durations affect the probability data. In order to evaluate biometric identification results, the probability value of the data obtained must be determined so that the expert’s report can be accurate and interpreted by other actors in the public proceedings. In the present study, the speech samples of 78 speakers from the forensic voice sample database were compared within the framework of the FORENSICSpeech research project (Beke et al. 2020). The samples include three different types of speech: spontaneous, read, and narration speech. The recording of the samples was repeated after an average of two weeks, and then the audio files were cut into 20, 40, 60, 80, 100, and 120 seconds in duration using automatic editing. The aim of this study is to show how different speech styles and durations affect voice biometric identification results.
Results show that EER2 and FRR3, Cllr4 and Cllr-min values decrease with increasing duration, however, in the 20–120-second range, the change is not continuous. Similarly, the lowest EER, FRR, Cllr, and Cllr- min values occur in the case of spontaneous speech, followed by narration, while the speech samples of information exchange give the highest Cllr values. The data as a whole is characterized by the fact that the more advanced i-vector method tends to provide more efficient, lower error-rate person identification results.

1 Gaussian Mixture Model – Universal Background Model
2 Equal Error Rate
3 False Reject Rate
4 Cost Likelihood Ratio

Author Biography

Attila Fejes, Nemzetbiztonsági Szakszolgálat Szakértői Intézet

nb. őrnagy,