The Development of the FilmHír database for researching the speech material in old sound newsreels

  • Ákos Gocsál

Abstract

This paper presents the first steps of the development of a speech database that supports the resarch of the spoken material in the early sound newsreels produced in Hungary. Since there was no recording protocol, and the recordings were not made in a controlled environment, the development of this database differs from that of other, purpose-built databases, such as the BEA Hungarian Spontaneous Speech Database, in several ways. There is no balanced selection of speakers or speech styles and the speech material may contain elements unpredictable to the developer. In the first phase of development, speech materials of 140 newsreels, made in 1931 and 1932, were extracted and annotated (only pause-to-pause sections). A table was created with basic metadata. It was revealed that the newsreels contain a variety of speech styles, such as conversations, public speeches, narrations, commands, interviews. In the second phase, further metadata were added, including background noises, music, utterances in foreign languages, distorsions, etc. Research experiences and possibilities offered by this database are also discussed as well as directions for its further development.

Published
2023-05-10