End-to-End Recognition of Spontaneous Speech on the Hungarian BEA Database
Absztrakt
The end-to-end deep neural network based speech recognition approach is increasingly popular due to its fully data driven nature - no language-specific knowledge is needed beyond the transcribed speech data. However, most of the end-to-end speech recognition experiments are performed on read (planned) speech and no Hungarian language results are available for the Speech Community. In this paper, we make the first attempt to train and evaluate a Hungarian speech recognition system based on the studio-quality Hungarian BEA (Spoken Language Speech Database) in an end-to-end neural manner. We present the challenge of recognising spontaneous speech: even without any significant background noise, the word error rate on spontaneous speech is an order of magnitude higher than in the case of planned speech - both recorded with the same speakers in the same environment. This emphasises the need for more thorough studies of spontaneous speech and possibly for more data.