Kísérletek a WavNet módszer alkalmazására magyar beszédszintézishez

  • Csaba Zainkó BME TMIT
  • Bálint Gyires-Tóth BME TMIT
  • Géza Németh BME TMIT
  • Gábor Olaszy BME TMIT


The WaveNet architecture is suitable to generate high quality speech, it was demonstrated for English by Google DeepMind. In this paper we de- scribe our experiments of using WaveNet for Hungarian speech generation. We investigated the effects of different control parameters and compared the quality of generated speech with different hyper-parameter settings and with different Hungarian speech databases. We examined the most influential con- trol parameters and we conducted a listening test to investigate the evaluation of Hungarian WaveNet models. Significant increase in quality of synthesized speech with larger models and with a modified approach was achieved.