Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

11/12/2020
by   Hamed Hemati, et al.
0

Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them towards a new speaker or a new language is not as straight-forward in a low-resource setup. In this paper, we show that by applying minor changes to a Tacotron model, one can transfer an existing TTS model for a new speaker with the same or a different language using only 20 minutes of data. For this purpose, we first introduce a baseline multi-lingual Tacotron with language-agnostic input, then show how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro