We introduce Matcha-TTS, a new encoder-decoder architecture for speedy T...
Self-supervised learning (SSL) speech representations learned from large...
With read-aloud speech synthesis achieving high naturalness scores, ther...
Turn-taking is a fundamental aspect of human communication where speaker...
Recent work has explored using self-supervised learning (SSL) speech
rep...
Spontaneous speech has many affective and pragmatic functions that are
i...
Neural HMMs are a type of neural transducer recently proposed for
sequen...
Neural sequence-to-sequence TTS has achieved significantly better output...
Text-to-speech and co-speech gesture synthesis have until now been treat...
Embodied human communication encompasses both verbal (speech) and non-ve...