Direct speech-to-speech translation (S2ST) with discrete self-supervised...
Various applications of voice synthesis have been developed independentl...
Multi-modal Contrastive Representation (MCR) learning aims to encode
dif...
Unconstrained lip-to-speech synthesis aims to generate corresponding spe...