MooseNet: A trainable metric for synthesized speech with plda backend

01/17/2023
by   Ondřej Plátek, et al.
2

We present MooseNet, a trainable speech metric that predicts listeners' Mean Opinion Score (MOS). We report improvements to the challenge baselines using easy-to-use modeling techniques, which also scales for larger self-supervised learning (SSL) model. We present two models. The first model is a Neural Network (NN). As a second model, we propose a PLDA generative model on the top layers of the first NN model, which improves the pure NN model. Ensembles from our two models achieve the top 3 or 4 VoiceMOS leaderboard places on all system and utterance level metrics.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro