Simulated Multiple Reference Training Improves Low-Resource Machine Translation

04/30/2020
by   Huda Khayrallah, et al.
0

Many valid translations exist for a given sentence, and yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce a novel MT training method that approximates the full space of possible translations by: sampling a paraphrase of the reference sentence from a paraphraser and training the MT model to predict the paraphraser's distribution over possible tokens. With an English paraphraser, we demonstrate the effectiveness of our method in low-resource settings, with gains of 1.2 to 7 BLEU.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset