How do language models learn to make predictions during pre-training? To...
NLP models often degrade in performance when real world data distributio...
Transformer language models have received widespread public attention, y...
We assess how multilingual language models maintain a shared multilingua...
We investigate how neural language models acquire individual words durin...
In this paper, we detail the relationship between convolutions and
self-...
We train neural machine translation (NMT) models from English to six tar...