We investigate the optimal model size and number of tokens for training ...
Language Models (LMs) often cannot be deployed because of their potentia...
The performance of a language model has been shown to be effectively mod...
We enhance auto-regressive language models by conditioning on document c...