We investigate the optimal model size and number of tokens for training ...
The performance of a language model has been shown to be effectively mod...
We enhance auto-regressive language models by conditioning on document c...
The recently-developed WaveNet architecture is the current state of the ...