Knowledge-intensive tasks such as question answering often require
assim...
This paper proposes Omnidirectional Representations from Transformers
(O...
Transformers do not scale very well to long sequence lengths largely bec...
Transformers-based models, such as BERT, have been one of the most succe...
As machine learning has become more prevalent, researchers have begun to...
Transformer-based models have pushed the state of the art in many natura...