research
∙
09/26/2021
On the Prunability of Attention Heads in Multilingual BERT
Large multilingual models, such as mBERT, have shown promise in crosslin...
research
∙
01/22/2021
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Multi-headed attention heads are a mainstay in transformer-based models....
research
∙
08/13/2020