We expose a surprising failure of generalization in auto-regressive larg...
We aim to better understand the emergence of `situational awareness' in ...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of ada...
Adapter layers are lightweight, learnable units inserted between transfo...
The Transformer model has achieved state-of-the-art performance in many
...
Modern deep neural networks can produce badly calibrated predictions,
es...
There has been recent success in pre-training on monolingual data and
fi...
Multi-task learning allows the sharing of useful information between mul...