"Induction heads" are attention heads that implement a simple algorithm ...
We describe our early efforts to red team language models in order to
si...
We study whether language models can evaluate the validity of their own
...
Recent large language models have been trained on vast datasets, but als...
We apply preference modeling and reinforcement learning from human feedb...
Large-scale pre-training has recently emerged as a technique for creatin...
Given the broad capabilities of large language models, it should be poss...
Rapid progress in machine learning and artificial intelligence (AI) has
...
TensorFlow is an interface for expressing machine learning algorithms, a...