Matthew Rahtz | DeepAI

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Anca D. Dragan
63 publications
Shane Legg
41 publications
Dylan Hadfield-Menell
38 publications
Jan Leike
27 publications
Geoffrey Irving
18 publications
Rohin Shah
17 publications
David Lindner
15 publications
János Kramár
14 publications
Ramana Kumar
12 publications
Neel Nanda
11 publications
Vladimir Mikulik
10 publications

research

∙ 07/28/2023

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usi...

0 Thomas McGrath, et al. ∙

research

∙ 07/18/2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Circuit analysis is a promising technique for understanding the internal...

0 Tom Lieberum, et al. ∙

research

∙ 01/12/2023

Tracr: Compiled Transformers as a Laboratory for Interpretability

Interpretability research aims to build tools for understanding machine ...

0 David Lindner, et al. ∙

research

∙ 01/20/2022

Safe Deep RL in 3D Environments using Human Feedback

Agents should avoid unsafe behaviour during both training and deployment...

5 Matthew Rahtz, et al. ∙

research

∙ 06/06/2019

An Extensible Interactive Interface for Agent Design

In artificial intelligence, we often specify tasks through a reward func...

0 Matthew Rahtz, et al. ∙

Success!

An error occurred