Dylan Hadfield-Menell

research

∙ 07/27/2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for tra...

0 Stephen Casper, et al. ∙

research

∙ 07/08/2023

Measuring the Success of Diffusion Models at Imitating Human Artists

Modern diffusion models have set the state-of-the-art in AI image genera...

0 Stephen Casper, et al. ∙

research

∙ 06/15/2023

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

Deploying Large language models (LLMs) can pose hazards from harmful out...

13 Stephen Casper, et al. ∙

research

∙ 02/13/2023

Recommending to Strategic Users

Recommendation systems are pervasive in the digital economy. An importan...

0 Andreas Haupt, et al. ∙

research

∙ 11/18/2022

Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks

Deep neural networks (DNNs) are powerful, but they can make mistakes tha...

8 Stephen Casper, et al. ∙

research

∙ 09/05/2022

White-Box Adversarial Policies in Deep Reinforcement Learning

Adversarial examples against AI systems pose both risks via malicious at...

11 Stephen Casper, et al. ∙

research

∙ 08/22/2022

Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

Multi-agent reinforcement learning (MARL) is a powerful tool for trainin...

0 Phillip J. K. Christoffersen, et al. ∙

research

∙ 08/01/2022

Towards Psychologically-Grounded Dynamic Preference Models

Designing recommendation systems that serve content aligned with time va...

0 Mihaela Curmei, et al. ∙

research

∙ 07/27/2022

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

The last decade of machine learning has seen drastic increases in scale ...

10 Tilman Räuker, et al. ∙

research

∙ 07/20/2022

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Recommender systems are the algorithms which select, filter, and persona...

0 Jonathan Stray, et al. ∙

research

∙ 06/16/2022

How to talk so your robot will learn: Instructions, descriptions, and pragmatics

From the earliest years of our lives, humans use language to express our...

0 Theodore R. Sumers, et al. ∙

research

∙ 04/25/2022

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

The content that a recommender system (RS) shows to users influences the...

4 Micah Carroll, et al. ∙

research

∙ 04/11/2022

Linguistic communication as (inverse) reward design

Natural language is an intuitive and expressive way to communicate rewar...

0 Theodore R. Sumers, et al. ∙

research

∙ 12/06/2021

Guided Imitation of Task and Motion Planning

While modern policy optimization methods can do complex manipulation fro...

0 Michael James McDonald, et al. ∙

research

∙ 07/22/2021

What are you optimizing for? Aligning Recommender Systems with Human Values

We describe cases where real recommender systems were modified in the se...

0 Jonathan Stray, et al. ∙

research

∙ 07/01/2021

When Curation Becomes Creation: Algorithms, Microcontent, and the Vanishing Distinction between Platforms and Creators

Ever since social activity on the Internet began migrating from the wild...

0 Liu Leqi, et al. ∙

research

∙ 02/07/2021

Consequences of Misaligned AI

AI systems often rely on two key components: a specified goal or reward ...

0 Simon Zhuang, et al. ∙

research

∙ 12/29/2020

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

We introduce the concept of a multi-principal assistance game (MPAG), an...

3 Arnaud Fickinger, et al. ∙

research

∙ 07/19/2020

Multi-Principal Assistance Games

Assistance games (also known as cooperative inverse reinforcement learni...

0 Arnaud Fickinger, et al. ∙

research

∙ 01/25/2020

Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors

How can societies learn to enforce and comply with social norms? Here we...

0 Raphael Koster, et al. ∙

research

∙ 06/06/2019

An Extensible Interactive Interface for Agent Design

In artificial intelligence, we often specify tasks through a reward func...

0 Matthew Rahtz, et al. ∙

research

∙ 05/02/2019

Adversarial Training with Voronoi Constraints

Adversarial examples are a pervasive phenomenon of machine learning mode...

0 Marc Khoury, et al. ∙

research

∙ 02/26/2019

Conservative Agency via Attainable Utility Preservation

Reward functions are often misspecified. An agent optimizing an incorrec...

0 Alexander Matt Turner, et al. ∙

research

∙ 01/24/2019

The Assistive Multi-Armed Bandit

Learning preferences implicit in the choices humans make is a well studi...

0 Lawrence Chan, et al. ∙

research

∙ 01/04/2019

On the Utility of Model Learning in HRI

Fundamental to robotics is the debate between model-based and model-free...

0 Rohan Choudhury*, et al. ∙

research

∙ 12/21/2018

Human-AI Learning Performance in Multi-Armed Bandits

People frequently face challenging decision-making problems in which out...

0 Ravi Pandya, et al. ∙

research

∙ 11/03/2018

Legible Normativity for AI Alignment: The Value of Silly Rules

It has become commonplace to assert that autonomous agents will have to ...

0 Dylan Hadfield-Menell, et al. ∙

research

∙ 11/01/2018

On the Geometry of Adversarial Examples

Adversarial examples are a pervasive phenomenon of machine learning mode...

50 Marc Khoury, et al. ∙

research

∙ 09/09/2018

Active Inverse Reward Design

Reward design, the problem of selecting an appropriate reward function f...

0 Sören Mindermann, et al. ∙

research

∙ 06/11/2018

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

Our goal is for AI systems to correctly identify and act according to th...

0 Dhruv Malik, et al. ∙

research

∙ 06/07/2018

Simplifying Reward Design through Divide-and-Conquer

Designing a good reward function is essential to robot planning and rein...

0 Ellis Ratner, et al. ∙

research

∙ 04/12/2018

Incomplete Contracting and AI Alignment

We suggest that the analysis of incomplete contracting developed by law ...

0 Dylan Hadfield-Menell, et al. ∙

research

∙ 02/05/2018

Expressive Robot Motion Timing

Our goal is to enable robots to time their motion in a way that is purpo...

0 Allan Zhou, et al. ∙

research

∙ 11/08/2017

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they d...

0 Dylan Hadfield-Menell, et al. ∙

research

∙ 07/20/2017

Pragmatic-Pedagogic Value Alignment

For an autonomous system to provide value (e.g., to customers, designers...

0 Jaime F. Fisac, et al. ∙

research

∙ 05/28/2017

Should Robots be Obedient?

Intuitively, obedience -- following the order that a human gives -- seem...

0 Smitha Milli, et al. ∙

research

∙ 11/24/2016

The Off-Switch Game

It is clear that one of the primary tools we can use to mitigate the pot...

0 Dylan Hadfield-Menell, et al. ∙

research

∙ 06/09/2016

Cooperative Inverse Reinforcement Learning

For an autonomous system to be helpful to humans and to pose no unwarran...

0 Dylan Hadfield-Menell, et al. ∙

Dylan Hadfield-Menell

Featured Co-authors

Sign in with Google

Consider DeepAI Pro