Reinforcement learning from human feedback (RLHF) is a technique for tra...
Modern diffusion models have set the state-of-the-art in AI image genera...
Deploying Large language models (LLMs) can pose hazards from harmful out...
Recommendation systems are pervasive in the digital economy. An importan...
Deep neural networks (DNNs) are powerful, but they can make mistakes tha...
Adversarial examples against AI systems pose both risks via malicious at...
Multi-agent reinforcement learning (MARL) is a powerful tool for trainin...
Designing recommendation systems that serve content aligned with time va...
The last decade of machine learning has seen drastic increases in scale ...
Recommender systems are the algorithms which select, filter, and persona...
From the earliest years of our lives, humans use language to express our...
The content that a recommender system (RS) shows to users influences the...
Natural language is an intuitive and expressive way to communicate rewar...
While modern policy optimization methods can do complex manipulation fro...
We describe cases where real recommender systems were modified in the se...
Ever since social activity on the Internet began migrating from the wild...
AI systems often rely on two key components: a specified goal or reward
...
We introduce the concept of a multi-principal assistance game (MPAG), an...
Assistance games (also known as cooperative inverse reinforcement learni...
How can societies learn to enforce and comply with social norms? Here we...
In artificial intelligence, we often specify tasks through a reward func...
Adversarial examples are a pervasive phenomenon of machine learning mode...
Reward functions are often misspecified. An agent optimizing an incorrec...
Learning preferences implicit in the choices humans make is a well studi...
Fundamental to robotics is the debate between model-based and model-free...
People frequently face challenging decision-making problems in which out...
It has become commonplace to assert that autonomous agents will have to ...
Adversarial examples are a pervasive phenomenon of machine learning mode...
Reward design, the problem of selecting an appropriate reward function f...
Our goal is for AI systems to correctly identify and act according to th...
Designing a good reward function is essential to robot planning and
rein...
We suggest that the analysis of incomplete contracting developed by law ...
Our goal is to enable robots to time their motion in a way that is
purpo...
Autonomous agents optimize the reward function we give them. What they d...
For an autonomous system to provide value (e.g., to customers, designers...
Intuitively, obedience -- following the order that a human gives -- seem...
It is clear that one of the primary tools we can use to mitigate the
pot...
For an autonomous system to be helpful to humans and to pose no unwarran...