The massive successes of large language models (LLMs) encourage the emer...
Recent months have seen the emergence of a powerful new trend in which l...
In this paper, we propose an enhanced approach for Rapid Exploration and...
One of the grand enduring goals of AI is to create generalist agents tha...
Lifelong learning offers a promising paradigm of building a generalist a...
Aligning language models (LMs) with preferences is an important problem ...
Achieving machine autonomy and human control often represent divergent
o...
Incorporating human feedback has been shown to be crucial to align text
...
When learning task-oriented dialogue (ToD) agents, reinforcement learnin...
In offline model-based reinforcement learning (offline MBRL), we learn a...
Goal-conditioned reinforcement learning (GCRL) has a wide range of poten...
Offline reinforcement learning (RL) extends the paradigm of classical RL...
Offline reinforcement learning enables learning from a fixed dataset, wi...
Reinforcement learning (RL) has drawn increasing interests in recent yea...
Deployed real-world machine learning applications are often subject to
u...
Text classification is usually studied by labeling natural language text...
Off-policy evaluation (OPE) is the task of estimating the expected rewar...
Off-policy evaluation provides an essential tool for evaluating the effe...
We consider off-policy evaluation (OPE), which evaluates the performance...
Infinite horizon off-policy policy evaluation is a highly challenging ta...
Value function learning plays a central role in many state-of-the-art
re...
It is very useful to integrate human knowledge and experience into
tradi...
Policy gradient methods have achieved remarkable successes in solving
ch...
We propose a simple algorithm to train stochastic neural networks to dra...