Offline reinforcement learning (RL) seeks to derive an effective control...
Efficient exploration is critical in cooperative deep Multi-Agent
Reinfo...
Reinforcement learning (RL) agents can leverage batches of previously
co...
Deep latent variable models have achieved significant empirical successe...
We study the effect of stochasticity in on-policy policy optimization, a...
We study the fundamental question of the sample complexity of learning a...
Batch policy optimization considers leveraging existing data for policy
...
We make three contributions toward better understanding policy gradient
...
Despite its potential to improve sample complexity versus model-free
app...