Policy Certificates: Towards Accountable Reinforcement Learning

11/07/2018
by   Christoph Dann, et al.
0

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about their current policy's quality before executing it, and thus have limited use in high-stakes applications like healthcare. In this paper, we address such a lack of accountability by proposing that algorithms output policy certificates, which upper bound the suboptimality in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further present a new learning framework (IPOC) for finite-sample analysis with policy certificates, and develop two IPOC algorithms that enjoy guarantees for the quality of both their policies and certificates.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset