On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm

05/11/2023
by   Ukjo Hwang, et al.
0

We study a robust reinforcement learning (RL) with model uncertainty. Given nominal Markov decision process (N-MDP) that generate samples for training, an uncertainty set is defined, which contains some perturbed MDPs from N-MDP for the purpose of reflecting potential mismatched between training (i.e., N-MDP) and testing environments. The objective of robust RL is to learn a robust policy that optimizes the worst-case performance over an uncertainty set. In this paper, we propose a new uncertainty set containing more realistic MDPs than the existing ones. For this uncertainty set, we present a robust RL algorithm (named ARQ-Learning) for tabular case and characterize its finite-time error bound. Also, it is proved that ARQ-Learning converges as fast as Q-Learning and the state-of-the-art robust Q-Learning while ensuring better robustness to real-world applications. Next, we propose pessimistic agent that efficiently tackles the key bottleneck for the extension of ARQ-Learning into the case with larger or continuous state spaces. Incorporating the idea of pessimistic agents into the famous RL algorithms such as Q-Learning, deep-Q network (DQN), and deep deterministic policy gradient (DDPG), we present PRQ-Learning, PR-DQN, and PR-DDPG, respectively. Noticeably, the proposed idea can be immediately applied to other model-free RL algorithms (e.g., soft actor critic). Via experiments, we demonstrate the superiority of our algorithms on various RL applications with model uncertainty.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset