LTL2Action: Generalizing LTL Instructions for Multi-Task RL
We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments. The combinatorial task sets we target consist of up to 10^39 unique tasks. We employ a well-known formal language – linear temporal logic (LTL) – to specify instructions, using a domain-specific vocabulary. We propose a novel approach to learning that exploits the compositional syntax and the semantics of LTL, enabling our RL agent to learn task-conditioned policies that generalize to new instructions, not observed during training. The expressive power of LTL supports the specification of a diversity of complex temporally extended behaviours that include conditionals and alternative realizations. To reduce the overhead of learning LTL semantics, we introduce an environment-agnostic LTL pretraining scheme which improves sample-efficiency in downstream environments. Experiments on discrete and continuous domains demonstrate the strength of our approach in learning to solve (unseen) tasks, given LTL instructions.
READ FULL TEXT