Learning Task Specifications from Demonstrations via the Principle of Maximum Causal Entropy
In many settings (e.g., robotics) demonstrations provide a natural way to specify sub-tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed specializing to task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process.
READ FULL TEXT