Optimizing for long term value is desirable in many practical applicatio...
Robust Reinforcement Learning aims to derive an optimal behavior that
ac...
The slate recommendation problem aims to find the "optimal" ordering of ...
Robust reinforcement learning aims to produce policies that have strong
...
Temporal Difference learning or TD(λ) is a fundamental algorithm in
the ...
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework t...
For complex, high-dimensional Markov Decision Processes (MDPs), it may b...
The monolithic approach to policy representation in Markov Decision Proc...
Twitter, a popular social network, presents great opportunities for on-l...