We study how to learn ϵ-optimal strategies in zero-sum imperfect
informa...
Multi-step learning applies lookahead over multiple time steps and has p...
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
Multi-robot navigation is the task of finding trajectories for a team of...
The hierarchy of global and local planners is one of the most commonly
u...
This study presents a benchmark for evaluating action-constrained
reinfo...
Robust Markov Decision Processes (MDPs) are getting more attention for
l...
Imperfect information games (IIG) are games in which each player only
pa...
We consider approximate dynamic programming in γ-discounted Markov
decis...
In this work, we consider and analyze the sample complexity of model-fre...
The performance of reinforcement learning (RL) agents is sensitive to th...
Approximate Policy Iteration (API) algorithms alternate between (approxi...
Model-agnostic meta-reinforcement learning requires estimating the Hessi...
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
Recently many algorithms were devised for reinforcement learning (RL) wi...
Progress in deep reinforcement learning (RL) research is largely enabled...
Off-policy multi-step reinforcement learning algorithms consist of
conse...
Building upon the formalism of regularized Markov decision processes, we...
In real-world applications of reinforcement learning (RL), noise from
in...
Approximate dynamic programming algorithms, such as approximate value
it...