Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011...
We propose a first-order method for convex optimization, where instead o...
Methods for sequential decision-making are often built upon a foundation...
Recent research has shown that seemingly fair machine learning models, w...
Model-based reinforcement learning promises to learn an optimal policy f...
Most reinforcement learning (RL) recommendation systems designed for edg...
Many sequential decision making problems are high-stakes and require
off...
Historically, to bound the mean for small sample sizes, practitioners ha...
We study the problem of Safe Policy Improvement (SPI) under constraints ...
When faced with sequential decision-making problems, it is often useful ...
Many sequential decision-making systems leverage data collected using pr...
Many real-world sequential decision-making problems involve critical sys...
Strategic recommendations (SR) refer to the problem where an intelligent...
Performance evaluations are critical for quantifying algorithmic advance...
Most reinforcement learning methods are based upon the key assumption th...
Reinforcement learning (RL) has become an increasingly active area of
re...
Neuroscientific theory suggests that dopaminergic neurons broadcast glob...
The policy gradient theorem describes the gradient of the expected disco...
We propose a new objective function for finite-horizon episodic Markov
d...
The Markov decision process (MDP) formulation used to model many real-wo...
In many real-world sequential decision making problems, the number of
av...
We present a new method for constructing a confidence interval for the m...
In this paper we introduce a reinforcement learning (RL) approach for
tr...
In this paper we introduce a reinforcement learning (RL) approach for
tr...
In this paper we consider the problem of how a reinforcement learning ag...
Most model-free reinforcement learning methods leverage state representa...
Many reinforcement learning applications involve the use of data that is...
The recently proposed option-critic architecture Bacon et al. provide a
...
Machine learning algorithms are everywhere, ranging from simple data ana...
We show how an action-dependent baseline can be used by the policy gradi...
We consider the task of evaluating a policy for a Markov decision proces...
In the artificial intelligence field, learning often corresponds to chan...
Importance sampling is often used in machine learning when training and
...
In this paper we present a new way of predicting the performance of a
re...
This paper specifies a notation for Markov decision processes....
This paper introduces new optimality-preserving operators on Q-functions...