Nino Vieillard

research

∙ 06/23/2023

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Knowledge distillation is commonly used for compressing neural networks ...

1 Rishabh Agarwal, et al. ∙

research

∙ 05/31/2023

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Despite the seeming success of contemporary grounded text generation sys...

0 Paul Roit, et al. ∙

research

∙ 05/22/2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...

0 Toshinori Kitamura, et al. ∙

research

∙ 05/27/2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

In this work, we consider and analyze the sample complexity of model-fre...

6 Tadashi Kozuno, et al. ∙

research

∙ 08/16/2021

Implicitly Regularized RL with Implicit Q-Values

The Q-function is a central quantity in many Reinforcement Learning (RL)...

0 Nino Vieillard, et al. ∙

research

∙ 06/11/2021

Offline Reinforcement Learning as Anti-Exploration

Offline Reinforcement Learning (RL) aims at learning an optimal control ...

0 Shideh Rezaeifar, et al. ∙

research

∙ 03/02/2021

Offline Reinforcement Learning with Pseudometric Learning

Offline Reinforcement Learning methods seek to learn a policy from logge...

0 Robert Dadashi, et al. ∙

research

∙ 07/28/2020

Munchausen Reinforcement Learning

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most a...

0 Nino Vieillard, et al. ∙

research

∙ 03/31/2020

Leverage the Average: an Analysis of Regularization in RL

Building upon the formalism of regularized Markov decision processes, we...

7 Nino Vieillard, et al. ∙

research

∙ 10/21/2019

Momentum in Reinforcement Learning

We adapt the optimization's concept of momentum to reinforcement learnin...

0 Nino Vieillard, et al. ∙

research

∙ 10/18/2019

On Connections between Constrained Optimization and Reinforcement Learning

Dynamic Programming (DP) provides standard algorithms to solve Markov De...

0 Nino Vieillard, et al. ∙

research

∙ 06/24/2019

Deep Conservative Policy Iteration

Conservative Policy Iteration (CPI) is a founding algorithm of Approxima...

0 Nino Vieillard, et al. ∙

Nino Vieillard

Featured Co-authors

Sign in with Google

Consider DeepAI Pro