Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

09/23/2016

∙

We present the first reinforcement-learning model to self-improve its reward-modulated training implemented through a continuously improving "intuition" neural network. An agent was trained how to play the arcade video game Pong with two reward-based alternatives, one where the paddle was placed randomly during training, and a second where the paddle was simultaneously trained on three additional neural networks such that it could develop a sense of "certainty" as to how probable its own predicted paddle position will be to return the ball. If the agent was less than 95 policy used an intuition neural network to place the paddle. We trained both architectures for an equivalent number of epochs and tested learning performance by letting the trained programs play against a near-perfect opponent. Through this, we found that the reinforcement learning model that uses an intuition neural network for placing the paddle during reward training quickly overtakes the simple architecture in its ability to outplay the near-perfect opponent, additionally outscoring that opponent by an increasingly wide margin after additional epochs of training.

READ FULL TEXT

Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

Configurable Agent With Reward As Input: A Play-Style Continuum Generation

Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Self-Play Learning Without a Reward Metric

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Alpha-Mini: Minichess Agent with Deep Reinforcement Learning

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments

Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

Related Research

Configurable Agent With Reward As Input: A Play-Style Continuum Generation

Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Self-Play Learning Without a Reward Metric

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Alpha-Mini: Minichess Agent with Deep Reinforcement Learning

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments