AlphaZero-Inspired General Board Game Learning and Playing
Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).
READ FULL TEXT