Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent
Counterfactual Regret Minimization (CFR) is a kind of regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets. CFRs have a fast convergence rate in practice and they have been widely used for solving large-scale imperfect-information Extensive-form games (EFGs). However, due to their locality, CFRs are difficult to analyze and extend. Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization. They are mathematically elegant but less practical in solving EFGs. In this paper, we provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD, respectively. With these equivalences, two new algorithms, which can be considered as the extensions of vanilla CFR and CFR+, are deduced from the perspective of FTRL and OMD. In these two variants, maintaining the local counterfactual regrets is not necessary anymore. The experiments show that the two variants converge faster than vanilla CFR and CFR+ in some EFGs.
READ FULL TEXT