On Monte-Carlo tree search for deterministic games with alternate moves and complete information

Delattre, Sylvain; Fournier, Nicolas

doi:10.1051/ps/2018006

Delattre, Sylvain ¹ ; Fournier, Nicolas ¹

ESAIM: Probability and Statistics, Tome 23 (2019), pp. 176-216.

Suite au passage du modèle économique de la revue en S20, le texte intégral des articles des années 2019 et 2020 est accessible uniquement sur le site de l'éditeur et est réservé aux abonnés.

Résumé

We consider a deterministic game with alternate moves and complete information, of which the issue is always the victory of one of the two opponents. We assume that this game is the realization of a random model enjoying some independence properties. We consider algorithms in the spirit of Monte-Carlo Tree Search, to estimate at best the minimax value of a given position: it consists in simulating, successively, n well-chosen matches, starting from this position. We build an algorithm, which is optimal, step by step, in some sense: once the n first matches are simulated, the algorithm decides from the statistics furnished by the n first matches (and the a priori we have on the game) how to simulate the (n + 1)th match in such a way that the increase of information concerning the minimax value of the position under study is maximal. This algorithm is remarkably quick. We prove that our step by step optimal algorithm is not globally optimal and that it always converges in a finite number of steps, even if the a priori we have on the game is completely irrelevant. We finally test our algorithm, against MCTS, on Pearl’s game [Pearl, Artif. Intell. 14 (1980) 113–138] and, with a very simple and universal a priori, on the game Connect Four and some variants. The numerical results are rather disappointing. We however exhibit some situations in which our algorithm seems efficient.

MR Zbl

DOI : 10.1051/ps/2018006

Classification : 91A05, 68T20, 60J80
Mots-clés : Monte Carlo tree search, deterministic games with alternate moves and complete information, minimax values, finite random trees, branching property

Affiliations des auteurs :

Delattre, Sylvain ¹ ; Fournier, Nicolas ¹

@article{PS_2019__23__176_0,
     author = {Delattre, Sylvain and Fournier, Nicolas},
     title = {On {Monte-Carlo} tree search for deterministic games with alternate moves and complete information},
     journal = {ESAIM: Probability and Statistics},
     pages = {176--216},
     publisher = {EDP-Sciences},
     volume = {23},
     year = {2019},
     doi = {10.1051/ps/2018006},
     mrnumber = {3945581},
     zbl = {1417.91011},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ps/2018006/}
}

TY  - JOUR
AU  - Delattre, Sylvain
AU  - Fournier, Nicolas
TI  - On Monte-Carlo tree search for deterministic games with alternate moves and complete information
JO  - ESAIM: Probability and Statistics
PY  - 2019
SP  - 176
EP  - 216
VL  - 23
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/ps/2018006/
DO  - 10.1051/ps/2018006
LA  - en
ID  - PS_2019__23__176_0
ER  -

%0 Journal Article
%A Delattre, Sylvain
%A Fournier, Nicolas
%T On Monte-Carlo tree search for deterministic games with alternate moves and complete information
%J ESAIM: Probability and Statistics
%D 2019
%P 176-216
%V 23
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/ps/2018006/
%R 10.1051/ps/2018006
%G en
%F PS_2019__23__176_0

Delattre, Sylvain; Fournier, Nicolas. On Monte-Carlo tree search for deterministic games with alternate moves and complete information. ESAIM: Probability and Statistics, Tome 23 (2019), pp. 176-216. doi : 10.1051/ps/2018006. https://www.numdam.org/articles/10.1051/ps/2018006/

Bibliographie
Cité par

[1] B. Abramson, Expected-outcome: a general model of static evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 182–193. | DOI

[2] P. Auer, N. Cesa-Bianchi and P. Fischer, Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2002) 235–256. | DOI | Zbl

[3] E.B. Baum and W.D. Smith, A Bayesian approach to relevance in game playing. Artif. Intell. 97 (1997) 195–242. | DOI | MR | Zbl

[4] C. Browne, E. Powley, D. Whitehouse, S. Lucas, P.I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S. Colton, A survey monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4 (2012) 1–43. | DOI

[5] S. Bubeck and N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5 (2012) 1–122. | DOI | Zbl

[6] L. Buşoniu, R. Munos and E. Páll, An analysis of optimistic, best-first search for minimax sequential decision making. IEEE Int. Symp. Approx. Dyn. Program. Reinf. Learn. (2014).

[7] G.M.J.B. Chaslot, M.H.M. Winands, H.J. Van Den Herik, J.W.H.M. Uiterwijk and B. Bouzy, Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4 (2008) 343–357. | DOI | MR | Zbl

[8] P.A. Coquelin and R. Munos, Bandit Algorithms for Tree Search. Technical report, INRIA RR-6141 (2007).

[9] R. Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search, in Proc. of 5th Int. Conf. Comput. and Games, Turin, Italy (2006) 72–83.

[10] L. Devroye and O. Kamoun, Random minimax game trees, in Random Discrete Structures (Minneapolis, MN, 1993) Vol. 76 of The IMA Volumes in Mathematics and its Applications. Springer, New York (1996) 55–80. | DOI | MR | Zbl

[11] A. Garivier, E. Kaufmann and W.M. Koolen, Maximin action identification: a new bandit framework for games, in JMLR: Workshop and Conference Proceedings, Vol. 49 (2016) 1–23.

[12] S. Gelly, Y. Wang, R. Munos and O. Teytaud, Modification of UCT With Patterns in Monte-Carlo Go. Tech. Rep. Inst. Nat. Rech. Inform. Auto. (INRIA), Paris (2006).

[13] M.L. Ginsberg, GIB: imperfect information in a computationally challenging game. J. Artif. Intell. Res. 14 (2001) 303–358. | DOI | Zbl

[14] D. Golovin and A. Kraus, Adaptive submodularity: theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res. 42 (2011) 427–486. | MR | Zbl

[15] L. Kocsis and C. Szepesvári, Bandit based Monte-Carlo planning, in Machine Learning: ECML, Vol. 4212 of Lecture Notes in Comput. Sci. Springer, Berlin (2006) 282–293. | DOI | MR

[16] C.S. Lee, M.H. Wang, G.M.J.B. Chaslot, J.B. Hoock, A. Rimmel, O. Teytaud et al., The computational intelligence of MoGo revealed in taiwans computer go tournaments. IEEE Trans. Comput. Intell. AI Games 1 (2009) 73–89. | DOI

[17] R. Munos, From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning, in Foundations and Trends in Machine Learning (Book 21). Now Publishers Inc. (2014) 146. | Zbl

[18] J. Pearl, Asymptotic properties of minimax trees and game-searching procedures. Artif. Intell. 14 (1980) 113–138. | DOI | MR | Zbl

[19] B. Sheppard, World-championship-caliber srabble. Artif. Intell. 134 (2002) 241–275. | Zbl

[20] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, Mastering the game of go with deep neural networks and tree search. Nature 529 (2016) 484–489. | DOI

[21] M. Tarsi, Optimal search on some game trees. J. Assoc. Comput. Mach. 30 (1983) 389–396. | DOI | MR | Zbl

[22] G. Tesauro, V.T. Rajan and R. Segal, Bayesian inference in Monte-Carlo tree search, in UAI’10 Proc. of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (2010) 580–588.

Cité par Sources :