We consider a deterministic game with alternate moves and complete information, of which the issue is always the victory of one of the two opponents. We assume that this game is the realization of a random model enjoying some independence properties. We consider algorithms in the spirit of Monte-Carlo Tree Search, to estimate at best the minimax value of a given position: it consists in simulating, successively, n well-chosen matches, starting from this position. We build an algorithm, which is optimal, step by step, in some sense: once the n first matches are simulated, the algorithm decides from the statistics furnished by the n first matches (and the a priori we have on the game) how to simulate the (n + 1)th match in such a way that the increase of information concerning the minimax value of the position under study is maximal. This algorithm is remarkably quick. We prove that our step by step optimal algorithm is not globally optimal and that it always converges in a finite number of steps, even if the a priori we have on the game is completely irrelevant. We finally test our algorithm, against MCTS, on Pearl’s game [Pearl, Artif. Intell. 14 (1980) 113–138] and, with a very simple and universal a priori, on the game Connect Four and some variants. The numerical results are rather disappointing. We however exhibit some situations in which our algorithm seems efficient.
Mots-clés : Monte Carlo tree search, deterministic games with alternate moves and complete information, minimax values, finite random trees, branching property
@article{PS_2019__23__176_0, author = {Delattre, Sylvain and Fournier, Nicolas}, title = {On {Monte-Carlo} tree search for deterministic games with alternate moves and complete information}, journal = {ESAIM: Probability and Statistics}, pages = {176--216}, publisher = {EDP-Sciences}, volume = {23}, year = {2019}, doi = {10.1051/ps/2018006}, mrnumber = {3945581}, zbl = {1417.91011}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ps/2018006/} }
TY - JOUR AU - Delattre, Sylvain AU - Fournier, Nicolas TI - On Monte-Carlo tree search for deterministic games with alternate moves and complete information JO - ESAIM: Probability and Statistics PY - 2019 SP - 176 EP - 216 VL - 23 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ps/2018006/ DO - 10.1051/ps/2018006 LA - en ID - PS_2019__23__176_0 ER -
%0 Journal Article %A Delattre, Sylvain %A Fournier, Nicolas %T On Monte-Carlo tree search for deterministic games with alternate moves and complete information %J ESAIM: Probability and Statistics %D 2019 %P 176-216 %V 23 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ps/2018006/ %R 10.1051/ps/2018006 %G en %F PS_2019__23__176_0
Delattre, Sylvain; Fournier, Nicolas. On Monte-Carlo tree search for deterministic games with alternate moves and complete information. ESAIM: Probability and Statistics, Tome 23 (2019), pp. 176-216. doi : 10.1051/ps/2018006. http://archive.numdam.org/articles/10.1051/ps/2018006/
[1] Expected-outcome: a general model of static evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 182–193. | DOI
,[2] Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2002) 235–256. | DOI | Zbl
, and ,[3] A Bayesian approach to relevance in game playing. Artif. Intell. 97 (1997) 195–242. | DOI | MR | Zbl
and ,[4] A survey monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4 (2012) 1–43. | DOI
, , , , , , , , and ,[5] Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5 (2012) 1–122. | DOI | Zbl
and ,[6] An analysis of optimistic, best-first search for minimax sequential decision making. IEEE Int. Symp. Approx. Dyn. Program. Reinf. Learn. (2014).
, and ,[7] Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4 (2008) 343–357. | DOI | MR | Zbl
, , , and ,[8] Bandit Algorithms for Tree Search. Technical report, INRIA RR-6141 (2007).
and ,[9] Efficient selectivity and backup operators in Monte-Carlo tree search, in Proc. of 5th Int. Conf. Comput. and Games, Turin, Italy (2006) 72–83.
,[10] Random minimax game trees, in Random Discrete Structures (Minneapolis, MN, 1993) Vol. 76 of The IMA Volumes in Mathematics and its Applications. Springer, New York (1996) 55–80. | DOI | MR | Zbl
and ,[11] Maximin action identification: a new bandit framework for games, in JMLR: Workshop and Conference Proceedings, Vol. 49 (2016) 1–23.
, and ,[12] Modification of UCT With Patterns in Monte-Carlo Go. Tech. Rep. Inst. Nat. Rech. Inform. Auto. (INRIA), Paris (2006).
, , and ,[13] GIB: imperfect information in a computationally challenging game. J. Artif. Intell. Res. 14 (2001) 303–358. | DOI | Zbl
,[14] Adaptive submodularity: theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res. 42 (2011) 427–486. | MR | Zbl
and ,[15] Bandit based Monte-Carlo planning, in Machine Learning: ECML, Vol. 4212 of Lecture Notes in Comput. Sci. Springer, Berlin (2006) 282–293. | DOI | MR
and ,[16] The computational intelligence of MoGo revealed in taiwans computer go tournaments. IEEE Trans. Comput. Intell. AI Games 1 (2009) 73–89. | DOI
, , , , , et al.,[17] From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning, in Foundations and Trends in Machine Learning (Book 21). Now Publishers Inc. (2014) 146. | Zbl
,[18] Asymptotic properties of minimax trees and game-searching procedures. Artif. Intell. 14 (1980) 113–138. | DOI | MR | Zbl
,[19] World-championship-caliber srabble. Artif. Intell. 134 (2002) 241–275. | Zbl
,[20] Mastering the game of go with deep neural networks and tree search. Nature 529 (2016) 484–489. | DOI
, , , , , , , , , , , , , , , , , , , ,[21] Optimal search on some game trees. J. Assoc. Comput. Mach. 30 (1983) 389–396. | DOI | MR | Zbl
,[22] Bayesian inference in Monte-Carlo tree search, in UAI’10 Proc. of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (2010) 580–588.
, and ,Cité par Sources :