Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Monte Carlo tree search in the "Settlers of Catan" strategy game

Karamalegos Emmanouil

Full record

Year 2016
Type of Item Diploma Work
Bibliographic Citation Emmanouil Karamalegos, "Monte Carlo tree search in the "Settlers of Catan" strategy game", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2016
Appears in Collections


Classic approaches to game AI require either a high quality of domain knowledge, ora long time to generate effective AI behavior. Monte Carlo Tree Search (MCTS) is asearch method that combines the precision of tree search with the generality ofrandom sampling. The family of MCTS algorithms has achieved promising resultswith perfect-information games such as Go. In our work, we apply Monte-Carlo TreeSearch to the non-deterministic game "Settlers of Catan", a multi-player board-turnedweb-based game that necessitates strategic planning and negotiation skills. Weimplemented an agent which takes into consideration all the aspects of the game forthe first time, using no domain knowledge.In our work, we are experimenting using a reinforcement learning method Value ofPerfect Information (VPI) and two bandit methods, namely, the Upper CoefficientBound and Bayesian Upper Coefficient Bound methods. Such methods attempt tostrike a balance between exploitation and exploration when creating of the searchtree.For first time, we implemented an agent that takes into consideration the completerules-set of the game and makes it possible to negotiate trading between all players.Furthermore, we included in our agent an alternative initial placement found in theliterature, which is based on the analysis of human behavior in Settlers of Catangames.In our experiments we compare the performance of our methods against each otherand against appropriate benchmarks (e.g., JSettlers agents), and examine the effectthat the number of simulations and the simulation depth has on the algorithms’performance. Our results suggests that VPI scores significantly better than banditbased methods, even if these employ a much higher number of simulations. Inaddition to this, the simulation depth of the algorithm needs to be calculated so themethod will neither get lost in deep simulations of improbable scenarios neither endup shortly without given a proper estimation of the upcoming moves. Finally, ourresults indicate that our agent performance is improved when the alternative, humanbehavior-based, initial placement method.

Available Files