Abstract
A novel approach to adapting Upper Confidence Tree (UCT) to graph structures was investigated. It is sometimes used to search Markov Decision Processes (MDP) that are not trees, so we created an algorithm to perform a similar search without simplifying the structure of MDPs. It combines the sampling of UCT with the cyclic propagation of Deterministic Graph-Based Optimal Planning (GBOP-D) and is called Graph-Based Optimal Planning with Sparse Sampling (GBOP-SS). We tested GBOP-SS, GBOP-D, and Upper Confidence bound for rooted Directed acyclic graphs on three MDPs. Data was collected from 50 playthroughs of each algorithm in each MDP, and the results were mixed. GBOP-D performed the best at Nim and Vanderbei’s sailing game, and GBOP-SS performed the best at Solitaire. This suggests that there are other domains where a combination of sampling and cyclic propagation would provide a net benefit.