At EuroPython 2009 I met a guy from Bet Fair who told me about the UCT algorithm. I decided to implement the UCT algorithm for the sailing problem and compare my results to normal Monte Carlo rollout planning. The files are here, the main document that I wrote is mdp.pdf.
The original paper on UCT is Levente Kocsis and Csaba Szepesvári: Bandit based Monte-Carlo Planning. Improvements to UCT are given by Gelly, Silver: Combining Online and Offline Knowledge in UCT. Sparse sampling is another approach to large MDPs: Kearns, Mansour, Ng: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. For a gentle introduction to reinforcement learning see Greenwald’s notes.