Markov Decision Processes and the UCT algorithm

At EuroPython 2009 I met a guy from Bet Fair who told me about the UCT algorithm. I decided to implement the UCT algorithm for the sailing problem and compare my results to normal Monte Carlo rollout planning. The files are here, the main document that I wrote is mdp.pdf.

The original paper on UCT is Levente Kocsis and Csaba Szepesvári: Bandit based Monte-Carlo Planning. Improvements to UCT are given by Gelly, Silver: Combining Online and Offline Knowledge in UCT. Sparse sampling is another approach to large MDPs: Kearns, Mansour, Ng: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. For a gentle introduction to reinforcement learning see Greenwald’s notes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s