TY - GEN
T1 - Bayesian real-time dynamic programming
AU - Sanner, Scott
AU - Goetschalckx, Robby
AU - Driessens, Kurt
AU - Shani, Guy
PY - 2009
Y1 - 2009
N2 - Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efficiency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.
AB - Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efficiency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.
UR - http://www.scopus.com/inward/record.url?scp=77958528355&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781577354260
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1784
EP - 1789
BT - IJCAI-09 - Proceedings of the 21st International Joint Conference on Artificial Intelligence
PB - International Joint Conferences on Artificial Intelligence
T2 - 21st International Joint Conference on Artificial Intelligence, IJCAI 2009
Y2 - 11 July 2009 through 16 July 2009
ER -