TY - GEN
T1 - Symbolic bounded real-time dynamic programming
AU - Delgado, Karina Valdivia
AU - Fang, Cheng
AU - Sanner, Scott
AU - De Barros, Leliane Nunes
PY - 2010
Y1 - 2010
N2 - Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state space. In order to improve the performance of RTDP, a variant based on symbolic representation was proposed, named sRTDP. Traditional RTDP approaches work best on problems with sparse transition matrices where they can often efficiently achieve ε-convergence without visiting all states; however, on problems with dense transition matrices where most states are reachable in one step, the sRTDP approach shows an advantage over traditional RTDP by up to three orders of magnitude, as we demonstrate in this paper. We also specify a new variant of sRTDP based on BRTDP, named sBRTDP, which converges quickly when compared to RTDP variants, since it does less updating by making a better choice of the next state to be visited.
AB - Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state space. In order to improve the performance of RTDP, a variant based on symbolic representation was proposed, named sRTDP. Traditional RTDP approaches work best on problems with sparse transition matrices where they can often efficiently achieve ε-convergence without visiting all states; however, on problems with dense transition matrices where most states are reachable in one step, the sRTDP approach shows an advantage over traditional RTDP by up to three orders of magnitude, as we demonstrate in this paper. We also specify a new variant of sRTDP based on BRTDP, named sBRTDP, which converges quickly when compared to RTDP variants, since it does less updating by making a better choice of the next state to be visited.
UR - http://www.scopus.com/inward/record.url?scp=78649954435&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-16138-4_20
DO - 10.1007/978-3-642-16138-4_20
M3 - Conference contribution
SN - 3642161375
SN - 9783642161377
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 193
EP - 202
BT - Advances in Artificial Intelligence, SBIA 2010 - 20th Brazilian Symposium on Artificial Intelligence, Proceedings
T2 - 20th Brazilian Symposium on Artificial Intelligence, SBIA 2010
Y2 - 23 October 2010 through 28 October 2010
ER -