Symbolic bounded real-time dynamic programming

Karina Valdivia Delgado, Cheng Fang, Scott Sanner, Leliane Nunes De Barros

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state space. In order to improve the performance of RTDP, a variant based on symbolic representation was proposed, named sRTDP. Traditional RTDP approaches work best on problems with sparse transition matrices where they can often efficiently achieve ε-convergence without visiting all states; however, on problems with dense transition matrices where most states are reachable in one step, the sRTDP approach shows an advantage over traditional RTDP by up to three orders of magnitude, as we demonstrate in this paper. We also specify a new variant of sRTDP based on BRTDP, named sBRTDP, which converges quickly when compared to RTDP variants, since it does less updating by making a better choice of the next state to be visited.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence, SBIA 2010 - 20th Brazilian Symposium on Artificial Intelligence, Proceedings
Pages193-202
Number of pages10
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event20th Brazilian Symposium on Artificial Intelligence, SBIA 2010 - Sao Bernardo do Campo, Brazil
Duration: 23 Oct 201028 Oct 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6404 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th Brazilian Symposium on Artificial Intelligence, SBIA 2010
Country/TerritoryBrazil
CitySao Bernardo do Campo
Period23/10/1028/10/10

Fingerprint

Dive into the research topics of 'Symbolic bounded real-time dynamic programming'. Together they form a unique fingerprint.

Cite this