TY - JOUR
T1 - Efficient solutions to factored MDPs with imprecise transition probabilities
AU - Delgado, Karina Valdivia
AU - Sanner, Scott
AU - De Barros, Leliane Nunes
PY - 2011/6
Y1 - 2011/6
N2 - When modeling real-world decision-theoretic planning problems in the Markov Decision Process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the transition specification due to elicitation of MDP transition models from an expert or estimation from data, or non-stationary transition distributions arising from insufficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, the Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) has been introduced to model such scenarios. Unfortunately, while various solution algorithms exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose efficient dynamic programming methods to exploit its structure. Noting that the key computational bottleneck in the solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional "flat" dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error of any approximation algorithm evaluated.
AB - When modeling real-world decision-theoretic planning problems in the Markov Decision Process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the transition specification due to elicitation of MDP transition models from an expert or estimation from data, or non-stationary transition distributions arising from insufficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, the Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) has been introduced to model such scenarios. Unfortunately, while various solution algorithms exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose efficient dynamic programming methods to exploit its structure. Noting that the key computational bottleneck in the solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional "flat" dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error of any approximation algorithm evaluated.
KW - Markov Decision Process
KW - Probabilistic planning
KW - Robust planning
UR - http://www.scopus.com/inward/record.url?scp=79953864311&partnerID=8YFLogxK
U2 - 10.1016/j.artint.2011.01.001
DO - 10.1016/j.artint.2011.01.001
M3 - Article
SN - 0004-3702
VL - 175
SP - 1498
EP - 1527
JO - Artificial Intelligence
JF - Artificial Intelligence
IS - 9-10
ER -