TY - GEN
T1 - Symbolic dynamic programming for continuous state and action MDPs
AU - Zamani, Zahra
AU - Sanner, Scott
AU - Fang, Cheng
PY - 2012
Y1 - 2012
N2 - Many real-world decision-theoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optimal closed-form value function and policy for CSA-MDPs with multivariate continuous state and actions, discrete noise, piecewise linear dynamics, and piecewise linear (or restricted piecewise quadratic) reward. Our key contribution over previous SDP work is to show how the continuous action maximization step in the dynamic programming backup can be evaluated optimally and symbolically - a task which amounts to symbolic constrained optimization subject to unknown state parameters; we further integrate this technique to work with an efficient and compact data structure for SDP - the extended algebraic decision diagram (XADD). We demonstrate empirical results on a didactic nonlinear planning example and two domains from operations research to show the first automated exact solution to these problems.
AB - Many real-world decision-theoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optimal closed-form value function and policy for CSA-MDPs with multivariate continuous state and actions, discrete noise, piecewise linear dynamics, and piecewise linear (or restricted piecewise quadratic) reward. Our key contribution over previous SDP work is to show how the continuous action maximization step in the dynamic programming backup can be evaluated optimally and symbolically - a task which amounts to symbolic constrained optimization subject to unknown state parameters; we further integrate this technique to work with an efficient and compact data structure for SDP - the extended algebraic decision diagram (XADD). We demonstrate empirical results on a didactic nonlinear planning example and two domains from operations research to show the first automated exact solution to these problems.
UR - http://www.scopus.com/inward/record.url?scp=84868280144&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781577355687
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 1839
EP - 1845
BT - AAAI-12 / IAAI-12 - Proceedings of the 26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference
T2 - 26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference, AAAI-12 / IAAI-12
Y2 - 22 July 2012 through 26 July 2012
ER -