TY - GEN
T1 - Nonlinear optimization and symbolic dynamic programming for parameterized hybrid Markov decision processes
AU - Kinathil, Sharain
AU - Soh, Harold
AU - Sanner, Scott
N1 - Publisher Copyright:
© 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2017
Y1 - 2017
N2 - It is often critical in real-world applications to: (i) perform inverse learning of the cost parameters of a multi-objective reward based on observed agent behavior, (ii) perform sensitivity analyses of policies to various parameter settings; and (iii) analyze and optimize policy performance as a function of policy parameters. When such problems have mixed discrete and continuous state and/or action spaces, this leads to parameterized hybrid MDPs (PHMDPs) that are often approx-imately solved via discretization, sampling, and/or local gradient methods (when optimization is involved). In this paper we combine two recent advances that allow for the first exact solution and optimization of PHMDPs. We first show how each of the aforementioned use cases can be formalized as PHMDPs, which can then be solved via an extension of symbolic dynamic programming (SDP) even when the solution is piecewise nonlinear. Secondly, we leverage recent advances in non-convex solvers such as dReal and dOp (that offer <5- optimality guarantees for nonlinear problems given a symbolic function) for non-convex global optimization in (i), (ii), and (iii) using SDP to derive symbolic solutions to each PH- MDP formalization. We demonstrate the efficacy and scalability of our framework by calculating the first known exact solutions to complex nonlinear examples of each of the aforementioned use cases.
AB - It is often critical in real-world applications to: (i) perform inverse learning of the cost parameters of a multi-objective reward based on observed agent behavior, (ii) perform sensitivity analyses of policies to various parameter settings; and (iii) analyze and optimize policy performance as a function of policy parameters. When such problems have mixed discrete and continuous state and/or action spaces, this leads to parameterized hybrid MDPs (PHMDPs) that are often approx-imately solved via discretization, sampling, and/or local gradient methods (when optimization is involved). In this paper we combine two recent advances that allow for the first exact solution and optimization of PHMDPs. We first show how each of the aforementioned use cases can be formalized as PHMDPs, which can then be solved via an extension of symbolic dynamic programming (SDP) even when the solution is piecewise nonlinear. Secondly, we leverage recent advances in non-convex solvers such as dReal and dOp (that offer <5- optimality guarantees for nonlinear problems given a symbolic function) for non-convex global optimization in (i), (ii), and (iii) using SDP to derive symbolic solutions to each PH- MDP formalization. We demonstrate the efficacy and scalability of our framework by calculating the first known exact solutions to complex nonlinear examples of each of the aforementioned use cases.
UR - http://www.scopus.com/inward/record.url?scp=85046090009&partnerID=8YFLogxK
M3 - Conference contribution
T3 - AAAI Workshop - Technical Report
SP - 917
EP - 922
BT - WS-17-01
PB - AI Access Foundation
T2 - 31st AAAI Conference on Artificial Intelligence, AAAI 2017
Y2 - 4 February 2017 through 10 February 2017
ER -