Nonlinear optimization and symbolic dynamic programming for parameterized hybrid Markov decision processes

Sharain Kinathil, Harold Soh, Scott Sanner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

It is often critical in real-world applications to: (i) perform inverse learning of the cost parameters of a multi-objective reward based on observed agent behavior, (ii) perform sensitivity analyses of policies to various parameter settings; and (iii) analyze and optimize policy performance as a function of policy parameters. When such problems have mixed discrete and continuous state and/or action spaces, this leads to parameterized hybrid MDPs (PHMDPs) that are often approx-imately solved via discretization, sampling, and/or local gradient methods (when optimization is involved). In this paper we combine two recent advances that allow for the first exact solution and optimization of PHMDPs. We first show how each of the aforementioned use cases can be formalized as PHMDPs, which can then be solved via an extension of symbolic dynamic programming (SDP) even when the solution is piecewise nonlinear. Secondly, we leverage recent advances in non-convex solvers such as dReal and dOp (that offer <5- optimality guarantees for nonlinear problems given a symbolic function) for non-convex global optimization in (i), (ii), and (iii) using SDP to derive symbolic solutions to each PH- MDP formalization. We demonstrate the efficacy and scalability of our framework by calculating the first known exact solutions to complex nonlinear examples of each of the aforementioned use cases.

Original languageEnglish
Title of host publicationWS-17-01
Subtitle of host publicationArtificial Intelligence and Operations Research for Social Good; WS-17-02: Artificial Intelligence, Ethics, and Society; WS-17-03: Artificial Intelligence for Connected and Automated Vehicles; WS-17-04: Artificial Intelligence for Cyber Security; WS-17-05: Artificial Intelligence for Smart Grids and Buildings; WS-17-06: Computer Poker and Imperfect Information Games; WS-17-07: Crowdsourcing, Deep Learning and Artificial Intelligence Agents; WS-17-08: Distributed Machine Learning; WS-17-09: Joint Workshop on Health Intelligence; WS-17-10: Human-Aware Artificial Intelligence; WS-17-11: Human-Machine Collaborative Learning; WS-17-12: Knowledge-Based Techniques for Problem Solving and Reasoning; WS-17-13: Plan, Activity, and Intent Recognition; WS-17-14: Symbolic Inference and Optimization; WS-17-15: What's Next for AI in Games?
PublisherAI Access Foundation
Pages917-922
Number of pages6
ISBN (Electronic)9781577357865
Publication statusPublished - 2017
Externally publishedYes
Event31st AAAI Conference on Artificial Intelligence, AAAI 2017 - San Francisco, United States
Duration: 4 Feb 201710 Feb 2017

Publication series

NameAAAI Workshop - Technical Report
VolumeWS-17-01 - WS-17-15

Conference

Conference31st AAAI Conference on Artificial Intelligence, AAAI 2017
Country/TerritoryUnited States
CitySan Francisco
Period4/02/1710/02/17

Fingerprint

Dive into the research topics of 'Nonlinear optimization and symbolic dynamic programming for parameterized hybrid Markov decision processes'. Together they form a unique fingerprint.

Cite this