Symbolic dynamic programming for continuous state and observation POMDPs

Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    9 Citations (Scopus)

    Abstract

    Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partially-observable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces.

    Original languageEnglish
    Title of host publicationAdvances in Neural Information Processing Systems 25
    Subtitle of host publication26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
    Pages1394-1402
    Number of pages9
    Publication statusPublished - 2012
    Event26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 - Lake Tahoe, NV, United States
    Duration: 3 Dec 20126 Dec 2012

    Publication series

    NameAdvances in Neural Information Processing Systems
    Volume2
    ISSN (Print)1049-5258

    Conference

    Conference26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
    Country/TerritoryUnited States
    CityLake Tahoe, NV
    Period3/12/126/12/12

    Fingerprint

    Dive into the research topics of 'Symbolic dynamic programming for continuous state and observation POMDPs'. Together they form a unique fingerprint.

    Cite this