Reference-Based POMDPs

Edward Kim, Yohan Karunanayake, Hanna Kurniawati

Research output: Contribution to conferencePaperpeer-review

Abstract

Making good decisions in partially observable and non-deterministic scenarios is a crucial capability for robots. A Partially Observable Markov Decision Process (POMDP) is a general framework for the above problem. Despite advances in POMDP solving, problems with long planning horizons and evolving environments remain difficult to solve even by the best approximate solvers today. To alleviate this difficulty, we propose a slightly modified POMDP problem, called a Reference-Based POMDP, where the objective is to balance between maximizing the expected total reward and being close to a given reference (stochastic) policy. The optimal policy of a Reference-Based POMDP can be computed via iterative expectations using the given reference policy, thereby avoiding exhaustive enumeration of actions at each belief node of the search tree. We demonstrate theoretically that the standard POMDP under stochastic policies is related to the Reference-Based POMDP. To demonstrate the feasibility of exploiting the formulation, we present a basic algorithm RefSolver. Results from experiments on long-horizon navigation problems indicate that this basic algorithm substantially outperforms POMCP.
Original languageEnglish
Number of pages17
Publication statusPublished - 12 Dec 2023
Event37th Conference on Neural Information Processing Systems - Ernest N. Morial Convention Center, New Orleans, United States
Duration: 10 Dec 202316 Dec 2023
Conference number: 37
https://neurips.cc/Conferences/2023

Conference

Conference37th Conference on Neural Information Processing Systems
Abbreviated titleNeurIPS
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23
Internet address

Fingerprint

Dive into the research topics of 'Reference-Based POMDPs'. Together they form a unique fingerprint.

Cite this