Reward Potentials for Planning with Learned Neural Network Transition Models

Buser Say*, Scott Sanner, Sylvie Thiébaux

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    2 Citations (Scopus)

    Abstract

    Optimal planning with respect to learned neural network (NN) models in continuous action and state spaces using mixed-integer linear programming (MILP) is a challenging task for branch-and-bound solvers due to the poor linear relaxation of the underlying MILP model. For a given set of features, potential heuristics provide an efficient framework for computing bounds on cost (reward) functions. In this paper, we model the problem of finding optimal potential bounds for learned NN models as a bilevel program, and solve it using a novel finite-time constraint generation algorithm. We then strengthen the linear relaxation of the underlying MILP model by introducing constraints to bound the reward function based on the precomputed reward potentials. Experimentally, we show that our algorithm efficiently computes reward potentials for learned NN models, and that the overhead of computing reward potentials is justified by the overall strengthening of the underlying MILP model for the task of planning over long horizons.

    Original languageEnglish
    Title of host publicationPrinciples and Practice of Constraint Programming - 25th International Conference, CP 2019, Proceedings
    EditorsThomas Schiex, Simon de Givry
    PublisherSpringer
    Pages674-689
    Number of pages16
    ISBN (Print)9783030300470
    DOIs
    Publication statusPublished - 2019
    Event25th International Conference on Principles and Practice of Constraint Programming, CP 2019 - Stamford , United States
    Duration: 30 Sept 20194 Oct 2019

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11802 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference25th International Conference on Principles and Practice of Constraint Programming, CP 2019
    Country/TerritoryUnited States
    CityStamford
    Period30/09/194/10/19

    Fingerprint

    Dive into the research topics of 'Reward Potentials for Planning with Learned Neural Network Transition Models'. Together they form a unique fingerprint.

    Cite this