FF+FPG: Guiding a Policy-Gradient planner

Olivier Buffet*, Douglas Aberdeen

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference Paperpeer-review

    16 Citations (Scopus)

    Abstract

    The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG's weakness is potentially long learning times, as it initially acts randomly and progressively improves its policy each time the goal is reached. This paper shows how to use an external teacher to guide FPG's exploration. While any teacher can be used, we concentrate on the actions suggested by FF's heuristic (Hoffmann 2001), as FF-replan has proved efficient for probabilistic re-planning. To achieve this, FPG must learn its own policy while following another. We thus extend FPG to off-policy learning using importance sampling (Glynn & Iglehart 1989; Peshkin & Shelton 2002). The resulting algorithm is presented and evaluated on IPC benchmarks.

    Original languageEnglish
    Title of host publicationICAPS 2007, 17th International Conference on Automated Planning and Scheduling
    PublisherAssociation for the Advancement of Artificial Intelligence, AAAI
    Pages42-48
    Number of pages7
    ISBN (Print)9781577353447
    Publication statusPublished - 2007
    EventICAPS 2007, 17th International Conference on Automated Planning and Scheduling - Providence, RI, United States
    Duration: 22 Sept 200726 Sept 2007

    Publication series

    NameICAPS 2007, 17th International Conference on Automated Planning and Scheduling

    Conference

    ConferenceICAPS 2007, 17th International Conference on Automated Planning and Scheduling
    Country/TerritoryUnited States
    CityProvidence, RI
    Period22/09/0726/09/07

    Fingerprint

    Dive into the research topics of 'FF+FPG: Guiding a Policy-Gradient planner'. Together they form a unique fingerprint.

    Cite this