Reinforcement learning with value advice

Mayank Daswani, Peter Sunehag, Marcus Hutter

    Research output: Contribution to journalConference articlepeer-review

    2 Citations (Scopus)

    Abstract

    The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.

    Original languageEnglish
    Pages (from-to)299-314
    Number of pages16
    JournalJournal of Machine Learning Research
    Volume39
    Issue number2014
    Publication statusPublished - 2014
    Event6th Asian Conference on Machine Learning, ACML 2014 - Nha Trang, Viet Nam
    Duration: 26 Nov 201428 Nov 2014

    Fingerprint

    Dive into the research topics of 'Reinforcement learning with value advice'. Together they form a unique fingerprint.

    Cite this