Bayesian reinforcement learning with exploration

Tor Lattimore*, Marcus Hutter

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    6 Citations (Scopus)

    Abstract

    We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

    Original languageEnglish
    Title of host publicationAlgorithmic Learning Theory - 25th International Conference, ALT 2014, Proceedings
    EditorsPeter Auer, Alexander Clark, Thomas Zeugmann, Sandra Zilles
    PublisherSpringer Verlag
    Pages170-184
    Number of pages15
    ISBN (Electronic)9783319116617
    DOIs
    Publication statusPublished - 2014
    Event25th International Conference on Algorithmic Learning Theory, ALT 2014 - Bled, Slovenia
    Duration: 8 Oct 201410 Oct 2014

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume8776
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference25th International Conference on Algorithmic Learning Theory, ALT 2014
    Country/TerritorySlovenia
    CityBled
    Period8/10/1410/10/14

    Fingerprint

    Dive into the research topics of 'Bayesian reinforcement learning with exploration'. Together they form a unique fingerprint.

    Cite this