Thompson Sampling is Asymptotically Optimal in General Environments

Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
    Original languageEnglish
    Title of host publicationProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence
    EditorsAlexander Ihler and Dominik Janzing
    Place of PublicationCanada
    PublisherAUAI Press
    Pages417-426pp
    EditionPeer reviewed
    ISBN (Print)9781510827806
    Publication statusPublished - 2016
    Event32nd Conference on Uncertainty in Artificial Intelligence 2016 - Jersey City, New Jersey, USA
    Duration: 1 Jan 2016 → …

    Conference

    Conference32nd Conference on Uncertainty in Artificial Intelligence 2016
    Period1/01/16 → …
    OtherJune 25-29 2016

    Fingerprint

    Dive into the research topics of 'Thompson Sampling is Asymptotically Optimal in General Environments'. Together they form a unique fingerprint.

    Cite this