On Thompson sampling and asymptotic optimality

Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Citations (Scopus)

    Abstract

    We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

    Original languageEnglish
    Title of host publication26th International Joint Conference on Artificial Intelligence, IJCAI 2017
    EditorsCarles Sierra
    PublisherInternational Joint Conferences on Artificial Intelligence
    Pages4889-4893
    Number of pages5
    ISBN (Electronic)9780999241103
    DOIs
    Publication statusPublished - 2017
    Event26th International Joint Conference on Artificial Intelligence, IJCAI 2017 - Melbourne, Australia
    Duration: 19 Aug 201725 Aug 2017

    Publication series

    NameIJCAI International Joint Conference on Artificial Intelligence
    Volume0
    ISSN (Print)1045-0823

    Conference

    Conference26th International Joint Conference on Artificial Intelligence, IJCAI 2017
    Country/TerritoryAustralia
    CityMelbourne
    Period19/08/1725/08/17

    Fingerprint

    Dive into the research topics of 'On Thompson sampling and asymptotic optimality'. Together they form a unique fingerprint.

    Cite this