Reinforcement learning via AIXI approximation

Joel Veness*, Kee Siong Ng, Marcus Hutter, David Silver

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    11 Citations (Scopus)

    Abstract

    This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.

    Original languageEnglish
    Title of host publicationAAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference
    PublisherAI Access Foundation
    Pages605-611
    Number of pages7
    ISBN (Print)9781577354642
    Publication statusPublished - 2010
    Event24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10 - Atlanta, GA, United States
    Duration: 11 Jul 201015 Jul 2010

    Publication series

    NameProceedings of the National Conference on Artificial Intelligence
    Volume1

    Conference

    Conference24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10
    Country/TerritoryUnited States
    CityAtlanta, GA
    Period11/07/1015/07/10

    Fingerprint

    Dive into the research topics of 'Reinforcement learning via AIXI approximation'. Together they form a unique fingerprint.

    Cite this