Universal reinforcement learning algorithms: Survey and experiments

John Aslanides, Jan Leikez, Marcus Hutter

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    10 Citations (Scopus)

    Abstract

    Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an opensource reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

    Original languageEnglish
    Title of host publication26th International Joint Conference on Artificial Intelligence, IJCAI 2017
    EditorsCarles Sierra
    PublisherInternational Joint Conferences on Artificial Intelligence
    Pages1403-1410
    Number of pages8
    ISBN (Electronic)9780999241103
    DOIs
    Publication statusPublished - 2017
    Event26th International Joint Conference on Artificial Intelligence, IJCAI 2017 - Melbourne, Australia
    Duration: 19 Aug 201725 Aug 2017

    Publication series

    NameIJCAI International Joint Conference on Artificial Intelligence
    Volume0
    ISSN (Print)1045-0823

    Conference

    Conference26th International Joint Conference on Artificial Intelligence, IJCAI 2017
    Country/TerritoryAustralia
    CityMelbourne
    Period19/08/1725/08/17

    Fingerprint

    Dive into the research topics of 'Universal reinforcement learning algorithms: Survey and experiments'. Together they form a unique fingerprint.

    Cite this