Compress and control

Joel Veness, Marc G. Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    15 Citations (Scopus)

    Abstract

    This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.

    Original languageEnglish
    Title of host publicationProceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
    PublisherAI Access Foundation
    Pages3016-3023
    Number of pages8
    ISBN (Electronic)9781577357025
    Publication statusPublished - 1 Jun 2015
    Event29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015 - Austin, United States
    Duration: 25 Jan 201530 Jan 2015

    Publication series

    NameProceedings of the National Conference on Artificial Intelligence
    Volume4

    Conference

    Conference29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
    Country/TerritoryUnited States
    CityAustin
    Period25/01/1530/01/15

    Fingerprint

    Dive into the research topics of 'Compress and control'. Together they form a unique fingerprint.

    Cite this