Extreme state aggregation beyond MDPs

Marcus Hutter*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    12 Citations (Scopus)

    Abstract

    We consider a Reinforcement Learning setup without any (esp. MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.

    Original languageEnglish
    Title of host publicationAlgorithmic Learning Theory - 25th International Conference, ALT 2014, Proceedings
    EditorsPeter Auer, Alexander Clark, Thomas Zeugmann, Sandra Zilles
    PublisherSpringer Verlag
    Pages185-199
    Number of pages15
    ISBN (Electronic)9783319116617
    DOIs
    Publication statusPublished - 2014
    Event25th International Conference on Algorithmic Learning Theory, ALT 2014 - Bled, Slovenia
    Duration: 8 Oct 201410 Oct 2014

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume8776
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference25th International Conference on Algorithmic Learning Theory, ALT 2014
    Country/TerritorySlovenia
    CityBled
    Period8/10/1410/10/14

    Fingerprint

    Dive into the research topics of 'Extreme state aggregation beyond MDPs'. Together they form a unique fingerprint.

    Cite this