Context tree maximizing reinforcement learning

Phuong Nguyen*, Peter Sunehag, Marcus Hutter

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    7 Citations (Scopus)

    Abstract

    Recent developments in reinforcement learning for non-Markovian problems witness a surge in history-based methods, among which we are particularly interested in two frameworks, ΦMDP and MC-AIXI-CTW. ΦMDP attempts to reduce the general RL problem, where the environment's states and dynamics are both unknown, to an MDP, while MC-AIXI-CTW incrementally learns a mixture of context trees as its environment model. The main idea of ΦMDP is to connect generic reinforcement learning with classical reinforcement learning. The first implementation of ΦMDP relies on a stochastic search procedure for finding a tree that minimizes a certain cost function. This does not guarantee finding the minimizing tree, or even a good one, given limited search time. As a consequence it appears that the approach has difficulties with large domains. MC-AIXI-CTW is attractive in that it can incrementally and analytically compute the internal model through interactions with the environment. Unfortunately, it is computationally demanding due to requiring heavy planning simulations at every single time step. We devise a novel approach called CTMRL, which analytically and efficiently finds the cost-minimizing tree. Instead of the context-tree weighting method that MC-AIXI-CTW is based on, we use the closely related context-tree maximizing algorithm that selects just one single tree. This approach falls under the ΦMDP framework, which allows the replacement of the costly planning component of MC-AIXI-CTW with simple Q-Learning. Our empirical investigation shows that CTMRL finds policies of quality as good as MC-AIXI-CTW's on six domains including a challenging Pacman domain, but in an order of magnitude less time.

    Original languageEnglish
    Title of host publicationAAAI-12 / IAAI-12 - Proceedings of the 26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference
    Pages1075-1082
    Number of pages8
    Publication statusPublished - 2012
    Event26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference, AAAI-12 / IAAI-12 - Toronto, ON, Canada
    Duration: 22 Jul 201226 Jul 2012

    Publication series

    NameProceedings of the National Conference on Artificial Intelligence
    Volume2

    Conference

    Conference26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference, AAAI-12 / IAAI-12
    Country/TerritoryCanada
    CityToronto, ON
    Period22/07/1226/07/12

    Fingerprint

    Dive into the research topics of 'Context tree maximizing reinforcement learning'. Together they form a unique fingerprint.

    Cite this