Constructing states for reinforcement learning

M. M.Hassan Mahmud

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    16 Citations (Scopus)

    Abstract

    POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by modeling the hidden environment with a novel class of models that are less expressive, but easier to learn and plan with than POMDPs. We call these models deterministic Markov models (DMMs), which are deterministic-probabilistic finite automata from learning theory, extended with actions to the sequential (rather than i.i.d.) setting. Conceptually, we extend the Utile Suffix Memory method of McCal-lum to handle long term memory. We describe DMMs, give Bayesian algorithms for learning and planning with them and also present experimental results for some standard POMDP tasks and tasks to illustrate its efficacy.

    Original languageEnglish
    Title of host publicationICML 2010 - Proceedings, 27th International Conference on Machine Learning
    Pages727-734
    Number of pages8
    Publication statusPublished - 2010
    Event27th International Conference on Machine Learning, ICML 2010 - Haifa, Israel
    Duration: 21 Jun 201025 Jun 2010

    Publication series

    NameICML 2010 - Proceedings, 27th International Conference on Machine Learning

    Conference

    Conference27th International Conference on Machine Learning, ICML 2010
    Country/TerritoryIsrael
    CityHaifa
    Period21/06/1025/06/10

    Fingerprint

    Dive into the research topics of 'Constructing states for reinforcement learning'. Together they form a unique fingerprint.

    Cite this