TY - GEN
T1 - Constructing states for reinforcement learning
AU - Mahmud, M. M.Hassan
PY - 2010
Y1 - 2010
N2 - POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by modeling the hidden environment with a novel class of models that are less expressive, but easier to learn and plan with than POMDPs. We call these models deterministic Markov models (DMMs), which are deterministic-probabilistic finite automata from learning theory, extended with actions to the sequential (rather than i.i.d.) setting. Conceptually, we extend the Utile Suffix Memory method of McCal-lum to handle long term memory. We describe DMMs, give Bayesian algorithms for learning and planning with them and also present experimental results for some standard POMDP tasks and tasks to illustrate its efficacy.
AB - POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by modeling the hidden environment with a novel class of models that are less expressive, but easier to learn and plan with than POMDPs. We call these models deterministic Markov models (DMMs), which are deterministic-probabilistic finite automata from learning theory, extended with actions to the sequential (rather than i.i.d.) setting. Conceptually, we extend the Utile Suffix Memory method of McCal-lum to handle long term memory. We describe DMMs, give Bayesian algorithms for learning and planning with them and also present experimental results for some standard POMDP tasks and tasks to illustrate its efficacy.
UR - http://www.scopus.com/inward/record.url?scp=77956529192&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781605589077
T3 - ICML 2010 - Proceedings, 27th International Conference on Machine Learning
SP - 727
EP - 734
BT - ICML 2010 - Proceedings, 27th International Conference on Machine Learning
T2 - 27th International Conference on Machine Learning, ICML 2010
Y2 - 21 June 2010 through 25 June 2010
ER -