TY - GEN
T1 - Compress and control
AU - Veness, Joel
AU - Bellemare, Marc G.
AU - Hutter, Marcus
AU - Chua, Alvin
AU - Desjardins, Guillaume
N1 - Publisher Copyright:
Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2015/6/1
Y1 - 2015/6/1
N2 - This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.
AB - This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.
UR - http://www.scopus.com/inward/record.url?scp=84960105649&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84960105649
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 3016
EP - 3023
BT - Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
PB - AI Access Foundation
T2 - 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
Y2 - 25 January 2015 through 30 January 2015
ER -