TY - GEN
T1 - Universal reinforcement learning algorithms
T2 - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
AU - Aslanides, John
AU - Leikez, Jan
AU - Hutter, Marcus
PY - 2017
Y1 - 2017
N2 - Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an opensource reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.
AB - Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an opensource reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.
UR - http://www.scopus.com/inward/record.url?scp=85031903578&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2017/194
DO - 10.24963/ijcai.2017/194
M3 - Conference contribution
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1403
EP - 1410
BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
A2 - Sierra, Carles
PB - International Joint Conferences on Artificial Intelligence
Y2 - 19 August 2017 through 25 August 2017
ER -