TY - GEN
T1 - Classical planning algorithms on the atari video games
AU - Lipovetzky, Nir
AU - Ramirez, Miquel
AU - Geffner, Hector
N1 - Publisher Copyright:
Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2015
Y1 - 2015
N2 - The Atari 2600 games supported in the Arcade Learning Environment (Bellemare et al. 2013) all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used for selecting actions for two reasons: first, no compact PDDL-model of the games is given, and more importantly, the action effects and goals are not known a priori. Moreover, in these games there is usually no set of goals to be achieved but rewards to be collected. These features do not preclude the use of classical algorithms like breadth-first search or Dijkstra's algorithm, but these methods are not effective over large state spaces. We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra's algorithm they are "blind" and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces. The simplest such algorithm, called Iterated Width or IW, consists of a sequence of calls IW(1), IW(2),..., IW(fc) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms. The empirical results over 54 games suggest that the performance of IW with the k parameter fixed to 1, i.e., IW(1), is at the level of the state of the art represented by UCT. A sim-ple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.
AB - The Atari 2600 games supported in the Arcade Learning Environment (Bellemare et al. 2013) all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used for selecting actions for two reasons: first, no compact PDDL-model of the games is given, and more importantly, the action effects and goals are not known a priori. Moreover, in these games there is usually no set of goals to be achieved but rewards to be collected. These features do not preclude the use of classical algorithms like breadth-first search or Dijkstra's algorithm, but these methods are not effective over large state spaces. We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra's algorithm they are "blind" and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces. The simplest such algorithm, called Iterated Width or IW, consists of a sequence of calls IW(1), IW(2),..., IW(fc) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms. The empirical results over 54 games suggest that the performance of IW with the k parameter fixed to 1, i.e., IW(1), is at the level of the state of the art represented by UCT. A sim-ple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.
UR - http://www.scopus.com/inward/record.url?scp=84964618007&partnerID=8YFLogxK
M3 - Conference contribution
T3 - AAAI Workshop - Technical Report
SP - 21
EP - 28
BT - Learning for General Competency in Video Games - Papers Presented at the 29th AAAI Conference on Artificial Intelligence, Technical Report
PB - AI Access Foundation
T2 - 29th AAAI Conference on Artificial Intelligence, AAAI 2015
Y2 - 25 January 2015 through 30 January 2015
ER -