TY - GEN
T1 - Classical planning with simulators
T2 - 24th International Joint Conference on Artificial Intelligence, IJCAI 2015
AU - Lipovetzky, Nir
AU - Ramirez, Miquel
AU - Geffner, Hector
PY - 2015
Y1 - 2015
N2 - The Atari 2600 games supported in the Arcade Learning Environment [Bellemare et al., 2013] all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used off-the-shelf as there is no compact PDDL-model of the games, and action effects and goals are not known a priori. Indeed, there are no explicit goals, and the planner must select actions on-line while interacting with a simulator that returns successor states and rewards. None of this precludes the use of blind lookahead algorithms for action selection like breadth-first search or Dijkstra's yet such methods are not effective over large state spaces. We thus turn to a different class of planning methods introduced recently that have been shown to be effective for solving large planning problems but which do not require prior knowledge of state transitions, costs (rewards) or goals. The empirical results over 54 Atari games show that the simplest such algorithm performs at the level of UCT, the state-of-the-art planning method in this domain, and suggest the potential of width-based methods for planning with simulators when factored, compact action models are not available.
AB - The Atari 2600 games supported in the Arcade Learning Environment [Bellemare et al., 2013] all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used off-the-shelf as there is no compact PDDL-model of the games, and action effects and goals are not known a priori. Indeed, there are no explicit goals, and the planner must select actions on-line while interacting with a simulator that returns successor states and rewards. None of this precludes the use of blind lookahead algorithms for action selection like breadth-first search or Dijkstra's yet such methods are not effective over large state spaces. We thus turn to a different class of planning methods introduced recently that have been shown to be effective for solving large planning problems but which do not require prior knowledge of state transitions, costs (rewards) or goals. The empirical results over 54 Atari games show that the simplest such algorithm performs at the level of UCT, the state-of-the-art planning method in this domain, and suggest the potential of width-based methods for planning with simulators when factored, compact action models are not available.
UR - http://www.scopus.com/inward/record.url?scp=84949807166&partnerID=8YFLogxK
M3 - Conference contribution
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1610
EP - 1616
BT - IJCAI 2015 - Proceedings of the 24th International Joint Conference on Artificial Intelligence
A2 - Wooldridge, Michael
A2 - Yang, Qiang
PB - International Joint Conferences on Artificial Intelligence
Y2 - 25 July 2015 through 31 July 2015
ER -