TY - GEN
T1 - Reward Potentials for Planning with Learned Neural Network Transition Models
AU - Say, Buser
AU - Sanner, Scott
AU - Thiébaux, Sylvie
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Optimal planning with respect to learned neural network (NN) models in continuous action and state spaces using mixed-integer linear programming (MILP) is a challenging task for branch-and-bound solvers due to the poor linear relaxation of the underlying MILP model. For a given set of features, potential heuristics provide an efficient framework for computing bounds on cost (reward) functions. In this paper, we model the problem of finding optimal potential bounds for learned NN models as a bilevel program, and solve it using a novel finite-time constraint generation algorithm. We then strengthen the linear relaxation of the underlying MILP model by introducing constraints to bound the reward function based on the precomputed reward potentials. Experimentally, we show that our algorithm efficiently computes reward potentials for learned NN models, and that the overhead of computing reward potentials is justified by the overall strengthening of the underlying MILP model for the task of planning over long horizons.
AB - Optimal planning with respect to learned neural network (NN) models in continuous action and state spaces using mixed-integer linear programming (MILP) is a challenging task for branch-and-bound solvers due to the poor linear relaxation of the underlying MILP model. For a given set of features, potential heuristics provide an efficient framework for computing bounds on cost (reward) functions. In this paper, we model the problem of finding optimal potential bounds for learned NN models as a bilevel program, and solve it using a novel finite-time constraint generation algorithm. We then strengthen the linear relaxation of the underlying MILP model by introducing constraints to bound the reward function based on the precomputed reward potentials. Experimentally, we show that our algorithm efficiently computes reward potentials for learned NN models, and that the overhead of computing reward potentials is justified by the overall strengthening of the underlying MILP model for the task of planning over long horizons.
KW - Constraint generation
KW - Neural networks
KW - Planning
KW - Potential heuristics
UR - http://www.scopus.com/inward/record.url?scp=85075746345&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-30048-7_39
DO - 10.1007/978-3-030-30048-7_39
M3 - Conference contribution
AN - SCOPUS:85075746345
SN - 9783030300470
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 674
EP - 689
BT - Principles and Practice of Constraint Programming - 25th International Conference, CP 2019, Proceedings
A2 - Schiex, Thomas
A2 - de Givry, Simon
PB - Springer
T2 - 25th International Conference on Principles and Practice of Constraint Programming, CP 2019
Y2 - 30 September 2019 through 4 October 2019
ER -