TY - GEN
T1 - Conditions on features for temporal difference-like methods to converge
AU - Hutter, Marcus
AU - Yang-Zhao, Samuel
AU - Majeed, Sultan Javed
N1 - Publisher Copyright:
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2019
Y1 - 2019
N2 - The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.
AB - The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.
UR - http://www.scopus.com/inward/record.url?scp=85074919262&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2019/357
DO - 10.24963/ijcai.2019/357
M3 - Conference contribution
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 2570
EP - 2577
BT - Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
A2 - Kraus, Sarit
PB - International Joint Conferences on Artificial Intelligence
T2 - 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Y2 - 10 August 2019 through 16 August 2019
ER -