Conditions on features for temporal difference-like methods to converge

Marcus Hutter, Samuel Yang-Zhao, Sultan Javed Majeed

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Citation (Scopus)

    Abstract

    The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.

    Original languageEnglish
    Title of host publicationProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
    EditorsSarit Kraus
    PublisherInternational Joint Conferences on Artificial Intelligence
    Pages2570-2577
    Number of pages8
    ISBN (Electronic)9780999241141
    DOIs
    Publication statusPublished - 2019
    Event28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China
    Duration: 10 Aug 201916 Aug 2019

    Publication series

    NameIJCAI International Joint Conference on Artificial Intelligence
    Volume2019-August
    ISSN (Print)1045-0823

    Conference

    Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
    Country/TerritoryChina
    CityMacao
    Period10/08/1916/08/19

    Fingerprint

    Dive into the research topics of 'Conditions on features for temporal difference-like methods to converge'. Together they form a unique fingerprint.

    Cite this