Learning to play chess using temporal differences

Jonathan Baxter, Andrew Tridgell, Lex Weaver

    Research output: Contribution to journalArticlepeer-review

    93 Citations (Scopus)

    Abstract

    In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program ’KnightCap’ used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.

    Original languageEnglish
    Pages (from-to)243-263
    Number of pages21
    JournalMachine Learning
    Volume40
    Issue number3
    DOIs
    Publication statusPublished - Sept 2000

    Fingerprint

    Dive into the research topics of 'Learning to play chess using temporal differences'. Together they form a unique fingerprint.

    Cite this