General discounting versus average reward

Marcus Hutter*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Citations (Scopus)

Abstract

Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m → ∞ and V for k → ∞ are equal, provided both limits exist. Further, if the effective horizon grows linearly with k or faster, then the existence of the limit of U implies that the limit of V exists. Conversely, if the effective horizon grows linearly with k or slower, then existence of the limit of V implies that the limit of U exists.

Original languageEnglish
Title of host publicationAlgorithmic Learning Theory - 17th International Conference, ALT 2006, Proceedings
PublisherSpringer Verlag
Pages244-258
Number of pages15
ISBN (Print)3540466495, 9783540466499
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event17th International Conference on Algorithmic Learning Theory, ALT 2006 - Barcelona, Spain
Duration: 7 Oct 200610 Oct 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4264 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Algorithmic Learning Theory, ALT 2006
Country/TerritorySpain
CityBarcelona
Period7/10/0610/10/06

Cite this