TY - JOUR
T1 - Convergence and loss bounds for Bayesian sequence prediction
AU - Hutter, Marcus
PY - 2003/8
Y1 - 2003/8
N2 - The probability of observing xt at time t, given past observations x1 ⋯ xt-1 can be computed if the true generating distribution μ of the sequences x1x2x3 ⋯ is known. If μ is unknown, but known to belong to a class M one can base one's prediction on the Bayes mix ξ defined as a weighted sum of distributions ν ε M. Various convergence results of the mixture posterior ξt to the true posterior μt, are presented. In particular, a new (elementary) derivation of the convergence ξt / μt → 1 is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action yt based on x1 ⋯ xt-1 and receives loss ℓxtyt if xt is the next symbol of the sequence. No assumptions are made on the structure of ℓ (apart from being bounded) and M. The Bayes-optimal prediction scheme Λξ based on mixture ξ and the Bayes-optimal informed prediction scheme Λμ are defined and the total loss Lξ of Λξ is bounded in terms of the total loss Lμ of Λμ. It is shown that Lξ is bounded for bounded Lμ and Lξ/Lμ → 1 for Lμ → ∞. Convergence of the instantaneous losses is also proven.
AB - The probability of observing xt at time t, given past observations x1 ⋯ xt-1 can be computed if the true generating distribution μ of the sequences x1x2x3 ⋯ is known. If μ is unknown, but known to belong to a class M one can base one's prediction on the Bayes mix ξ defined as a weighted sum of distributions ν ε M. Various convergence results of the mixture posterior ξt to the true posterior μt, are presented. In particular, a new (elementary) derivation of the convergence ξt / μt → 1 is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action yt based on x1 ⋯ xt-1 and receives loss ℓxtyt if xt is the next symbol of the sequence. No assumptions are made on the structure of ℓ (apart from being bounded) and M. The Bayes-optimal prediction scheme Λξ based on mixture ξ and the Bayes-optimal informed prediction scheme Λμ are defined and the total loss Lξ of Λξ is bounded in terms of the total loss Lμ of Λμ. It is shown that Lξ is bounded for bounded Lμ and Lξ/Lμ → 1 for Lμ → ∞. Convergence of the instantaneous losses is also proven.
KW - Bayesian sequence prediction
KW - Convergence
KW - General loss function and bounds
KW - Mixture distributions
UR - http://www.scopus.com/inward/record.url?scp=0043268070&partnerID=8YFLogxK
U2 - 10.1109/TIT.2003.814488
DO - 10.1109/TIT.2003.814488
M3 - Letter
SN - 0018-9448
VL - 49
SP - 2061
EP - 2067
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 8
ER -