TY - JOUR

T1 - On the optimality of the aggregate with exponential weights for low temperatures

AU - Lecué, Guillaume

AU - Mendelson, Shahar

PY - 2013/5

Y1 - 2013/5

N2 - Given a finite class of functions F, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75-87) is the aggregate with exponential weights (AEW), defined by f̃ AEW = σf σFθ̂(f )f, where θ̂ (f ) = exp(-(n/T )Rn(f ))/σgσF exp(-(n/T )Rn(g)), whereT <0 is called the temperature parameter and Rn(·) is an empirical risk. In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when T ≤ c1, where c1 is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of 1/√n. Third, we prove that if a geometric condition on the dictionary (the so-called "Bernstein condition") is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of "almost minimizers" rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.

AB - Given a finite class of functions F, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75-87) is the aggregate with exponential weights (AEW), defined by f̃ AEW = σf σFθ̂(f )f, where θ̂ (f ) = exp(-(n/T )Rn(f ))/σgσF exp(-(n/T )Rn(g)), whereT <0 is called the temperature parameter and Rn(·) is an empirical risk. In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when T ≤ c1, where c1 is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of 1/√n. Third, we prove that if a geometric condition on the dictionary (the so-called "Bernstein condition") is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of "almost minimizers" rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.

KW - Aggregation

KW - Empirical process

KW - Gaussian approximation

KW - Gibbs estimators

UR - http://www.scopus.com/inward/record.url?scp=84884125671&partnerID=8YFLogxK

U2 - 10.3150/11-BEJ408

DO - 10.3150/11-BEJ408

M3 - Article

SN - 1350-7265

VL - 19

SP - 646

EP - 675

JO - Bernoulli

JF - Bernoulli

IS - 2

ER -