TY - JOUR
T1 - On the optimality of the aggregate with exponential weights for low temperatures
AU - Lecué, Guillaume
AU - Mendelson, Shahar
PY - 2013/5
Y1 - 2013/5
N2 - Given a finite class of functions F, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75-87) is the aggregate with exponential weights (AEW), defined by f̃ AEW = σf σFθ̂(f )f, where θ̂ (f ) = exp(-(n/T )Rn(f ))/σgσF exp(-(n/T )Rn(g)), whereT <0 is called the temperature parameter and Rn(·) is an empirical risk. In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when T ≤ c1, where c1 is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of 1/√n. Third, we prove that if a geometric condition on the dictionary (the so-called "Bernstein condition") is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of "almost minimizers" rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.
AB - Given a finite class of functions F, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75-87) is the aggregate with exponential weights (AEW), defined by f̃ AEW = σf σFθ̂(f )f, where θ̂ (f ) = exp(-(n/T )Rn(f ))/σgσF exp(-(n/T )Rn(g)), whereT <0 is called the temperature parameter and Rn(·) is an empirical risk. In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when T ≤ c1, where c1 is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of 1/√n. Third, we prove that if a geometric condition on the dictionary (the so-called "Bernstein condition") is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of "almost minimizers" rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.
KW - Aggregation
KW - Empirical process
KW - Gaussian approximation
KW - Gibbs estimators
UR - http://www.scopus.com/inward/record.url?scp=84884125671&partnerID=8YFLogxK
U2 - 10.3150/11-BEJ408
DO - 10.3150/11-BEJ408
M3 - Article
SN - 1350-7265
VL - 19
SP - 646
EP - 675
JO - Bernoulli
JF - Bernoulli
IS - 2
ER -