TY - JOUR
T1 - Effect of dependence on stochastic measures of accuracy of density estimators
AU - Claeskens, Gerda
AU - Hall, Peter
PY - 2002/4
Y1 - 2002/4
N2 - In kernel density estimation, those data values that make a nondegenerate contribution to the estimator (computed at a given point) tend to be spaced well apart. This property has the effect of suppressing many of the conventional consequences of long-range dependence, for example, slower rates of convergence, which might otherwise be revealed by a traditional lossor risk-based assessment of performance. From that viewpoint, dependence has to be very long-range indeed before a density estimator experiences any first-order effects. However, an analysis in terms of the convergence rate for a particular realization, rather than the rate averaged over all realizations, reveals a very different picture. We show that from that viewpoint, and in the context of functions of Gaussian processes, effects on rates of convergence can become apparent as soon as the boundary between short- and long-range dependence is crossed. For example, the distance between ISE- and MISE-optimal bandwidths is generally of larger order for long-range dependent data. We shed new light on cross-validation, too. In particular we show that the variance of the cross-validation bandwidth is generally larger for long-range dependent data, and that the first-order properties of this bandwidth do not depend on how many data are left out when constructing the cross-validation criterion. Moreover, for long-range dependent data the cross-validation bandwidth is usually perfectly negatively correlated, in the limit, with the optimal stochastic bandwidth.
AB - In kernel density estimation, those data values that make a nondegenerate contribution to the estimator (computed at a given point) tend to be spaced well apart. This property has the effect of suppressing many of the conventional consequences of long-range dependence, for example, slower rates of convergence, which might otherwise be revealed by a traditional lossor risk-based assessment of performance. From that viewpoint, dependence has to be very long-range indeed before a density estimator experiences any first-order effects. However, an analysis in terms of the convergence rate for a particular realization, rather than the rate averaged over all realizations, reveals a very different picture. We show that from that viewpoint, and in the context of functions of Gaussian processes, effects on rates of convergence can become apparent as soon as the boundary between short- and long-range dependence is crossed. For example, the distance between ISE- and MISE-optimal bandwidths is generally of larger order for long-range dependent data. We shed new light on cross-validation, too. In particular we show that the variance of the cross-validation bandwidth is generally larger for long-range dependent data, and that the first-order properties of this bandwidth do not depend on how many data are left out when constructing the cross-validation criterion. Moreover, for long-range dependent data the cross-validation bandwidth is usually perfectly negatively correlated, in the limit, with the optimal stochastic bandwidth.
KW - Bandwidth
KW - Cross-validation
KW - Gaussian process
KW - Integrated squared error
KW - Kernel methods
KW - Long-range dependence
KW - Nonparametric density estimator
KW - Risk-based analysis
UR - http://www.scopus.com/inward/record.url?scp=0036281479&partnerID=8YFLogxK
U2 - 10.1214/aos/1021379860
DO - 10.1214/aos/1021379860
M3 - Article
SN - 0090-5364
VL - 30
SP - 431
EP - 454
JO - Annals of Statistics
JF - Annals of Statistics
IS - 2
ER -