TY - JOUR
T1 - Biased bootstrap methods for reducing the effects of contamination
AU - Hall, Peter
AU - Presnell, Brett
PY - 1999
Y1 - 1999
N2 - Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback-Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε ∈ (0, 1/2) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression.
AB - Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback-Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε ∈ (0, 1/2) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression.
KW - Biased bootstrap
KW - Empirical likelihood
KW - Influence
KW - Inlier
KW - Local linear smoothing
KW - Multivariate analysis
KW - Nonparametric curve estimation
KW - Outlier
KW - Regression
KW - Robust statistical methods
KW - Trimming
KW - Weighted bootstrap
UR - http://www.scopus.com/inward/record.url?scp=0000238125&partnerID=8YFLogxK
U2 - 10.1111/1467-9868.00199
DO - 10.1111/1467-9868.00199
M3 - Article
SN - 1369-7412
VL - 61
SP - 661
EP - 680
JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology
JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology
IS - 3
ER -