TY - JOUR

T1 - Biased bootstrap methods for reducing the effects of contamination

AU - Hall, Peter

AU - Presnell, Brett

PY - 1999

Y1 - 1999

N2 - Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback-Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε ∈ (0, 1/2) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression.

AB - Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback-Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε ∈ (0, 1/2) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression.

KW - Biased bootstrap

KW - Empirical likelihood

KW - Influence

KW - Inlier

KW - Local linear smoothing

KW - Multivariate analysis

KW - Nonparametric curve estimation

KW - Outlier

KW - Regression

KW - Robust statistical methods

KW - Trimming

KW - Weighted bootstrap

UR - http://www.scopus.com/inward/record.url?scp=0000238125&partnerID=8YFLogxK

U2 - 10.1111/1467-9868.00199

DO - 10.1111/1467-9868.00199

M3 - Article

SN - 1369-7412

VL - 61

SP - 661

EP - 680

JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology

JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology

IS - 3

ER -