TY - JOUR
T1 - Order-Preserving Nonparametric Regression, with Applications to Conditional Distribution and Quantile Function Estimation
AU - Hall, Peter
AU - Müller, Hans Georg
PY - 2003/9
Y1 - 2003/9
N2 - In some regression problems we observe a "response" Y ti to level t of a "treatment" applied to an individual with level Xi of a given characteristic, where it has been established that response is monotone increasing in the level of the treatment. A related problem arises when estimating conditional distributions, where the raw data are typically independent and identically distributed pairs (X i, Zi), and Yti denotes the proportion of Zi's that do not exceed t. We expect the regression means g t(x) = E(YtiXi = x) to enjoy the same order relation as the responses, that is, gt ≤ gs whenever s ≤ t. This requirement is necessary to obtain bona fide conditional distribution functions, for example. If we estimate gt by passing a linear smoother through each dataset Χt = {(Xi, Y ti) : 1 ≤ i ≤ n}, then the order-preserving property is guaranteed if and only if the smoother has nonnegative weights. However, in such cases the estimators generally have high levels of boundary bias. On the other hand, the order-preserving property usually fails for linear estimators with low boundary bias, such as local linear estimators, or kernel estimators employing boundary kernels. This failure is generally most serious at boundaries of the distribution of the explanatory variables, and ironically it is often in just those places that estimation is of greatest interest, because responses there imply constraints on the larger population. In this article we suggest nonlinear, order-invariant estimators for nonparametric regression, and discuss their properties. The resulting estimators are applied to the estimation of conditional distribution functions at endpoints and also changepoints. The availability of bona fide distribution function estimators at endpoints also enables the computation of changepoint diagnostics that are based on differences in a suitable norm between two estimated conditional distribution functions, obtained from data that fall into one-sided bins.
AB - In some regression problems we observe a "response" Y ti to level t of a "treatment" applied to an individual with level Xi of a given characteristic, where it has been established that response is monotone increasing in the level of the treatment. A related problem arises when estimating conditional distributions, where the raw data are typically independent and identically distributed pairs (X i, Zi), and Yti denotes the proportion of Zi's that do not exceed t. We expect the regression means g t(x) = E(YtiXi = x) to enjoy the same order relation as the responses, that is, gt ≤ gs whenever s ≤ t. This requirement is necessary to obtain bona fide conditional distribution functions, for example. If we estimate gt by passing a linear smoother through each dataset Χt = {(Xi, Y ti) : 1 ≤ i ≤ n}, then the order-preserving property is guaranteed if and only if the smoother has nonnegative weights. However, in such cases the estimators generally have high levels of boundary bias. On the other hand, the order-preserving property usually fails for linear estimators with low boundary bias, such as local linear estimators, or kernel estimators employing boundary kernels. This failure is generally most serious at boundaries of the distribution of the explanatory variables, and ironically it is often in just those places that estimation is of greatest interest, because responses there imply constraints on the larger population. In this article we suggest nonlinear, order-invariant estimators for nonparametric regression, and discuss their properties. The resulting estimators are applied to the estimation of conditional distribution functions at endpoints and also changepoints. The availability of bona fide distribution function estimators at endpoints also enables the computation of changepoint diagnostics that are based on differences in a suitable norm between two estimated conditional distribution functions, obtained from data that fall into one-sided bins.
KW - Bias reduction
KW - Boundary effect
KW - Changepoint
KW - Endpoint
KW - Linear methods
KW - Local linear estimator
KW - Monotonicity
KW - Nadaraya-Watson estimator
KW - Prediction
UR - http://www.scopus.com/inward/record.url?scp=10744219922&partnerID=8YFLogxK
U2 - 10.1198/016214503000000512
DO - 10.1198/016214503000000512
M3 - Article
SN - 0162-1459
VL - 98
SP - 598
EP - 608
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 463
ER -