TY - JOUR
T1 - Cross-validation and the estimation of conditional probability densities
AU - Hall, Peter
AU - Racine, Jeff
AU - Li, Qi
PY - 2004/12
Y1 - 2004/12
N2 - Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer's salary. occupation, age, sex, marital status, and address, a company might wish to estimate the density of the expenditure. Y, that could he made by that person, basing the inference on observations of (X,Y) for previous clients. Choosing appropriate smoothing parameters for this problem can he tricky, not in the least because plug-in rules take a particularly complex form in the case of mixed data. An obvious difliculty is that there exists no general formula for the optimal smoothing parameters. More insidiously, and more seriously, it can be difticult to determine which components of X are relevant to the problem of conditional inference. For example, if the jth component of X is independent of Y, then that component is irrelevant to estimating the density of Y given X, and ideally should he dropped before conducting inference. In this article we show that cross-validation overcomes these difficulties. It automatically determines which components are relevant and which are not, through assigning large smoothing parameters to the latter and consequently shrinking them toward the uniform distribution on the respective marginals. This effectively removes irrelevant components from contention, by suppressing their contribution to estimator variance; they already have very small bias, a consequence of their independence of Y. Cross-validation also yields important information about which components are relevant: the relevant components are precisely those that cross-validation has chosen to smooth in a traditional way, by assigning them smoothing parameters of conventional size. Indeed, cross-validation produces asymptotically optimal smoothing for relevant components, while eliminating irrelevant components by oversmoothing. In the problem of nonparamctric estimation of a conditional density, cross-validation comes into its own as a method with no obvious peers.
AB - Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer's salary. occupation, age, sex, marital status, and address, a company might wish to estimate the density of the expenditure. Y, that could he made by that person, basing the inference on observations of (X,Y) for previous clients. Choosing appropriate smoothing parameters for this problem can he tricky, not in the least because plug-in rules take a particularly complex form in the case of mixed data. An obvious difliculty is that there exists no general formula for the optimal smoothing parameters. More insidiously, and more seriously, it can be difticult to determine which components of X are relevant to the problem of conditional inference. For example, if the jth component of X is independent of Y, then that component is irrelevant to estimating the density of Y given X, and ideally should he dropped before conducting inference. In this article we show that cross-validation overcomes these difficulties. It automatically determines which components are relevant and which are not, through assigning large smoothing parameters to the latter and consequently shrinking them toward the uniform distribution on the respective marginals. This effectively removes irrelevant components from contention, by suppressing their contribution to estimator variance; they already have very small bias, a consequence of their independence of Y. Cross-validation also yields important information about which components are relevant: the relevant components are precisely those that cross-validation has chosen to smooth in a traditional way, by assigning them smoothing parameters of conventional size. Indeed, cross-validation produces asymptotically optimal smoothing for relevant components, while eliminating irrelevant components by oversmoothing. In the problem of nonparamctric estimation of a conditional density, cross-validation comes into its own as a method with no obvious peers.
KW - Bandwidth choice
KW - Binary data
KW - Categorical data
KW - Continuous data
KW - Dimension reduction
KW - Discrete data
KW - Kernel methods
KW - Mixed data: Nonparametric density estimation
KW - Relevant and irrelevant data
KW - Smoothing parameter choice
UR - http://www.scopus.com/inward/record.url?scp=10144254861&partnerID=8YFLogxK
U2 - 10.1198/016214504000000548
DO - 10.1198/016214504000000548
M3 - Article
SN - 0162-1459
VL - 99
SP - 1015
EP - 1026
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 468
ER -