TY - JOUR
T1 - GEE-Assisted Variable Selection for Latent Variable Models with Multivariate Binary Data
AU - Hui, Francis K.C.
AU - Müller, Samuel
AU - Welsh, A. H.
N1 - Publisher Copyright:
© 2021 American Statistical Association.
PY - 2023
Y1 - 2023
N2 - Multivariate data are commonly analyzed using one of two approaches: a conditional approach based on generalized linear latent variable models (GLLVMs) or some variation thereof, and a marginal approach based on generalized estimating equations (GEEs). With research on mixed models and GEEs having gone down separate paths, there is a common mindset to treat the two approaches as mutually exclusive, with which to use driven by the question of interest. In this article, focusing on multivariate binary responses, we study the connections between the parameters from conditional and marginal models, with the aim of using GEEs for fast variable selection in GLLVMs. This is accomplished through two main contributions. First, we show that GEEs are zero consistent for GLLVMs fitted to multivariate binary data. That is, if the true model is a GLLVM but we misspecify and fit GEEs, then the latter is able to asymptotically differentiate between truly zero versus nonzero coefficients in the former. Building on this result, we propose GEE-assisted variable selection for GLLVMs using score- and Wald-based information criteria to construct a fast forward selection path followed by pruning. We demonstrate GEE-assisted variable selection is selection consistent for the underlying GLLVM, with simulation studies demonstrating its strong finite sample performance and computational efficiency.
AB - Multivariate data are commonly analyzed using one of two approaches: a conditional approach based on generalized linear latent variable models (GLLVMs) or some variation thereof, and a marginal approach based on generalized estimating equations (GEEs). With research on mixed models and GEEs having gone down separate paths, there is a common mindset to treat the two approaches as mutually exclusive, with which to use driven by the question of interest. In this article, focusing on multivariate binary responses, we study the connections between the parameters from conditional and marginal models, with the aim of using GEEs for fast variable selection in GLLVMs. This is accomplished through two main contributions. First, we show that GEEs are zero consistent for GLLVMs fitted to multivariate binary data. That is, if the true model is a GLLVM but we misspecify and fit GEEs, then the latter is able to asymptotically differentiate between truly zero versus nonzero coefficients in the former. Building on this result, we propose GEE-assisted variable selection for GLLVMs using score- and Wald-based information criteria to construct a fast forward selection path followed by pruning. We demonstrate GEE-assisted variable selection is selection consistent for the underlying GLLVM, with simulation studies demonstrating its strong finite sample performance and computational efficiency.
KW - Consistency
KW - Factor analysis
KW - Generalized estimating equations
KW - Information criterion
KW - Model selection
UR - http://www.scopus.com/inward/record.url?scp=85120745624&partnerID=8YFLogxK
U2 - 10.1080/01621459.2021.1987251
DO - 10.1080/01621459.2021.1987251
M3 - Article
SN - 0162-1459
VL - 118
SP - 1252
EP - 1263
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 542
ER -