TY - JOUR
T1 - A W-test collapsing method for rare-variant association testing in exome sequencing data
AU - Sun, Rui
AU - Weng, Haoyi
AU - Hu, Inchi
AU - Guo, Junfeng
AU - Wu, William K.K.
AU - Zee, Benny Chung Ying
AU - Wang, Maggie Haitian
N1 - Publisher Copyright:
© 2016 WILEY PERIODICALS, INC.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - Advancement in sequencing technology enables the study of association between complex disorder phenotypes and single-nucleotide polymorphisms with rare mutations. However, the rare genetic variant has extremely small variance and impairs testing power of traditional statistical methods. We introduce a W-test collapsing method to evaluate rare-variant association by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. The method is model-free and inherits chi-squared distribution with degrees of freedom estimated from bootstrapped samples of the data, and allows for fast and accurate P-value calculation without the need of permutations. The proposed method is compared with the Weighted-Sum Statistic and Sequence Kernel Association Test on simulation datasets, and showed good performances and significantly faster computing speed. In the application of real next-generation sequencing dataset of hypertensive disorder, it identified genes of interesting biological functions associated to metabolism disorder and inflammation, including the MACROD1, NLRP7, AGK, PAK6, and APBB1. The proposed method offers an efficient and effective way for testing rare genetic variants in whole exome sequencing datasets.
AB - Advancement in sequencing technology enables the study of association between complex disorder phenotypes and single-nucleotide polymorphisms with rare mutations. However, the rare genetic variant has extremely small variance and impairs testing power of traditional statistical methods. We introduce a W-test collapsing method to evaluate rare-variant association by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. The method is model-free and inherits chi-squared distribution with degrees of freedom estimated from bootstrapped samples of the data, and allows for fast and accurate P-value calculation without the need of permutations. The proposed method is compared with the Weighted-Sum Statistic and Sequence Kernel Association Test on simulation datasets, and showed good performances and significantly faster computing speed. In the application of real next-generation sequencing dataset of hypertensive disorder, it identified genes of interesting biological functions associated to metabolism disorder and inflammation, including the MACROD1, NLRP7, AGK, PAK6, and APBB1. The proposed method offers an efficient and effective way for testing rare genetic variants in whole exome sequencing datasets.
KW - exome sequencing
KW - genetic association study
KW - rare-variant testing
UR - http://www.scopus.com/inward/record.url?scp=84991661906&partnerID=8YFLogxK
U2 - 10.1002/gepi.22000
DO - 10.1002/gepi.22000
M3 - Article
SN - 0741-0395
VL - 40
SP - 591
EP - 596
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 7
ER -