Saddlepoint approximations in binary genome-wide association studies

10/08/2021
by   Pål Vegard Johnsen, et al.
0

We investigate saddlepoint approximations applied to the score test statistic in genome-wide association studies with binary phenotypes. The inaccuracy in the normal approximation of the score test statistic increases with increasing sample imbalance and with decreasing minor allele count. Applying saddlepoint approximations to the score test statistic distribution greatly improve the accuracy, even far out in the tail of the distribution. By using exact results for an intercept model and binary covariate model, as well as simulations for models with nuisance parameters, we emphasize the need for continuity corrections in order to achieve valid p-values. The performance of the saddlepoint approximations is evaluated by overall and conditional type I error rate on simulated data. We investigate the methods further by using data from UK Biobank with skin and soft tissue infections as phenotype, using both common and rare variants. The analysis confirms that continuity correction is important particularly for rare variants, and that the normal approximation gives a highly inflated type I error rate for case imbalance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset