Statistical Inference on Explained Variation in High-dimensional Linear Model with Dense Effects
Statistical inference on the explained variation of an outcome by a set of covariates is of particular interest in practice. When the covariates are of moderate to high-dimension and the effects are not sparse, several approaches have been proposed for estimation and inference. One major problem with the existing approaches is that the inference procedures are not robust to the normality assumption on the covariates and the residual errors. In this paper, we propose an estimating equation approach to the estimation and inference on the explained variation in the high-dimensional linear model. Unlike the existing approaches, the proposed approach does not rely on the restrictive normality assumptions for inference. It is shown that the proposed estimator is consistent and asymptotically normally distributed under reasonable conditions. Simulation studies demonstrate better performance of the proposed inference procedure in comparison with the existing approaches. The proposed approach is applied to studying the variation of glycohemoglobin explained by environmental pollutants in a National Health and Nutrition Examination Survey data set.
READ FULL TEXT