Statistical inference with F-statistics when fitting simple models to high-dimensional data

02/12/2019
by   Hannes Leeb, et al.
0

We study linear subset regression in the context of the high-dimensional overall model y = ϑ+θ' z + ϵ with univariate response y and a d-vector of random regressors z, independent of ϵ. Here, "high-dimensional" means that the number d of available explanatory variables is much larger than the number n of observations. We consider simple linear sub-models where y is regressed on a set of p regressors given by x = M'z, for some d × p matrix M of full rank p < n. The corresponding simple model, i.e., y=α+β' x + e, can be justified by imposing appropriate restrictions on the unknown parameter θ in the overall model; otherwise, this simple model can be grossly misspecified. In this paper, we establish asymptotic validity of the standard F-test on the surrogate parameter β, in an appropriate sense, even when the simple model is misspecified.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset