Test and Measure for Partial Mean Dependence Based on Deep Neural Networks
It is of great importance to investigate the significance of a subset of covariates W for the response Y given covariates Z in regression modeling. To this end, we propose a new significance test for the partial mean independence problem based on deep neural networks and data splitting. The test statistic converges to the standard chi-squared distribution under the null hypothesis while it converges to a normal distribution under the alternative hypothesis. We also suggest a powerful ensemble algorithm based on multiple data splitting to enhance the testing power. If the null hypothesis is rejected, we propose a new partial Generalized Measure of Correlation (pGMC) to measure the partial mean dependence of Y given W after controlling for the nonlinear effect of Z, which is an interesting extension of the GMC proposed by Zheng et al. (2012). We present the appealing theoretical properties of the pGMC and establish the asymptotic normality of its estimator with the optimal root-N converge rate. Furthermore, the valid confidence interval for the pGMC is also derived. As an important special case when there is no conditional covariates Z, we also consider a new test of overall significance of covariates for the response in a model-free setting. We also introduce new estimator of GMC and derive its asymptotic normality. Numerical studies and real data analysis are also conducted to compare with existing approaches and to illustrate the validity and flexibility of our proposed procedures.
READ FULL TEXT