Asymptotic Normality of Gini Correlation in High Dimension with Applications to the K-sample Problem
The categorical Gini correlation proposed by Dang et al. is a dependence measure between a categorical and a numerical variables, which can characterize independence of the two variables. The asymptotic distributions of the sample correlation under the dependence and independence have been established when the dimension of the numerical variable is fixed. However, its asymptotic distribution for high dimensional data has not been explored. In this paper, we develop the central limit theorem for the Gini correlation for the more realistic setting where the dimensionality of the numerical variable is diverging. We then construct a powerful and consistent test for the K-sample problem based on the asymptotic normality. The proposed test not only avoids computation burden but also gains power over the permutation procedure. Simulation studies and real data illustrations show that the proposed test is more competitive to existing methods across a broad range of realistic situations, especially in unbalanced cases.
READ FULL TEXT