Pearson's goodness-of-fit tests for sparse distributions
Pearson's chi-squared test is widely used to test the goodness of fit between categorical data and a given discrete distribution function. When the number of sets of the categorical data, say k, is a fixed integer, Pearson's chi-squared test statistic converges in distribution to a chi-squared distribution with k-1 degrees of freedom when the sample size n goes to infinity. In real applications, the number k often changes with n and may be even much larger than n. By using the martingale techniques, we prove that Pearson's chi-squared test statistic converges to the normal under quite general conditions. We also propose a new test statistic which is more powerful than chi-squared test statistic based on our simulation study. A real application to lottery data is provided to illustrate our methodology.
READ FULL TEXT