Extreme eigenvalues of sample covariance matrices under generalized elliptical models with applications
We consider the extreme eigenvalues of the sample covariance matrix Q=YY^* under the generalized elliptical model that Y=Σ^1/2XD. Here Σ is a bounded p × p positive definite deterministic matrix representing the population covariance structure, X is a p × n random matrix containing either independent columns sampled from the unit sphere in ℝ^p or i.i.d. centered entries with variance n^-1, and D is a diagonal random matrix containing i.i.d. entries and independent of X. Such a model finds important applications in statistics and machine learning. In this paper, assuming that p and n are comparably large, we prove that the extreme edge eigenvalues of Q can have several types of distributions depending on Σ and D asymptotically. These distributions include: Gumbel, Fréchet, Weibull, Tracy-Widom, Gaussian and their mixtures. On the one hand, when the random variables in D have unbounded support, the edge eigenvalues of Q can have either Gumbel or Fréchet distribution depending on the tail decay property of D. On the other hand, when the random variables in D have bounded support, under some mild regularity assumptions on Σ, the edge eigenvalues of Q can exhibit Weibull, Tracy-Widom, Gaussian or their mixtures. Based on our theoretical results, we consider two important applications. First, we propose some statistics and procedure to detect and estimate the possible spikes for elliptically distributed data. Second, in the context of a factor model, by using the multiplier bootstrap procedure via selecting the weights in D, we propose a new algorithm to infer and estimate the number of factors in the factor model. Numerical simulations also confirm the accuracy and powerfulness of our proposed methods and illustrate better performance compared to some existing methods in the literature.
READ FULL TEXT