Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

02/24/2021
by   Romain Couillet, et al.
0

The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix X∈ℂ^p× n (or ℝ^p× n) and its corresponding kernel (Gram) matrix K through Bernoulli masks: S∈{0,1}^p× n for X and B∈{0,1}^n× n for K. The resulting "two-way punctured" kernel is thus given by K=1/p[(X ⊙ S)^ H (X ⊙ S)] ⊙ B. We demonstrate that, for X composed of independent columns drawn from a Gaussian mixture model, as n,p→∞ with p/n→ c_0∈(0,∞), the spectral behavior of K – its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors – is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering of PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset