Hierarchically Clustered PCA and CCA via a Convex Clustering Penalty

11/29/2022
by   Amanda M. Buch, et al.
0

We introduce an unsupervised learning approach that combines the truncated singular value decomposition with convex clustering to estimate within-cluster directions of maximum variance/covariance (in the variables) while simultaneously hierarchically clustering (on observations). In contrast to previous work on joint clustering and embedding, our approach has a straightforward formulation, is readily scalable via distributed optimization, and admits a direct interpretation as hierarchically clustered principal component analysis (PCA) or hierarchically clustered canonical correlation analysis (CCA). Through numerical experiments and real-world examples relevant to precision medicine, we show that our approach outperforms traditional and contemporary clustering methods on underdetermined problems (p ≫ N with tens of observations) and scales to large datasets (e.g., N=100,000; p=1,000) while yielding interpretable dendrograms of hierarchical per-cluster principal components or canonical variates.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset