Principal Structure Identification: Fast Disentanglement of Multi-source Dataset
Analysis of multi-source data, where data on the same objects are collected from multiple sources, is of rising importance in many fields, e.g., multi-omics biology. Major challenges in multi-source data analysis include heterogeneity among different data sources and the entanglement of their association structure among several groups of variables. Our goal is to disentangle the association structure by identifying shared score subspaces among all or some of data blocks. We propose a sequential algorithm that gathers score subspaces of different data blocks within certain angle threshold and identifies partially-shared score components, using the concept of principal angles between subspaces of different dimensions. Our method shows better performance in identifying the linear association structure than competing methods in this field. In real data analysis, we apply our method to an oncological multi-omics dataset associated with drug responses. The proposed method boasts super-fast computational speed and results in revealing the scores in the estimated shared component showing strong correlations with well-known biological pathways.
READ FULL TEXT