Learning Sparsity and Block Diagonal Structure in Multi-View Mixture Models
Scientific studies increasingly collect multiple modalities of data to investigate a phenomenon from several perspectives. In integrative data analysis it is important to understand how information is heterogeneously spread across these different data sources. To this end, we consider a parametric clustering model for the subjects in a multi-view data set (i.e. multiple sources of data from the same set of subjects) where each view marginally follows a mixture model. In the case of two views, the dependence between them is captured by a cluster membership matrix parameter and we aim to learn the structure of this matrix (e.g. the zero pattern). First, we develop a penalized likelihood approach to estimate the sparsity pattern of the cluster membership matrix. For the specific case of block diagonal structures, we develop a constrained likelihood formulation where this matrix is constrained to be block diagonal up to permutations of the rows and columns. To enforce block diagonal constraints we propose a novel optimization approach based on the symmetric graph Laplacian. We demonstrate the performance of these methods through both simulations and applications to data sets from cancer genetics and neuroscience. Both methods naturally extend to multiple views.
READ FULL TEXT