Mixture Envelope Model for Heterogeneous Genomics Data Analysis

05/04/2018
by   Bochao Jia, et al.
0

Envelope model also known as multivariate regression model was proposed to solve the multiple response regression problems. It measures the linear association between predictors and multiple responses by using the minimal reducing subspace of the covariance matrix that accommodates the mean function. However, in many real applications, data may consist many unknown confounding factors or they just come from different resources. Thus, there might be some heterogeneous dependency across the whole population and divide them into different groups. For example, there exists several subtypes across the population with breast cancer with different gene interaction mechanisms for each subtype group. In this setting, constructing a single model using all observations ignores the difference between groups while estimating multiple models for each group is infeasible due to the unknown group classification. To deal with this problem, we proposed a mixture envelope model which construct a groupwise model for heterogeneous data and simultaneously classify them into different groups by an Imputation-Conditional Consistency (ICC) algorithm. Simulation results shows that our proposed method outperforms on both classification and prediction than some existing methods. Finally, we apply our proposed method into breast cancer analysis to identify patients with inflammatory breast cancer subtype and evaluate the associations between micro-RNAs and message RNAs gene expression.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset