Best Subset Selection in Reduced Rank Regression
Reduced rank regression is popularly used for modeling the relationship and uncovering the structure between multivariate responses and multivariate predictors in genetics. It is especially challenging when predictors are high-dimensional, in which case subset selection is considered to reduce model complexity and enhance model interpretability. We propose a novel selection scheme to directly identify the best subset of predictors via a primal dual formulation. Based on it, we develop a computational efficient algorithm that can be scalable to high-dimensional data with guaranteed convergence. We show that the estimator from the proposed algorithm enjoys nice sampling properties including consistency in estimation, rank and sparsity selection under wild regularity conditions. Further in the practical stage, the new estimator achieves competitive numerical performance under a variety of simulation settings and at the same time allows significantly fast computation. The effectiveness of the proposed method is also demonstrated on an ovarian cancer genetic dataset.
READ FULL TEXT