Simultaneous estimation of normal means with side information

08/16/2019
by   Sihai Dave Zhao, et al.
0

The integrative analysis of multiple datasets is an important strategy in data analysis. It is increasingly popular in genomics, which enjoys a wealth of publicly available datasets that can be compared, contrasted, and combined in order to extract novel scientific insights. This paper studies a stylized example of data integration for a classical statistical problem: leveraging side information to estimate a vector of normal means. This task is formulated as a compound decision problem, an oracle integrative decision rule is derived, and a data-driven estimate of this rule based on minimizing an unbiased estimate of its risk is proposed. The data-driven rule is shown to asymptotically achieve the minimum possible risk among all separable decision rules, and it can outperform existing methods in numerical properties. The proposed procedure leads naturally to an integrative high-dimensional classification procedure, which is illustrated by combining data from two independent gene expression profiling studies.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset