An Empirical Bayes Approach for Constructing the Confidence Intervals of Clonality and Entropy
This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the well-known PCR process of successive heating and cooling, producing a large number of reads. They number roughly 30,000 to 300,000. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. In the end, the observation consists of a set of numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements present in the blood. Statistical question is to make inferences on a summary parameter of the probability vector based on a single multinomial-type observation of a large dimension. Popular summary of the diversity of a cell population includes clonality and entropy, or more generally, is a suitable function of the probability vector. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. After obtaining a point estimator of a particular function, the remaining challenge is to construct a confidence interval of the parameter to appropriately reflect its uncertainty. In this paper, we have proposed to couple the empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method has been illustrated via extensive numerical study and real data examples.
READ FULL TEXT