Bayesian Inference under Cluster Sampling with Probability Proportional to Size
Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains. We use Stan for computing and apply the proposal to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference in health surveys.
READ FULL TEXT