Correcting Sociodemographic Selection Biases for Accurate Population Prediction from Social Media

11/10/2019
by   Salvatore Giorgi, et al.
0

Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population — a "selection bias". Across five tasks for predicting US county population health statistics from Twitter, we explore standard restratification techniques — bias mitigation approaches that reweight people-specific variables according to how under-sampled their socio-demographic groups are. We found standard restratification provided no improvement and often degraded population prediction accuracy. The core reason for this seemed to be both shrunken and sparse estimates of each population's socio-demographics for which we thus develop and evaluate three methods to address: predictive redistribution to account for shrinking, as well as adaptive binning and informed smoothing to handle sparse socio-demographic estimates. We show each of our methods can significantly improve over the standard restratification approaches. Combining approaches, we find substantial improvements over non-restratified models as well, yielding a 35.4 life satisfaction, and an 10.0

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset