BISG: When inferring race or ethnicity, does it matter that people often live near their relatives?
Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual's geolocation and surname. BISG assumes that in the United States population, surname and geolocation are independent given a particular race or ethnicity. This assumption appears to contradict conventional wisdom including that people often live near their relatives (with the same surname and race). We demonstrate that this independence assumption results in systematic biases for minority subpopulations and we introduce a simple alternative to BISG. Our raking-based prediction algorithm offers a significant improvement over BISG and we validate our algorithm on states' voter registration lists that contain self-identified race/ethnicity. The proposed improvement and the inaccuracies of BISG generalize to applications in election law, health care, finance, tech, law enforcement and many other fields.
READ FULL TEXT