An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings

05/13/2021

∙

Applications of machine learning (ML) to high-stakes policy settings – such as education, criminal justice, healthcare, and social service delivery – have grown rapidly in recent years, sparking important conversations about how to ensure fair outcomes from these systems. The machine learning research community has responded to this challenge with a wide array of proposed fairness-enhancing strategies for ML models, but despite the large number of methods that have been developed, little empirical work exists evaluating these methods in real-world settings. Here, we seek to fill this research gap by investigating the performance of several methods that operate at different points in the ML pipeline across four real-world public policy and social good problems. Across these problems, we find a wide degree of variability and inconsistency in the ability of many of these methods to improve model fairness, but post-processing by choosing group-specific score thresholds consistently removes disparities, with important implications for both the ML research community and practitioners deploying machine learning to inform consequential policy decisions.

READ FULL TEXT

An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings

Sign in with Google

Consider DeepAI Pro