Principal Fairness: Removing Bias via Projections
Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. We complement several recent papers in this line of research by introducing a general method to reduce bias in the data through random projections in a "fair" subspace. We apply this method to densest subgraph and k-means. For densest subgraph, our approach based on fair projections allows to recover both theoretically and empirically an almost optimal, fair, dense subgraph hidden in the input data. We also show that, under the small set expansion hypothesis, approximating this problem beyond a factor of 2 is NP-hard and we show a polynomial time algorithm with a matching approximation bound. We further apply our method to k-means. In a previous paper, Chierichetti et al. [NIPS 2017] showed that problems such as k-means can be approximated up to a constant factor while ensuring that none of two protected class (e.g., gender, ethnicity) is disparately impacted. We show that fair projections generalize the concept of fairlet introduced by Chierichietti et al. to any number of protected attributes and improve empirically the quality of the resulting clustering. We also present the first constant-factor approximation for an arbitrary number of protected attributes thus settling an open problem recently addressed in several works.
READ FULL TEXT