Tackling Racial Bias in Automated Online Hate Detection: Towards Fair and Accurate Classification of Hateful Online Users Using Geometric Deep Learning
Online hate is a growing concern on many social media platforms and other sites. To combat it, technology companies are increasingly identifying and sanctioning `hateful users' rather than simply moderating hateful content. Yet, most research in online hate detection to date has focused on hateful content. This paper examines how fairer and more accurate hateful user detection systems can be developed by incorporating social network information through geometric deep learning. Geometric deep learning dynamically learns information-rich network representations and can generalise to unseen nodes. This is essential for moving beyond manually engineered network features, which lack scalability and produce information-sparse network representations. This paper compares the accuracy of geometric deep learning with other techniques which either exclude network information or incorporate it through manual feature engineering (e.g., node2vec). It also evaluates the fairness of these techniques using the `predictive equality' criteria, comparing the false positive rates on a subset of 136 African-American users with 4836 other users. Geometric deep learning produces the most accurate and fairest classifier, with an AUC score of 90.8% on the entire dataset and a false positive rate of zero among the African-American subset for the best performing model. This highlights the benefits of more effectively incorporating social network features in automated hateful user detection. Such an approach is also easily operationalized for real-world content moderation as it has an efficient and scalable design.
READ FULL TEXT