The State of Profanity Obfuscation in Natural Language Processing
Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While maintaining publications' professional appearance, obfuscating profanities makes it challenging to evaluate the content, especially for non-native speakers. Surveying 150 ACL papers, we discovered that obfuscation is usually employed for English but not other languages, and even so quite uneven. We discuss the problems with obfuscation and suggest a multilingual community resource called PrOf that has a Python module to standardize profanity obfuscation processes. We believe PrOf can help scientific publication policies to make hate speech work accessible and comparable, irrespective of language.
READ FULL TEXT