Performance Disparities Between Accents in Automatic Speech Recognition
Automatic speech recognition (ASR) services are ubiquitous, transforming speech into text for systems like Amazon's Alexa, Google's Assistant, and Microsoft's Cortana. However, researchers have identified biases in ASR performance between particular English language accents by racial group and by nationality. In this paper, we expand this discussion both qualitatively by relating it to historical precedent and quantitatively through a large-scale audit. Standardization of language and the use of language to maintain global and political power have played an important role in history, which we explain to show the parallels in the ways in which ASR services act on English language speakers today. Then, using a large and global data set of speech from The Speech Accent Archive which includes over 2,700 speakers of English born in 171 different countries, we perform an international audit of some of the most popular English ASR services. We show that performance disparities exist as a function of whether or not a speaker's first language is English and, even when controlling for multiple linguistic covariates, that these disparities have a statistically significant relationship to the political alignment of the speaker's birth country with respect to the United States' geopolitical power.
READ FULL TEXT