Privacy Policies Across the Ages: Content and Readability of Privacy Policies 1996–2021
It is well-known that most users do not read privacy policies, but almost all users tick the box to agree with them. In this paper, we analyze the 25-year history of privacy policies using methods from transparency research, machine learning, and natural language processing. Specifically, we collect a large-scale longitudinal corpus of privacy policies from 1996 to 2021 and analyze the length and readability of privacy policies as well as their content in terms of the data practices they describe, the rights they grant to users, and the rights they reserve for their organizations. We pay particular attention to changes in response to recent privacy regulations such as the GDPR and CCPA. Our results show that policies are getting longer and harder to read, especially after new regulations take effect, and we find a range of concerning data practices. Our results allow us to speculate why privacy policies are rarely read and propose changes that would make privacy policies serve their readers instead of their writers.
READ FULL TEXT