Scalable Discovery and Continuous Inventory of Personal Data at Rest in Cloud Native Systems
Cloud native systems are processing large amounts of personal data through numerous and possibly multi-paradigmatic data stores (e.g., relational and non-relational databases). From a privacy engineering perspective, a core challenge is to keep track of all exact locations, where personal data is being stored, as required by regulatory frameworks such as the European General Data Protection Regulation. In this paper, we present Teiresias, comprising i) a workflow pattern for scalable discovery of personal data at rest, and ii) a cloud native system architecture and open source prototype implementation of said workflow pattern. To this end, we enable a continuous inventory of personal data featuring transparency and accountability following DevOps/DevPrivOps practices. In particular, we scope version-controlled Infrastructure as Code definitions, cloud-based storages, and how to integrate the process into CI/CD pipelines. Thereafter, we provide iii) a comparative performance evaluation demonstrating both appropriate execution times for real-world settings, and a promising personal data detection accuracy outperforming existing proprietary tools in public clouds.
READ FULL TEXT