Protecting Global Properties of Datasets with Distribution Privacy Mechanisms
Alongside the rapid development of data collection and analysis techniques in recent years, there is increasingly an emphasis on the need to address information leakage associated with such usage of data. To this end, much work in the privacy literature is devoted to the protection of individual users and contributors of data. However, some situations instead require a different notion of data confidentiality involving global properties aggregated over the records of a dataset. Such notions of information protection are particularly applicable for business and organization data, where global properties may reflect trade secrets, or demographic data, which can be harmful if mishandled. Recent work on property inference attacks furthermore shows how data analysis algorithms can be susceptible to leaking these global properties of data, highlighting the importance of developing mechanisms that can protect such information. In this work, we demonstrate how a distribution privacy framework can be applied to formalize the problem of protecting global properties of datasets. Given this framework, we investigate several mechanisms and their tradeoffs for providing this notion of data confidentiality. We analyze the theoretical protection guarantees offered by these mechanisms under various data assumptions, then implement and empirically evaluate these mechanisms for several data analysis tasks. The results of our experiments show that our mechanisms can indeed reduce the effectiveness of practical property inference attacks while providing utility substantially greater than a crude group differential privacy baseline. Our work thus provides groundwork for theoretically supported mechanisms for protecting global properties of datasets.
READ FULL TEXT