Fairness and Cost Constrained Privacy-Aware Record Linkage

06/30/2022
by   Nan Wu, et al.
0

Record linkage algorithms match and link records from different databases that refer to the same real-world entity based on direct and/or quasi-identifiers, such as name, address, age, and gender, available in the records. Since these identifiers generally contain personal identifiable information (PII) about the entities, record linkage algorithms need to be developed with privacy constraints. Known as privacy-preserving record linkage (PPRL), many research studies have been conducted to perform the linkage on encoded and/or encrypted identifiers. Differential privacy (DP) combined with computationally efficient encoding methods, e.g. Bloom filter encoding, has been used to develop PPRL with provable privacy guarantees. The standard DP notion does not however address other constraints, among which the most important ones are fairness-bias and cost of linkage in terms of number of record pairs to be compared. In this work, we propose new notions of fairness-constrained DP and fairness and cost-constrained DP for PPRL and develop a framework for PPRL with these new notions of DP combined with Bloom filter encoding. We provide theoretical proofs for the new DP notions for fairness and cost-constrained PPRL and experimentally evaluate them on two datasets containing person-specific data. Our experimental results show that with these new notions of DP, PPRL with better performance (compared to the standard DP notion for PPRL) can be achieved with regard to privacy, cost and fairness constraints.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset