Abstract:
Large corporations, government entities and institutions such as hospitals and census bureaus routinely collect our personal and sensitive information for providing various services. A key technological challenge is designing algorithms for these services that provide useful results, while simultaneously maintaining the privacy of the individuals whosedataarebeingshared.Differentialprivacy(DP)isacryptographicallymotivated and mathematically rigorous approach for addressing this challenge. Under DP, a randomizedalgorithmprovidesprivacyguaranteesbyapproximatingthedesiredfunctionality, leading to a privacy–utility trade-off. Strong (pure DP) privacy guarantees areof- tencostlyintermsofutility.Motivatedbytheneedforamoreefficientmechanismwith better privacy–utility trade-off, we propose Gaussian FM, an improvement to thefunctionalmechanism(FM)thatoffershigherutilityattheexpenseofaweakened(approximate)DPguarantee.WeshowanalyticallyandempiricallythattheproposedGaussian FMalgorithmcanofferordersofmagnitudesmallernoisethantheexistingFMalgorithms.Forafeaturevectorofsize101,GaussianFMyieldsonly 1/103ofthenoisestandard deviation compared to the existing FM. Furthermore, we show how Gaussian FM can exploit a correlated noise generation protocol, CAPE, in decentralized-data settings to achieve the same noise variance as its centralized counterparts, and pro- pose capeFM. As opposed to conventional decentralized differential privacyschemes, capeFMcanofferthesamelevelofutilityasthatofthecentralized-datasettingswith- out compromising privacy for a range of parameter choices. We empirically show that forprivacybudgetϵassmallas10-1withprobabilityatleast(1–10-5),ourproposed Gaussian FM and capeFMcan achieve utility close to the non-private algorithms and outperform existing state-of-the-art approaches on synthetic and realdatasets.