DPCrowd: Privacy-preserving and Communication-efficient Decentralized Statistical Estimation for Real-time Crowd-sourced Data


Abstract in English

In Internet of Things (IoT) driven smart-world systems, real-time crowd-sourced databases from multiple distributed servers can be aggregated to extract dynamic statistics from a larger population, thus providing more reliable knowledge for our society. Particularly, multiple distributed servers in a decentralized network can realize real-time collaborative statistical estimation by disseminating statistics from their separate databases. Despite no raw data sharing, the real-time statistics could still expose the data privacy of crowd-sourcing participants. For mitigating the privacy concern, while traditional differential privacy (DP) mechanism can be simply implemented to perturb the statistics in each timestamp and independently for each dimension, this may suffer a great utility loss from the real-time and multi-dimensional crowd-sourced data. Also, the real-time broadcasting would bring significant overheads in the whole network. To tackle the issues, we propose a novel privacy-preserving and communication-efficient decentralized statistical estimation algorithm (DPCrowd), which only requires intermittently sharing the DP protected parameters with one-hop neighbors by exploiting the temporal correlations in real-time crowd-sourced data. Then, with further consideration of spatial correlations, we develop an enhanced algorithm, DPCrowd+, to deal with multi-dimensional infinite crowd-data streams. Extensive experiments on several datasets demonstrate that our proposed schemes DPCrowd and DPCrowd+ can significantly outperform existing schemes in providing accurate and consensus estimation with rigorous privacy protection and great communication efficiency.

Download