DPCrowd: Privacy-preserving and Communication-efficient Decentralized Statistical Estimation for Real-time Crowd-sourced Data

188 0 0.0 ( 0 )

Download Cite

Added by Xuebin Ren Dr

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Xuebin Ren - Chia-Mu Yu - Wei Yu

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In Internet of Things (IoT) driven smart-world systems, real-time crowd-sourced databases from multiple distributed servers can be aggregated to extract dynamic statistics from a larger population, thus providing more reliable knowledge for our society. Particularly, multiple distributed servers in a decentralized network can realize real-time collaborative statistical estimation by disseminating statistics from their separate databases. Despite no raw data sharing, the real-time statistics could still expose the data privacy of crowd-sourcing participants. For mitigating the privacy concern, while traditional differential privacy (DP) mechanism can be simply implemented to perturb the statistics in each timestamp and independently for each dimension, this may suffer a great utility loss from the real-time and multi-dimensional crowd-sourced data. Also, the real-time broadcasting would bring significant overheads in the whole network. To tackle the issues, we propose a novel privacy-preserving and communication-efficient decentralized statistical estimation algorithm (DPCrowd), which only requires intermittently sharing the DP protected parameters with one-hop neighbors by exploiting the temporal correlations in real-time crowd-sourced data. Then, with further consideration of spatial correlations, we develop an enhanced algorithm, DPCrowd+, to deal with multi-dimensional infinite crowd-data streams. Extensive experiments on several datasets demonstrate that our proposed schemes DPCrowd and DPCrowd+ can significantly outperform existing schemes in providing accurate and consensus estimation with rigorous privacy protection and great communication efficiency.

rate research

Real-Time Privacy-Preserving Data Release for Smart Meters

93 - Mohammadhadi Shateri , Francisco Messina , Pablo Piantanida 2019

Smart Meters (SMs) are a fundamental component of smart grids, but they carry sensitive information about users such as occupancy status of houses and therefore, they have raised serious concerns about leakage of consumers private information. In particular, we focus on real-time privacy threats, i.e., potential attackers that try to infer sensitive data from SMs reported data in an online fashion. We adopt an information-theoretic privacy measure and show that it effectively limits the performance of any real-time attacker. Using this privacy measure, we propose a general formulation to design a privatization mechanism that can provide a target level of privacy by adding a minimal amount of distortion to the SMs measurements. On the other hand, to cope with different applications, a flexible distortion measure is considered. This formulation leads to a general loss function, which is optimized using a deep learning adversarial framework, where two neural networks $-$ referred to as the releaser and the adversary $-$ are trained with opposite goals. An exhaustive empirical study is then performed to validate the performances of the proposed approach for the occupancy detection privacy problem, assuming the attacker disposes of either limited or full access to the training dataset.

Signal Processing Cryptography and Security Information Theory

Road Grade Estimation Using Crowd-Sourced Smartphone Data

68 - Abhishek Gupta 2020

Estimates of road grade/slope can add another dimension of information to existing 2D digital road maps. Integration of road grade information will widen the scope of digital maps applications, which is primarily used for navigation, by enabling driving safety and efficiency applications such as Advanced Driver Assistance Systems (ADAS), eco-driving, etc. The huge scale and dynamic nature of road networks make sensing road grade a challenging task. Traditional methods oftentimes suffer from limited scalability and update frequency, as well as poor sensing accuracy. To overcome these problems, we propose a cost-effective and scalable road grade estimation framework using sensor data from smartphones. Based on our understanding of the error characteristics of smartphone sensors, we intelligently combine data from accelerometer, gyroscope and vehicle speed data from OBD-II/smartphones GPS to estimate road grade. To improve accuracy and robustness of the system, the estimations of road grade from multiple sources/vehicles are crowd-sourced to compensate for the effects of varying quality of sensor data from different sources. Extensive experimental evaluation on a test route of ~9km demonstrates the superior performance of our proposed method, achieving $5times$ improvement on road grade estimation accuracy over baselines, with 90% of errors below 0.3$^circ$.

Signal Processing

Communication-Efficient Accurate Statistical Estimation

152 - Jianqing Fan , Yongyi Guo , Kaizheng Wang 2019

When the data are stored in a distributed manner, direct application of traditional statistical inference procedures is often prohibitive due to communication cost and privacy concerns. This paper develops and investigates two Communication-Efficient Accurate Statistical Estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multi-step statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.

Machine Learning Machine Learning Optimization and Control

Communication-efficient Decentralized Machine Learning over Heterogeneous Networks

100 - Pan Zhou , Qian Lin , Dumitrel Loghin 2020

In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide area network connecting data centers and edge clusters. In these heterogeneous networks, the link speeds among worker nodes vary significantly, making it challenging for state-of-the-art machine learning approaches to perform efficient training. Both centralized and decentralized training approaches suffer from low-speed links. In this paper, we propose a decentralized approach, namely NetMax, that enables worker nodes to communicate via high-speed links and, thus, significantly speed up the training process. NetMax possesses the following novel features. First, it consists of a novel consensus algorithm that allows worker nodes to train model copies on their local dataset asynchronously and exchange information via peer-to-peer communication to synchronize their local copies, instead of a central master node (i.e., parameter server). Second, each worker node selects one peer randomly with a fine-tuned probability to exchange information per iteration. In particular, peers with high-speed links are selected with high probability. Third, the probabilities of selecting peers are designed to minimize the total convergence time. Moreover, we mathematically prove the convergence of NetMax. We evaluate NetMax on heterogeneous cluster networks and show that it achieves speedups of 3.7X, 3.4X, and 1.9X in comparison with the state-of-the-art decentralized training approaches Prague, Allreduce-SGD, and AD-PSGD, respectively.

Distributed Parallel and Cluster Computing

Decentralized Privacy-Preserving Proximity Tracing

80 - Carmela Troncoso , Mathias Payer , Jean-Pierre Hubaux 2020

This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contacts identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the users phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the users phone. Other users apps can use data from the server to locally estimate whether the devices owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.

Cryptography and Security Computers and Society