Privacy for Rescue: A New Testimony Why Privacy is Vulnerable In Deep Models

206 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ruiyuan Gao

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ruiyuan Gao - Ming Dun - Hailong Yang

التشفير والأمن التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The huge computation demand of deep learning models and limited computation resources on the edge devices calls for the cooperation between edge device and cloud service by splitting the deep models into two halves. However, transferring the intermediates results from the partial models between edge device and cloud service makes the user privacy vulnerable since the attacker can intercept the intermediate results and extract privacy information from them. Existing research works rely on metrics that are either impractical or insufficient to measure the effectiveness of privacy protection methods in the above scenario, especially from the aspect of a single user. In this paper, we first present a formal definition of the privacy protection problem in the edge-cloud system running DNN models. Then, we analyze the-state-of-the-art methods and point out the drawbacks of their methods, especially the evaluation metrics such as the Mutual Information (MI). In addition, we perform several experiments to demonstrate that although existing methods perform well under MI, they are not effective enough to protect the privacy of a single user. To address the drawbacks of the evaluation metrics, we propose two new metrics that are more accurate to measure the effectiveness of privacy protection methods. Finally, we highlight several potential research directions to encourage future efforts addressing the privacy protection problem.

قيم البحث

126 - Xiyang Liu , Yixi Xu , Shruti Tople 2020

In this work, we formally study the membership privacy risk of generative models and propose a membership privacy estimation framework. We formulate the membership privacy risk as a statistical divergence between training samples and hold-out samples , and propose sample-based methods to estimate this divergence. Unlike previous works, our proposed metric and estimators make realistic and flexible assumptions. First, we offer a generalizable metric as an alternative to accuracy for imbalanced datasets. Second, our estimators are capable of estimating the membership privacy risk given any scalar or vector valued attributes from the learned model, while prior work require access to specific attributes. This allows our framework to provide data-driven certificates for trained generative models in terms of membership privacy risk. Finally, we show a connection to differential privacy, which allows our proposed estimators to be used to understand the privacy budget epsilon needed for differentially private generative models. We demonstrate the utility of our framework through experimental demonstrations on different generative models using various model attributes yielding some new insights about membership leakage and vulnerabilities of models.

التشفير والأمن التعلم الآلي

Privacy Preserving Vertical Federated Learning for Tree-based Models

125 - Yuncheng Wu , Shaofeng Cai , Xiaokui Xiao 2020

Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {it vertical} federated learning, which tackles the scenarios where (i ) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise $m-1$ out of $m$ clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

التشفير والأمن التعلم الآلي

Connecting Robust Shuffle Privacy and Pan-Privacy

82 - Victor Balcer , Albert Cheu , Matthew Joseph 2020

In the emph{shuffle model} of differential privacy, data-holding users send randomized messages to a secure shuffler, the shuffler permutes the messages, and the resulting collection of messages must be differentially private with regard to user data . In the emph{pan-private} model, an algorithm processes a stream of data while maintaining an internal state that is differentially private with regard to the stream data. We give evidence connecting these two apparently different models. Our results focus on emph{robustly} shuffle private protocols, whose privacy guarantees are not greatly affected by malicious users. First, we give robustly shuffle private protocols and upper bounds for counting distinct elements and uniformity testing. Second, we use pan-private lower bounds to prove robustly shuffle private lower bounds for both problems. Focusing on the dependence on the domain size $k$, we find that robust approximate shuffle privacy and approximate pan-privacy have additive error $Theta(sqrt{k})$ for counting distinct elements. For uniformity testing, we give a robust approximate shuffle private protocol with sample complexity $tilde O(k^{2/3})$ and show that an $Omega(k^{2/3})$ dependence is necessary for any robust pure shuffle private tester. Finally, we show that this connection is useful in both directions: we give a pan-private adaptation of recent work on shuffle private histograms and use it to recover further separations between pan-privacy and interactive local privacy.

التشفير والأمن التعلم الآلي

GECKO: Reconciling Privacy, Accuracy and Efficiency in Embedded Deep Learning

57 - Vasisht Duddu , Antoine Boutet , Virat Shejwalkar 2020

Embedded systems demand on-device processing of data using Neural Networks (NNs) while conforming to the memory, power and computation constraints, leading to an efficiency and accuracy tradeoff. To bring NNs to edge devices, several optimizations su ch as model compression through pruning, quantization, and off-the-shelf architectures with efficient design have been extensively adopted. These algorithms when deployed to real world sensitive applications, requires to resist inference attacks to protect privacy of users training data. However, resistance against inference attacks is not accounted for designing NN models for IoT. In this work, we analyse the three-dimensional privacy-accuracy-efficiency tradeoff in NNs for IoT devices and propose Gecko training methodology where we explicitly add resistance to private inferences as a design objective. We optimize the inference-time memory, computation, and power constraints of embedded devices as a criterion for designing NN architecture while also preserving privacy. We choose quantization as design choice for highly efficient and private models. This choice is driven by the observation that compressed models leak more information compared to baseline models while off-the-shelf efficient architectures indicate poor efficiency and privacy tradeoff. We show that models trained using Gecko methodology are comparable to prior defences against black-box membership attacks in terms of accuracy and privacy while providing efficiency.

التشفير والأمن التعلم الآلي

A Quantitative Metric for Privacy Leakage in Federated Learning

220 - Yong Liu , Xinghua Zhu , Jianzong Wang 2021

In the federated learning system, parameter gradients are shared among participants and the central modulator, while the original data never leave their protected source domain. However, the gradient itself might carry enough information for precise inference of the original data. By reporting their parameter gradients to the central server, client datasets are exposed to inference attacks from adversaries. In this paper, we propose a quantitative metric based on mutual information for clients to evaluate the potential risk of information leakage in their gradients. Mutual information has received increasing attention in the machine learning and data mining community over the past few years. However, existing mutual information estimation methods cannot handle high-dimensional variables. In this paper, we propose a novel method to approximate the mutual information between the high-dimensional gradients and batched input data. Experimental results show that the proposed metric reliably reflect the extent of information leakage in federated learning. In addition, using the proposed metric, we investigate the influential factors of risk level. It is proven that, the risk of information leakage is related to the status of the task model, as well as the inherent data distribution.

التشفير والأمن التعلم الآلي