ترغب بنشر مسار تعليمي؟ اضغط هنا

Quantifying Privacy Leakage in Graph Embedding

111   0   0.0 ( 0 )
 نشر من قبل Antoine Boutet
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Graph embeddings have been proposed to map graph data to low dimensional space for downstream processing (e.g., node classification or link prediction). With the increasing collection of personal data, graph embeddings can be trained on private and sensitive data. For the first time, we quantify the privacy leakage in graph embeddings through three inference attacks targeting Graph Neural Networks. We propose a membership inference attack to infer whether a graph node corresponding to individual users data was member of the models training or not. We consider a blackbox setting where the adversary exploits the output prediction scores, and a whitebox setting where the adversary has also access to the released node embeddings. This attack provides an accuracy up to 28% (blackbox) 36% (whitebox) beyond random guess by exploiting the distinguishable footprint between train and test data records left by the graph embedding. We propose a Graph Reconstruction attack where the adversary aims to reconstruct the target graph given the corresponding graph embeddings. Here, the adversary can reconstruct the graph with more than 80% of accuracy and link inference between two nodes around 30% more confidence than a random guess. We then propose an attribute inference attack where the adversary aims to infer a sensitive attribute. We show that graph embeddings are strongly correlated to node attributes letting the adversary inferring sensitive information (e.g., gender or location).

قيم البحث

اقرأ أيضاً

154 - Sara Saeidian 2020
Machine learning models are known to memorize the unique properties of individual data points in a training set. This memorization capability can be exploited by several types of attacks to infer information about the training data, most notably, mem bership inference attacks. In this paper, we propose an approach based on information leakage for guaranteeing membership privacy. Specifically, we propose to use a conditional form of the notion of maximal leakage to quantify the information leaking about individual data entries in a dataset, i.e., the entrywise information leakage. We apply our privacy analysis to the Private Aggregation of Teacher Ensembles (PATE) framework for privacy-preserving classification of sensitive data and prove that the entrywise information leakage of its aggregation mechanism is Schur-concave when the injected noise has a log-concave probability density. The Schur-concavity of this leakage implies that increased consensus among teachers in labeling a query reduces its associated privacy cost. Finally, we derive upper bounds on the entrywise information leakage when the aggregation mechanism uses Laplace distributed noise.
Machine Learning models, extensively used for various multimedia applications, are offered to users as a blackbox service on the Cloud on a pay-per-query basis. Such blackbox models are commercially valuable to adversaries, making them vulnerable to extraction attacks to reverse engineer the proprietary model thereby violating the model privacy and Intellectual Property. Here, the adversary first extracts the model architecture or hyperparameters through side channel leakage, followed by stealing the functionality of the target model by training the reconstructed architecture on a synthetic dataset. While the attacks proposed in literature are empirical, there is a need for a theoretical framework to measure the information leaked under such extraction attacks. To this extent, in this work, we propose a novel probabilistic framework, Airavata, to estimate the information leakage in such model extraction attacks. This framework captures the fact that extracting the exact target model is difficult due to experimental uncertainty while inferring model hyperparameters and stochastic nature of training to steal the target model functionality. Specifically, we use Bayesian Networks to capture uncertainty in estimating the target model under various extraction attacks based on the subjective notion of probability. We validate the proposed framework under different adversary assumptions commonly adopted in literature to reason about the attack efficacy. This provides a practical tool to infer actionable details about extracting blackbox models and help identify the best attack combination which maximises the knowledge extracted (or information leaked) from the target model.
The number of smartphones, tablets, sensors, and connected wearable devices are rapidly increasing. Today, in many parts of the globe, the penetration of mobile computers has overtaken the number of traditional personal computers. This trend and the always-on nature of these devices have resulted in increasing concerns over the intrusive nature of these devices and the privacy risks that they impose on users or those associated with them. In this paper, we survey the current state of the art on mobile computing research, focusing on privacy risks and data leakage effects. We then discuss a number of methods, recommendations, and ongoing research in limiting the privacy leakages and associated risks by mobile computing.
In the federated learning system, parameter gradients are shared among participants and the central modulator, while the original data never leave their protected source domain. However, the gradient itself might carry enough information for precise inference of the original data. By reporting their parameter gradients to the central server, client datasets are exposed to inference attacks from adversaries. In this paper, we propose a quantitative metric based on mutual information for clients to evaluate the potential risk of information leakage in their gradients. Mutual information has received increasing attention in the machine learning and data mining community over the past few years. However, existing mutual information estimation methods cannot handle high-dimensional variables. In this paper, we propose a novel method to approximate the mutual information between the high-dimensional gradients and batched input data. Experimental results show that the proposed metric reliably reflect the extent of information leakage in federated learning. In addition, using the proposed metric, we investigate the influential factors of risk level. It is proven that, the risk of information leakage is related to the status of the task model, as well as the inherent data distribution.
Federated learning enables mutually distrusting participants to collaboratively learn a distributed machine learning model without revealing anything but the models output. Generic federated learning has been studied extensively, and several learning protocols, as well as open-source frameworks, have been developed. Yet, their over pursuit of computing efficiency and fast implementation might diminish the security and privacy guarantees of participants training data, about which little is known thus far. In this paper, we consider an honest-but-curious adversary who participants in training a distributed ML model, does not deviate from the defined learning protocol, but attempts to infer private training data from the legitimately received information. In this setting, we design and implement two practical attacks, reverse sum attack and reverse multiplication attack, neither of which will affect the accuracy of the learned model. By empirically studying the privacy leakage of two learning protocols, we show that our attacks are (1) effective - the adversary successfully steal the private training data, even when the intermediate outputs are encrypted to protect data privacy; (2) evasive - the adversarys malicious behavior does not deviate from the protocol specification and deteriorate any accuracy of the target model; and (3) easy - the adversary needs little prior knowledge about the data distribution of the target participant. We also experimentally show that the leaked information is as effective as the raw training data through training an alternative classifier on the leaked information. We further discuss potential countermeasures and their challenges, which we hope may lead to several promising research directions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا