Privacy-Preserving Machine Learning: Methods, Challenges and Directions

259 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Runhua Xu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Runhua Xu - Nathalie Baracaldo - James Joshi

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model, especially, emerging deep neural network model, relies on a large volume of training data and high-powered computational resources. The need for a vast volume of available data raises serious privacy concerns because of the risk of leakage of highly privacy-sensitive information and the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data. Furthermore, a trained ML model may also be vulnerable to adversarial attacks such as membership/property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are crucial and have attracted increasing research interest from academia and industry. More and more efforts of PPML are proposed via integrating privacy-preserving techniques into ML algorithms, fusing privacy-preserving approaches into ML pipeline, or designing various privacy-preserving architectures for existing ML systems. In particular, existing PPML arts cross-cut ML, system, security, and privacy; hence, there is a critical need to understand state-of-art studies, related challenges, and a roadmap for future research. This paper systematically reviews and summarizes existing privacy-preserving approaches and proposes a PGU model to guide evaluation for various PPML solutions through elaborately decomposing their privacy-preserving functionalities. The PGU model is designed as the triad of Phase, Guarantee, and technical Utility. Furthermore, we also discuss the unique characteristics and challenges of PPML and outline possible directions of future work that benefit a wide range of research communities among ML, distributed systems, security, and privacy areas.

قيم البحث

143 - Runhua Xu , Nathalie Baracaldo , Yi Zhou 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Federated Learning and Differential Privacy: Software tools analysis, the Sherpa.ai FL framework and methodological guidelines for preserving data privacy

159 - Nuria Rodriguez-Barroso , Goran Stipcich , Daniel Jimenez-Lopez 2020

The high demand of artificial intelligence services at the edges that also preserve data privacy has pushed the research on novel machine learning paradigms that fit those requirements. Federated learning has the ambition to protect data privacy thro ugh distributed learning methods that keep the data in their data silos. Likewise, differential privacy attains to improve the protection of data privacy by measuring the privacy loss in the communication among the elements of federated learning. The prospective matching of federated learning and differential privacy to the challenges of data privacy protection has caused the release of several software tools that support their functionalities, but they lack of the needed unified vision for those techniques, and a methodological workflow that support their use. Hence, we present the Sherpa.ai Federated Learning framework that is built upon an holistic view of federated learning and differential privacy. It results from the study of how to adapt the machine learning paradigm to federated learning, and the definition of methodological guidelines for developing artificial intelligence services based on federated learning and differential privacy. We show how to follow the methodological guidelines with the Sherpa.ai Federated Learning framework by means of a classification and a regression use cases.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

AirMixML: Over-the-Air Data Mixup for Inherently Privacy-Preserving Edge Machine Learning

135 - Yusuke Koda , Jihong Park , Mehdi Bennis 2021

Wireless channels can be inherently privacy-preserving by distorting the received signals due to channel noise, and superpositioning multiple signals over-the-air. By harnessing these natural distortions and superpositions by wireless channels, we pr opose a novel privacy-preserving machine learning (ML) framework at the network edge, coined over-the-air mixup ML (AirMixML). In AirMixML, multiple workers transmit analog-modulated signals of their private data samples to an edge server who trains an ML model using the received noisy-and superpositioned samples. AirMixML coincides with model training using mixup data augmentation achieving comparable accuracy to that with raw data samples. From a privacy perspective, AirMixML is a differentially private (DP) mechanism limiting the disclosure of each workers private sample information at the server, while the workers transmit power determines the privacy disclosure level. To this end, we develop a fractional channel-inversion power control (PC) method, {alpha}-Dirichlet mixup PC (DirMix({alpha})-PC), wherein for a given global power scaling factor after channel inversion, each workers local power contribution to the superpositioned signal is controlled by the Dirichlet dispersion ratio {alpha}. Mathematically, we derive a closed-form expression clarifying the relationship between the local and global PC factors to guarantee a target DP level. By simulations, we provide DirMix({alpha})-PC design guidelines to improve accuracy, privacy, and energy-efficiency. Finally, AirMixML with DirMix({alpha})-PC is shown to achieve reasonable accuracy compared to a privacy-violating baseline with neither superposition nor PC.

بنية الشبكات والإنترنت الذكاء الاصطناعي التشفير والأمن

Quantum machine learning with differential privacy

112 - William M Watkins , Samuel Yen-Chi Chen , Shinjae Yoo 2021

Quantum machine learning (QML) can complement the growing trend of using learned models for a myriad of classification tasks, from image recognition to natural speech processing. A quantum advantage arises due to the intractability of quantum operati ons on a classical computer. Many datasets used in machine learning are crowd sourced or contain some private information. To the best of our knowledge, no current QML models are equipped with privacy-preserving features, which raises concerns as it is paramount that models do not expose sensitive information. Thus, privacy-preserving algorithms need to be implemented with QML. One solution is to make the machine learning algorithm differentially private, meaning the effect of a single data point on the training dataset is minimized. Differentially private machine learning models have been investigated, but differential privacy has yet to be studied in the context of QML. In this study, we develop a hybrid quantum-classical model that is trained to preserve privacy using differentially private optimization algorithm. This marks the first proof-of-principle demonstration of privacy-preserving QML. The experiments demonstrate that differentially private QML can protect user-sensitive information without diminishing model accuracy. Although the quantum model is simulated and tested on a classical computer, it demonstrates potential to be efficiently implemented on near-term quantum devices (noisy intermediate-scale quantum [NISQ]). The approachs success is illustrated via the classification of spatially classed two-dimensional datasets and a binary MNIST classification. This implementation of privacy-preserving QML will ensure confidentiality and accurate learning on NISQ technology.

فيزياء الكم الذكاء الاصطناعي التشفير والأمن

Privacy-preserving Machine Learning for Medical Image Classification

150 - Shreyansh Singh , K.K. Shukla 2021

With the rising use of Machine Learning (ML) and Deep Learning (DL) in various industries, the medical industry is also not far behind. A very simple yet extremely important use case of ML in this industry is for image classification. This is importa nt for doctors to help them detect certain diseases timely, thereby acting as an aid to reduce chances of human judgement error. However, when using automated systems like these, there is a privacy concern as well. Attackers should not be able to get access to the medical records and images of the patients. It is also required that the model be secure, and that the data that is sent to the model and the predictions that are received both should not be revealed to the model in clear text. In this study, we aim to solve these problems in the context of a medical image classification problem of detection of pneumonia by examining chest x-ray images.

التعلم الآلي التشفير والأمن