Remember What You Want to Forget: Algorithms for Machine Unlearning

340 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ayush Sekhari

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ayush Sekhari - Jayadev Acharya - Gautam Kamath

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a model $widehat{w}$ that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint $z in S$ can request to be unlearned, thus prompting the learner to modify its output model while still ensuring the same accuracy guarantees. We initiate a rigorous study of generalization in machine unlearning, where the goal is to perform well on previously unseen datapoints. Our focus is on both computational and storage complexity. For the setting of convex losses, we provide an unlearning algorithm that can unlearn up to $O(n/d^{1/4})$ samples, where $d$ is the problem dimension. In comparison, in general, differentially private learning (which implies unlearning) only guarantees deletion of $O(n/d^{1/2})$ samples. This demonstrates a novel separation between differential privacy and machine unlearning.

قيم البحث

472 - Yang Liu , Zhuo Ma , Ximeng Liu 2020

Nowadays, machine learning models, especially neural networks, become prevalent in many real-world applications.These models are trained based on a one-way trip from user data: as long as users contribute their data, there is no way to withdraw; and it is well-known that a neural network memorizes its training data. This contradicts the right to be forgotten clause of GDPR, potentially leading to law violations. To this end, machine unlearning becomes a popular research topic, which allows users to eliminate memorization of their private data from a trained machine learning model.In this paper, we propose the first uniform metric called for-getting rate to measure the effectiveness of a machine unlearning method. It is based on the concept of membership inference and describes the transformation rate of the eliminated data from memorized to unknown after conducting unlearning. We also propose a novel unlearning method calledForsaken. It is superior to previous work in either utility or efficiency (when achieving the same forgetting rate). We benchmark Forsaken with eight standard datasets to evaluate its performance. The experimental results show that it can achieve more than 90% forgetting rate on average and only causeless than 5% accuracy loss.

التعلم الآلي التشفير والأمن التعلم الالي

Making AI Forget You: Data Deletion in Machine Learning

133 - Antonio Ginart , Melody Y. Guan , Gregory Valiant 2019

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EUs Right To Be Forgotten regulation is an example of this effort. In this paper we initiate a framework studying w hat to do when it is no longer permissible to deploy models derivative from specific user data. In particular, we formulate the problem of efficiently deleting individual data points from trained machine learning models. For many standard ML models, the only way to completely remove an individuals data is to retrain the whole model from scratch on the remaining data, which is often not computationally practical. We investigate algorithmic principles that enable efficient data deletion in ML. For the specific setting of k-means clustering, we propose two provably efficient deletion algorithms which achieve an average of over 100X improvement in deletion efficiency across 6 datasets, while producing clusters of comparable statistical quality to a canonical k-means++ baseline.

التعلم الآلي التعلم الالي

Differential-Critic GAN: Generating What You Want by a Cue of Preferences

62 - Yinghua Yao , Yuangang Pan , Ivor W.Tsang 2021

This paper proposes Differential-Critic Generative Adversarial Network (DiCGAN) to learn the distribution of user-desired data when only partial instead of the entire dataset possesses the desired property, which generates desired data that meets use rs expectations and can assist in designing biological products with desired properties. Existing approaches select the desired samples first and train regular GANs on the selected samples to derive the user-desired data distribution. However, the selection of the desired data relies on an expert criterion and supervision over the entire dataset. DiCGAN introduces a differential critic that can learn the preference direction from the pairwise preferences, which is amateur knowledge and can be defined on part of the training data. The resultant critic guides the generation of the desired data instead of the whole data. Specifically, apart from the Wasserstein GAN loss, a ranking loss of the pairwise preferences is defined over the critic. It endows the difference of critic values between each pair of samples with the pairwise preference relation. The higher critic value indicates that the sample is preferred by the user. Thus training the generative model for higher critic values encourages the generation of user-preferred samples. Extensive experiments show that our DiCGAN achieves state-of-the-art performance in learning the user-desired data distributions, especially in the cases of insufficient desired data and limited supervision.

التعلم الآلي

Adaptive Machine Unlearning

92 - Varun Gupta , Christopher Jung , Seth Neel 2021

Data deletion algorithms aim to remove the influence of deleted data points from trained models at a cheaper computational cost than fully retraining those models. However, for sequences of deletions, most prior work in the non-convex setting gives v alid guarantees only for sequences that are chosen independently of the models that are published. If people choose to delete their data as a function of the published models (because they dont like what the models reveal about them, for example), then the update sequence is adaptive. In this paper, we give a general reduction from deletion guarantees against adaptive sequences to deletion guarantees against non-adaptive sequences, using differential privacy and its connection to max information. Combined with ideas from prior work which give guarantees for non-adaptive deletion sequences, this leads to extremely flexible algorithms able to handle arbitrary model classes and training methodologies, giving strong provable deletion guarantees for adaptive deletion sequences. We show in theory how prior work for non-convex models fails against adaptive deletion sequences, and use this intuition to design a practical attack against the SISA algorithm of Bourtoule et al. [2021] on CIFAR-10, MNIST, Fashion-MNIST.

التعلم الآلي التعلم الالي

Listen to What You Want: Neural Network-based Universal Sound Selector

84 - Tsubasa Ochiai , Marc Delcroix , Yuma Koizumi 2020

Being able to control the acoustic events (AEs) to which we want to listen would allow the development of more controllable hearable devices. This paper addresses the AE sound selection (or removal) problems, that we define as the extraction (or supp ression) of all the sounds that belong to one or multiple desired AE classes. Although this problem could be addressed with a combination of source separation followed by AE classification, this is a sub-optimal way of solving the problem. Moreover, source separation usually requires knowing the maximum number of sources, which may not be practical when dealing with AEs. In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes. The proposed framework can be explicitly optimized to simultaneously select sounds from multiple desired AE classes, independently of the number of sources in the mixture. We experimentally show that the proposed method achieves promising AE sound selection performance and could be generalized to mixtures with a number of sources that are unseen during training.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب