Large Scale Private Learning via Low-rank Reparametrization

366 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Da Yu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Da Yu - Huishuai Zhang - Wei Chen

التعلم الآلي التشفير والأمن

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence. Specifically, we reparametrize each weight matrix with two emph{gradient-carrier} matrices of small dimension and a emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9%$ on four downstream tasks with $epsilon=8$, which is within $5%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.

قيم البحث

60 - Seng Pei Liew , Tsubasa Takahashi , Michihiko Ueno 2021

We propose a new framework of synthesizing data using deep generative models in a differentially private manner. Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion, such that training deep genera tive models is possible without re-using the original data. Hence, no extra privacy costs or model constraints are incurred, in contrast to popular approaches such as Differentially Private Stochastic Gradient Descent (DP-SGD), which, among other issues, causes degradation in privacy guarantees as the training iteration increases. We demonstrate a realization of our framework by making use of the characteristic function and an adversarial re-weighting objective, which are of independent interest as well. Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.

التعلم الآلي التشفير والأمن

DPlis: Boosting Utility of Differentially Private Deep Learning via Randomized Smoothing

255 - Wenxiao Wang 2021

Deep learning techniques have achieved remarkable performance in wide-ranging tasks. However, when trained on privacy-sensitive datasets, the model parameters may expose private information in training data. Prior attempts for differentially private training, although offering rigorous privacy guarantees, lead to much lower model performance than the non-private ones. Besides, different runs of the same training algorithm produce models with large performance variance. To address these issues, we propose DPlis--Differentially Private Learning wIth Smoothing. The core idea of DPlis is to construct a smooth loss function that favors noise-resilient models lying in large flat regions of the loss landscape. We provide theoretical justification for the utility improvements of DPlis. Extensive experiments also demonstrate that DPlis can effectively boost model quality and training stability under a given privacy budget.

التعلم الآلي التشفير والأمن التعلم الالي

Large-Scale Differentially Private BERT

95 - Rohan Anil , Badih Ghazi , Vineet Gupta 2021

In this work, we study the large-scale pretraining of BERT-Large with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP- SGD step for BERT; we also enhance its efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of [SVK20], who demonstrated that the overhead of a DP-SGD step is minimized with effective use of JAX [BFH+18, FJL18] primitives in conjunction with the XLA compiler [XLA17]. Our implementation achieves a masked language model accuracy of 60.5% at a batch size of 2M, for $epsilon = 5.36$. To put this number in perspective, non-private BERT models achieve an accuracy of $sim$70%.

التعلم الآلي الحساب واللغة التشفير والأمن

Generating Artificial Data for Private Deep Learning

119 - Aleksei Triastcyn , Boi Faltings 2018

In this paper, we propose generating artificial data that retain statistical properties of real data as the means of providing privacy with respect to the original dataset. We use generative adversarial network to draw privacy-preserving artificial d ata samples and derive an empirical method to assess the risk of information disclosure in a differential-privacy-like way. Our experiments show that we are able to generate artificial data of high quality and successfully train and validate machine learning models on this data while limiting potential privacy loss.

التعلم الآلي التشفير والأمن التعلم الالي

Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning

71 - Chong Liu , Yuqing Zhu , Kamalika Chaudhuri 2020

The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning. Existing theoretical analysis shows that PATE consistently learns any VC-classes in the realizable settin g, but falls short in explaining its success in more general cases where the error rate of the optimal classifier is bounded away from zero. We fill in this gap by introducing the Tsybakov Noise Condition (TNC) and establish stronger and more interpretable learning bounds. These bounds provide new insights into when PATE works and improve over existing results even in the narrower realizable setting. We also investigate the compelling idea of using active learning for saving privacy budget. The novel components in the proofs include a more refined analysis of the majority voting classifier -- which could be of independent interest -- and an observation that the synthetic student learning problem is nearly realizable by construction under the Tsybakov noise condition.

التعلم الآلي التشفير والأمن