ترغب بنشر مسار تعليمي؟ اضغط هنا

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

73   0   0.0 ( 0 )
 نشر من قبل Yingxue Zhou
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for over-parameterized models where $p gg n$, the number of training samples. Existing lower bounds on private ERM show that such dependence on $p$ is inevitable in the worst case. In this paper, we circumvent the dependence on the ambient dimension by leveraging a low-dimensional structure of gradient space in deep networks -- that is, the stochastic gradients for deep nets usually stay in a low dimensional subspace in the training process. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace, which is given by the top gradient eigenspace on a small public dataset. We provide a general sample complexity analysis on the public dataset for the gradient subspace identification problem and demonstrate that under certain low-dimensional assumptions the public sample complexity only grows logarithmically in $p$. Finally, we provide a theoretical analysis and empirical evaluations to show that our method can substantially improve the accuracy of DP-SGD in the high privacy regime (corresponding to low privacy loss $epsilon$).

قيم البحث

اقرأ أيضاً

As massive data are produced from small gadgets, federated learning on mobile devices has become an emerging trend. In the federated setting, Stochastic Gradient Descent (SGD) has been widely used in federated learning for various machine learning mo dels. To prevent privacy leakages from gradients that are calculated on users sensitive data, local differential privacy (LDP) has been considered as a privacy guarantee in federated SGD recently. However, the existing solutions have a dimension dependency problem: the injected noise is substantially proportional to the dimension $d$. In this work, we propose a two-stage framework FedSel for federated SGD under LDP to relieve this problem. Our key idea is that not all dimensions are equally important so that we privately select Top-k dimensions according to their contributions in each iteration of federated SGD. Specifically, we propose three private dimension selection mechanisms and adapt the gradient accumulation technique to stabilize the learning process with noisy updates. We also theoretically analyze privacy, accuracy and time complexity of FedSel, which outperforms the state-of-the-art solutions. Experiments on real-world and synthetic datasets verify the effectiveness and efficiency of our framework.
Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of per-sample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson-Lindenstrauss (JL) projections to quickly approximate the per-sample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their non-
We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexit y for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on benchmark data sets. In particular, we provide the first global convergence analysis of ALS with noise introduced to ensure DP, and show that, in comparison to the best known alternative (the Private Frank-Wolfe algorithm by Jain et al. (2018)), our error bounds scale significantly better with respect to the number of items and users, which is critical in practical problems. Extensive validation on standard benchmarks demonstrate that the algorithm, in combination with carefully designed sampling procedures, is significantly more accurate than existing techniques, thus promising to be the first practical DP embedding model.
We consider the problem of empirical risk minimization given a database, using the gradient descent algorithm. We note that the function to be optimized may be non-convex, consisting of saddle points which impede the convergence of the algorithm. A p erturbed gradient descent algorithm is typically employed to escape these saddle points. We show that this algorithm, that perturbs the gradient, inherently preserves the privacy of the data. We then employ the differential privacy framework to quantify the privacy hence achieved. We also analyze the change in privacy with varying parameters such as problem dimension and the distance between the databases.
Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning. To date, most work has focused on private empirical risk minimization (ERM) or pri vate population loss minimization. However, there are often other objectives--such as fairness, adversarial robustness, or sensitivity to outliers--besides average performance that are not captured in the classical ERM setup. To this end, we study a completely general family of convex, Lipschitz loss functions and establish the first known DP excess risk and runtime bounds for optimizing this broad class. We provide similar bounds under additional assumptions of smoothness and/or strong convexity. We also address private stochastic convex optimization (SCO). While $(epsilon, delta)$-DP ($delta > 0$) has been the focus of much recent work in private SCO, proving tight population loss bounds and runtime bounds for $(epsilon, 0)$-DP remains a challenging open problem. We provide the tightest known $(epsilon, 0)$-DP population loss bounds and fastest runtimes under the presence of (or lack of) smoothness and strong convexity. Our methods extend to the $delta > 0$ setting, where we offer the unique benefit of ensuring differential privacy for arbitrary $epsilon > 0$ by incorporating a new form of Gaussian noise. Finally, we apply our theory to two learning frameworks: tilted ERM and adversarial learning. In particular, our theory quantifies tradeoffs between adversarial robustness, privacy, and runtime. Our results are achieved using perhaps the simplest DP algorithm: output perturbation. Although this method is not novel conceptually, our novel implementation scheme and analysis show that the power of this method to achieve strong privacy, utility, and runtime guarantees has not been fully appreciated in prior works.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا