No Arabic abstract
We show that differentially private stochastic gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models. This represents a serious issue for safety-critical applications, e.g. in medical diagnosis. We highlight and exploit parallels between stochastic gradient Langevin dynamics, a scalable Bayesian inference technique for training deep neural networks, and DP-SGD, in order to train differentially private, Bayesian neural networks with minor adjustments to the original (DP-SGD) algorithm. Our approach provides considerably more reliable uncertainty estimates than DP-SGD, as demonstrated empirically by a reduction in expected calibration error (MNIST $sim{5}$-fold, Pediatric Pneumonia Dataset $sim{2}$-fold).
Differentially private stochastic gradient descent (DPSGD) is a variation of stochastic gradient descent based on the Differential Privacy (DP) paradigm which can mitigate privacy threats arising from the presence of sensitive information in training data. One major drawback of training deep neural networks with DPSGD is a reduction in the models accuracy. In this paper, we propose an alternative method for preserving data privacy based on introducing noise through learnable probability distributions, which leads to a significant improvement in the utility of the resulting private models. We also demonstrate that normalization layers have a large beneficial impact on the performance of deep neural networks with noisy parameters. In particular, we show that contrary to general belief, a large amount of random noise can be added to the weights of neural networks without harming the performance, once the networks are augmented with normalization layers. We hypothesize that this robustness is a consequence of the scale invariance property of normalization operators. Building on these observations, we propose a new algorithmic technique for training deep neural networks under very low privacy budgets by sampling weights from Gaussian distributions and utilizing batch or layer normalization techniques to prevent performance degradation. Our method outperforms previous approaches, including DPSGD, by a substantial margin on a comprehensive set of experiments on Computer Vision and Natural Language Processing tasks. In particular, we obtain a 20 percent accuracy improvement over DPSGD on the MNIST and CIFAR10 datasets with DP-privacy budgets of $varepsilon = 0.05$ and $varepsilon = 2.0$, respectively. Our code is available online: https://github.com/uds-lsv/SIDP.
The application of differential privacy to the training of deep neural networks holds the promise of allowing large-scale (decentralized) use of sensitive data while providing rigorous privacy guarantees to the individual. The predominant approach to differentially private training of neural networks is DP-SGD, which relies on norm-based gradient clipping as a method for bounding sensitivity, followed by the addition of appropriately calibrated Gaussian noise. In this work we propose NeuralDP, a technique for privatising activations of some layer within a neural network, which by the post-processing properties of differential privacy yields a differentially private network. We experimentally demonstrate on two datasets (MNIST and Pediatric Pneumonia Dataset (PPD)) that our method offers substantially improved privacy-utility trade-offs compared to DP-SGD.
Bayesian neural network (BNN) allows for uncertainty quantification in prediction, offering an advantage over regular neural networks that has not been explored in the differential privacy (DP) framework. We fill this important gap by leveraging recent development in Bayesian deep learning and privacy accounting to offer a more precise analysis of the trade-off between privacy and accuracy in BNN. We propose three DP-BNNs that characterize the weight uncertainty for the same network architecture in distinct ways, namely DP-SGLD (via the noisy gradient method), DP-BBP (via changing the parameters of interest) and DP-MC Dropout (via the model architecture). Interestingly, we show a new equivalence between DP-SGD and DP-SGLD, implying that some non-Bayesian DP training naturally allows for uncertainty quantification. However, the hyperparameters such as learning rate and batch size, can have different or even opposite effects in DP-SGD and DP-SGLD. Extensive experiments are conducted to compare DP-BNNs, in terms of privacy guarantee, prediction accuracy, uncertainty quantification, calibration, computation speed, and generalizability to network architecture. As a result, we observe a new tradeoff between the privacy and the reliability. When compared to non-DP and non-Bayesian approaches, DP-SGLD is remarkably accurate under strong privacy guarantee, demonstrating the great potential of DP-BNN in real-world tasks.
Neural architecture search, which aims to automatically search for architectures (e.g., convolution, max pooling) of neural networks that maximize validation performance, has achieved remarkable progress recently. In many application scenarios, several parties would like to collaboratively search for a shared neural architecture by leveraging data from all parties. However, due to privacy concerns, no party wants its data to be seen by other parties. To address this problem, we propose federated neural architecture search (FNAS), where different parties collectively search for a differentiable architecture by exchanging gradients of architecture variables without exposing their data to other parties. To further preserve privacy, we study differentially-private FNAS (DP-FNAS), which adds random noise to the gradients of architecture variables. We provide theoretical guarantees of DP-FNAS in achieving differential privacy. Experiments show that DP-FNAS can search highly-performant neural architectures while protecting the privacy of individual parties. The code is available at https://github.com/UCSD-AI4H/DP-FNAS
Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning. To date, most work has focused on private empirical risk minimization (ERM) or private population loss minimization. However, there are often other objectives--such as fairness, adversarial robustness, or sensitivity to outliers--besides average performance that are not captured in the classical ERM setup. To this end, we study a completely general family of convex, Lipschitz loss functions and establish the first known DP excess risk and runtime bounds for optimizing this broad class. We provide similar bounds under additional assumptions of smoothness and/or strong convexity. We also address private stochastic convex optimization (SCO). While $(epsilon, delta)$-DP ($delta > 0$) has been the focus of much recent work in private SCO, proving tight population loss bounds and runtime bounds for $(epsilon, 0)$-DP remains a challenging open problem. We provide the tightest known $(epsilon, 0)$-DP population loss bounds and fastest runtimes under the presence of (or lack of) smoothness and strong convexity. Our methods extend to the $delta > 0$ setting, where we offer the unique benefit of ensuring differential privacy for arbitrary $epsilon > 0$ by incorporating a new form of Gaussian noise. Finally, we apply our theory to two learning frameworks: tilted ERM and adversarial learning. In particular, our theory quantifies tradeoffs between adversarial robustness, privacy, and runtime. Our results are achieved using perhaps the simplest DP algorithm: output perturbation. Although this method is not novel conceptually, our novel implementation scheme and analysis show that the power of this method to achieve strong privacy, utility, and runtime guarantees has not been fully appreciated in prior works.