No Arabic abstract
This paper presents the private-outsourced-Gaussian process-upper confidence bound (PO-GP-UCB) algorithm, which is the first algorithm for privacy-preserving Bayesian optimization (BO) in the outsourced setting with a provable performance guarantee. We consider the outsourced setting where the entity holding the dataset and the entity performing BO are represented by different parties, and the dataset cannot be released non-privately. For example, a hospital holds a dataset of sensitive medical records and outsources the BO task on this dataset to an industrial AI company. The key idea of our approach is to make the BO performance of our algorithm similar to that of non-private GP-UCB run using the original dataset, which is achieved by using a random projection-based transformation that preserves both privacy and the pairwise distances between inputs. Our main theoretical contribution is to show that a regret bound similar to that of the standard GP-UCB algorithm can be established for our PO-GP-UCB algorithm. We empirically evaluate the performance of our PO-GP-UCB algorithm with synthetic and real-world datasets.
Generalized linear models (GLMs) such as logistic regression are among the most widely used arms in data analysts repertoire and often used on sensitive datasets. A large body of prior works that investigate GLMs under differential privacy (DP) constraints provide only private point estimates of the regression coefficients, and are not able to quantify parameter uncertainty. In this work, with logistic and Poisson regression as running examples, we introduce a generic noise-aware DP Bayesian inference method for a GLM at hand, given a noisy sum of summary statistics. Quantifying uncertainty allows us to determine which of the regression coefficients are statistically significantly different from zero. We provide a previously unknown tight privacy analysis and experimentally demonstrate that the posteriors obtained from our model, while adhering to strong privacy guarantees, are close to the non-private posteriors.
Bayesian neural network (BNN) allows for uncertainty quantification in prediction, offering an advantage over regular neural networks that has not been explored in the differential privacy (DP) framework. We fill this important gap by leveraging recent development in Bayesian deep learning and privacy accounting to offer a more precise analysis of the trade-off between privacy and accuracy in BNN. We propose three DP-BNNs that characterize the weight uncertainty for the same network architecture in distinct ways, namely DP-SGLD (via the noisy gradient method), DP-BBP (via changing the parameters of interest) and DP-MC Dropout (via the model architecture). Interestingly, we show a new equivalence between DP-SGD and DP-SGLD, implying that some non-Bayesian DP training naturally allows for uncertainty quantification. However, the hyperparameters such as learning rate and batch size, can have different or even opposite effects in DP-SGD and DP-SGLD. Extensive experiments are conducted to compare DP-BNNs, in terms of privacy guarantee, prediction accuracy, uncertainty quantification, calibration, computation speed, and generalizability to network architecture. As a result, we observe a new tradeoff between the privacy and the reliability. When compared to non-DP and non-Bayesian approaches, DP-SGLD is remarkably accurate under strong privacy guarantee, demonstrating the great potential of DP-BNN in real-world tasks.
Through the lens of information-theoretic reductions, we examine a reductions approach to fair optimization and learning where a black-box optimizer is used to learn a fair model for classification or regression. Quantifying the complexity, both statistically and computationally, of making such models satisfy the rigorous definition of differential privacy is our end goal. We resolve a few open questions and show applicability to fair machine learning, hypothesis testing, and to optimizing non-standard measures of classification loss. Furthermore, our sample complexity bounds are tight amongst all strategies that jointly minimize a composition of functions. The reductions approach to fair optimization can be abstracted as the constrained group-objective optimization problem where we aim to optimize an objective that is a function of losses of individual groups, subject to some constraints. We give the first polynomial-time algorithms to solve the problem with $(epsilon, 0)$ or $(epsilon, delta)$ differential privacy guarantees when defined on a convex decision set (for example, the $ell_P$ unit ball) with convex constraints and losses. Accompanying information-theoretic lower bounds for the problem are presented. In addition, compared to a previous method for ensuring differential privacy subject to a relaxed form of the equalized odds fairness constraint, the $(epsilon, delta)$-differentially private algorithm we present provides asymptotically better sample complexity guarantees, resulting in an exponential improvement in certain parameter regimes. We introduce a class of bounded divergence linear optimizers, which could be of independent interest, and specialize to pure and approximate differential privacy.
In shuffle privacy, each user sends a collection of randomized messages to a trusted shuffler, the shuffler randomly permutes these messages, and the resulting shuffled collection of messages must satisfy differential privacy. Prior work in this model has largely focused on protocols that use a single round of communication to compute algorithmic primitives like means, histograms, and counts. In this work, we present interactive shuffle protocols for stochastic convex optimization. Our optimization protocols rely on a new noninteractive protocol for summing vectors of bounded $ell_2$ norm. By combining this sum subroutine with techniques including mini-batch stochastic gradient descent, accelerated gradient descent, and Nesterovs smoothing method, we obtain loss guarantees for a variety of convex loss functions that significantly improve on those of the local model and sometimes match those of the central model.
Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning. To date, most work has focused on private empirical risk minimization (ERM) or private population loss minimization. However, there are often other objectives--such as fairness, adversarial robustness, or sensitivity to outliers--besides average performance that are not captured in the classical ERM setup. To this end, we study a completely general family of convex, Lipschitz loss functions and establish the first known DP excess risk and runtime bounds for optimizing this broad class. We provide similar bounds under additional assumptions of smoothness and/or strong convexity. We also address private stochastic convex optimization (SCO). While $(epsilon, delta)$-DP ($delta > 0$) has been the focus of much recent work in private SCO, proving tight population loss bounds and runtime bounds for $(epsilon, 0)$-DP remains a challenging open problem. We provide the tightest known $(epsilon, 0)$-DP population loss bounds and fastest runtimes under the presence of (or lack of) smoothness and strong convexity. Our methods extend to the $delta > 0$ setting, where we offer the unique benefit of ensuring differential privacy for arbitrary $epsilon > 0$ by incorporating a new form of Gaussian noise. Finally, we apply our theory to two learning frameworks: tilted ERM and adversarial learning. In particular, our theory quantifies tradeoffs between adversarial robustness, privacy, and runtime. Our results are achieved using perhaps the simplest DP algorithm: output perturbation. Although this method is not novel conceptually, our novel implementation scheme and analysis show that the power of this method to achieve strong privacy, utility, and runtime guarantees has not been fully appreciated in prior works.