Differentially Private Simple Linear Regression

94 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Audra McMillan

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Daniel Alabi - Audra McMillan - Jayshree Sarathy

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithms output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases.

قيم البحث

127 - Tejas Kulkarni , Joonas Jalko , Antti Koskela 2020

Generalized linear models (GLMs) such as logistic regression are among the most widely used arms in data analysts repertoire and often used on sensitive datasets. A large body of prior works that investigate GLMs under differential privacy (DP) const raints provide only private point estimates of the regression coefficients, and are not able to quantify parameter uncertainty. In this work, with logistic and Poisson regression as running examples, we introduce a generic noise-aware DP Bayesian inference method for a GLM at hand, given a noisy sum of summary statistics. Quantifying uncertainty allows us to determine which of the regression coefficients are statistically significantly different from zero. We provide a previously unknown tight privacy analysis and experimentally demonstrate that the posteriors obtained from our model, while adhering to strong privacy guarantees, are close to the non-private posteriors.

التعلم الآلي التشفير والأمن التعلم الالي

Differentially Private Weighted Sampling

104 - Edith Cohen , Ofir Geri , Tamas Sarlos 2020

Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that ensures element-level differential privacy while retaining, to the extent possible, the utility of a respective non-private weighted sample. PWS maximizes the reporting probabilities of keys and estimation quality of a broad family of statistics. PWS improves over the state of the art also for the well-studied special case of private histograms, when no sampling is performed. We empirically demonstrate significant performance gains compared with prior baselines: 20%-300% increase in key reporting for common Zipfian frequency distributions and accuracy for $times 2$-$ 8$ lower frequencies in estimation tasks. Moreover, PWS is applied as a simple post-processing of a non-private sample, without requiring the original data. This allows for seamless integration with existing implementations of non-private schemes and retaining the efficiency of schemes designed for resource-constrained settings such as massive distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

Differentially Private Correlation Clustering

357 - Mark Bun , Marek Eliav{s} , Janardhan Kulkarni 2021

Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error of $Omega(n)$.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

Differentially Private Online Submodular Maximization

233 - Sebastian Perez-Salazar , Rachel Cummings 2020

In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of $T$ submodular functions over a common finite ground set $U$ arrives online, and at each time-step the d ecision maker must choose at most $k$ elements of $U$ before observing the function. The decision maker obtains a payoff equal to the function evaluated on the chosen set, and aims to learn a sequence of sets that achieves low expected regret. In the full-information setting, we develop an $(varepsilon,delta)$-DP algorithm with expected $(1-1/e)$-regret bound of $mathcal{O}left( frac{k^2log |U|sqrt{T log k/delta}}{varepsilon} right)$. This algorithm contains $k$ ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an $(varepsilon,delta+ O(e^{-T^{1/3}}))$-DP algorithm with expected $(1-1/e)$-regret bound of $mathcal{O}left( frac{sqrt{log k/delta}}{varepsilon} (k (|U| log |U|)^{1/3})^2 T^{2/3} right)$. Our algorithms contains $k$ ordered experts that learn the best marginal item to select given the items chosen her predecessors, while maintaining privacy of the functions. One challenge for privacy in this setting is that the payoff and feedback of expert $i$ depends on the actions taken by her $i-1$ predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

Differentially Private n-gram Extraction

129 - Kunho Kim , Sivakanth Gopi , Janardhan Kulkarni 2021

We revisit the problem of $n$-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n$-grams as possible while preserving user level privacy. Extracting $n$-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012).

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات