Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

60 0 0.0 ( 0 )

Download Cite

Added by Shuang Song

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ulfar Erlingsson - Vitaly Feldman - Ilya Mironov

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.

rate research

Connecting Robust Shuffle Privacy and Pan-Privacy

82 - Victor Balcer , Albert Cheu , Matthew Joseph 2020

In the emph{shuffle model} of differential privacy, data-holding users send randomized messages to a secure shuffler, the shuffler permutes the messages, and the resulting collection of messages must be differentially private with regard to user data. In the emph{pan-private} model, an algorithm processes a stream of data while maintaining an internal state that is differentially private with regard to the stream data. We give evidence connecting these two apparently different models. Our results focus on emph{robustly} shuffle private protocols, whose privacy guarantees are not greatly affected by malicious users. First, we give robustly shuffle private protocols and upper bounds for counting distinct elements and uniformity testing. Second, we use pan-private lower bounds to prove robustly shuffle private lower bounds for both problems. Focusing on the dependence on the domain size $k$, we find that robust approximate shuffle privacy and approximate pan-privacy have additive error $Theta(sqrt{k})$ for counting distinct elements. For uniformity testing, we give a robust approximate shuffle private protocol with sample complexity $tilde O(k^{2/3})$ and show that an $Omega(k^{2/3})$ dependence is necessary for any robust pure shuffle private tester. Finally, we show that this connection is useful in both directions: we give a pan-private adaptation of recent work on shuffle private histograms and use it to recover further separations between pan-privacy and interactive local privacy.

Cryptography and Security Machine Learning

An Evaluation of Cryptocurrency Payment Channel Networks and Their Privacy Implications

71 - Enes Erdin , Suat Mercan , 2021

Cryptocurrencies redefined how money can be stored and transferred among users. However, independent of the amount being sent, public blockchain-based cryptocurrencies suffer from high transaction waiting times and fees. These drawbacks hinder the wide use of cryptocurrencies by masses. To address these challenges, payment channel network concept is touted as the most viable solution to be used for micro-payments. The idea is exchanging the ownership of money by keeping the state of the accounts locally. The users inform the blockchain rarely, which decreases the load on the blockchain. Specifically, payment channel networks can provide transaction approvals in seconds by charging a nominal fee proportional to the payment amount. Such attraction on payment channel networks inspired many recent studies which focus on how to design them and allocate channels such that the transactions will be secure and efficient. However, as payment channel networks are emerging and reaching large number of users, privacy issues are becoming more relevant that raise concerns about exposing not only individual habits but also businesses revenues. In this paper, we first propose a categorization of the existing payment networks formed on top of blockchain-backed cryptocurrencies. After discussing several emerging attacks on user/business privacy in these payment channel networks, we qualitatively evaluate them based on a number of privacy metrics that relate to our case. Based on the discussions on the strengths and weaknesses of the approaches, we offer possible directions for research for the future of privacy based payment channel networks.

Cryptography and Security

An Empirical Cybersecurity Evaluation of GitHub Copilots Code Contributions

92 - Hammond Pearce , Baleegh Ahmad , Benjamin Tan 2021

There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer, GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilots code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITREs Top 25 list). We explore Copilots performance on three distinct code generation axes -- examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, we found approximately 40% to be vulnerable.

Cryptography and Security Artificial Intelligence

AMPPERE: A Universal Abstract Machine for Privacy-Preserving Entity Resolution Evaluation

136 - Yixiang Yao , Tanmay Ghai , Srivatsan Ravi 2021

Entity resolution is the task of identifying records in different datasets that refer to the same entity in the real world. In sensitive domains (e.g. financial accounts, hospital health records), entity resolution must meet privacy requirements to avoid revealing sensitive information such as personal identifiable information to untrusted parties. Existing solutions are either too algorithmically-specific or come with an implicit trade-off between accuracy of the computation, privacy, and run-time efficiency. We propose AMMPERE, an abstract computation model for performing universal privacy-preserving entity resolution. AMPPERE offers abstractions that encapsulate multiple algorithmic and platform-agnostic approaches using variants of Jaccard similarity to perform private data matching and entity resolution. Specifically, we show that two parties can perform entity resolution over their data, without leaking sensitive information. We rigorously compare and analyze the feasibility, performance overhead and privacy-preserving properties of these approaches on the Sharemind multi-party computation (MPC) platform as well as on PALISADE, a lattice-based homomorphic encryption library. The AMPPERE system demonstrates the efficacy of privacy-preserving entity resolution for real-world data while providing a precise characterization of the induced cost of preventing information leakage.

Cryptography and Security

Classification and Evaluation the Privacy Preserving Data Mining Techniques by using a Data Modification-based Framework

386 - MohammadReza Keyvanpour 2011

In recent years, the data mining techniques have met a serious challenge due to the increased concerning and worries of the privacy, that is, protecting the privacy of the critical and sensitive data. Different techniques and algorithms have been already presented for Privacy Preserving data mining, which could be classified in three common approaches: Data modification approach, Data sanitization approach and Secure Multi-party Computation approach. This paper presents a Data modification- based Framework for classification and evaluation of the privacy preserving data mining techniques. Based on our framework the techniques are divided into two major groups, namely perturbation approach and anonymization approach. Also in proposed framework, eight functional criteria will be used to analyze and analogically assessment of the techniques in these two major groups. The proposed framework provides a good basis for more accurate comparison of the given techniques to privacy preserving data mining. In addition, this framework allows recognizing the overlapping amount for different approaches and identifying modern approaches in this field.

Cryptography and Security