No Arabic abstract
Many socially valuable activities depend on sensitive information, such as medical research, public health policies, political coordination, and personalized digital services. This is often posed as an inherent privacy trade-off: we can benefit from data analysis or retain data privacy, but not both. Across several disciplines, a vast amount of effort has been directed toward overcoming this trade-off to enable productive uses of information without also enabling undesired misuse, a goal we term `structured transparency. In this paper, we provide an overview of the frontier of research seeking to develop structured transparency. We offer a general theoretical framework and vocabulary, including characterizing the fundamental components -- input privacy, output privacy, input verification, output verification, and flow governance -- and fundamental problems of copying, bundling, and recursive oversight. We argue that these barriers are less fundamental than they often appear. Recent progress in developing `privacy-enhancing technologies (PETs), such as secure computation and federated learning, may substantially reduce lingering use-misuse trade-offs in a number of domains. We conclude with several illustrations of structured transparency -- in open research, energy management, and credit scoring systems -- and a discussion of the risks of misuse of these tools.
Since the global spread of Covid-19 began to overwhelm the attempts of governments to conduct manual contact-tracing, there has been much interest in using the power of mobile phones to automate the contact-tracing process through the development of exposure notification applications. The rough idea is simple: use Bluetooth or other data-exchange technologies to record contacts between users, enable users to report positive diagnoses, and alert users who have been exposed to sick users. Of course, there are many privacy concerns associated with this idea. Much of the work in this area has been concerned with designing mechanisms for tracing contacts and alerting users that do not leak additional information about users beyond the existence of exposure events. However, although designing practical protocols is of crucial importance, it is essential to realize that notifying users about exposure events may itself leak confidential information (e.g. that a particular contact has been diagnosed). Luckily, while digital contact tracing is a relatively new task, the generic problem of privacy and data disclosure has been studied for decades. Indeed, the framework of differential privacy further permits provable query privacy by adding random noise. In this article, we translate two results from statistical privacy and social recommendation algorithms to exposure notification. We thus prove some naive bounds on the degree to which accuracy must be sacrificed if exposure notification frameworks are to be made more private through the injection of noise.
The open data movement is leading to the massive publishing of court records online, increasing transparency and accessibility of justice, and to the design of legal technologies building on the wealth of legal data available. However, the sensitive nature of legal decisions also raises important privacy issues. Current practices solve the resulting privacy versus transparency trade-off by combining access control with (manual or semi-manual) text redaction. In this work, we claim that current practices are insufficient for coping with massive access to legal data (restrictive access control policies is detrimental to openness and to utility while text redaction is unable to provide sound privacy protection) and advocate for a in-tegrative approach that could benefit from the latest developments of the privacy-preserving data publishing domain. We present a thorough analysis of the problem and of the current approaches, and propose a straw man multimodal architecture paving the way to a full-fledged privacy-preserving legal data publishing system.
Contact tracing is an essential tool for public health officials and local communities to fight the spread of novel diseases, such as for the COVID-19 pandemic. The Singaporean government just released a mobile phone app, TraceTogether, that is designed to assist health officials in tracking down exposures after an infected individual is identified. However, there are important privacy implications of the existence of such tracking apps. Here, we analyze some of those implications and discuss ways of ameliorating the privacy concerns without decreasing usefulness to public health. We hope in writing this document to ensure that privacy is a central feature of conversations surrounding mobile contact tracing apps and to encourage community efforts to develop alternative effective solutions with stronger privacy protection for the users. Importantly, though we discuss potential modifications, this document is not meant as a formal research paper, but instead is a response to some of the privacy characteristics of direct contact tracing apps like TraceTogether and an early-stage Request for Comments to the community. Date written: 2020-03-24 Minor correction: 2020-03-30
We study privacy-utility trade-offs where users share privacy-correlated useful information with a service provider to obtain some utility. The service provider is adversarial in the sense that it can infer the users private information based on the shared useful information. To minimize the privacy leakage while maintaining a desired level of utility, the users carefully perturb the useful information via a probabilistic privacy mapping before sharing it. We focus on the setting in which the adversary attempting an inference attack on the users privacy has potentially biased information about the statistical correlation between the private and useful variables. This information asymmetry between the users and the limited adversary leads to better privacy guarantees than the case of the omniscient adversary under the same utility requirement. We first identify assumptions on the adversarys information so that the inference costs are well-defined and finite. Then, we characterize the impact of the information asymmetry and show that it increases the inference costs for the adversary. We further formulate the design of the privacy mapping against a limited adversary using a difference of convex functions program and solve it via the concave-convex procedure. When the adversarys information is not precisely available, we adopt a Bayesian view and represent the adversarys information by a probability distribution. In this case, the expected cost for the adversary does not admit a closed-form expression, and we establish and maximize a lower bound of the expected cost. We provide a numerical example regarding a census data set to illustrate the theoretical results.
How to contain the spread of the COVID-19 virus is a major concern for most countries. As the situation continues to change, various countries are making efforts to reopen their economies by lifting some restrictions and enforcing new measures to prevent the spread. In this work, we review some approaches that have been adopted to contain the COVID-19 virus such as contact tracing, clusters identification, movement restrictions, and status validation. Specifically, we classify available techniques based on some characteristics such as technology, architecture, trade-offs (privacy vs utility), and the phase of adoption. We present a novel approach for evaluating privacy using both qualitative and quantitative measures of privacy-utility assessment of contact tracing applications. In this new method, we classify utility at three (3) distinct levels: no privacy, 100% privacy, and at k where k is set by the system providing the utility or privacy.