No Arabic abstract
The open data movement is leading to the massive publishing of court records online, increasing transparency and accessibility of justice, and to the design of legal technologies building on the wealth of legal data available. However, the sensitive nature of legal decisions also raises important privacy issues. Current practices solve the resulting privacy versus transparency trade-off by combining access control with (manual or semi-manual) text redaction. In this work, we claim that current practices are insufficient for coping with massive access to legal data (restrictive access control policies is detrimental to openness and to utility while text redaction is unable to provide sound privacy protection) and advocate for a in-tegrative approach that could benefit from the latest developments of the privacy-preserving data publishing domain. We present a thorough analysis of the problem and of the current approaches, and propose a straw man multimodal architecture paving the way to a full-fledged privacy-preserving legal data publishing system.
Many socially valuable activities depend on sensitive information, such as medical research, public health policies, political coordination, and personalized digital services. This is often posed as an inherent privacy trade-off: we can benefit from data analysis or retain data privacy, but not both. Across several disciplines, a vast amount of effort has been directed toward overcoming this trade-off to enable productive uses of information without also enabling undesired misuse, a goal we term `structured transparency. In this paper, we provide an overview of the frontier of research seeking to develop structured transparency. We offer a general theoretical framework and vocabulary, including characterizing the fundamental components -- input privacy, output privacy, input verification, output verification, and flow governance -- and fundamental problems of copying, bundling, and recursive oversight. We argue that these barriers are less fundamental than they often appear. Recent progress in developing `privacy-enhancing technologies (PETs), such as secure computation and federated learning, may substantially reduce lingering use-misuse trade-offs in a number of domains. We conclude with several illustrations of structured transparency -- in open research, energy management, and credit scoring systems -- and a discussion of the risks of misuse of these tools.
The rapid growth in digital data forms the basis for a wide range of new services and research, e.g, large-scale medical studies. At the same time, increasingly restrictive privacy concerns and laws are leading to significant overhead in arranging for sharing or combining different data sets to obtain these benefits. For new applications, where the benefit of combined data is not yet clear, this overhead can inhibit organizations from even trying to determine whether they can mutually benefit from sharing their data. In this paper, we discuss techniques to overcome this difficulty by employing private information transfer to determine whether there is a benefit from sharing data, and whether there is room to negotiate acceptable prices. These techniques involve cryptographic protocols. While currently considered secure, these protocols are potentially vulnerable to the development of quantum technology, particularly for ensuring privacy over significant periods of time into the future. To mitigate this concern, we describe how developments in practical quantum technology can improve the security of these protocols.
Since the global spread of Covid-19 began to overwhelm the attempts of governments to conduct manual contact-tracing, there has been much interest in using the power of mobile phones to automate the contact-tracing process through the development of exposure notification applications. The rough idea is simple: use Bluetooth or other data-exchange technologies to record contacts between users, enable users to report positive diagnoses, and alert users who have been exposed to sick users. Of course, there are many privacy concerns associated with this idea. Much of the work in this area has been concerned with designing mechanisms for tracing contacts and alerting users that do not leak additional information about users beyond the existence of exposure events. However, although designing practical protocols is of crucial importance, it is essential to realize that notifying users about exposure events may itself leak confidential information (e.g. that a particular contact has been diagnosed). Luckily, while digital contact tracing is a relatively new task, the generic problem of privacy and data disclosure has been studied for decades. Indeed, the framework of differential privacy further permits provable query privacy by adding random noise. In this article, we translate two results from statistical privacy and social recommendation algorithms to exposure notification. We thus prove some naive bounds on the degree to which accuracy must be sacrificed if exposure notification frameworks are to be made more private through the injection of noise.
The purpose of Secure Multi-Party Computation is to enable protocol participants to compute a public function of their private inputs while keeping their inputs secret, without resorting to any trusted third party. However, opening the public output of such computations inevitably reveals some information about the private inputs. We propose a measure generalising both Renyi entropy and g-entropy so as to quantify this information leakage. In order to control and restrain such information flows, we introduce the notion of function substitution which replaces the computation of a function that reveals sensitive information with that of an approximate function. We exhibit theoretical bounds for the privacy gains that this approach provides and experimentally show that this enhances the confidentiality of the inputs while controlling the distortion of computed output values. Finally, we investigate the inherent compromise between accuracy of computation and privacy of inputs and we demonstrate how to realise such optimal trade-offs.
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contacts identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the users phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the users phone. Other users apps can use data from the server to locally estimate whether the devices owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.