ﻻ يوجد ملخص باللغة العربية
We address the problem of how to obfuscate texts by removing stylistic clues which can identify authorship, whilst preserving (as much as possible) the content of the text. In this paper we combine ideas from generalised differential privacy and machine learning techniques for text processing to model privacy for text documents. We define a privacy mechanism that operates at the level of text documents represented as bags-of-words - these representations are typical in machine learning and contain sufficient information to carry out many kinds of classification tasks including topic identification and authorship attribution (of the original documents). We show that our mechanism satisfies privacy with respect to a metric for semantic similarity, thereby providing a balance between utility, defined by the semantic content of texts, with the obfuscation of stylistic clues. We demonstrate our implementation on a fan fiction dataset, confirming that it is indeed possible to disguise writing style effectively whilst preserving enough information and variation for accurate content classification tasks.
The problem of obfuscating the authorship of a text document has received little attention in the literature to date. Current approaches are ad-hoc and rely on assumptions about an adversarys auxiliary knowledge which makes it difficult to reason abo
In this rejoinder, we aim to address two broad issues that cover most comments made in the discussion. First, we discuss some theoretical aspects of our work and comment on how this work might impact the theoretical foundation of privacy-preserving d
The wide deployment of machine learning in recent years gives rise to a great demand for large-scale and high-dimensional data, for which the privacy raises serious concern. Differential privacy (DP) mechanisms are conventionally developed for scalar
We develop two notions of instance optimality in differential privacy, inspired by classical statistical theory: one by defining a local minimax risk and the other by considering unbiased mechanisms and analogizing the Cramer-Rao bound, and we show t
Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to