No Arabic abstract
With the widespread use of LBSs (Location-based Services), synthesizing location traces plays an increasingly important role in analyzing spatial big data while protecting user privacy. In particular, a synthetic trace that preserves a feature specific to a cluster of users (e.g., those who commute by train, those who go shopping) is important for various geo-data analysis tasks and for providing a synthetic location dataset. Although location synthesizers have been widely studied, existing synthesizers do not provide sufficient utility, privacy, or scalability, hence are not practical for large-scale location traces. To overcome this issue, we propose a novel location synthesizer called PPMTF (Privacy-Preserving Multiple Tensor Factorization). We model various statistical features of the original traces by a transition-count tensor and a visit-count tensor. We factorize these two tensors simultaneously via multiple tensor factorization, and train factor matrices via posterior sampling. Then we synthesize traces from reconstructed tensors, and perform a plausible deniability test for a synthetic trace. We comprehensively evaluate PPMTF using two datasets. Our experimental results show that PPMTF preserves various statistical features including cluster-specific features, protects user privacy, and synthesizes large-scale location traces in practical time. PPMTF also significantly outperforms the state-of-the-art methods in terms of utility and scalability at the same level of privacy.
Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: an output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
Activity-tracking applications and location-based services using short-range communication (SRC) techniques have been abruptly demanded in the COVID-19 pandemic, especially for automated contact tracing. The attention from both public and policy keeps raising on related practical problems, including textit{1) how to protect data security and location privacy? 2) how to efficiently and dynamically deploy SRC Internet of Thing (IoT) witnesses to monitor large areas?} To answer these questions, in this paper, we propose a decentralized and permissionless blockchain protocol, named textit{Bychain}. Specifically, 1) a privacy-preserving SRC protocol for activity-tracking and corresponding generalized block structure is developed, by connecting an interactive zero-knowledge proof protocol and the key escrow mechanism. As a result, connections between personal identity and the ownership of on-chain location information are decoupled. Meanwhile, the owner of the on-chain location data can still claim its ownership without revealing the private key to anyone else. 2) An artificial potential field-based incentive allocation mechanism is proposed to incentivize IoT witnesses to pursue the maximum monitoring coverage deployment. We implemented and evaluated the proposed blockchain protocol in the real-world using the Bluetooth 5.0. The storage, CPU utilization, power consumption, time delay, and security of each procedure and performance of activities are analyzed. The experiment and security analysis is shown to provide a real-world performance evaluation.
In Near-Neighbor Search (NNS), a new client queries a database (held by a server) for the most similar data (near-neighbors) given a certain similarity metric. The Privacy-Preserving variant (PP-NNS) requires that neither server nor the client shall learn information about the other partys data except what can be inferred from the outcome of NNS. The overwhelming growth in the size of current datasets and the lack of a truly secure server in the online world render the existing solutions impractical; either due to their high computational requirements or non-realistic assumptions which potentially compromise privacy. PP-NNS having query time {it sub-linear} in the size of the database has been suggested as an open research direction by Li et al. (CCSW15). In this paper, we provide the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties. At the heart of our proposal lies a secure binary embedding scheme generated from a novel probabilistic transformation over locality sensitive hashing family. We provide information theoretic bound for the privacy guarantees and support our theoretical claims using substantial empirical evidence on real-world datasets.
In this paper, we study the privacy-preserving task assignment in spatial crowdsourcing, where the locations of both workers and tasks, prior to their release to the server, are perturbed with Geo-Indistinguishability (a differential privacy notion for location-based systems). Different from the previously studied online setting, where each task is assigned immediately upon arrival, we target the batch-based setting, where the server maximizes the number of successfully assigned tasks after a batch of tasks arrive. To achieve this goal, we propose the k-Switch solution, which first divides the workers into small groups based on the perturbed distance between workers/tasks, and then utilizes Homomorphic Encryption (HE) based secure computation to enhance the task assignment. Furthermore, we expedite HE-based computation by limiting the size of the small groups under k. Extensive experiments demonstrate that, in terms of the number of successfully assigned tasks, the k-Switch solution improves batch-based baselines by 5.9X and the existing online solution by 1.74X, with no privacy leak.
The prevalence of e-commerce has made detailed customers personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When involving personalized information, how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over $T$ time periods with an emph{unknown} demand function of posted price and personalized information. At each time $t$, the retailer observes an arriving customers personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third party agent might infer the personalized information and purchase decisions from price changes from the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customers information and purchasing decisions. To this end, we first introduce a notion of emph{anticipating} $(varepsilon, delta)$-differential privacy that is tailored to dynamic pricing problem. Our policy achieves both the privacy guarantee and the performance guarantee in terms of regret. Roughly speaking, for $d$-dimensional personalized information, our algorithm achieves the expected regret at the order of $tilde{O}(varepsilon^{-1} sqrt{d^3 T})$, when the customers information is adversarially chosen. For stochastic personalized information, the regret bound can be further improved to $tilde{O}(sqrt{d^2T} + varepsilon^{-2} d^2)$