No Arabic abstract
The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and tradeoffs for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval containing it. The proposed interval privacy mechanisms can be easily deployed through most existing survey-based data collection paradigms, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not distort it. The way of using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individual respondents. We study different theoretical aspects of the proposed privacy. In the context of supervised learning, we also offer a method such that existing supervised learning algorithms designed for point-valued data could be directly applied to learning from interval-valued data.
Privacy protection in electronic healthcare applications is an important consideration due to the sensitive nature of personal health data. Internet of Health Things (IoHT) networks have privacy requirements within a healthcare setting. However, these networks have unique challenges and security requirements (integrity, authentication, privacy and availability) must also be balanced with the need to maintain efficiency in order to conserve battery power, which can be a significant limitation in IoHT devices and networks. Data are usually transferred without undergoing filtering or optimization, and this traffic can overload sensors and cause rapid battery consumption when interacting with IoHT networks. This consequently poses restrictions on the practical implementation of these devices. As a solution to address the issues, this paper proposes a privacy-preserving two-tier data inference framework, this can conserve battery consumption by reducing the data size required to transmit through inferring the sensed data and can also protect the sensitive data from leakage to adversaries. Results from experimental evaluations on privacy show the validity of the proposed scheme as well as significant data savings without compromising the accuracy of the data transmission, which contributes to energy efficiency of IoHT sensor devices.
Local Differential Privacy (LDP) is popularly used in practice for privacy-preserving data collection. Although existing LDP protocols offer high utility for large user populations (100,000 or more users), they perform poorly in scenarios with small user populations (such as those in the cybersecurity domain) and lack perturbation mechanisms that are effective for both ordinal and non-ordinal item sequences while protecting sequence length and content simultaneously. In this paper, we address the small user population problem by introducing the concept of Condensed Local Differential Privacy (CLDP) as a specialization of LDP, and develop a suite of CLDP protocols that offer desirable statistical utility while preserving privacy. Our protocols support different types of client data, ranging from ordinal data types in finite metric spaces (numeric malware infection statistics), to non-ordinal items (O
In recent years, the data mining techniques have met a serious challenge due to the increased concerning and worries of the privacy, that is, protecting the privacy of the critical and sensitive data. Different techniques and algorithms have been already presented for Privacy Preserving data mining, which could be classified in three common approaches: Data modification approach, Data sanitization approach and Secure Multi-party Computation approach. This paper presents a Data modification- based Framework for classification and evaluation of the privacy preserving data mining techniques. Based on our framework the techniques are divided into two major groups, namely perturbation approach and anonymization approach. Also in proposed framework, eight functional criteria will be used to analyze and analogically assessment of the techniques in these two major groups. The proposed framework provides a good basis for more accurate comparison of the given techniques to privacy preserving data mining. In addition, this framework allows recognizing the overlapping amount for different approaches and identifying modern approaches in this field.
Differential privacy offers a formal framework for reasoning about privacy and accuracy of computations on private data. It also offers a rich set of building blocks for constructing data analyses. When carefully calibrated, these analyses simultaneously guarantee privacy of the individuals contributing their data, and accuracy of their results for inferring useful properties about the population. The compositional nature of differential privacy has motivated the design and implementation of several programming languages aimed at helping a data analyst in programming differentially private analyses. However, most of the programming languages for differential privacy proposed so far provide support for reasoning about privacy but not for reasoning about the accuracy of data analyses. To overcome this limitation, in this work we present DPella, a programming framework providing data analysts with support for reasoning about privacy, accuracy and their trade-offs. The distinguishing feature of DPella is a novel component which statically tracks the accuracy of different data analyses. In order to make tighter accuracy estimations, this component leverages taint analysis for automatically inferring statistical independence of the different noise quantities added for guaranteeing privacy. We show the flexibility of our approach by not only implementing classical counting queries (e.g., CDFs) but also by analyzing hierarchical counting queries (like those done by Census Bureaus), where accuracy have different constraints per level and data analysts should figure out the best manner to calibrate privacy to meet the accuracy requirements.
Our behavior (the way we talk, walk, or think) is unique and can be used as a biometric trait. It also correlates with sensitive attributes like emotions. Hence, techniques to protect individuals privacy against unwanted inferences are required. To consolidate knowledge in this area, we systematically reviewed applicable anonymization techniques. We taxonomize and compare existing solutions regarding privacy goals, conceptual operation, advantages, and limitations. Our analysis shows that some behavioral traits (e.g., voice) have received much attention, while others (e.g., eye-gaze, brainwaves) are mostly neglected. We also find that the evaluation methodology of behavioral anonymization techniques can be further improved.