No Arabic abstract
In recent years, the data mining techniques have met a serious challenge due to the increased concerning and worries of the privacy, that is, protecting the privacy of the critical and sensitive data. Different techniques and algorithms have been already presented for Privacy Preserving data mining, which could be classified in three common approaches: Data modification approach, Data sanitization approach and Secure Multi-party Computation approach. This paper presents a Data modification- based Framework for classification and evaluation of the privacy preserving data mining techniques. Based on our framework the techniques are divided into two major groups, namely perturbation approach and anonymization approach. Also in proposed framework, eight functional criteria will be used to analyze and analogically assessment of the techniques in these two major groups. The proposed framework provides a good basis for more accurate comparison of the given techniques to privacy preserving data mining. In addition, this framework allows recognizing the overlapping amount for different approaches and identifying modern approaches in this field.
With mobile phone penetration rates reaching 90%, Consumer Proprietary Network Information (CPNI) can offer extremely valuable information to different sectors, including policymakers. Indeed, as part of CPNI, Call Detail Records have been successfully used to provide real-time traffic information, to improve our understanding of the dynamics of peoples mobility and so to allow prevention and measures in fighting infectious diseases, and to offer population statistics. While there is no doubt of the usefulness of CPNI data, privacy concerns regarding sharing individuals data have prevented it from being used to its full potential. Traditional de-anonymization measures, such as pseudonymization and standard de-identification, have been shown to be insufficient to protect privacy. This has been specifically shown on mobile phone datasets. As an example, researchers have shown that with only four data points of approximate place and time information of a user, 95% of users could be re-identified in a dataset of 1.5 million mobile phone users. In this landscape paper, we will discuss the state-of-the-art anonymization techniques and their shortcomings.
Trusted execution environments (TEE) such as Intels Software Guard Extension (SGX) have been widely studied to boost security and privacy protection for the computation of sensitive data such as human genomics. However, a performance hurdle is often generated by SGX, especially from the small enclave memory. In this paper, we propose a new Hybrid Secured Flow framework (called HySec-Flow) for large-scale genomic data analysis using SGX platforms. Here, the data-intensive computing tasks can be partitioned into independent subtasks to be deployed into distinct secured and non-secured containers, therefore allowing for parallel execution while alleviating the limited size of Page Cache (EPC) memory in each enclave. We illustrate our contributions using a workflow supporting indexing, alignment, dispatching, and merging the execution of SGX- enabled containers. We provide details regarding the architecture of the trusted and untrusted components and the underlying Scorn and Graphene support as generic shielding execution frameworks to port legacy code. We thoroughly evaluate the performance of our privacy-preserving reads mapping algorithm using real human genome sequencing data. The results demonstrate that the performance is enhanced by partitioning the time-consuming genomic computation into subtasks compared to the conventional execution of the data-intensive reads mapping algorithm in an enclave. The proposed HySec-Flow framework is made available as an open-source and adapted to the data-parallel computation of other large-scale genomic tasks requiring security and scalable computational resources.
Data markets have the potential to foster new data-driven applications and help growing data-driven businesses. When building and deploying such markets in practice, regulations such as the European Unions General Data Protection Regulation (GDPR) impose constraints and restrictions on these markets especially when dealing with personal or privacy-sensitive data. In this paper, we present a candidate architecture for a privacy-preserving personal data market, relying on cryptographic primitives such as multi-party computation (MPC) capable of performing privacy-preserving computations on the data. Besides specifying the architecture of such a data market, we also present a privacy-risk analysis of the market following the LINDDUN methodology.
Privacy protection in electronic healthcare applications is an important consideration due to the sensitive nature of personal health data. Internet of Health Things (IoHT) networks have privacy requirements within a healthcare setting. However, these networks have unique challenges and security requirements (integrity, authentication, privacy and availability) must also be balanced with the need to maintain efficiency in order to conserve battery power, which can be a significant limitation in IoHT devices and networks. Data are usually transferred without undergoing filtering or optimization, and this traffic can overload sensors and cause rapid battery consumption when interacting with IoHT networks. This consequently poses restrictions on the practical implementation of these devices. As a solution to address the issues, this paper proposes a privacy-preserving two-tier data inference framework, this can conserve battery consumption by reducing the data size required to transmit through inferring the sensed data and can also protect the sensitive data from leakage to adversaries. Results from experimental evaluations on privacy show the validity of the proposed scheme as well as significant data savings without compromising the accuracy of the data transmission, which contributes to energy efficiency of IoHT sensor devices.
In the big data era, more and more cloud-based data-driven applications are developed that leverage individual data to provide certain valuable services (the utilities). On the other hand, since the same set of individual data could be utilized to infer the individuals certain sensitive information, it creates new channels to snoop the individuals privacy. Hence it is of great importance to develop techniques that enable the data owners to release privatized data, that can still be utilized for certain premised intended purpose. Existing data releasing approaches, however, are either privacy-emphasized (no consideration on utility) or utility-driven (no guarantees on privacy). In this work, we propose a two-step perturbation-based utility-aware privacy-preserving data releasing framework. First, certain predefined privacy and utility problems are learned from the public domain data (background knowledge). Later, our approach leverages the learned knowledge to precisely perturb the data owners data into privatized data that can be successfully utilized for certain intended purpose (learning to succeed), without jeopardizing certain predefined privacy (training to fail). Extensive experiments have been conducted on Human Activity Recognition, Census Income and Bank Marketing datasets to demonstrate the effectiveness and practicality of our framework.