Do you want to publish a course? Click here

Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based Enterprise Search Solutions

84   0   0.0 ( 0 )
 Added by Mohsen Amini Salehi
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big data (eg, credential documents) are reluctant to fully embrace such services, due to valid concerns about data privacy. Solutions based on client-side encryption have been explored to mitigate privacy concerns. Nonetheless, such solutions hinder data processing, specifically clustering, which is pivotal in dealing with different forms of big data. For instance, clustering is critical to limit the search space and perform real-time search operations on big datasets. To overcome the hindrance in clustering encrypted big data, we propose privacy-preserving clustering schemes for three forms of unstructured encrypted big datasets, namely static, semi-dynamic, and dynamic datasets. To preserve data privacy, the proposed clustering schemes function based on statistical characteristics of the data and determine (A) the suitable number of clusters and (B) appropriate content for each cluster. Experimental results obtained from evaluating the clustering schemes on three different datasets demonstrate between 30% to 60% improvement on the clusters coherency compared to other clustering schemes for encrypted data. Employing the clustering schemes in a privacy-preserving enterprise search system decreases its search time by up to 78%, while increases the search accuracy by up to 35%.



rate research

Read More

Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many data analytics applications, such as search systems. To overcome the limitation, in this paper, we present an approach named ClustCrypt for efficient topic-based clustering of encrypted unstructured big data in the cloud. ClustCrypt dynamically estimates the optimal number of clusters based on the statistical characteristics of encrypted data. It also provides clustering approach for encrypted data. We deploy ClustCrypt within the context of a secure cloud-based semantic search system (S3BD). Experimental results obtained from evaluating ClustCrypt on three datasets demonstrate on average 60% improvement on clusters coherency. ClustCrypt also decreases the search-time overhead by up to 78% and increases the accuracy of search results by up to 35%
Big Data is used by data miner for analysis purpose which may contain sensitive information. During the procedures it raises certain privacy challenges for researchers. The existing privacy preserving methods use different algorithms that results into limitation of data reconstruction while securing the sensitive data. This paper presents a clustering based privacy preservation probabilistic model of big data to secure sensitive information..model to attain minimum perturbation and maximum privacy. In our model, sensitive information is secured after identifying the sensitive data from data clusters to modify or generalize it.The resulting dataset is analysed to calculate the accuracy level of our model in terms of hidden data, lossed data as result of reconstruction. Extensive experiements are carried out in order to demonstrate the results of our proposed model. Clustering based Privacy preservation of individual data in big data with minimum perturbation and successful reconstruction highlights the significance of our model in addition to the use of standard performance evaluation measures.
185 - Rulin Shao , Hongyu He , Hui Liu 2019
Artificial neural network has achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process. And further improvement could be achieved by tuning the pruning rate. Our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR.
66 - Shuo Chen , Rongxing Lu , 2017
The singular value decomposition (SVD) is a widely used matrix factorization tool which underlies plenty of useful applications, e.g. recommendation system, abnormal detection and data compression. Under the environment of emerging Internet of Things (IoT), there would be an increasing demand for data analysis to better humans lives and create new economic growth points. Moreover, due to the large scope of IoT, most of the data analysis work should be done in the network edge, i.e. handled by fog computing. However, the devices which provide fog computing may not be trustable while the data privacy is often the significant concern of the IoT application users. Thus, when performing SVD for data analysis purpose, the privacy of user data should be preserved. Based on the above reasons, in this paper, we propose a privacy-preserving fog computing framework for SVD computation. The security and performance analysis shows the practicability of the proposed framework. Furthermore, since different applications may utilize the result of SVD operation in different ways, three applications with different objectives are introduced to show how the framework could flexibly achieve the purposes of different applications, which indicates the flexibility of the design.
The outbreak of COVID-19 pandemic has exposed an urgent need for effective contact tracing solutions through mobile phone applications to prevent the infection from spreading further. However, due to the nature of contact tracing, public concern on privacy issues has been a bottleneck to the existing solutions, which is significantly affecting the uptake of contact tracing applications across the globe. In this paper, we present a blockchain-enabled privacy-preserving contact tracing scheme: BeepTrace, where we propose to adopt blockchain bridging the user/patient and the authorized solvers to desensitize the user ID and location information. Compared with recently proposed contract tracing solutions, our approach shows higher security and privacy with the additional advantages of being battery friendly and globally accessible. Results show viability in terms of the required resource at both server and mobile phone perspectives. Through breaking the privacy concerns of the public, the proposed BeepTrace solution can provide a timely framework for authorities, companies, software developers and researchers to fast develop and deploy effective digital contact tracing applications, to conquer COVID-19 pandemic soon. Meanwhile, the open initiative of BeepTrace allows worldwide collaborations, integrate existing tracing and positioning solutions with the help of blockchain technology.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا