No Arabic abstract
Due to increasing concerns of data privacy, databases are being encrypted before they are stored on an untrusted server. To enable search operations on the encrypted data, searchable encryption techniques have been proposed. Representative schemes use order-preserving encryption (OPE) for supporting efficient Boolean queries on encrypted databases. Yet, recent works showed the possibility of inferring plaintext data from OPE-encrypted databases, merely using the order-preserving constraints, or combined with an auxiliary plaintext dataset with similar frequency distribution. So far, the effectiveness of such attacks is limited to single-dimensional dense data (most values from the domain are encrypted), but it remains challenging to achieve it on high-dimensional datasets (e.g., spatial data) which are often sparse in nature. In this paper, for the first time, we study data inference attacks on multi-dimensional encrypted databases (with 2-D as a special case). We formulate it as a 2-D order-preserving matching problem and explore both unweighted and weighted cases, where the former maximizes the number of points matched using only order information and the latter further considers points with similar frequencies. We prove that the problem is NP-hard, and then propose a greedy algorithm, along with a polynomial-time algorithm with approximation guarantees. Experimental results on synthetic and real-world datasets show that the data recovery rate is significantly enhanced compared with the previous 1-D matching algorithm.
Databases can leak confidential information when users combine query results with probabilistic data dependencies and prior knowledge. Current research offers mechanisms that either handle a limited class of dependencies or lack tractable enforcement algorithms. We propose a foundation for Database Inference Control based on ProbLog, a probabilistic logic programming language. We leverage this foundation to develop Angerona, a provably secure enforcement mechanism that prevents information leakage in the presence of probabilistic dependencies. We then provide a tractable inference algorithm for a practically relevant fragment of ProbLog. We empirically evaluate Angeronas performance showing that it scales to relevant security-critical problems.
Data protection algorithms are becoming increasingly important to support modern business needs for facilitating data sharing and data monetization. Anonymization is an important step before data sharing. Several organizations leverage on third parties for storing and managing data. However, third parties are often not trusted to store plaintext personal and sensitive data; data encryption is widely adopted to protect against intentional and unintentional attempts to read personal/sensitive data. Traditional encryption schemes do not support operations over the ciphertexts and thus anonymizing encrypted datasets is not feasible with current approaches. This paper explores the feasibility and depth of implementing a privacy-preserving data publishing workflow over encrypted datasets leveraging on homomorphic encryption. We demonstrate how we can achieve uniqueness discovery, data masking, differential privacy and k-anonymity over encrypted data requiring zero knowledge about the original values. We prove that the security protocols followed by our approach provide strong guarantees against inference attacks. Finally, we experimentally demonstrate the performance of our data publishing workflow components.
As cloud computing becomes prevalent in recent years, more and more enterprises and individuals outsource their data to cloud servers. To avoid privacy leaks, outsourced data usually is encrypted before being sent to cloud servers, which disables traditional search schemes for plain text. To meet both end of security and searchability, search-supported encryption is proposed. However, many previous schemes suffer severe vulnerability when typos and semantic diversity exist in query requests. To overcome such flaw, higher error-tolerance is always expected for search-supported encryption design, sometimes defined as fuzzy search. In this paper, we propose a new scheme of multi-keyword fuzzy search over encrypted and outsourced data. Our approach introduces a new mechanism to map a natural language expression into a word-vector space. Compared with previous approaches, our design shows higher robustness when multiple kinds of typos are involved. Besides, our approach is enhanced with novel data structures to improve search efficiency. These two innovations can work well for both accuracy and efficiency. Moreover, these designs will not hurt the fundamental security. Experiments on a real-world dataset demonstrate the effectiveness of our proposed approach, which outperforms currently popular approaches focusing on similar tasks.
Traffic inspection is a fundamental building block of many security solutions today. For example, to prevent the leakage or exfiltration of confidential insider information, as well as to block malicious traffic from entering the network, most enterprises today operate intrusion detection and prevention systems that inspect traffic. However, the state-of-the-art inspection systems do not reflect well the interests of the different involved autonomous roles. For example, employees in an enterprise, or a company outsourcing its network management to a specialized third party, may require that their traffic remains confidential, even from the system administrator. Moreover, the rules used by the intrusion detection system, or more generally the configuration of an online or offline anomaly detection engine, may be provided by a third party, e.g., a security research firm, and can hence constitute a critical business asset which should be kept confidential. Today, it is often believed that accounting for these additional requirements is impossible, as they contradict efficiency and effectiveness. We in this paper explore a novel approach, called Privacy Preserving Inspection (PRI), which provides a solution to this problem, by preserving privacy of traffic inspection and confidentiality of inspection rules and configurations, and e.g., also supports the flexible installation of additional Data Leak Prevention (DLP) rules specific to the company.
Adoption of artificial intelligence medical imaging applications is often impeded by barriers between healthcare systems and algorithm developers given that access to both private patient data and commercial model IP is important to perform pre-deployment evaluation. This work investigates a framework for secure, privacy-preserving and AI-enabled medical imaging inference using CrypTFlow2, a state-of-the-art end-to-end compiler allowing cryptographically secure 2-party Computation (2PC) protocols between the machine learning model vendor and target patient data owner. A common DenseNet-121 chest x-ray diagnosis model was evaluated on multi-institutional chest radiographic imaging datasets both with and without CrypTFlow2 on two test sets spanning seven sites across the US and India, and comprising 1,149 chest x-ray images. We measure comparative AUROC performance between secure and insecure inference in multiple pathology classification tasks, and explore model output distributional shifts and resource constraints introduced by secure model inference. Secure inference with CrypTFlow2 demonstrated no significant difference in AUROC for all diagnoses, and model outputs from secure and insecure inference methods were distributionally equivalent. The use of CrypTFlow2 may allow off-the-shelf secure 2PC between healthcare systems and AI model vendors for medical imaging, without changes in performance, and can facilitate scalable pre-deployment infrastructure for real-world secure model evaluation without exposure to patient data or model IP.