No Arabic abstract
In this work, we study how to securely evaluate the value of trading data without requiring a trusted third party. We focus on the important machine learning task of classification. This leads us to propose a provably secure four-round protocol that computes the value of the data to be traded without revealing the data to the potential acquirer. The theoretical results demonstrate a number of important properties of the proposed protocol. In particular, we prove the security of the proposed protocol in the honest-but-curious adversary model.
Elaborate protocols in Secure Multi-party Computation enable several participants to compute a public function of their own private inputs while ensuring that no undesired information leaks about the private inputs, and without resorting to any trusted third party. However, the public output of the computation inevitably leaks some information about the private inputs. Recent works have introduced a framework and proposed some techniques for quantifying such information flow. Yet, owing to their complexity, those methods do not scale to practical situations that may involve large input spaces. The main contribution of the work reported here is to formally investigate the information flow captured by the min-entropy in the particular case of secure three-party computations of affine functions in order to make its quantification scalable to realistic scenarios. To this end, we mathematically derive an explicit formula for this entropy under uniform prior beliefs about the inputs. We show that this closed-form expression can be computed in time constant in the inputs sizes and logarithmic in the coefficients of the affine function. Finally, we formulate some theoretical bounds for this privacy leak in the presence of non-uniform prior beliefs.
Privacy preserving multi-party computation has many applications in areas such as medicine and online advertisements. In this work, we propose a framework for distributed, secure machine learning among untrusted individuals. The framework consists of two parts: a two-step training protocol based on homomorphic addition and a zero knowledge proof for data validity. By combining these two techniques, our framework provides privacy of per-user data, prevents against a malicious user contributing corrupted data to the shared pool, enables each user to self-compute the results of the algorithm without relying on external trusted third parties, and requires no private channels between groups of users. We show how different ML algorithms such as Latent Dirichlet Allocation, Naive Bayes, Decision Trees etc. fit our framework for distributed, secure computing.
The purpose of Secure Multi-Party Computation is to enable protocol participants to compute a public function of their private inputs while keeping their inputs secret, without resorting to any trusted third party. However, opening the public output of such computations inevitably reveals some information about the private inputs. We propose a measure generalising both Renyi entropy and g-entropy so as to quantify this information leakage. In order to control and restrain such information flows, we introduce the notion of function substitution which replaces the computation of a function that reveals sensitive information with that of an approximate function. We exhibit theoretical bounds for the privacy gains that this approach provides and experimentally show that this enhances the confidentiality of the inputs while controlling the distortion of computed output values. Finally, we investigate the inherent compromise between accuracy of computation and privacy of inputs and we demonstrate how to realise such optimal trade-offs.
Hybrid Privacy-Preserving Neural Network (HPPNN) implementing linear layers by Homomorphic Encryption (HE) and nonlinear layers by Garbled Circuit (GC) is one of the most promising secure solutions to emerging Machine Learning as a Service (MLaaS). Unfortunately, a HPPNN suffers from long inference latency, e.g., $sim100$ seconds per image, which makes MLaaS unsatisfactory. Because HE-based linear layers of a HPPNN cost $93%$ inference latency, it is critical to select a set of HE parameters to minimize computational overhead of linear layers. Prior HPPNNs over-pessimistically select huge HE parameters to maintain large noise budgets, since they use the same set of HE parameters for an entire network and ignore the error tolerance capability of a network. In this paper, for fast and accurate secure neural network inference, we propose an automated layer-wise parameter selector, AutoPrivacy, that leverages deep reinforcement learning to automatically determine a set of HE parameters for each linear layer in a HPPNN. The learning-based HE parameter selection policy outperforms conventional rule-based HE parameter selection policy. Compared to prior HPPNNs, AutoPrivacy-optimized HPPNNs reduce inference latency by $53%sim70%$ with negligible loss of accuracy.
Federated machine learning systems have been widely used to facilitate the joint data analytics across the distributed datasets owned by the different parties that do not trust each others. In this paper, we proposed a novel Gradient Boosting Machines (GBM) framework SecureGBM built-up with a multi-party computation model based on semi-homomorphic encryption, where every involved party can jointly obtain a shared Gradient Boosting machines model while protecting their own data from the potential privacy leakage and inferential identification. More specific, our work focused on a specific dual--party secure learning scenario based on two parties -- both party own an unique view (i.e., attributes or features) to the sample group of samples while only one party owns the labels. In such scenario, feature and label data are not allowed to share with others. To achieve the above goal, we firstly extent -- LightGBM -- a well known implementation of tree-based GBM through covering its key operations for training and inference with SEAL homomorphic encryption schemes. However, the performance of such re-implementation is significantly bottle-necked by the explosive inflation of the communication payloads, based on ciphertexts subject to the increasing length of plaintexts. In this way, we then proposed to use stochastic approximation techniques to reduced the communication payloads while accelerating the overall training procedure in a statistical manner. Our experiments using the real-world data showed that SecureGBM can well secure the communication and computation of LightGBM training and inference procedures for the both parties while only losing less than 3% AUC, using the same number of iterations for gradient boosting, on a wide range of benchmark datasets.