Multi-Party Dual Learning

146 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yuan Gao

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Maoguo Gong - Yuan Gao - Yu Xie

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The performance of machine learning algorithms heavily relies on the availability of a large amount of training data. However, in reality, data usually reside in distributed parties such as different institutions and may not be directly gathered and integrated due to various data policy constraints. As a result, some parties may suffer from insufficient data available for training machine learning models. In this paper, we propose a multi-party dual learning (MPDL) framework to alleviate the problem of limited data with poor quality in an isolated party. Since the knowledge sharing processes for multiple parties always emerge in dual forms, we show that dual learning is naturally suitable to handle the challenge of missing data, and explicitly exploits the probabilistic correlation and structural relationship between dual tasks to regularize the training process. We introduce a feature-oriented differential privacy with mathematical proof, in order to avoid possible privacy leakage of raw features in the dual inference process. The approach requires minimal modifications to the existing multi-party learning structure, and each party can build flexible and powerful models separately, whose accuracy is no less than non-distributed self-learning approaches. The MPDL framework achieves significant improvement compared with state-of-the-art multi-party learning methods, as we demonstrated through simulations on real-world datasets.

قيم البحث

84 - Brian Knott , Shobha Venkataraman , Awni Hannun 2021

Secure multi-party computation (MPC) allows parties to perform computations on data while keeping that data private. This capability has great potential for machine-learning applications: it facilitates training of machine-learning models on private data sets owned by different parties, evaluation of one partys private model using another partys private data, etc. Although a range of studies implement machine-learning models via secure MPC, such implementations are not yet mainstream. Adoption of secure MPC is hampered by the absence of flexible software frameworks that speak the language of machine-learning researchers and engineers. To foster adoption of secure MPC in machine learning, we present CrypTen: a software framework that exposes popular secure MPC primitives via abstractions that are common in modern machine-learning frameworks, such as tensor computations, automatic differentiation, and modular neural networks. This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification. Our benchmarks show that CrypTens GPU support and high-performance communication between (an arbitrary number of) parties allows it to perform efficient private evaluation of modern machine-learning models under a semi-honest threat model. For example, two parties using CrypTen can securely predict phonemes in speech recordings using Wav2Letter faster than real-time. We hope that CrypTen will spur adoption of secure MPC in the machine-learning community.

التعلم الآلي التشفير والأمن

SecureGBM: Secure Multi-Party Gradient Boosting

164 - Zhi Fengy , Haoyi Xiong , Chuanyuan Song 2019

Federated machine learning systems have been widely used to facilitate the joint data analytics across the distributed datasets owned by the different parties that do not trust each others. In this paper, we proposed a novel Gradient Boosting Machine s (GBM) framework SecureGBM built-up with a multi-party computation model based on semi-homomorphic encryption, where every involved party can jointly obtain a shared Gradient Boosting machines model while protecting their own data from the potential privacy leakage and inferential identification. More specific, our work focused on a specific dual--party secure learning scenario based on two parties -- both party own an unique view (i.e., attributes or features) to the sample group of samples while only one party owns the labels. In such scenario, feature and label data are not allowed to share with others. To achieve the above goal, we firstly extent -- LightGBM -- a well known implementation of tree-based GBM through covering its key operations for training and inference with SEAL homomorphic encryption schemes. However, the performance of such re-implementation is significantly bottle-necked by the explosive inflation of the communication payloads, based on ciphertexts subject to the increasing length of plaintexts. In this way, we then proposed to use stochastic approximation techniques to reduced the communication payloads while accelerating the overall training procedure in a statistical manner. Our experiments using the real-world data showed that SecureGBM can well secure the communication and computation of LightGBM training and inference procedures for the both parties while only losing less than 3% AUC, using the same number of iterations for gradient boosting, on a wide range of benchmark datasets.

التعلم الآلي التعلم الالي

Privacy-Preserving Multi-Party Contextual Bandits

96 - Awni Hannun , Brian Knott , Shubho Sengupta 2019

Contextual bandits are online learners that, given an input, select an arm and receive a reward for that arm. They use the reward as a learning signal and aim to maximize the total reward over the inputs. Contextual bandits are commonly used to solve recommendation or ranking problems. This paper considers a learning setting in which multiple parties aim to train a contextual bandit together in a private way: the parties aim to maximize the total reward but do not want to share any of the relevant information they possess with the other parties. Specifically, multiple parties have access to (different) features that may benefit the learner but that cannot be shared with other parties. One of the parties pulls the arm but other parties may not learn which arm was pulled. One party receives the reward but the other parties may not learn the reward value. This paper develops a privacy-preserving multi-party contextual bandit for this learning setting by combining secure multi-party computation with a differentially private mechanism based on epsilon-greedy exploration.

التعلم الآلي التشفير والأمن التعلم الالي

Mechanism Design for Multi-Party Machine Learning

87 - Mengjing Chen , Yang Liu , Weiran Shen 2020

In a multi-party machine learning system, different parties cooperate on optimizing towards better models by sharing data in a privacy-preserving way. A major challenge in learning is the incentive issue. For example, if there is competition among th e parties, one may strategically hide his data to prevent other parties from getting better models. In this paper, we study the problem through the lens of mechanism design and incorporate the features of multi-party learning in our setting. First, each agents valuation has externalities that depend on others types and actions. Second, each agent can only misreport a type lower than his true type, but not the other way round. We call this setting interdependent value with type-dependent action spaces. We provide the optimal truthful mechanism in the quasi-monotone utility setting. We also provide necessary and sufficient conditions for truthful mechanisms in the most general case. Finally, we show the existence of such mechanisms is highly affected by the market growth rate and provide empirical analysis.

أنظمة متعددة العملاء

Label Leakage and Protection in Two-party Split Learning

129 - Oscar Li , Jiankai Sun , Xin Yang 2021

In vertical federated learning, two-party split learning has become an important topic and has found many applications in real business scenarios. However, how to prevent the participants ground-truth labels from possible leakage is not well studied. In this paper, we consider answering this question in an imbalanced binary classification setting, a common case in online business applications. We first show that, norm attack, a simple method that uses the norm of the communicated gradients between the parties, can largely reveal the ground-truth labels from the participants. We then discuss several protection techniques to mitigate this issue. Among them, we have designed a principled approach that directly maximizes the worst-case error of label detection. This is proved to be more effective in countering norm attack and beyond. We experimentally demonstrate the competitiveness of our proposed method compared to several other baselines.

التعلم الآلي التشفير والأمن