ﻻ يوجد ملخص باللغة العربية
Distributed machine learning generally aims at training a global model based on distributed data without collecting all the data to a centralized location, where two different approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset). While each approach preserves data privacy to some extent thanks to not sharing the raw data, the exact extent of protection is unclear under sophisticated attacks that try to infer the raw data from the shared information. We present the first comparison between the two approaches in terms of target model accuracy, communication cost, and data privacy, where the last is measured by the accuracy of a state-of-the-art attack strategy called the membership inference attack. Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive
Membership inference attack aims to identify whether a data sample was used to train a machine learning model or not. It can raise severe privacy risks as the membership can reveal an individuals sensitive information. For example, identifying an ind
We design and mathematically analyze sampling-based algorithms for regularized loss minimization problems that are implementable in popular computational models for large data, in which the access to the data is restricted in some way. Our main resul
Membership inference (MI) attacks affect user privacy by inferring whether given data samples have been used to train a target learning model, e.g., a deep neural network. There are two types of MI attacks in the literature, i.e., these with and with
This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domain