Learning Federated Representations and Recommendations with Limited Negatives

58 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lin Ning

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lin Ning - Karan Singhal - Ellie X. Zhou

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Deep retrieval models are widely used for learning entity representations and recommendations. Federated learning provides a privacy-preserving way to train these models without requiring centralization of user data. However, federated deep retrieval models usually perform much worse than their centralized counterparts due to non-IID (independent and identically distributed) training data on clients, an intrinsic property of federated learning that limits negatives available for training. We demonstrate that this issue is distinct from the commonly studied client drift problem. This work proposes batch-insensitive losses as a way to alleviate the non-IID negatives issue for federated movie recommendation. We explore a variety of techniques and identify that batch-insensitive losses can effectively improve the performance of federated deep retrieval models, increasing the relative recall of the federated model by up to 93.15% and reducing the relative gap in recall between it and a centralized model from 27.22% - 43.14% to 0.53% - 2.42%. We open-source our code framework to accelerate further research and applications of federated deep retrieval models.

قيم البحث

107 - Loek Tonnaer , Luis A. Perez Rey , Vlado Menkovski 2020

Learning low-dimensional representations that disentangle the underlying factors of variation in data has been posited as an important step towards interpretable machine learning with good generalization. To address the fact that there is no consensu s on what disentanglement entails, Higgins et al. (2018) propose a formal definition for Linear Symmetry-Based Disentanglement, or LSBD, arguing that underlying real-world transformations give exploitable structure to data. Although several works focus on learning LSBD representations, such methods require supervision on the underlying transformations for the entire dataset, and cannot deal with unlabeled data. Moreover, none of these works provide a metric to quantify LSBD. We propose a metric to quantify LSBD representations that is easy to compute under certain well-defined assumptions. Furthermore, we present a method that can leverage unlabeled data, such that LSBD representations can be learned with limited supervision on transformations. Using our LSBD metric, our results show that limited supervision is indeed sufficient to learn LSBD representations.

التعلم الآلي

Federated Reconstruction: Partially Local Federated Learning

98 - Karan Singhal , Hakim Sidahmed , Zachary Garrett 2021

Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model para meters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

FedProf: Selective Federated Learning with Representation Profiling

100 - Wentai Wu , Ligang He , Weiwei Lin 2021

Federated Learning (FL) has shown great potential as a privacy-preserving solution to learning from decentralized data that are only accessible to end devices (i.e., clients). In many scenarios however, a large proportion of the clients are probably in possession of low-quality data that are biased, noisy or even irrelevant. As a result, they could significantly slow down the convergence of the global model we aim to build and also compromise its quality. In light of this, we propose FedProf, a novel algorithm for optimizing FL under such circumstances without breaching data privacy. The key of our approach is a data representation profiling and matching scheme that uses the global model to dynamically profile data representations and allows for low-cost, lightweight representation matching. Based on the scheme we adaptively score each client and adjust its participation probability so as to mitigate the impact of low-value clients on the training process. We have conducted extensive experiments on public datasets using various FL settings. The results show that FedProf effectively reduces the number of communication rounds and overall time (up to 4.5x speedup) for the global model to converge and provides accuracy gain.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Federated Transfer Learning with Dynamic Gradient Aggregation

98 - Dimitrios Dimitriadis , Kenichi Kumatani , Robert Gmyr 2020

In this paper, a Federated Learning (FL) simulation platform is introduced. The target scenario is Acoustic Model training based on this platform. To our knowledge, this is the first attempt to apply FL techniques to Speech Recognition tasks due to t he inherent complexity. The proposed FL platform can support different tasks based on the adopted modular design. As part of the platform, a novel hierarchical optimization scheme and two gradient aggregation methods are proposed, leading to almost an order of magnitude improvement in training convergence speed compared to other distributed or FL training algorithms like BMUF and FedAvg. The hierarchical optimization offers additional flexibility in the training pipeline besides the enhanced convergence speed. On top of the hierarchical optimization, a dynamic gradient aggregation algorithm is proposed, based on a data-driven weight inference. This aggregation algorithm acts as a regularizer of the gradient quality. Finally, an unsupervised training pipeline tailored to FL is presented as a separate training scenario. The experimental validation of the proposed system is based on two tasks: first, the LibriSpeech task showing a speed-up of 7x and 6% Word Error Rate reduction (WERR) compared to the baseline results. The second task is based on session adaptation providing an improvement of 20% WERR over a competitive production-ready LAS model. The proposed Federated Learning system is shown to outperform the golden standard of distributed training in both convergence speed and overall model performance.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

Anarchic Federated Learning

254 - Haibo Yang , Xin Zhang , Prashant Khanduri 2021

Present-day federated learning (FL) systems deployed over edge networks have to consistently deal with a large number of workers with high degrees of heterogeneity in data and/or computing capabilities. This diverse set of workers necessitates the de velopment of FL algorithms that allow: (1) flexible worker participation that grants the workers capability to engage in training at will, (2) varying number of local updates (based on computational resources) at each worker along with asynchronous communication with the server, and (3) heterogeneous data across workers. To address these challenges, in this work, we propose a new paradigm in FL called ``Anarchic Federated Learning (AFL). In stark contrast to conventional FL models, each worker in AFL has complete freedom to choose i) when to participate in FL, and ii) the number of local steps to perform in each round based on its current situation (e.g., battery level, communication channels, privacy concerns). However, AFL also introduces significant challenges in algorithmic design because the server needs to handle the chaotic worker behaviors. Toward this end, we propose two Anarchic FedAvg-like algorithms with two-sided learning rates for both cross-device and cross-silo settings, which are named AFedAvg-TSLR-CD and AFedAvg-TSLR-CS, respectively. For general worker information arrival processes, we show that both algorithms retain the highly desirable linear speedup effect in the new AFL paradigm. Moreover, we show that our AFedAvg-TSLR algorithmic framework can be viewed as a {em meta-algorithm} for AFL in the sense that they can utilize advanced FL algorithms as worker- and/or server-side optimizers to achieve enhanced performance under AFL. We validate the proposed algorithms with extensive experiments on real-world datasets.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية