ﻻ يوجد ملخص باللغة العربية
We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR. DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub-operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale Deep Learning Recommendation Models (DLRMs), DES achieves higher AUC(Area Under ROC). We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.
In this paper, we consider hybrid parallelism -- a paradigm that employs both Data Parallelism (DP) and Model Parallelism (MP) -- to scale distributed training of large recommendation models. We propose a compression framework called Dynamic Communic
Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), t
The success of recommender systems in modern online platforms is inseparable from the accurate capture of users personal tastes. In everyday life, large amounts of user feedback data are created along with user-item online interactions in a variety o
Context-aware recommender systems (CARS) have gained increasing attention due to their ability to utilize contextual information. Compared to traditional recommender systems, CARS are, in general, able to generate more accurate recommendations. Laten
Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we fo