ﻻ يوجد ملخص باللغة العربية
With the increasing scale of machine learning tasks, it has become essential to reduce the communication between computing nodes. Early work on gradient compression focused on the bottleneck between CPUs and GPUs, but communication-efficiency is now needed in a variety of different system architectures, from high-performance clusters to energy-constrained IoT devices. In the current practice, compression levels are typically chosen before training and settings that work well for one task may be vastly suboptimal for another dataset on another architecture. In this paper, we propose a flexible framework which adapts the compression level to the true gradient at each iteration, maximizing the improvement in the objective function that is achieved per communicated bit. Our framework is easy to adapt from one technology to the next by modeling how the communication cost depends on the compression level for the specific technology. Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes.
When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a
We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stoch
AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the struc
Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed data parallel distributed across many nodes. Each nodes contribution to the overa
In this paper, we present a scalable distributed implementation of the Sampled Limited-memory Symmetric Rank-1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications a