Optimizing Pipelined Computation and Communication for Latency-Constrained Edge Learning

68 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nicolas Skatchkovsky

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Nicolas Skatchkovsky - Osvaldo Simeone

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Consider a device that is connected to an edge processor via a communication channel. The device holds local data that is to be offloaded to the edge processor so as to train a machine learning model, e.g., for regression or classification. Transmission of the data to the learning processor, as well as training based on Stochastic Gradient Descent (SGD), must be both completed within a time limit. Assuming that communication and computation can be pipelined, this letter investigates the optimal choice for the packet payload size, given the overhead of each data packet transmission and the ratio between the computation and the communication rates. This amounts to a tradeoff between bias and variance, since communicating the entire data set first reduces the bias of the training process but it may not leave sufficient time for learning. Analytical bounds on the expected optimality gap are derived so as to enable an effective optimization, which is validated in numerical results.

قيم البحث

102 - Anis Elgabli , Jihong Park , Amrit S. Bedi 2019

When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alternating Direction Method of Multipliers (GADMM) is based on the Alternating Direction Method of Multipliers (ADMM) framework. The key novelty in GADMM is that it solves the problem in a decentralized topology where at most half of the workers are competing for the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges to the optimal solution for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets. Furthermore, we propose Dynamic GADMM (D-GADMM), a variant of GADMM, and prove its convergence under the time-varying network topology of the workers.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

Communication-Computation Trade-Off in Resource-Constrained Edge Inference

137 - Jiawei Shao , Jun Zhang 2020

The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based servi ces at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods.

التعلم الآلي معالجة الإشارات التعلم الالي

Computation Peer Offloading for Energy-Constrained Mobile Edge Computing in Small-Cell Networks

139 - Lixing Chen , Sheng Zhou , Jie Xu 2017

The (ultra-)dense deployment of small-cell base stations (SBSs) endowed with cloud-like computing functionalities paves the way for pervasive mobile edge computing (MEC), enabling ultra-low latency and location-awareness for a variety of emerging mob ile applications and the Internet of Things. To handle spatially uneven computation workloads in the network, cooperation among SBSs via workload peer offloading is essential to avoid large computation latency at overloaded SBSs and provide high quality of service to end users. However, performing effective peer offloading faces many unique challenges in small cell networks due to limited energy resources committed by self-interested SBS owners, uncertainties in the system dynamics and co-provisioning of radio access and computing services. This paper develops a novel online SBS peer offloading framework, called OPEN, by leveraging the Lyapunov technique, in order to maximize the long-term system performance while keeping the energy consumption of SBSs below individual long-term constraints. OPEN works online without requiring information about future system dynamics, yet provides provably near-optimal performance compared to the oracle solution that has the complete future information. In addition, this paper formulates a novel peer offloading game among SBSs, analyzes its equilibrium and efficiency loss in terms of the price of anarchy in order to thoroughly understand SBSs strategic behaviors, thereby enabling decentralized and autonomous peer offloading decision making. Extensive simulations are carried out and show that peer offloading among SBSs dramatically improves the edge computing performance.

علوم الكمبيوتر ونظرية الألعاب النظم الموزعة والتوازية والحوسبة العنقودية

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

93 - Naeimeh Omidvar , Mohammad Ali Maddah-Ali , Hamed Mahdavi 2020

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm, the worke r nodes calculate and communicate some scalers, that are the directional derivatives of the sample functions in some emph{pre-shared directions}. However, to maintain accuracy, after every specific number of iterations, they communicate the vectors of stochastic gradients. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order stochastic gradient estimation, by performing just two function evaluations rather than computing a first-order gradient vector. The proposed method highly improves the convergence rate of the zeroth-order methods, guaranteeing order-wise faster convergence. Moreover, compared to the famous communication-efficient methods of model averaging (that perform local model updates and periodic communication of the gradients to synchronize the local models), we prove that for the general class of non-convex stochastic problems and with reasonable choice of parameters, the proposed method guarantees the same orders of communication load and convergence rate, while having order-wise less computational complexity. Experimental results on various learning problems in neural networks applications demonstrate the effectiveness of the proposed approach compared to various state-of-the-art distributed SGD methods.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

Device Scheduling and Update Aggregation Policies for Asynchronous Federated Learning

91 - Chung-Hsuan Hu , Zheng Chen , Erik G. Larsson 2021

Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework that combines on-device local training with server-based model synchronization to train a centralized ML model over distributed nodes. In this paper, we propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems. For the proposed model, we investigate several device scheduling and update aggregation policies and compare their performances when the devices have heterogeneous computation capabilities and training data distributions. From the simulation results, we conclude that the scheduling and aggregation design for asynchronous FL can be rather different from the synchronous case. For example, a norm-based significance-aware scheduling policy might not be efficient in an asynchronous FL setting, and an appropriate age-aware weighting design for the model aggregation can greatly improve the learning performance of such systems.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات