ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimal Coding Scheme and Resource Allocation for Distributed Computation with Limited Resources

76   0   0.0 ( 0 )
 نشر من قبل Yi Lihui
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A central issue of distributed computing systems is how to optimally allocate computing and storage resources and design data shuffling strategies such that the total execution time for computing and data shuffling is minimized. This is extremely critical when the computation, storage and communication resources are limited. In this paper, we study the resource allocation and coding scheme for the MapReduce-type framework with limited resources. In particular, we focus on the coded distributed computing (CDC) approach proposed by Li et al.. We first extend the asymmetric CDC (ACDC) scheme proposed by Yu et al. to the cascade case where each output function is computed by multiple servers. Then we demonstrate that whether CDC or ACDC is better depends on system parameters (e.g., number of computing servers) and task parameters (e.g., number of input files), implying that neither CDC nor ACDC is optimal. By merging the ideas of CDC and ACDC, we propose a hybrid scheme and show that it can strictly outperform CDC and ACDC. Furthermore, we derive an information-theoretic converse showing that for the MapReduce task using a type of weakly symmetric Reduce assignment, which includes the Reduce assignments of CDC and ACDC as special cases, the hybrid scheme with a corresponding resource allocation strategy is optimal, i.e., achieves the minimum execution time, for an arbitrary amount of computing servers and storage memories.

قيم البحث

اقرأ أيضاً

This work proposes a new resource allocation optimization and network management framework for wireless networks using neighborhood-based optimization rather than fully centralized or fully decentralized methods. We propose hierarchical clustering wi th a minimax linkage criterion for the formation of the virtual cells. Once the virtual cells are formed, we consider two cooperation models: the interference coordination model and the coordinated multi-point decoding model. In the first model base stations in a virtual cell decode their signals independently, but allocate the communication resources cooperatively. In the second model base stations in the same virtual cell allocate the communication resources and decode their signals cooperatively. We address the resource allocation problem for each of these cooperation models. For the interference coordination model this problem is an NP-hard mixed-integer optimization problem whereas for the coordinated multi-point decoding model it is convex. Our numerical results indicate that proper design of the neighborhood-based optimization leads to significant gains in sum rate over fully decentralized optimization, yet may also have a significant sum rate penalty compared to fully centralized optimization. In particular, neighborhood-based optimization has a significant sum rate penalty compared to fully centralized optimization in the coordinated multi-point model, but not the interference coordination model.
Redundant storage maintains the performance of distributed systems under various forms of uncertainty. This paper considers the uncertainty in node access and download service. We consider two access models under two download service models. In one a ccess model, a user can access each node with a fixed probability, and in the other, a user can access a random fixed-size subset of nodes. We consider two download service models. In the first (small file) model, the randomness associated with the file size is negligible. In the second (large file) model, randomness is associated with both the file size and the systems operations. We focus on the service rate of the system. For a fixed redundancy level, the systems service rate is determined by the allocation of coded chunks over the storage nodes. We consider quasi-uniform allocations, where coded content is uniformly spread among a subset of nodes. The question we address asks what the size of this subset (spreading) should be. We show that in the small file model, concentrating the coded content to a minimum-size subset is universally optimal. For the large file model, the optimal spreading depends on the system parameters. These conclusions hold for both access models.
In this article, we consider the problem of relay assisted computation offloading (RACO), in which user A aims to share the results of computational tasks with another user B through wireless exchange over a relay platform equipped with mobile edge c omputing capabilities, referred to as a mobile edge relay server (MERS). To support the computation offloading, we propose a hybrid relaying (HR) approach employing two orthogonal frequency bands, where the amplify-and-forward scheme is used in one band to exchange computational results, while the decode-and-forward scheme is used in the other band to transfer the unprocessed tasks. The motivation behind the proposed HR scheme for RACO is to adapt the allocation of computing and communication resources both to dynamic user requirements and to diverse computational tasks. Within this framework, we seek to minimize the weighted sum of the execution delay and the energy consumption in the RACO system by jointly optimizing the computation offloading ratio, the bandwidth allocation, the processor speeds, as well as the transmit power levels of both user $A$ and the MERS, under practical constraints on the available computing and communication resources. The resultant problem is formulated as a non-differentiable and nonconvex optimization program with highly coupled constraints. By adopting a series of transformations and introducing auxiliary variables, we first convert this problem into a more tractable yet equivalent form. We then develop an efficient iterative algorithm for its solution based on the concave-convex procedure. By exploiting the special structure of this problem, we also propose a simplified algorithm based on the inexact block coordinate descent method, with reduced computational complexity. Finally, we present numerical results that illustrate the advantages of the proposed algorithms over state-of-the-art benchmark schemes.
Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding wi th linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with emph{arbitrary} data placement, with $s in mathbb{N}$ stragglers and $a in mathbb{N}$ adversarial nodes. In particular, we show that the optimum communication cost, normalized by the size of the gradient vectors, is equal to $(r-s-2a)^{-1}$, where $r in mathbb{N}$ is the minimum number that a data partition is replicated. In other words, the communication cost is determined by the data partition with the minimum replication, irrespective of the structure of the placement. The proposed achievable scheme also allows us to target the computation of a polynomial function of the aggregated gradient matrix. It also allows us to borrow some ideas from approximation computing and propose an approximate gradient coding scheme for the cases when the repetition in data placement is smaller than what is needed to meet the restriction imposed on communication cost or when the number of stragglers appears to be more than the presumed value in the system design.
Distributed implementations are crucial in speeding up large scale machine learning applications. Distributed gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers. A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers. Coded distributed computation techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers. In this paper, we consider gradient coding (GC), and propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to dynamically choose from among a set of possible codes depending on the past straggling behavior. In particular, we consider GC with clustering, and regulate the number of stragglers in each cluster by dynamically forming the clusters at each iteration; hence, the proposed scheme is called $GC$ $with$ $dynamic$ $clustering$ (GC-DC). Under a time-correlated straggling behavior, GC-DC gains from adapting to the straggling behavior over time such that, at each iteration, GC-DC aims at distributing the stragglers across clusters as uniformly as possible based on the past straggler behavior. For both homogeneous and heterogeneous worker models, we numerically show that GC-DC provides significant improvements in the average per-iteration completion time without an increase in the communication load compared to the original GC scheme.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا