Do you want to publish a course? Click here

Optimizing the Transition Waste in Coded Elastic Computing

115   0   0.0 ( 0 )
 Added by Hoang Dau
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Distributed computing, in which a resource-intensive task is divided into subtasks and distributed among different machines, plays a key role in solving large-scale problems, e.g., machine learning for large datasets or massive computational problems arising in genomic research. Coded computing is a recently emerging paradigm where redundancy for distributed computing is introduced to alleviate the impact of slow machines, or stragglers, on the completion time. Motivated by recently available services in the cloud computing industry, e.g., EC2 Spot or Azure Batch, where spare/low-priority virtual machines are offered at a fraction of the price of the on-demand instances but can be preempted in a short notice, we investigate coded computing solutions over elastic resources, where the set of available machines may change in the middle of the computation. Our contributions are two-fold: We first propose an efficient method to minimize the transition waste, a newly introduced concept quantifying the total number of tasks that existing machines have to abandon or take on anew when a machine joins or leaves, for the cyclic elastic task allocation scheme recently proposed in the literature (Yang et al. ISIT19). We then proceed to generalize such a scheme and introduce new task allocation schemes based on finite geometry that achieve zero transition wastes as long as the number of active machines varies within a fixed range. The proposed solutions can be applied on top of every existing coded computing scheme tolerating stragglers.



rate research

Read More

Cloud providers have recently introduced new offerings whereby spare computing resources are accessible at discounts compared to on-demand computing. Exploiting such opportunity is challenging inasmuch as such resources are accessed with low-priority and therefore can elastically leave (through preemption) and join the computation at any time. In this paper, we design a new technique called coded elastic computing, enabling distributed computations over elastic resources. The proposed technique allows machines to leave the computation without sacrificing the algorithm-level performance, and, at the same time, adaptively reduce the workload at existing machines when new ones join the computation. Leveraging coded redundancy, our approach can achieve similar computational cost as the original (noiseless) method when all machines are present; the cost gracefully increases when machines are preempted and reduces when machines join. The performance of the proposed technique is evaluated on matrix-vector multiplication and linear regression tasks. In experimental validations, it can achieve exactly the same numerical result as the noiseless computation, while reducing the computation time by 46% when compared to non-adaptive coding schemes.
A distributed computing scenario is considered, where the computational power of a set of worker nodes is used to perform a certain computation task over a dataset that is dispersed among the workers. Lagrange coded computing (LCC), proposed by Yu et al., leverages the well-known Lagrange polynomial to perform polynomial evaluation of the dataset in such a scenario in an efficient parallel fashion while keeping the privacy of data amidst possible collusion of workers. This solution relies on quantizing the data into a finite field, so that Shamirs secret sharing, as one of its main building blocks, can be employed. Such a solution, however, is not properly scalable with the size of dataset, mainly due to computation overflows. To address such a critical issue, we propose a novel extension of LCC to the analog domain, referred to as analog LCC (ALCC). All the operations in the proposed ALCC protocol are done over the infinite fields of R/C but for practical implementations floating-point numbers are used. We characterize the privacy of data in ALCC, against any subset of colluding workers up to a certain size, in terms of the distinguishing security (DS) and the mutual information security (MIS) metrics. Also, the accuracy of outcome is characterized in a practical setting assuming operations are performed using floating-point numbers. Consequently, a fundamental trade-off between the accuracy of the outcome of ALCC and its privacy level is observed and is numerically evaluated. Moreover, we implement the proposed scheme to perform matrix-matrix multiplication over a batch of matrices. It is observed that ALCC is superior compared to the state-of-the-art LCC, implemented using fixed-point numbers, assuming both schemes use an equal number of bits to represent data symbols.
In caching system, it is desirable to design a coded caching scheme with the transmission load $R$ and subpacketization $F$ as small as possible, in order to improve efficiency of transmission in the peak traffic times and to decrease implementation complexity. Yan et al. reformulated the centralized coded caching scheme as designing a corresponding $Ftimes K$ array called placement delivery array (PDA), where $F$ is the subpacketization and $K$ is the number of users. Motivated by several constructions of PDAs, we introduce a framework for constructing PDAs, where each row is indexed by a row vector of some matrix called row index matrix and each columns index is labelled by an element of a direct product set. Using this framework, a new scheme is obtained, which can be regarded as a generalization of some previously known schemes. When $K$ is equal to ${mchoose t}q^t$ for positive integers $m$, $t$ with $t<m$ and $qgeq 2$, we show that the row index matrix must be an orthogonal array if all the users have the same memory size. Furthermore, the row index matrix must be a covering array if the coded gain is ${mchoose t}$, which is the maximal coded gain under our framework. Consequently the lower bounds on the transmission load and subpacketization of the schemes are derived under our framework. Finally, using orthogonal arrays as the row index matrix, we obtain two more explicit classes of schemes which have significantly advantages on the subpacketization while the transmission load is equal or close to that of the schemes constructed by Shangguan et al. (IEEE Trans. Inf. Theory, 64, 5755-5766, 2018) for the same number of users and memory size.
171 - Haoning Chen , Youlong Wu 2020
We consider a MapReduce-type task running in a distributed computing model which consists of ${K}$ edge computing nodes distributed across the edge of the network and a Master node that assists the edge nodes to compute output functions. The Master node and the edge nodes, both equipped with some storage memories and computing capabilities, are connected through a multicast network. We define the communication time spent during the transmission for the sequential implementation (all nodes send symbols sequentially) and parallel implementation (the Master node can send symbols during the edge nodes transmission), respectively. We propose a mixed coded distributed computing scheme that divides the system into two subsystems where the coded distributed computing (CDC) strategy proposed by Songze Li emph{et al.} is applied into the first subsystem and a novel master-aided CDC strategy is applied into the second subsystem. We prove that this scheme is optimal, i.e., achieves the minimum communication time for both the sequential and parallel implementation, and establish an {emph{optimal}} information-theoretic tradeoff between the overall communication time, computation load, and the Master nodes storage capacity. It demonstrates that incorporating a Master node with storage and computing capabilities can further reduce the communication time. For the sequential implementation, we deduce the approximately optimal file allocation between the two subsystems, which shows that the Master node should map as many files as possible in order to achieve smaller communication time. For the parallel implementation, if the Master nodes storage and computing capabilities are sufficiently large (not necessary to store and map all files), then the proposed scheme requires at most 1/2 of the minimum communication time of system without the help of the Master node.
One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا