No Arabic abstract
We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework directly resolves a recent open problem on how to design optimal load balancing schemes using delayed information.
Dynamic affinity load balancing of multi-type tasks on multi-skilled servers, when the service rate of each task type on each of the servers is known and can possibly be different from each other, is an open problem for over three decades. The goal is to do task assignment on servers in a real time manner so that the system becomes stable, which means that the queue lengths do not diverge to infinity in steady state (throughput optimality), and the mean task completion time is minimized (delay optimality). The fluid model planning, Max-Weight, and c-$mu$-rule algorithms have theoretical guarantees on optimality in some aspects for the affinity problem, but they consider a complicated queueing structure and either require the task arrival rates, the service rates of tasks on servers, or both. In many cases that are discussed in the introduction section, both task arrival rates and service rates of different task types on different servers are unknown. In this work, the Blind GB-PANDAS algorithm is proposed which is completely blind to task arrival rates and service rates. Blind GB-PANDAS uses an exploration-exploitation approach for load balancing. We prove that Blind GB-PANDAS is throughput optimal under arbitrary and unknown distributions for service times of different task types on different servers and unknown task arrival rates. Blind GB-PANDAS desires to route an incoming task to the server with the minimum weighted-workload, but since the service rates are unknown, such routing of incoming tasks is not guaranteed which makes the throughput optimality analysis more complicated than the case where service rates are known. Our extensive experimental results reveal that Blind GB-PANDAS significantly outperforms existing methods in terms of mean task completion time at high loads.
We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi-GPU distribution scheme and 3) a static, yet very effective load balancing strategy. The simulator further features an easy to use API and the ability to create custom models. We compare the proposed simulator against two state of the art ones on a series of benchmarks using three well-established models. We find that our simulator is faster, consumes less memory, and scales linearly with the number of GPUs.
Recently, fog computing has been introduced as a modern distributed paradigm and complement to cloud computing to provide services. Fog system extends storing and computing to the edge of the network, which can solve the problem about service computing of the delay-sensitive applications remarkably besides enabling the location awareness and mobility support. Load balancing is an important aspect of fog networks that avoids a situation with some under-loaded or overloaded fog nodes. Quality of Service (QoS) parameters such as resource utilization, throughput, cost, response time, performance, and energy consumption can be improved with load balancing. In recent years, some researches in load balancing techniques in fog networks have been carried out, but there is no systematic review to consolidate these studies. This article reviews the load-balancing mechanisms systematically in fog computing in four classifications, including approximate, exact, fundamental, and hybrid methods (published between 2013 and August 2020). Also, this article investigates load balancing metrics with all advantages and disadvantages related to chosen load balancing mechanisms in fog networks. The evaluation techniques and tools applied for each reviewed study are explored as well. Additionally, the essential open challenges and future trends of these mechanisms are discussed.
In this note, we apply Steins method to analyze the performance of general load balancing schemes in the many-server heavy-traffic regime. In particular, consider a load balancing system of $N$ servers and the distance of arrival rate to the capacity region is given by $N^{1-alpha}$ with $alpha > 1$. We are interested in the performance as $N$ goes to infinity under a large class of policies. We establish different asymptotics under different scalings and conditions. Specifically, (i) If the second moments linearly increase with $N$ with coefficients $sigma_a^2$ and $ u_s^2$, then for any $alpha > 4$, the distribution of the sum queue length scaled by $N^{-alpha}$ converges to an exponential random variable with mean $frac{sigma_a^2 + u_s^2}{2}$. (3) If the second moments quadratically increase with $N$ with coefficients $tilde{sigma}_a^2$ and $tilde{ u}_s^2$, then for any $alpha > 3$, the distribution of the sum queue length scaled by $N^{-alpha-1}$ converges to an exponential random variable with mean $frac{tilde{sigma}_a^2 + tilde{ u}_s^2}{2}$. Both results are simple applications of our previously developed framework of Steins method for heavy-traffic analysis in cite{zhou2020note}.
Set function optimization is essential in AI and machine learning. We focus on a subadditive set function that generalizes submodularity, and examine the subadditivity of non-submodular functions. We also deal with a minimax subadditive load balancing problem, and present a modularization-minimization algorithm that theoretically guarantees a worst-case approximation factor. In addition, we give a lower bound computation technique for the problem. We apply these methods to the multi-robot routing problem for an empirical performance evaluation.