No Arabic abstract
In this note, we apply Steins method to analyze the performance of general load balancing schemes in the many-server heavy-traffic regime. In particular, consider a load balancing system of $N$ servers and the distance of arrival rate to the capacity region is given by $N^{1-alpha}$ with $alpha > 1$. We are interested in the performance as $N$ goes to infinity under a large class of policies. We establish different asymptotics under different scalings and conditions. Specifically, (i) If the second moments linearly increase with $N$ with coefficients $sigma_a^2$ and $ u_s^2$, then for any $alpha > 4$, the distribution of the sum queue length scaled by $N^{-alpha}$ converges to an exponential random variable with mean $frac{sigma_a^2 + u_s^2}{2}$. (3) If the second moments quadratically increase with $N$ with coefficients $tilde{sigma}_a^2$ and $tilde{ u}_s^2$, then for any $alpha > 3$, the distribution of the sum queue length scaled by $N^{-alpha-1}$ converges to an exponential random variable with mean $frac{tilde{sigma}_a^2 + tilde{ u}_s^2}{2}$. Both results are simple applications of our previously developed framework of Steins method for heavy-traffic analysis in cite{zhou2020note}.
In this note, we apply Steins method to analyze the steady-state distribution of queueing systems in the traditional heavy-traffic regime. Compared to previous methods (e.g., drift method and transform method), Steins method allows us to establish stronger results with simple and template proofs. In particular, we consider discrete-time systems in this note. We first introduce the key ideas of Steins method for heavy-traffic analysis through a single-server system. Then, we apply the developed template to analyze both load balancing problems and scheduling problems. All these three examples demonstrate the power and flexibility of Steins method in heavy-traffic analysis. In particular, we can see that one appealing property of Steins method is that it combines the advantages of both the drift method and the transform method.
We study a many-server queueing model with server vacations, where the population size dynamics of servers and customers are coupled: a server may leave for vacation only when no customers await, and the capacity available to customers is directly affected by the number of servers on vacation. We focus on scaling regimes in which server dynamics and queue dynamics fluctuate at matching time scales, so that their limiting dynamics are coupled. Specifically, we argue that interesting coupled dynamics occur in (a) the Halfin-Whitt regime, (b) the nondegenerate slowdown regime, and (c) the intermediate, near Halfin-Whitt regime; whereas the dynamics asymptotically decouple in the other heavy traffic regimes. We characterize the limiting dynamics, which are different for each scaling regime. We consider relevant respective performance measures for regimes (a) and (b) --- namely, the probability of wait and the slowdown. While closed form formulas for these performance measures have been derived for models that do not accommodate server vacations, it is difficult to obtain closed form formulas for these performance measures in the setting with server vacations. Instead, we propose formulas that approximate these performance measures, and depend on the steady-state mean number of available servers and previously derived formulas for models without server vacations. We test the accuracy of these formulas numerically.
We extend the measure-valued fluid model, which tracks residuals of patience and service times, to allow for time-varying arrivals. The fluid model can be characterized by a one-dimensional convolution equation involving both the patience and service time distributions. We also make an interesting connection to the measure-valued fluid model tracking the elapsed waiting and service times. Our analysis shows that the two fluid models are actually characterized by the same one-dimensional convolution equation.
We consider a system of N queues with decentralized load balancing such as power-of-D strategies(where D may depend on N) and generic scheduling disciplines. To measure the dependence of the queues, we use the clan of ancestors, a technique coming from interacting particle systems. Relying in that analysis we prove quantitative estimates on the queues correlations implying propagation of chaos for systems with Markovian arrivals and general service time distribution. This solves the conjecture posed by Bramsom et. al. in [*] concerning the asymptotic independence of the servers in the case of processor sharing policy. We then proceed to prove asymptotic insensitivity in the stationary regime for a wide class of scheduling disciplines and obtain speed of convergence estimates for light tailed service distribution. [*] M. BRAMSON, Y. LU AND B. PRABHAKAR, Asymptotic independence of queues under randomized load balancing, Queueing Syst., 71:247-292, 2012.
Popular dispatching policies such as the join shortest queue (JSQ), join smallest work (JSW) and their power of two variants are used in load balancing systems where the instantaneous queue length or workload information at all queues or a subset of them can be queried. In situations where the dispatcher has an associated memory, one can minimize this query overhead by maintaining a list of idle servers to which jobs can be dispatched. Recent alternative approaches that do not require querying such information include the cancel on start and cancel on complete based replication policies. The downside of such policies however is that the servers must communicate the start or completion of each service to the dispatcher and must allow cancellation of redundant copies. In this work, we consider a load balancing environment where the dispatcher cannot query load information, does not have a memory, and cannot cancel any replica that it may have created. In such a rigid environment, we allow the dispatcher to possibly append a server side cancellation criteria to each job or its replica. A job or a replica is served only if it satisfies the predefined criteria at the time of service. We focus on a criteria that is based on the waiting time experienced by a job or its replica and analyze several variants of this policy based on the assumption of asymptotic independence of queues. The proposed policies are novel and perform remarkably well in spite of the rigid operating constraints.