ترغب بنشر مسار تعليمي؟ اضغط هنا

A Survey of Stability Results for Redundancy Systems

84   0   0.0 ( 0 )
 نشر من قبل Elene Anton
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Redundancy mechanisms consist in sending several copies of a same job to a subset of servers. It constitutes one of the most promising ways to exploit diversity in multiservers applications. However, its pros and cons are still not sufficiently understood in the context of realistic models with generic statistical properties of service-times distributions and correlation structures of copies. We aim at giving a survey of recent results concerning the stability-arguably the first benchmark of performance-of systems with cancel-oncompletion redundancy. We also point out open questions and conjectures.


قيم البحث

اقرأ أيضاً

114 - Ji Zhu , Bruce Hajek 2011
This paper focuses on the stationary portion of file download in an unstructured peer-to-peer network, which typically follows for many hours after a flash crowd initiation. The model includes the case that peers can have some pieces at the time of a rrival. The contribution of the paper is to identify how much help is needed from the seeds, either fixed seeds or peer seeds (which are peers remaining in the system after obtaining a complete collection) to stabilize the system. The dominant cause for instability is the missing piece syndrome, whereby one piece becomes very rare in the network. It is shown that stability can be achieved with only a small amount of help from peer seeds--even with very little help from a fixed seed, peers need dwell as peer seeds on average only long enough to upload one additional piece. The region of stability is insensitive to the piece selection policy. Network coding can substantially increase the region of stability in case a portion of the new peers arrive with randomly coded pieces.
Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-job- per-server model: an arrival might not fit into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.
We give simple and unified proofs of the known stability and rigidity results for Lie algebras, Lie subalgebras and Lie algebra homomorphisms. Moreover, we investigate when a Lie algebra homomorphism is stable under all automorphisms of the codomain (including outer automorphisms).
91 - Yige Hong , Weina Wang 2021
Multiserver jobs, which are jobs that occupy multiple servers simultaneously during service, are prevalent in todays computing clusters. But little is known about the delay performance of systems with multiserver jobs. We consider queueing models for multiserver jobs in a scaling regime where the total number of servers in the system becomes large and meanwhile both the system load and the number of servers that a job needs scale with the total number of servers. Prior work has derived upper bounds on the queueing probability in this scaling regime. However, without proper lower bounds, the existing results cannot be used to differentiate between policies. In this paper, we study the delay performance by establishing sharp bounds on the mean waiting time of multiserver jobs, where the waiting time of a job is the time spent in queueing rather than in service. We first consider the commonly used First-Come-First-Serve (FCFS) policy and characterize the exact order of its mean waiting time. We then prove a lower bound on the mean waiting time of all policies, and demonstrate that there is an order gap between this lower bound and the mean waiting time under FCFS. We finally complement the lower bound with an achievability result: we show that under a priority policy that we call P-Priority, the mean waiting time achieves the order of the lower bound. This achievability result implies the tightness of the lower bound, the asymptotic optimality of P-Priority, and the strict suboptimality of FCFS.
We investigate the stability condition of redundancy-$d$ multi-server systems. Each server has its own queue and implements popular scheduling disciplines such as First-Come-First-Serve (FCFS), Processor Sharing (PS), and Random Order of Service (ROS ). New jobs arrive according to a Poisson process and copies of each job are sent to $d$ servers chosen uniformly at random. The service times of jobs are assumed to be exponentially distributed. A job departs as soon as one of its copies finishes service. Under the assumption that all $d$ copies are i.i.d., we show that for PS and ROS (for FCFS it is already known) sending redundant copies does not reduce the stability region. Under the assumption that the $d$ copies are identical, we show that (i) ROS does not reduce the stability region, (ii) FCFS reduces the stability region, which can be characterized through an associated saturated system, and (iii) PS severely reduces the stability region, which coincides with the system where all copies have to be emph{fully} served. The proofs are based on careful characterizations of scaling limits of the underlying stochastic process. Through simulations we obtain interesting insights on the systems performance for non-exponential service time distributions and heterogeneous server speeds.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا