No Arabic abstract
In a continuous-time setting, Fill (2010) proved, for a large class of probabilistic sources, that the number of symbol comparisons used by QuickSort, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random variable Y -- not even that it is nondegenerate. We establish the nondegeneracy of Y. The proof is perhaps surprisingly difficult.
Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (i.i.d.) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild tameness condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, that is, whenever each key is generated as an infinite string of i.i.d. symbols. This is somewhat surprising; even for the classical model that each key is an i.i.d. string of unbiased (fair) bits, the mean exhibits periodic fluctuations of order n.
When the search algorithm QuickSelect compares keys during its execution in order to find a key of target rank, it must operate on the keys representations or internal structures, which were ignored by the previous studies that quantified the execution cost for the algorithm in terms of the number of required key comparisons. In this paper, we analyze running costs for the algorithm that take into account not only the number of key comparisons but also the cost of each key comparison. We suppose that keys are represented as sequences of symbols generated by various probabilistic sources and that QuickSelect operates on individual symbols in order to find the target key. We identify limiting distributions for the costs and derive integral and series expressions for the expectations of the limiting distributions. These expressions are used to recapture previously obtained results on the number of key comparisons required by the algorithm.
The analyses of many algorithms and data structures (such as digital search trees) for searching and sorting are based on the representation of the keys involved as bit strings and so count the number of bit comparisons. On the other hand, the standard analyses of many other algorithms (such as Quicksort) are performed in terms of the number of key comparisons. We introduce the prospect of a fair comparison between algorithms of the two types by providing an average-case analysis of the number of bit comparisons required by Quicksort. Counting bit comparisons rather than key comparisons introduces an extra logarithmic factor to the asymptotic average total. We also provide a new algorithm, BitsQuick, that reduces this factor to constant order by eliminating needless bit comparisons.
Using a recursive approach, we obtain a simple exact expression for the L^2-distance from the limit in Regniers (1989) classical limit theorem for the number of key comparisons required by QuickSort. A previous study by Fill and Janson (2002) using a similar approach found that the d_2-distance is of order between n^{-1} log n and n^{-1/2}, and another by Neininger and Ruschendorf (2002) found that the Zolotarev zeta_3-distance is of exact order n^{-1} log n. Our expression reveals that the L^2-distance is asymptotically equivalent to (2 n^{-1} ln n)^{1/2}.
We substantially refine asymptotic logarithmic upper bounds produced by Svante Janson (2015) on the right tail of the limiting QuickSort distribution function $F$ and by Fill and Hung (2018) on the right tails of the corresponding density $f$ and of the absolute derivatives of $f$ of each order. For example, we establish an upper bound on $log[1 - F(x)]$ that matches conjectured asymptotics of Knessl and Szpankowski (1999) through terms of order $(log x)^2$; the corresponding order for the Janson (2015) bound is the lead order, $x log x$. Using the refined asymptotic bounds on $F$, we derive right-tail large deviation (LD) results for the distribution of the number of comparisons required by QuickSort that substantially sharpen the two-sided LD results of McDiarmid and Hayward (1996).