No Arabic abstract
This work shows that the following problems are equivalent, both in theory and in practice: - median filtering: given an $n$-element vector, compute the sliding window median with window size $k$, - piecewise sorting: given an $n$-element vector, divide it in $n/k$ blocks of length $k$ and sort each block. By prior work, median filtering is known to be at least as hard as piecewise sorting: with a single median filter operation we can sort $Theta(n/k)$ blocks of length $Theta(k)$. The present work shows that median filtering is also as easy as piecewise sorting: we can do median filtering with one piecewise sorting operation and linear-time postprocessing. In particular, median filtering can directly benefit from the vast literature on sorting algorithms---for example, adaptive sorting algorithms imply adaptive median filtering algorithms. The reduction is very efficient in practice---for random inputs the performance of the new sorting-based algorithm is on a par with the fastest heap-based algorithms, and for benign data distributions it typically outperforms prior algorithms. The key technical idea is that we can represent the sliding window with a pair of sorted doubly-linked lists: we delete items from one list and add items to the other list. Deletions are easy; additions can be done efficiently if we reverse the time twice: First we construct the full list and delete the items in the reverse order. Then we undo each deletion with Knuths dancing links technique.
Computing the convolution $Astar B$ of two length-$n$ vectors $A,B$ is an ubiquitous computational primitive. Applications range from string problems to Knapsack-type problems, and from 3SUM to All-Pairs Shortest Paths. These applications often come in the form of nonnegative convolution, where the entries of $A,B$ are nonnegative integers. The classical algorithm to compute $Astar B$ uses the Fast Fourier Transform and runs in time $O(nlog n)$. However, often $A$ and $B$ satisfy sparsity conditions, and hence one could hope for significant improvements. The ideal goal is an $O(klog k)$-time algorithm, where $k$ is the number of non-zero elements in the output, i.e., the size of the support of $Astar B$. This problem is referred to as sparse nonnegative convolution, and has received considerable attention in the literature; the fastest algorithms to date run in time $O(klog^2 n)$. The main result of this paper is the first $O(klog k)$-time algorithm for sparse nonnegative convolution. Our algorithm is randomized and assumes that the length $n$ and the largest entry of $A$ and $B$ are subexponential in $k$. Surprisingly, we can phrase our algorithm as a reduction from the sparse case to the dense case of nonnegative convolution, showing that, under some mild assumptions, sparse nonnegative convolution is equivalent to dense nonnegative convolution for constant-error randomized algorithms. Specifically, if $D(n)$ is the time to convolve two nonnegative length-$n$ vectors with success probability $2/3$, and $S(k)$ is the time to convolve two nonnegative vectors with output size $k$ with success probability $2/3$, then $S(k)=O(D(k)+k(loglog k)^2)$. Our approach uses a variety of new techniques in combination with some old machinery from linear sketching and structured linear algebra, as well as new insights on linear hashing, the most classical hash function.
The linear pivot selection algorithm, known as median-of-medians, makes the worst case complexity of quicksort be $mathrm{O}(nln n)$. Nevertheless, it has often been said that this algorithm is too expensive to use in quicksort. In this article, we show that we can make the quicksort with this kind of pivot selection approach be efficient.
This paper shows an application of the theory of sorting networks to facilitate the synthesis of optimized general purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm with insertion sort applied as the base case for small fixed numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching and results in code which is equivalent to a sorting network. This enables the application of further program transformations based on sorting network optimizations, and eventually the synthesis of code from sorting networks. We show that if considering the number of comparisons and swaps then theory predicts no real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. We provide empirical evidence that using code synthesized from efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.
This paper studies new properties of the front and back ends of a sorting network, and illustrates the utility of these in the search for new bounds on optimal sorting networks. Search focuses first on the outsides of the network and then on the inner part. All previous works focus only on properties of the front end of networks and on how to apply these to break symmetries in the search. The new, out-side-in, properties help shed understanding on how sorting networks sort, and facilitate the computation of new bounds on optimal sorting networks. We present new parallel sorting networks for 17 to 20 inputs. For 17, 19, and 20 inputs these networks are faster than the previously known best networks. For 17 inputs, the new sorting network is shown optimal in the sense that no sorting network using less layers exists.
We empirically study sorting in the evolving data model. In this model, a sorting algorithm maintains an approximation to the sorted order of a list of data items while simultaneously, with each comparison made by the algorithm, an adversary randomly swaps the order of adjacent items in the true sorted order. Previous work studies only t