Do you want to publish a course? Click here

Applying Sorting Networks to Synthesize Optimized Sorting Libraries

129   0   0.0 ( 0 )
 Publication date 2015
and research's language is English




Ask ChatGPT about the research

This paper shows an application of the theory of sorting networks to facilitate the synthesis of optimized general purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm with insertion sort applied as the base case for small fixed numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching and results in code which is equivalent to a sorting network. This enables the application of further program transformations based on sorting network optimizations, and eventually the synthesis of code from sorting networks. We show that if considering the number of comparisons and swaps then theory predicts no real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. We provide empirical evidence that using code synthesized from efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.



rate research

Read More

This paper studies new properties of the front and back ends of a sorting network, and illustrates the utility of these in the search for new bounds on optimal sorting networks. Search focuses first on the outsides of the network and then on the inner part. All previous works focus only on properties of the front end of networks and on how to apply these to break symmetries in the search. The new, out-side-in, properties help shed understanding on how sorting networks sort, and facilitate the computation of new bounds on optimal sorting networks. We present new parallel sorting networks for 17 to 20 inputs. For 17, 19, and 20 inputs these networks are faster than the previously known best networks. For 17 inputs, the new sorting network is shown optimal in the sense that no sorting network using less layers exists.
169 - Jukka Suomela 2014
This work shows that the following problems are equivalent, both in theory and in practice: - median filtering: given an $n$-element vector, compute the sliding window median with window size $k$, - piecewise sorting: given an $n$-element vector, divide it in $n/k$ blocks of length $k$ and sort each block. By prior work, median filtering is known to be at least as hard as piecewise sorting: with a single median filter operation we can sort $Theta(n/k)$ blocks of length $Theta(k)$. The present work shows that median filtering is also as easy as piecewise sorting: we can do median filtering with one piecewise sorting operation and linear-time postprocessing. In particular, median filtering can directly benefit from the vast literature on sorting algorithms---for example, adaptive sorting algorithms imply adaptive median filtering algorithms. The reduction is very efficient in practice---for random inputs the performance of the new sorting-based algorithm is on a par with the fastest heap-based algorithms, and for benign data distributions it typically outperforms prior algorithms. The key technical idea is that we can represent the sliding window with a pair of sorted doubly-linked lists: we delete items from one list and add items to the other list. Deletions are easy; additions can be done efficiently if we reverse the time twice: First we construct the full list and delete the items in the reverse order. Then we undo each deletion with Knuths dancing links technique.
Sorting and ranking supervision is a method for training neural networks end-to-end based on ordering constraints. That is, the ground truth order of sets of samples is known, while their absolute values remain unsupervised. For that, we propose differentiable sorting networks by relaxing their pairwise conditional swap operations. To address the problems of vanishing gradients and extensive blurring that arise with larger numbers of layers, we propose mapping activations to regions with moderate gradients. We consider odd-even as well as bitonic sorting networks, which outperform existing relaxations of the sorting operation. We show that bitonic sorting networks can achieve stable training on large input sets of up to 1024 elements.
We study the problem of sorting under incomplete information, when queries are used to resolve uncertainties. Each of $n$ data items has an unknown value, which is known to lie in a given interval. We can pay a query cost to learn the actual value, and we may allow an error threshold in the sorting. The goal is to find a nearly-sorted permutation by performing a minimum-cost set of queries. We show that an offline optimum query set can be found in poly time, and that both oblivious and adaptive problems have simple competitive algorithms. The competitive ratio for the oblivious problem is $n$ for uniform query costs, and unbounded for arbitrary costs; for the adaptive problem, the ratio is 2. We present a unified adaptive strategy for uniform costs that yields the following improved results: (1) a 3/2-competitive randomized algorithm; (2) a 5/3-competitive deterministic algorithm if the dependency graph has no 2-components after some preprocessing, which has competitive ratio $3/2+mathrm{O}(1/k)$ if the components obtained have size at least $k$; and (3) an exact algorithm for laminar families of intervals. The first two results have matching lower bounds, and we have a lower bound of 7/5 for large components. We also give a randomized adaptive algorithm with competitive ratio $1+frac{4}{3sqrt{3}}approx 1.7698$ for arbitrary query costs, and we show that the 2-competitive deterministic adaptive algorithm can be generalized for queries returning intervals and for a more general vertex cover problem, by using the local ratio technique. Moreover, we prove that the advice complexity of the adaptive problem is $lfloor n/2rfloor$ if no error threshold is allowed, and $lceil n/3cdotlg 3rceil$ for the general case. Finally, we present some graph-theoretical results on co-threshold tolerance graphs, and we discuss uncertainty variants of some classical interval problems.
Previous work identifying depth-optimal $n$-channel sorting networks for $9leq n leq 16$ is based on exploiting symmetries of the first two layers. However, the naive generate-and-test approach typically applied does not scale. This paper revisits the problem of generating two-layer prefixes modulo symmetries. An improved notion of symmetry is provided and a novel technique based on regular languages and graph isomorphism is shown to generate the set of non-symmetric representations. An empirical evaluation demonstrates that the new method outperforms the generate-and-test approach by orders of magnitude and easily scales until $n=40$.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا