Previous work identifying depth-optimal $n$-channel sorting networks for $9leq n leq 16$ is based on exploiting symmetries of the first two layers. However, the naive generate-and-test approach typically applied does not scale. This paper revisits the problem of generating two-layer prefixes modulo symmetries. An improved notion of symmetry is provided and a novel technique based on regular languages and graph isomorphism is shown to generate the set of non-symmetric representations. An empirical evaluation demonstrates that the new method outperforms the generate-and-test approach by orders of magnitude and easily scales until $n=40$.
This paper studies new properties of the front and back ends of a sorting network, and illustrates the utility of these in the search for new bounds on optimal sorting networks. Search focuses first on the outsides of the network and then on the inner part. All previous works focus only on properties of the front end of networks and on how to apply these to break symmetries in the search. The new, out-side-in, properties help shed understanding on how sorting networks sort, and facilitate the computation of new bounds on optimal sorting networks. We present new parallel sorting networks for 17 to 20 inputs. For 17, 19, and 20 inputs these networks are faster than the previously known best networks. For 17 inputs, the new sorting network is shown optimal in the sense that no sorting network using less layers exists.
This paper shows an application of the theory of sorting networks to facilitate the synthesis of optimized general purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm with insertion sort applied as the base case for small fixed numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching and results in code which is equivalent to a sorting network. This enables the application of further program transformations based on sorting network optimizations, and eventually the synthesis of code from sorting networks. We show that if considering the number of comparisons and swaps then theory predicts no real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. We provide empirical evidence that using code synthesized from efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.
In the classical Subset Sum problem we are given a set $X$ and a target $t$, and the task is to decide whether there exists a subset of $X$ which sums to $t$. A recent line of research has resulted in $tilde{O}(t)$-time algorithms, which are (near-)optimal under popular complexity-theoretic assumptions. On the other hand, the standard dynamic programming algorithm runs in time $O(n cdot |mathcal{S}(X,t)|)$, where $mathcal{S}(X,t)$ is the set of all subset sums of $X$ that are smaller than $t$. Furthermore, all known pseudopolynomial algorithms actually solve a stronger task, since they actually compute the whole set $mathcal{S}(X,t)$. As the aforementioned two running times are incomparable, in this paper we ask whether one can achieve the best of both worlds: running time $tilde{O}(|mathcal{S}(X,t)|)$. In particular, we ask whether $mathcal{S}(X,t)$ can be computed in near-linear time in the output-size. Using a diverse toolkit containing techniques such as color coding, sparse recovery, and sumset estimates, we make considerable progress towards this question and design an algorithm running in time $tilde{O}(|mathcal{S}(X,t)|^{4/3})$. Central to our approach is the study of top-$k$-convolution, a natural problem of independent interest: given sparse polynomials with non-negative coefficients, compute the lowest $k$ non-zero monomials of their product. We design an algorithm running in time $tilde{O}(k^{4/3})$, by a combination of sparse convolution and sumset estimates considered in Additive Combinatorics. Moreover, we provide evidence that going beyond some of the barriers we have faced requires either an algorithmic breakthrough or possibly new techniques from Additive Combinatorics on how to pass from information on restricted sumsets to information on unrestricted sumsets.
Sorting a Permutation by Transpositions (SPbT) is an important problem in Bioinformtics. In this paper, we improve the running time of the best known approximation algorithm for SPbT. We use the permutation tree data structure of Feng and Zhu and improve the running time of the 1.375 Approximation Algorithm for SPbT of Elias and Hartman to $O(nlog n)$. The previous running time of EH algorithm was $O(n^2)$.
Motivated by the development of computer theory, the sorting algorithm is emerging in an endless stream. Inspired by decrease and conquer method, we propose a brand new sorting algorithmUltimately Heapsort. The algorithm consists of two parts: building a heap and adjusting a heap. Through the asymptotic analysis and experimental analysis of the algorithm, the time complexity of our algorithm can reach O(nlogn) under any condition. Moreover, its space complexity is only O(1). It can be seen that our algorithm is superior to all previous algorithms.
Michael Codish
,Luis Cruz-Filipe
,Peter Schneider-Kamp
.
(2014)
.
"The Quest for Optimal Sorting Networks: Efficient Generation of Two-Layer Prefixes"
.
Peter Schneider-Kamp
هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا