No Arabic abstract
random_tree() is a linear time and space C++ implementation able to create trees of up to a billion nodes for genetic programming and genetic improvement experiments. A 3.60GHz CPU can generate more than 18 million random nodes for GP program trees per second.
In simulations, probabilistic algorithms and statistical tests, we often generate random integers in an interval (e.g., [0,s)). For example, random integers in an interval are essential to the Fisher-Yates random shuffle. Consequently, popular languages like Java, Python, C++, Swift and Go include ranged random integer generation functions as part of their runtime libraries. Pseudo-random values are usually generated in words of a fixed number of bits (e.g., 32 bits, 64 bits) using algorithms such as a linear congruential generator. We need functions to convert such random words to random integers in an interval ([0,s)) without introducing statistical biases. The standard functions in programming languages such as Java involve integer divisions. Unfortunately, division instructions are relatively expensive. We review an unbiased function to generate ranged integers from a source of random words that avoids integer divisions with high probability. To establish the practical usefulness of the approach, we show that this algorithm can multiply the speed of unbiased random shuffling on x64 processors. Our proposed approach has been adopted by the Go language for its implementation of the shuffle function.
We study multi-finger binary search trees (BSTs), a far-reaching extension of the classical BST model, with connections to the well-studied $k$-server problem. Finger search is a popular technique for speeding up BST operations when a query sequence has locality of reference. BSTs with multiple fingers can exploit more general regularities in the input. In this paper we consider the cost of serving a sequence of queries in an optimal (offline) BST with $k$ fingers, a powerful benchmark against which other algorithms can be measured. We show that the $k$-finger optimum can be matched by a standard dynamic BST (having a single root-finger) with an $O(log{k})$ factor overhead. This result is tight for all $k$, improving the $O(k)$ factor implicit in earlier work. Furthermore, we describe new online BSTs that match this bound up to a $(log{k})^{O(1)}$ factor. Previously only the one-finger special case was known to hold for an online BST (Iacono, Langerman, 2016; Cole et al., 2000). Splay trees, assuming their conjectured optimality (Sleator and Tarjan, 1983), would have to match our bounds for all $k$. Our online algorithms are randomized and combine techniques developed for the $k$-server problem with a multiplicative-weights scheme for learning tree metrics. To our knowledge, this is the first time when tools developed for the $k$-server problem are used in BSTs. As an application of our $k$-finger results, we show that BSTs can efficiently serve queries that are close to some recently accessed item. This is a (restricted) form of the unified property (Iacono, 2001) that was previously not known to hold for any BST algorithm, online or offline.
Best match graphs (BMG) are a key intermediate in graph-based orthology detection and contain a large amount of information on the gene tree. We provide a near-cubic algorithm to determine whether a BMG is binary-explainable, i.e., whether it can be explained by a fully resolved gene tree and, if so, to construct such a tree. Moreover, we show that all such binary trees are refinements of the unique binary-resolvable tree (BRT), which in general is a substantial refinement of the also unique least resolved tree of a BMG. Finally, we show that the problem of editing an arbitrary vertex-colored graph to a binary-explainable BMG is NP-complete and provide an integer linear program formulation for this task.
We present an algorithm that, with high probability, generates a random spanning tree from an edge-weighted undirected graph in $tilde{O}(n^{4/3}m^{1/2}+n^{2})$ time (The $tilde{O}(cdot)$ notation hides $operatorname{polylog}(n)$ factors). The tree is sampled from a distribution where the probability of each tree is proportional to the product of its edge weights. This improves upon the previous best algorithm due to Colbourn et al. that runs in matrix multiplication time, $O(n^omega)$. For the special case of unweighted graphs, this improves upon the best previously known running time of $tilde{O}(min{n^{omega},msqrt{n},m^{4/3}})$ for $m gg n^{5/3}$ (Colbourn et al. 96, Kelner-Madry 09, Madry et al. 15). The effective resistance metric is essential to our algorithm, as in the work of Madry et al., but we eschew determinant-based and random walk-based techniques used by previous algorithms. Instead, our algorithm is based on Gaussian elimination, and the fact that effective resistance is preserved in the graph resulting from eliminating a subset of vertices (called a Schur complement). As part of our algorithm, we show how to compute $epsilon$-approximate effective resistances for a set $S$ of vertex pairs via approximate Schur complements in $tilde{O}(m+(n + |S|)epsilon^{-2})$ time, without using the Johnson-Lindenstrauss lemma which requires $tilde{O}( min{(m + |S|)epsilon^{-2}, m+nepsilon^{-4} +|S|epsilon^{-2}})$ time. We combine this approximation procedure with an error correction procedure for handing edges where our estimate isnt sufficiently accurate.
Motivated by recent developments in optical switching and reconfigurable network design, we study dynamic binary search trees (BSTs) in the matching model. In the classical dynamic BST model, the cost of both link traversal and basic reconfiguration (rotation) is $O(1)$. However, in the matching model, the BST is defined by two optical switches (that represent two matchings in an abstract way), and each switch (or matching) reconfiguration cost is $alpha$ while a link traversal cost is still $O(1)$. In this work, we propose Arithmetic BST (A-BST), a simple dynamic BST algorithm that is based on dynamic Shannon-Fano-Elias coding, and show that A-BST is statically optimal for sequences of length $Omega(n alpha log alpha)$ where $n$ is the number of nodes (keys) in the tree.