ترغب بنشر مسار تعليمي؟ اضغط هنا

RecSplit: Minimal Perfect Hashing via Recursive Splitting

55   0   0.0 ( 0 )
 نشر من قبل Sebastiano Vigna
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A minimal perfect hash function bijectively maps a key set $S$ out of a universe $U$ into the first $|S|$ natural numbers. Minimal perfect hash functions are used, for example, to map irregularly-shaped keys, such as string, in a compact space so that metadata can then be simply stored in an array. While it is known that just $1.44$ bits per key are necessary to store a minimal perfect function, no published technique can go below $2$ bits per key in practice. We propose a new technique for storing minimal perfect hash functions with expected linear construction time and expected constant lookup time that makes it possible to build for the first time, for example, structures which need $1.56$ bits per key, that is, within $8.3$% of the lower bound, in less than $2$ ms per key. We show that instances of our construction are able to simultaneously beat the construction time, space usage and lookup time of the state-of-the-art data structure reaching $2$ bits per key. Moreover, we provide parameter choices giving structures which are competitive with alternative, larger-size data structures in terms of space and lookup time. The construction of our data structures can be easily parallelized or mapped on distributed computational units (e.g., within the MapReduce framework), and structures larger than the available RAM can be directly built in mass storage.

قيم البحث

اقرأ أيضاً

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03.
We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions. We formulate the value function estimation procedure in value iteration as an approximate maximum inner pro duct search problem and propose a locality sensitive hashing (LSH) [Indyk and Motwani STOC98, Andoni and Razenshteyn STOC15, Andoni, Laarhoven, Razenshteyn and Waingarten SODA17] type data structure to solve this problem with sublinear time complexity. Moreover, we build the connections between the theory of approximate maximum inner product search and the regret analysis of reinforcement learning. We prove that, with our choice of approximation factor, our Sublinear LSVI algorithms maintain the same regret as the original LSVI algorithms while reducing the runtime complexity to sublinear in the number of actions. To the best of our knowledge, this is the first work that combines LSH with reinforcement learning resulting in provable improvements. We hope that our novel way of combining data-structures and iterative algorithm will open the door for further study into cost reduction in optimization.
Motivated by adjacency in perfect matching polytopes, we study the shortest reconfiguration problem of perfect matchings via alternating cycles. Namely, we want to find a shortest sequence of perfect matchings which transforms one given perfect match ing to another given perfect matching such that the symmetric difference of each pair of consecutive perfect matchings is a single cycle. The problem is equivalent to the combinatorial shortest path problem in perfect matching polytopes. We prove that the problem is NP-hard even when a given graph is planar or bipartite, but it can be solved in polynomial time when the graph is outerplanar.
94 - Sungsoo Ahn 2015
Max-product Belief Propagation (BP) is a popular message-passing algorithm for computing a Maximum-A-Posteriori (MAP) assignment over a distribution represented by a Graphical Model (GM). It has been shown that BP can solve a number of combinatorial optimization problems including minimum weight matching, shortest path, network flow and vertex cover under the following common assumption: the respective Linear Programming (LP) relaxation is tight, i.e., no integrality gap is present. However, when LP shows an integrality gap, no model has been known which can be solved systematically via sequential applications of BP. In this paper, we develop the first such algorithm, coined Blossom-BP, for solving the minimum weight matching problem over arbitrary graphs. Each step of the sequential algorithm requires applying BP over a modified graph constructed by contractions and expansions of blossoms, i.e., odd sets of vertices. Our scheme guarantees termination in O(n^2) of BP runs, where n is the number of vertices in the original graph. In essence, the Blossom-BP offers a distributed version of the celebrated Edmonds Blossom algorithm by jumping at once over many sub-steps with a single BP. Moreover, our result provides an interpretation of the Edmonds algorithm as a sequence of LPs.
In this paper we further investigate the well-studied problem of finding a perfect matching in a regular bipartite graph. The first non-trivial algorithm, with running time $O(mn)$, dates back to K{o}nigs work in 1916 (here $m=nd$ is the number of ed ges in the graph, $2n$ is the number of vertices, and $d$ is the degree of each node). The currently most efficient algorithm takes time $O(m)$, and is due to Cole, Ost, and Schirra. We improve this running time to $O(min{m, frac{n^{2.5}ln n}{d}})$; this minimum can never be larger than $O(n^{1.75}sqrt{ln n})$. We obtain this improvement by proving a uniform sampling theorem: if we sample each edge in a $d$-regular bipartite graph independently with a probability $p = O(frac{nln n}{d^2})$ then the resulting graph has a perfect matching with high probability. The proof involves a decomposition of the graph into pieces which are guaranteed to have many perfect matchings but do not have any small cuts. We then establish a correspondence between potential witnesses to non-existence of a matching (after sampling) in any piece and cuts of comparable size in that same piece. Kargers sampling theorem for preserving cuts in a graph can now be adapted to prove our uniform sampling theorem for preserving perfect matchings. Using the $O(msqrt{n})$ algorithm (due to Hopcroft and Karp) for finding maximum matchings in bipartite graphs on the sampled graph then yields the stated running time. We also provide an infinite family of instances to show that our uniform sampling result is tight up to poly-logarithmic factors (in fact, up to $ln^2 n$).
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا