Fast Scalable Construction of (Minimal Perfect Hash) Functions

352 0 0.0 ( 0 )

Download Cite

Added by Sebastiano Vigna

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Marco Genuzio - Giuseppe Ottaviano - Sebastiano Vigna

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03.

rate research

RecSplit: Minimal Perfect Hashing via Recursive Splitting

54 - Emmanuel Esposito , Thomas Mueller Graf , Sebastiano Vigna 2019

A minimal perfect hash function bijectively maps a key set $S$ out of a universe $U$ into the first $|S|$ natural numbers. Minimal perfect hash functions are used, for example, to map irregularly-shaped keys, such as string, in a compact space so that metadata can then be simply stored in an array. While it is known that just $1.44$ bits per key are necessary to store a minimal perfect function, no published technique can go below $2$ bits per key in practice. We propose a new technique for storing minimal perfect hash functions with expected linear construction time and expected constant lookup time that makes it possible to build for the first time, for example, structures which need $1.56$ bits per key, that is, within $8.3$% of the lower bound, in less than $2$ ms per key. We show that instances of our construction are able to simultaneously beat the construction time, space usage and lookup time of the state-of-the-art data structure reaching $2$ bits per key. Moreover, we provide parameter choices giving structures which are competitive with alternative, larger-size data structures in terms of space and lookup time. The construction of our data structures can be easily parallelized or mapped on distributed computational units (e.g., within the MapReduce framework), and structures larger than the available RAM can be directly built in mass storage.

Data Structures and Algorithms

Cache-Aware Lock-Free Concurrent Hash Tries

67 - Aleksandar Prokopec , Phil Bagwell , Martin Odersky 2017

This report describes an implementation of a non-blocking concurrent shared-memory hash trie based on single-word compare-and-swap instructions. Insert, lookup and remove operations modifying different parts of the hash trie can be run independent of each other and do not contend. Remove operations ensure that the unneeded memory is freed and that the trie is kept compact. A pseudocode for these operations is presented and a proof of correctness is given -- we show that the implementation is linearizable and lock-free. Finally, benchmarks are presented which compare concurrent hash trie operations against the corresponding operations on other concurrent data structures, showing their performance and scalability.

Data Structures and Algorithms

Analysis of Concurrent Lock-Free Hash Tries with Constant-Time Operations

101 - Aleksandar Prokopec 2017

Ctrie is a scalable concurrent non-blocking dictionary data structure, with good cache locality, and non-blocking linearizable iterators. However, operations on most existing concurrent hash tries run in O(log n) time. In this technical report, we extend the standard concurrent hash-tries with an auxiliary data structure called a cache. The cache is essentially an array that stores pointers to a specific level of the hash trie. We analyze the performance implications of adding a cache, and prove that the running time of the basic operations becomes O(1).

Data Structures and Algorithms

Quantum collision finding for homomorphic hash functions

72 - Juan Carlos Garcia-Escartin , Vicent Gimeno , Julio Josen Moyano-Fernandez 2021

Hash functions are a basic cryptographic primitive. Certain hash functions try to prove security against collision and preimage attacks by reductions to known hard problems. These hash functions usually have some additional properties that allow for that reduction. Hash functions which are additive or multiplicative are vulnerable to a quantum attack using the hidden subgroup problem algorithm for quantum computers. Using a quantum oracle to the hash, we can reconstruct the kernel of the hash function, which is enough to find collisions and second preimages. When the hash functions are additive with respect to the group operation in an Abelian group, there is always an efficient implementation of this attack. We present concrete attack examples to provable hash functions, including a preimage attack to $oplus$-linear hash functions and for certain multiplicative homomorphic hash schemes.

Cryptography and Security Commutative Algebra Quantum Physics

Scalable and Robust Construction of Topical Hierarchies

400 - Chi Wang , Xueqing Liu , Yanglei Song 2014

Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering with many valuable applications. In this paper a scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection. We divide and conquer the problem using a top-down recursive framework, based on a tensor orthogonal decomposition technique. We solve a critical challenge to perform scalable inference for our newly designed hierarchical topic model. Experiments with various real-world datasets illustrate its ability to generate robust, high-quality hierarchies efficiently. Our method reduces the time of construction by several orders of magnitude, and its robust feature renders it possible for users to interactively revise the hierarchy.

Machine Learning Computation and Language Databases