Subscribe to the gold package and get unlimited access to Shamra Academy

Strongly universal string hashing is fast

643 0 0.0 ( 0 )

Download Cite

Added by Daniel Lemire

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Owen Kaser - Daniel Lemire

Databases Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.

rate research

Recursive n-gram hashing is pairwise independent, at best

559 - Daniel Lemire , Owen Kaser 2016

Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time O(n) or use an exponential amount of memory. As a more scalable alternative, we make hashing by cyclic polynomials pairwise independent by ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials is is twice as fast as hashing by irreducible polynomials. We also show that randomized Karp-Rabin hash families are not pairwise independent.

Databases Computation and Language

FRESH: Frechet Similarity with Hashing

70 - Matteo Ceccarello , Anne Driemel , Francesco Silvestri 2018

This paper studies the $r$-range search problem for curves under the continuous Frechet distance: given a dataset $S$ of $n$ polygonal curves and a threshold $r>0$, construct a data structure that, for any query curve $q$, efficiently returns all entries in $S$ with distance at most $r$ from $q$. We propose FRESH, an approximate and randomized approach for $r$-range search, that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications. We experimentally compare fresh to exact and deterministic solutions, and we show that high performance can be reached by suitably relaxing precision and recall.

Computational Geometry Data Structures and Algorithms

Fast Class-wise Updating for Online Hashing

187 - Mingbao Lin , Rongrong Ji , Xiaoshuai Sun 2020

Online image hashing has received increasing research attention recently, which processes large-scale data in a streaming fashion to update the hash functions on-the-fly. To this end, most existing works exploit this problem under a supervised setting, i.e., using class labels to boost the hashing performance, which suffers from the defects in both adaptivity and efficiency: First, large amounts of training batches are required to learn up-to-date hash functions, which leads to poor online adaptivity. Second, the training is time-consuming, which contradicts with the core need of online learning. In this paper, a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH), is proposed to address the above two challenges by introducing a novel and efficient inner product operation. To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches. Quantitatively, such a decomposition further leads to at least 75% storage saving. To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently. Without additional constraints and variables, the time complexity is significantly reduced. Such a scheme is also quantitatively shown to well preserve past information during updating hashing functions. We have quantitatively demonstrated that the collective effort of class-wise updating and semi-relaxation optimization provides a superior performance comparing to various state-of-the-art methods, which is verified through extensive experiments on three widely-used datasets.

Computer Vision and Pattern Recognition Information Retrieval

Where is String Theory?

60 - Andrea Guerrieri , Joao Penedones , Pedro Vieira 2021

We use the S-matrix bootstrap to carve out the space of unitary, crossing symmetric and supersymmetric graviton scattering amplitudes in ten dimensions. We focus on the leading Wilson coefficient $alpha$ controlling the leading correction to maximal supergravity. The negative region $alpha<0$ is excluded by a simple dual argument based on linearized unitarity (the desert). A whole semi-infinite region $alpha gtrsim 0.14$ is allowed by the primal bootstrap (the garden). A finite intermediate region is excluded by non-perturbative unitarity (the swamp). Remarkably, string theory seems to cover all (or at least almost all) the garden from very large positive $alpha$ -- at weak coupling -- to the swamp boundary -- at strong coupling.

High Energy Physics - Theory

Faster 64-bit universal hashing using carry-less multiplications

578 - Daniel Lemire , Owen Kaser 2015

Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60% faster. We also compare CLHASH with a popular hash function designed for speed (Googles CityHash). We find that CLHASH is 40% faster than CityHash on inputs larger than 64 bytes and just as fast otherwise.

Data Structures and Algorithms Performance

comments

Fetching comments

Wadi International University

Additional details More universities

Strongly universal string hashing is fast

Ask ChatGPT about the research

No Arabic abstract

Read More