Do you want to publish a course? Click here

Strongly universal string hashing is fast

238   0   0.0 ( 0 )
 Added by Daniel Lemire
 Publication date 2012
and research's language is English




Ask ChatGPT about the research

We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.



rate research

Read More

195 - Daniel Lemire , Owen Kaser 2016
Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time O(n) or use an exponential amount of memory. As a more scalable alternative, we make hashing by cyclic polynomials pairwise independent by ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials is is twice as fast as hashing by irreducible polynomials. We also show that randomized Karp-Rabin hash families are not pairwise independent.
This paper studies the $r$-range search problem for curves under the continuous Frechet distance: given a dataset $S$ of $n$ polygonal curves and a threshold $r>0$, construct a data structure that, for any query curve $q$, efficiently returns all entries in $S$ with distance at most $r$ from $q$. We propose FRESH, an approximate and randomized approach for $r$-range search, that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications. We experimentally compare fresh to exact and deterministic solutions, and we show that high performance can be reached by suitably relaxing precision and recall.
Online image hashing has received increasing research attention recently, which processes large-scale data in a streaming fashion to update the hash functions on-the-fly. To this end, most existing works exploit this problem under a supervised setting, i.e., using class labels to boost the hashing performance, which suffers from the defects in both adaptivity and efficiency: First, large amounts of training batches are required to learn up-to-date hash functions, which leads to poor online adaptivity. Second, the training is time-consuming, which contradicts with the core need of online learning. In this paper, a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH), is proposed to address the above two challenges by introducing a novel and efficient inner product operation. To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches. Quantitatively, such a decomposition further leads to at least 75% storage saving. To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently. Without additional constraints and variables, the time complexity is significantly reduced. Such a scheme is also quantitatively shown to well preserve past information during updating hashing functions. We have quantitatively demonstrated that the collective effort of class-wise updating and semi-relaxation optimization provides a superior performance comparing to various state-of-the-art methods, which is verified through extensive experiments on three widely-used datasets.
We use the S-matrix bootstrap to carve out the space of unitary, crossing symmetric and supersymmetric graviton scattering amplitudes in ten dimensions. We focus on the leading Wilson coefficient $alpha$ controlling the leading correction to maximal supergravity. The negative region $alpha<0$ is excluded by a simple dual argument based on linearized unitarity (the desert). A whole semi-infinite region $alpha gtrsim 0.14$ is allowed by the primal bootstrap (the garden). A finite intermediate region is excluded by non-perturbative unitarity (the swamp). Remarkably, string theory seems to cover all (or at least almost all) the garden from very large positive $alpha$ -- at weak coupling -- to the swamp boundary -- at strong coupling.
280 - Daniel Lemire , Owen Kaser 2015
Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60% faster. We also compare CLHASH with a popular hash function designed for speed (Googles CityHash). We find that CLHASH is 40% faster than CityHash on inputs larger than 64 bytes and just as fast otherwise.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا