ترغب بنشر مسار تعليمي؟ اضغط هنا

Vectorized VByte Decoding

48   0   0.0 ( 0 )
 نشر من قبل Daniel Lemire
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed.

قيم البحث

اقرأ أيضاً

Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Googles Varint-GB). They are appealing due to th eir simplicity and engineering convenience. Amazons varint-G8IU is one of the fastest byte-oriented compression technique published so far. It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors. To surpass varint-G8IU, we present Stream VByte, a novel byte-oriented compression technique that separates the control stream from the encoded data. Like varint-G8IU, Stream VByte is well suited for SIMD instructions. We show that Stream VByte decoding can be up to twice as fast as varint-G8IU decoding over real data sets. In this sense, Stream VByte establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache.
Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded an d computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named Subgraph2Vec, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) Subgraph2Vec improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.
`Tree pruning (TP) is an algorithm for probabilistic inference on binary Markov random fields. It has been recently derived by Dror Weitz and used to construct the first fully polynomial approximation scheme for counting independent sets up to the `t ree uniqueness threshold. It can be regarded as a clever method for pruning the belief propagation computation tree, in such a way to exactly account for the effect of loops. In this paper we generalize the original algorithm to make it suitable for decoding linear codes, and discuss various schemes for pruning the computation tree. Further, we present the outcomes of numerical simulations on several linear codes, showing that tree pruning allows to interpolate continuously between belief propagation and maximum a posteriori decoding. Finally, we discuss theoretical implications of the new method.
We consider the problem of resolving $ r$ point sources from $n$ samples at the low end of the spectrum when point spread functions (PSFs) are not known. Assuming that the spectrum samples of the PSFs lie in low dimensional subspace (let $s$ denote t he dimension), this problem can be reformulated as a matrix recovery problem, followed by location estimation. By exploiting the low rank structure of the vectorized Hankel matrix associated with the target matrix, a convex approach called Vectorized Hankel Lift is proposed for the matrix recovery. It is shown that $ngtrsim rslog^4 n$ samples are sufficient for Vectorized Hankel Lift to achieve the exact recovery. For the location retrieval from the matrix, applying the single snapshot MUSIC method within the vectorized Hankel lift framework corresponds to the spatial smoothing technique proposed to improve the performance of the MMV MUSIC for the direction-of-arrival (DOA) estimation.
We introduce a class of sets of words which is a natural common generalization of Sturmian sets and of interval exchange sets. This class of sets consists of the uniformly recurrent tree sets, where the tree sets are defined by a condition on the pos sible extensions of bispecial factors. We prove that this class is closed under maximal bifix decoding. The proof uses the fact that the class is also closed under decoding with respect to return words.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا