بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

SIMD Compression and the Intersection of Sorted Integers

233 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Lemire

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Daniel Lemire - Leonid Boytsov - Nathan Kurz

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decoding speed can be wasted. To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm. We exploit the fact that one SIMD instruction can compare 4 pairs of integers at once. We experiment with two TREC text collections, GOV2 and ClueWeb09 (Category B), using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach.

قيم البحث

40 - Daniel Lemire , Nathan Kurz , Christoph Rupp 2017

Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Googles Varint-GB). They are appealing due to th eir simplicity and engineering convenience. Amazons varint-G8IU is one of the fastest byte-oriented compression technique published so far. It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors. To surpass varint-G8IU, we present Stream VByte, a novel byte-oriented compression technique that separates the control stream from the encoded data. Like varint-G8IU, Stream VByte is well suited for SIMD instructions. We show that Stream VByte decoding can be up to twice as fast as varint-G8IU decoding over real data sets. In this sense, Stream VByte establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache.

استرجاع المعلومات قواعد البيانات

PageRank of integers

545 - K. M. Frahm , A. D. Chepelianskii , D. L. Shepelyansky 2012

We build up a directed network tracing links from a given integer to its divisors and analyze the properties of the Google matrix of this network. The PageRank vector of this matrix is computed numerically and it is shown that its probability is inve rsely proportional to the PageRank index thus being similar to the Zipf law and the dependence established for the World Wide Web. The spectrum of the Google matrix of integers is characterized by a large gap and a relatively small number of nonzero eigenvalues. A simple semi-analytical expression for the PageRank of integers is derived that allows to find this vector for matrices of billion size. This network provides a new PageRank order of integers.

استرجاع المعلومات الميكانيكا الإحصائية نظرية الأعداد

Generalizations of intersection homology and perverse sheaves with duality over the integers

59 - Greg Friedman 2016

We provide a generalization of the Deligne sheaf construction of intersection homology theory, and a corresponding generalization of Poincare duality on pseudomanifolds, such that the Goresky-MacPherson, Goresky-Siegel, and Cappell-Shaneson duality t heorems all arise as special cases. Unlike classical intersection homology theory, our duality theorem holds with ground coefficients in an arbitrary PID and with no local cohomology conditions on the underlying space. Self-duality does require local conditions, but our perspective leads to a new class of spaces more general than the Goresky-Siegel IP spaces on which upper-middle perversity intersection homology is self dual. We also examine categories of perverse sheaves that contain our torsion-sensitive Deligne sheaves as intermediate extensions.

الطوبولوجيا الهندسية

Generating Concise and Readable Summaries of XML Documents

374 - Maya Ramanath , Kondreddi Sarath Kumar , Georgiana Ifrim 2009

XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, birds eye view of the ava ilable data is a necessity. In this paper, we are interested in semantic XML document summaries which present the important information available in an XML document to the user. In the best case, such a summary is a concise replacement for the original document itself. At the other extreme, it should at least help the user make an informed choice as to the relevance of the document to his needs. In this paper, we address the two main issues which arise in producing such meaningful and concise summaries: i) which tags or text units are important and should be included in the summary, ii) how to generate summaries of different sizes.%for different memory budgets. We conduct user studies with different real-life datasets and show that our methods are useful and effective in practice.

استرجاع المعلومات قواعد البيانات

Text Categorization using Association Rule and Naive Bayes Classifier

361 - S M Kamruzzaman , Chowdhury Mofizur Rahman 2010

As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic catego rization of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Text categorization using Association Rule and Naive Bayes Classifier is proposed here. Instead of using words word relation i.e association rules from these words is used to derive feature set from pre-classified text documents. Naive Bayes Classifier is then used on derived features for final categorization.

استرجاع المعلومات قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة السورية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SIMD Compression and the Intersection of Sorted Integers

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً