The Graph Traversal Pattern

285 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Marko A. Rodriguez

تاريخ النشر 2010

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Marko A. Rodriguez - Peter Neubauer

بنى وهياكل البيانات والخوارزميات قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A graph is a structure composed of a set of vertices (i.e.nodes, dots) connected to one another by a set of edges (i.e.links, lines). The concept of a graph has been around since the late 19$^text{th}$ century, however, only in recent decades has there been a strong resurgence in both theoretical and applied graph research in mathematics, physics, and computer science. In applied computing, since the late 1960s, the interlinked table structure of the relational database has been the predominant information storage and retrieval model. With the growth of graph/network-based data and the need to efficiently process such data, new data management systems have been developed. In contrast to the index-intensive, set-theoretic operations of relational databases, graph databases make use of index-free, local traversals. This article discusses the graph traversal pattern and its use in computing.

قيم البحث

308 - Marko A. Rodriguez 2015

Gremlin is a graph traversal machine and language designed, developed, and distributed by the Apache TinkerPop project. Gremlin, as a graph traversal machine, is composed of three interacting components: a graph $G$, a traversal $Psi$, and a set of t raversers $T$. The traversers move about the graph according to the instructions specified in the traversal, where the result of the computation is the ultimate locations of all halted traversers. A Gremlin machine can be executed over any supporting graph computing system such as an OLTP graph database and/or an OLAP graph processor. Gremlin, as a graph traversal language, is a functional language implemented in the users native programming language and is used to define the $Psi$ of a Gremlin machine. This article provides a mathematical description of Gremlin and details its automaton and functional properties. These properties enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, hybrid depth- and breadth-first evaluation, as well as the existence of a Universal Gremlin Machine and its respective entailments.

قواعد البيانات الرياضيات المتقطعة

RULF: Rust Library Fuzzing via API Dependency Graph Traversal

131 - Jianfeng Jiang , Hui Xu , Yangfan Zhou 2021

Robustness is a key concern for Rust library development because Rust promises no risks of undefined behaviors if developers use safe APIs only. Fuzzing is a practical approach for examining the robustness of programs. However, existing fuzzing tools are not directly applicable to library APIs due to the absence of fuzz targets. It mainly relies on human efforts to design fuzz targets case by case which is labor-intensive. To address this problem, this paper proposes a novel automated fuzz target generation approach for fuzzing Rust libraries via API dependency graph traversal. We identify several essential requirements for library fuzzing, including validity and effectiveness of fuzz targets, high API coverage, and efficiency. To meet these requirements, we first employ breadth-first search with pruning to find API sequences under a length threshold, then we backward search longer sequences for uncovered APIs, and finally we optimize the sequence set as a set covering problem. We implement our fuzz target generator and conduct fuzzing experiments with AFL++ on several real-world popular Rust projects. Our tool finally generates 7 to 118 fuzz targets for each library with API coverage up to 0.92. We exercise each target with a threshold of 24 hours and finally find 30 previously-unknown bugs from seven libraries.

هندسة البرمجيات

Parameterized Complexity of the k-anonymity Problem

253 - Stefano Beretta , Paola Bonizzoni , Gianluca Della Vedova 2009

The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned i n clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be APX-hard even when the records values are over a binary alphabet and $k=3$, and when the records have length at most 8 and $k=4$ . In this paper we study how the complexity of the problem is influenced by different parameters. In this paper we follow this direction of research, first showing that the problem is W[1]-hard when parameterized by the size of the solution (and the value $k$). Then we exhibit a fixed parameter algorithm, when the problem is parameterized by the size of the alphabet and the number of columns. Finally, we investigate the computational (and approximation) complexity of the $k$-anonymity problem, when restricting the instance to records having length bounded by 3 and $k=3$. We show that such a restriction is APX-hard.

بنى وهياكل البيانات والخوارزميات قواعد البيانات الرياضيات المتقطعة

Validating XML Documents in the Streaming Model with External Memory

122 - Christian Konrad , Frederic Magniez 2010

We study the problem of validating XML documents of size $N$ against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which $p$-pass streaming algorithms require $Omega(N/p)$ space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this problem with memory space $O(log^2 N)$, a constant number of auxiliary read/write streams, and $O(log N)$ total number of passes on the XML document and auxiliary streams. An important intermediate step of this algorithm is the computation of the First-Child-Next-Sibling (FCNS) encoding of the initial XML document in a streaming fashion. We study this problem independently, and we also provide memory efficient streaming algorithms for decoding an XML document given in its FCNS encoding. Furthermore, validating XML documents encoding binary trees in the usual streaming model without external memory can be done with sublinear memory. There is a one-pass algorithm using $O(sqrt{N log N})$ space, and a bidirectional two-pass algorithm using $O(log^2 N)$ space performing this task.

بنى وهياكل البيانات والخوارزميات قواعد البيانات

The PGM-index: a multicriteria, compressed and learned approach to data indexing

68 - Paolo Ferragina , Giorgio Vinciguerra 2019

The recent introduction of learned indexes has shaken the foundations of the decades-old field of indexing data structures. Combining, or even replacing, classic design elements such as B-tree nodes with machine learning models has proven to give out standing improvements in the space footprint and time efficiency of data systems. However, these novel approaches are based on heuristics, thus they lack any guarantees both in their time and space requirements. We propose the Piecewise Geometric Model index (shortly, PGM-index), which achieves guaranteed I/O-optimality in query operations, learns an optimal number of linear models, and its peculiar recursive construction makes it a purely learned data structure, rather than a hybrid of traditional and learned indexes (such as RMI and FITing-tree). We show that the PGM-index improves the space of the FITing-tree by 63.3% and of the B-tree by more than four orders of magnitude, while achieving their same or even better query time efficiency. We complement this result by proposing three variants of the PGM-index. First, we design a compressed PGM-index that further reduces its space footprint by exploiting the repetitiveness at the level of the learned linear models it is composed of. Second, we design a PGM-index that adapts itself to the distribution of the queries, thus resulting in the first known distribution-aware learned index to date. Finally, given its flexibility in the offered space-time trade-offs, we propose the multicriteria PGM-index that efficiently auto-tune itself in a few seconds over hundreds of millions of keys to the possibly evolving space-time constraints imposed by the application of use. We remark to the reader that this paper is an extended and improved version of our previous paper titled Superseding traditional indexes by orchestrating learning and geometry (arXiv:1903.00507).

بنى وهياكل البيانات والخوارزميات قواعد البيانات استرجاع المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلوان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Graph Traversal Pattern

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً