بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey

99 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Waqas Ali Mr

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Waqas Ali - Muhammad Saleem - Bin Yao

قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is significant adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ various mechanisms to implement critical components of the query processing engines such as data storage, indexing, language support, and query execution. All these components govern how queries are executed and can have a substantial effect on the query runtime. For example, the storage of RDF data in various ways significantly affects the data storage space required and the query runtime performance. The type of indexing approach used in RDF engines is critical for fast data lookup. The type of the underlying querying language (e.g., SPARQL or SQL) used for query execution is a crucial optimization component of the RDF storage solutions. Finally, query execution involving different join orders significantly affects the query response time. This paper provides a comprehensive review of centralized and distributed RDF engines in terms of storage, indexing, language support, and query execution.

قيم البحث

168 - Waqas Ali , Muhammad Saleem , Bin Yao 2021

The RDF graph-based data model has seen ever-broadening adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey pape r provides a comprehensive review of techniques, engines and benchmarks for querying RDF knowledge graphs. While other reviews on this topic tend to focus on the distributed setting, the main focus of the work is on providing a comprehensive survey of state-of-the-art storage, indexing and query processing techniques for efficiently evaluating SPARQL queries in a local setting (on one machine). To keep the survey self-contained, we also provide a short discussion on graph partitioning techniques used in the distributed setting. We conclude by discussing contemporary research challenges for further improving SPARQL query engines. An online extended version also provides a survey of over one hundred SPARQL query engines and the techniques they use, along with twelve benchmarks and their features.

قواعد البيانات

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

97 - Umair Qudus , Muhammad Saleem , Axel-Cyrille Ngonga Ngomo 2021

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines a cross different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.

قواعد البيانات التعلم الآلي الأداء

PRSP: A Plugin-based Framework for RDF Stream Processing

68 - Qiong Li , Xiaowang Zhang , Zhiyong Feng 2017

In this paper, we propose a plugin-based framework for RDF stream processing named PRSP. Within this framework, we can employ SPARQL query engines to process C-SPARQL queries with maintaining the high performance of those engines in a simple way. Tak ing advantage of PRSP, we can process large-scale RDF streams in a distributed context via distributed SPARQL engines. Besides, we can evaluate the performance and correctness of existing SPARQL query engines in handling RDF streams in a united way, which amends the evaluation of them ranging from static RDF (i.e., RDF graph) to dynamic RDF (i.e., RDF stream). Finally, within PRSP, we experimently evaluate the correctness and the performance on YABench. The experiments show that PRSP can still maintain the high performance of those engines in RDF stream processing although there are some slight differences among them.

قواعد البيانات

Q-Graph: Preserving Query Locality in Multi-Query Graph Processing

61 - Christian Mayer , Ruben Mayer , Jonas Grunert 2018

Arising user-centric graph applications such as route planning and personalized social network analysis have initiated a shift of paradigms in modern graph processing systems towards multi-query analysis, i.e., processing multiple graph queries in pa rallel on a shared graph. These applications generate a dynamic number of localized queries around query hotspots such as popular urban areas. However, existing graph processing systems are not yet tailored towards these properties: The employed methods for graph partitioning and synchronization management disregard query locality and dynamism which leads to high query latency. To this end, we propose the system Q-Graph for multi-query graph analysis that considers query locality on three levels. (i) The query-aware graph partitioning algorithm Q-cut maximizes query locality to reduce communication overhead. (ii) The method for synchronization management, called hybrid barrier synchronization, allows for full exploitation of local queries spanning only a subset of partitions. (iii) Both methods adapt at runtime to changing query workloads in order to maintain and exploit locality. Our experiments show that Q-cut reduces average query latency by up to 57 percent compared to static query-agnostic partitioning algorithms.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية

UStore: A Distributed Storage With Rich Semantics

400 - Anh Dinh , Ji Wang , Sheng Wang 2017

Todays storage systems expose abstractions which are either too low-level (e.g., key-value store, raw-block store) that they require developers to re-invent the wheels, or too high-level (e.g., relational databases, Git) that they lack generality to support many classes of applications. In this work, we propose and implement a general distributed data storage system, called UStore, which has rich semantics. UStore delivers three key properties, namely immutability, sharing and security, which unify and add values to many classes of todays applications, and which also open the door for new applications. By keeping the core properties within the storage, UStore helps reduce application development efforts while offering high performance at hand. The storage embraces current hardware trends as key enablers. It is built around a data-structure similar to that of Git, a popular source code versioning system, but it also synthesizes many designs from distributed systems and databases. Our current implementation of UStore has better performance than general in-memory key-value storage systems, especially for version scan operations. We port and evaluate four applications on top of UStore: a Git-like application, a collaborative data science application, a transaction management application, and a blockchain application. We demonstrate that UStore enables faster development and the UStore-backed applications can have better performance than the existing implementations.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني للإدارة العامة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً