PRSP: A Plugin-based Framework for RDF Stream Processing

69 0 0.0 ( 0 )

Download Cite

Added by Xiaowang Zhang

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Qiong Li - Xiaowang Zhang - Zhiyong Feng

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose a plugin-based framework for RDF stream processing named PRSP. Within this framework, we can employ SPARQL query engines to process C-SPARQL queries with maintaining the high performance of those engines in a simple way. Taking advantage of PRSP, we can process large-scale RDF streams in a distributed context via distributed SPARQL engines. Besides, we can evaluate the performance and correctness of existing SPARQL query engines in handling RDF streams in a united way, which amends the evaluation of them ranging from static RDF (i.e., RDF graph) to dynamic RDF (i.e., RDF stream). Finally, within PRSP, we experimently evaluate the correctness and the performance on YABench. The experiments show that PRSP can still maintain the high performance of those engines in RDF stream processing although there are some slight differences among them.

rate research

PIWD: A Plugin-based Framework for Well-Designed SPARQL

265 - Xiaowang Zhang , Zhenyu Song , Zhiyong Feng 2016

In the real world datasets (e.g.,DBpedia query log), queries built on well-designed patterns containing only AND and OPT operators (for short, WDAO-patterns) account for a large proportion among all SPARQL queries. In this paper, we present a plugin-based framework for all SELECT queries built on WDAO-patterns, named PIWD. The framework is based on a parse tree called emph{well-designed AND-OPT tree} (for short, WDAO-tree) whose leaves are basic graph patterns (BGP) and inner nodes are the OPT operators. We prove that for any WDAO-pattern, its parse tree can be equivalently transformed into a WDAO-tree. Based on the proposed framework, we can employ any query engine to evaluate BGP for evaluating queries built on WDAO-patterns in a convenient way. Theoretically, we can reduce the query evaluation of WDAO-patterns to subgraph homomorphism as well as BGP since the query evaluation of BGP is equivalent to subgraph homomorphism. Finally, our preliminary experiments on gStore and RDF-3X show that PIWD can answer all queries built on WDAO-patterns effectively and efficiently.

Databases

Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version)

70 - Zuozhi Wang , Kai Zeng , Botong Huang 2020

Incremental processing is widely-adopted in many applications, ranging from incremental view maintenance, stream computing, to recently emerging progressive data warehouse and intermittent query processing. Despite many algorithms developed on this topic, none of them can produce an incremental plan that always achieves the best performance, since the optimal plan is data dependent. In this paper, we develop a novel cost-based optimizer framework, called Tempura, for optimizing incremental data processing. We propose an incremental query planning model called TIP based on the concept of time-varying relations, which can formally model incremental processing in its most general form. We give a full specification of Tempura, which can not only unify various existing techniques to generate an optimal incremental plan, but also allow the developer to add their rewrite rules. We study how to explore the plan space and search for an optimal incremental plan. We conduct a thorough experimental evaluation of Tempura in various incremental processing scenarios to show its effectiveness and efficiency.

Databases

An analytical framework for data stream mining techniques based on challenges and requirements

384 - Mahnoosh Kholghi (Department of Electronic , Computer , IT 2011

A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such examples. The imminent need for turning such data into useful information and knowledge augments the development of systems, algorithms and frameworks that address streaming challenges. The storage, querying and mining of such data sets are highly computationally challenging tasks. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non stopping streams of information. Generally, two main challenges are designing fast mining methods for data streams and need to promptly detect changing concepts and data distribution because of highly dynamic nature of data streams. The goal of this article is to analyze and classify the application of diverse data mining techniques in different challenges of data stream mining. In this paper, we present the theoretical foundations of data stream analysis and propose an analytical framework for data stream mining techniques.

Databases

Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey

98 - Waqas Ali , Muhammad Saleem , Bin Yao 2020

The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is significant adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ various mechanisms to implement critical components of the query processing engines such as data storage, indexing, language support, and query execution. All these components govern how queries are executed and can have a substantial effect on the query runtime. For example, the storage of RDF data in various ways significantly affects the data storage space required and the query runtime performance. The type of indexing approach used in RDF engines is critical for fast data lookup. The type of the underlying querying language (e.g., SPARQL or SQL) used for query execution is a crucial optimization component of the RDF storage solutions. Finally, query execution involving different join orders significantly affects the query response time. This paper provides a comprehensive review of centralized and distributed RDF engines in terms of storage, indexing, language support, and query execution.

Databases

PanJoin: A Partition-based Adaptive Stream Join

65 - Fei Pan , Hans-Arno Jacobsen 2018

In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests. In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data.

Databases