ﻻ يوجد ملخص باللغة العربية
In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests. In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data.
Resource Description Framework (RDF) has been widely used to represent information on the web, while SPARQL is a standard query language to manipulate RDF data. Given a SPARQL query, there often exist many joins which are the bottlenecks of efficienc
In this paper, we propose a plugin-based framework for RDF stream processing named PRSP. Within this framework, we can employ SPARQL query engines to process C-SPARQL queries with maintaining the high performance of those engines in a simple way. Tak
XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimiz
We introduce and study the problem of computing the similarity self-join in a streaming context (SSSJ), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose similarity is gr
Given two collections of set objects $R$ and $S$, the $R bowtie_{subseteq} S$ set containment join returns all object pairs $(r, s) in R times S$ such that $r subseteq s$. Besides being a basic operator in all modern data management systems with a wi