ترغب بنشر مسار تعليمي؟ اضغط هنا

Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version)

71   0   0.0 ( 0 )
 نشر من قبل Zuozhi Wang
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Incremental processing is widely-adopted in many applications, ranging from incremental view maintenance, stream computing, to recently emerging progressive data warehouse and intermittent query processing. Despite many algorithms developed on this topic, none of them can produce an incremental plan that always achieves the best performance, since the optimal plan is data dependent. In this paper, we develop a novel cost-based optimizer framework, called Tempura, for optimizing incremental data processing. We propose an incremental query planning model called TIP based on the concept of time-varying relations, which can formally model incremental processing in its most general form. We give a full specification of Tempura, which can not only unify various existing techniques to generate an optimal incremental plan, but also allow the developer to add their rewrite rules. We study how to explore the plan space and search for an optimal incremental plan. We conduct a thorough experimental evaluation of Tempura in various incremental processing scenarios to show its effectiveness and efficiency.

قيم البحث

اقرأ أيضاً

Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-Lite ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.
In this paper, we propose a plugin-based framework for RDF stream processing named PRSP. Within this framework, we can employ SPARQL query engines to process C-SPARQL queries with maintaining the high performance of those engines in a simple way. Tak ing advantage of PRSP, we can process large-scale RDF streams in a distributed context via distributed SPARQL engines. Besides, we can evaluate the performance and correctness of existing SPARQL query engines in handling RDF streams in a united way, which amends the evaluation of them ranging from static RDF (i.e., RDF graph) to dynamic RDF (i.e., RDF stream). Finally, within PRSP, we experimently evaluate the correctness and the performance on YABench. The experiments show that PRSP can still maintain the high performance of those engines in RDF stream processing although there are some slight differences among them.
Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first- order queries). Specifically, we introduce a graph-based provenance model that, while syntactic in nature, supports reverse reasoning and is proven to encode a wide range of provenance models from the literature. The implementation of this model in our PUG (Provenance Unification through Graphs) system takes a provenance question and Datalog query as an input and generates a Datalog program that computes an explanation, i.e., the part of the provenance that is relevant to answer the question. Furthermore, we demonstrate how a desirable factorization of provenance can be achieved by rewriting an input query. We experimentally evaluate our approach demonstrating its efficiency.
We propose hMDAP, a hybrid framework for large-scale data analytical processing on Spark, to support multi-paradigm process (incl. OLAP, machine learning, and graph analysis etc.) in distributed environments. The framework features a three-layer data process module and a business process module which controls the former. We will demonstrate the strength of hMDAP by using traffic scenarios in a real world.
Predicting the execution time of queries is an important problem with applications in scheduling, service level agreements and error detection. During query planning, a cost is associated with the chosen execution plan and used to rank competing plan s. It would be convenient to use that cost to predict execution time, but it has been claimed in the literature that this is not possible. In this paper, we thoroughly investigate this claim considering both linear and non-linear models. We find that the accuracy using more complex models with only the optimizer cost is comparable to the reported accuracy in the literature. The most accurate method in the literature is nearest-neighbour regression which does not produce a model. The published results used a large feature set to identify nearest neighbours. We show that it is possible to achieve the same level of accuracy using only the cost to identify nearest neighbours. Using a smaller feature set brings the advantages of reduced overhead in terms of both storage space for the training data and the time to produce a prediction.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا