ﻻ يوجد ملخص باللغة العربية
Data loading has been one of the most common performance bottlenecks for many big data applications, especially when they are running on inefficient human-readable formats, such as JSON or CSV. Parsing, validating, integrity checking and data structure maintenance are all computationally expensive steps in loading these formats. Regardless of these costs, many records may be filtered later during query evaluation due to highly selective predicates -- resulting in wasted computation. Meanwhile, the computing power of client ends is typically not exploited. Here, we explore investing limited cycles of clients on prefiltering to accelerate data loading and enable data skipping for query execution. In this paper, we present CIAO, a tunable system to enable client cooperation with the server to enable efficient partial loading and data skipping for a given workload. We proposed an efficient algorithm that would select a near-optimal predicate set to push down within a given budget. Moreover, CIAO will address the trade-off between client cost and server savings by setting different budgets for different clients. We implemented CIAO and evaluated its performance on three real-world datasets. Our experimental results show that the system substantially accelerates data loading by up to 21x and query execution by up to 23x and improves end-to-end performance by up to 19x within a budget of 1.0 microseconds latency per record on clients.
A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such examples.
Assessing and improving the quality of data are fundamental challenges for data-intensive systems that have given rise to applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data migration are
In this paper, we present an embedding-based framework (TrQuery) for recommending solutions of a SPARQL query, including approximate solutions when exact querying solutions are not available due to incompleteness or inconsistencies of real-world RDF
It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can
Mobile apps that use location data are pervasive, spanning domains such as transportation, urban planning and healthcare. Important use cases for location data rely on statistical queries, e.g., identifying hotspots where users work and travel. Such