ﻻ يوجد ملخص باللغة العربية
Extract-Transform-Load (ETL) handles large amount of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and the resources required by ETL dataflows, this paper presents a framework to optimize dataflows using shared cache and parallelization techniques. The framework classifies the components in an ETL dataflow into different categories based on their data operation properties. The framework then partitions the dataflow based on the classification at different granularities. Furthermore, the framework applies optimization techniques such as cache re-using, pipelining and multi-threading to the already-partitioned dataflows. The proposed techniques reduce system memory footprint and the frequency of copying data between different components, and also take full advantage of the computing power of multi-core processors. The experimental results show that the proposed optimization framework is 4.7 times faster than the ordinary ETL dataflows (without using the proposed optimization techniques), and outperforms the similar tool (Kettle).
A practical and promising approach to parallelizing XPath queries was proposed by Bordawekar et al. in 2009, which enables parallelization on top of existing XML database engines. Although they experimentally demonstrated the speedup by their approac
In data warehousing, Extract-Transform-Load (ETL) extracts the data from data sources into a central data warehouse regularly for the support of business decision-makings. The data from transaction processing systems are featured with the high freque
We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based) architectur
We consider the problem of emph{secretive coded caching} in a shared cache setup where the number of users accessing a particular emph{helper cache} is more than one, and every user can access exactly one helper cache. In secretive coded caching, the
Nowadays, tiered architectures are widely accepted for constructing large scale information systems. In this context application servers often form the bottleneck for a systems efficiency. An application server exposes an object oriented interface co