Partitioning SKA Dataflows for Optimal Graph Execution

118 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chen Wu

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chen Wu - Andreas Wicenec - Rodrigo Tobar

النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Optimizing data-intensive workflow execution is essential to many modern scientific projects such as the Square Kilometre Array (SKA), which will be the largest radio telescope in the world, collecting terabytes of data per second for the next few decades. At the core of the SKA Science Data Processor is the graph execution engine, scheduling tens of thousands of algorithmic components to ingest and transform millions of parallel data chunks in order to solve a series of large-scale inverse problems within the power budget. To tackle this challenge, we have developed the Data Activated Liu Graph Engine (DALiuGE) to manage data processing pipelines for several SKA pathfinder projects. In this paper, we discuss the DALiuGE graph scheduling sub-system. By extending previous studies on graph scheduling and partitioning, we lay the foundation on which we can develop polynomial time optimization methods that minimize both workflow execution time and resource footprint while satisfying resource constraints imposed by individual algorithms. We show preliminary results obtained from three radio astronomy data pipelines.

قيم البحث

162 - Masatoshi Hanai , Nikos Tziritas , Toyotaro Suzumura 2021

The dynamic scaling of distributed computations plays an important role in the utilization of elastic computational resources, such as the cloud. It enables the provisioning and de-provisioning of resources to match dynamic resource availability and demands. In the case of distributed graph processing, changing the number of the graph partitions while maintaining high partitioning quality imposes serious computational overheads as typically a time-consuming graph partitioning algorithm needs to execute each time repartitioning is required. In this paper, we propose a dynamic scaling method that can efficiently change the number of graph partitions while keeping its quality high. Our idea is based on two techniques: preprocessing and very fast edge partitioning, called graph edge ordering and chunk-based edge partitioning, respectively. The former converts the graph data into an ordered edge list in such a way that edges with high locality are closer to each other. The latter immediately divides the ordered edge list into an arbitrary number of high-quality partitions. The evaluation with the real-world billion-scale graphs demonstrates that our proposed approach significantly reduces the repartitioning time, while the partitioning quality it achieves is on par with that of the best existing static method.

النظم الموزعة والتوازية والحوسبة العنقودية قواعد البيانات الرياضيات المتقطعة

DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

74 - Chen Wu , Rodrigo Tobar , Kevin Vinsen 2017

The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipe lines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGEs scalability to very large numbers of tasks on two supercomputing facilities.

النظم الموزعة والتوازية والحوسبة العنقودية أجهزة الكشف الفيزيائية

xDGP: A Dynamic Graph Processing System with Adaptive Partitioning

304 - Luis Vaquero , Felix Cuadrado , Dionysios Logothetis 2013

Many real-world systems, such as social networks, rely on mining efficiently large graphs, with hundreds of millions of vertices and edges. This volume of information requires partitioning the graph across multiple nodes in a distributed system. This has a deep effect on performance, as traversing edges cut between partitions incurs a significant performance penalty due to the cost of communication. Thus, several systems in the literature have attempted to improve computational performance by enhancing graph partitioning, but they do not support another characteristic of real-world graphs: graphs are inherently dynamic, their topology evolves continuously, and subsequently the optimum partitioning also changes over time. In this work, we present the first system that dynamically repartitions massive graphs to adapt to structural changes. The system optimises graph partitioning to prevent performance degradation without using data replication. The system adopts an iterative vertex migration algorithm that relies on local information only, making complex coordination unnecessary. We show how the improvement in graph partitioning reduces execution time by over 50%, while adapting the partitioning to a large number of changes to the graph in three real-world scenarios.

النظم الموزعة والتوازية والحوسبة العنقودية

A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators

130 - Raveesh Garg , Eric Qin , Francisco Mu~noz Martinez 2021

Recently, Graph Neural Networks (GNNs) have received a lot of interest because of their success in learning representations from graph structured data. However, GNNs exhibit different compute and memory characteristics compared to traditional Deep Ne ural Networks (DNNs). Graph convolutions require feature aggregations from neighboring nodes (known as the aggregation phase), which leads to highly irregular data accesses. GNNs also have a very regular compute phase that can be broken down to matrix multiplications (known as the combination phase). All recently proposed GNN accelerators utilize different dataflows and microarchitecture optimizations for these two phases. Different communication strategies between the two phases have been also used. However, as more custom GNN accelerators are proposed, the harder it is to qualitatively classify them and quantitatively contrast them. In this work, we present a taxonomy to describe several diverse dataflows for running GNN inference on accelerators. This provides a structured way to describe and compare the design-space of GNN accelerators.

النظم الموزعة والتوازية والحوسبة العنقودية هندسة العتاد

Memory-Aware Partitioning of Machine Learning Applications for Optimal Energy Use in Batteryless Systems

125 - Andres Gomez , Andreas Tretter , Pascal Alexander Hager 2021

Sensing systems powered by energy harvesting have traditionally been designed to tolerate long periods without energy. As the Internet of Things (IoT) evolves towards a more transient and opportunistic execution paradigm, reducing energy storage cost s will be key for its economic and ecologic viability. However, decreasing energy storage in harvesting systems introduces reliability issues. Transducers only produce intermittent energy at low voltage and current levels, making guaranteed task completion a challenge. Existing ad hoc methods overcome this by buffering enough energy either for single tasks, incurring large data-retention overheads, or for one full application cycle, requiring a large energy buffer. We present Julienning: an automated method for optimizing the total energy cost of batteryless applications. Using a custom specification model, developers can describe transient applications as a set of atomically executed kernels with explicit data dependencies. Our optimization flow can partition data- and energy-intensive applications into multiple execution cycles with bounded energy consumption. By leveraging interkernel data dependencies, these energy-bounded execution cycles minimize the number of system activations and nonvolatile data transfers, and thus the total energy overhead. We validate our methodology with two batteryless cameras running energy-intensive machine learning applications. Results demonstrate that compared to ad hoc solutions, our method can reduce the required energy storage by over 94% while only incurring a 0.12% energy overhead.

النظم الموزعة والتوازية والحوسبة العنقودية هندسة العتاد التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات