Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

113 0 0.0 ( 0 )

Download Cite

Added by Cristian Ramon-Cortes

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Cristian Ramon-Cortes - Ramon Amela - Jorge Ejarque

Programming Languages Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores.

rate research

Program Execution on Reconfigurable Multicore Architectures

93 - Sanjiva Prasad 2016

Based on the two observations that diverse applications perform better on different multicore architectures, and that different phases of an application may have vastly different resource requirements, Pal et al. proposed a novel reconfigurable hardware approach for executing multithreaded programs. Instead of mapping a concurrent program to a fixed architecture, the architecture adaptively reconfigures itself to meet the applications concurrency and communication requirements, yielding significant improvements in performance. Based on our earlier abstract operational framework for multicore execution with hierarchical memory structures, we describe execution of multithreaded programs on reconfigurable architectures that support a variety of clustered configurations. Such reconfiguration may not preserve the semantics of programs due to the possible introduction of race conditions arising from concurrent accesses to shared memory by threads running on the different cores. We present an intuitive partial ordering notion on the cluster configurations, and show that the semantics of multithreaded programs is always preserved for reconfigurations upward in that ordering, whereas semantics preservation for arbitrary reconfigurations can be guaranteed for well-synchronised programs. We further show that a simple approximate notion of efficiency of execution on the different configurations can be obtained using the notion of amortised bisimulations, and extend it to dynamic reconfiguration.

Programming Languages Distributed Parallel and Cluster Computing Performance

PaPy: Parallel and Distributed Data-processing Pipelines in Python

401 - Marcin Cieslik , Cameron Mura 2014

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written Python functions (nodes) connected by pipes (edges) into a directed acyclic graph. These functions are arbitrarily definable, and can make use of any Python modules or external binaries. Given a user-defined topology and collection of input data, functions are composed into nested higher-order maps, which are transparently and robustly evaluated in parallel on a single computer or on remote hosts. Local and remote computational resources can be flexibly pooled and assigned to functional nodes, thereby allowing facile load-balancing and pipeline optimization to maximize computational throughput. Input items are processed by nodes in parallel, and traverse the graph in batches of adjustable size -- a trade-off between lazy-evaluation, parallelism, and memory consumption. The processing of a single item can be parallelized in a scatter/gather scheme. The simplicity and flexibility of distributed workflows using PaPy bridges the gap between desktop -> grid, enabling this new computing paradigm to be leveraged in the processing of large scientific datasets.

Programming Languages Quantitative Methods

Productivity, Portability, Performance: Data-Centric Python

339 - Alexandros Nikolaos Ziogas , Timo Schneider , Tal Ben-Nun 2021

Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. In this work, we present a workflow that retains Pythons high productivity while achieving portable performance across different architectures. The workflows key features are HPC-oriented language extensions and a set of automatic optimizations powered by a data-centric intermediate representation. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer (up to 23,328 cores), with 2.47x and 3.75x speedups over previous-best solutions, first-ever Xilinx and Intel FPGA results of annotated Python, and up to 93.16% scaling efficiency on 512 nodes.

Programming Languages Distributed Parallel and Cluster Computing Performance

Distributed Programming via Safe Closure Passing

132 - Philipp Haller 2016

Programming systems incorporating aspects of functional programming, e.g., higher-order functions, are becoming increasingly popular for large-scale distributed programming. New frameworks such as Apache Spark leverage functional techniques to provide high-level, declarative APIs for in-memory data analytics, often outperforming traditional big data frameworks like Hadoop MapReduce. However, widely-used programming models remain rather ad-hoc; aspects such as implementation trade-offs, static typing, and semantics are not yet well-understood. We present a new asynchronous programming model that has at its core several principles facilitating functional processing of distributed data. The emphasis of our model is on simplicity, performance, and expressiveness. The primary means of communication is by passing functions (closures) to distributed, immutable data. To ensure safe and efficient distribution of closures, our model leverages both syntactic and type-based restrictions. We report on a prototype implementation in Scala. Finally, we present preliminary experimental results evaluating the performance impact of a static, type-based optimization of serialization.

Programming Languages Distributed Parallel and Cluster Computing

Astroalign: A Python module for astronomical image registration

76 - Martin Beroiz , Juan B. Cabral , Bruno Sanchez 2019

We present an algorithm implemented in the astroalign Python module for image registration in astronomy. Our module does not rely on WCS information and instead matches 3-point asterisms (triangles) on the images to find the most accurate linear transformation between the two. It is especially useful in the context of aligning images prior to stacking or performing difference image analysis. Astroalign can match images of different point-spread functions, seeing, and atmospheric conditions.

Instrumentation and Methods for Astrophysics Computer Vision and Pattern Recognition

comments

Fetching comments

Peninsula Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

Ask ChatGPT about the research

No Arabic abstract

Read More