مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

P = FS: Parallel is Just Fast Serial

65 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Neil J. Gunther

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Neil J. Gunther

الأداء النظم الموزعة والتوازية والحوسبة العنقودية بنية الشبكات والإنترنت

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We prove that parallel processing with homogeneous processors is logically equivalent to fast serial processing. The reverse proposition can also be used to identify obscure opportunities for applying parallelism. To our knowledge, this theorem has not been previously reported in the queueing theory literature. A plausible explanation is offered for why this might be. The basic homogeneous theorem is also extended to optimizing the latency of heterogenous parallel arrays.

قيم البحث

181 - Neil J. Gunther 2020

This exposition presents a novel approach to solving an M/M/m queue for the waiting time and the residence time. The motivation comes from an algebraic solution for the residence time of the M/M/1 queue. The key idea is the introduction of an ansatz transformation, defined in terms of the Erlang B function, that avoids the more opaque derivation based on applied probability theory. The only prerequisite is an elementary knowledge of the Poisson distribution, which is already necessary for understanding the M/M/1 queue. The approach described here supersedes our earlier approximate morphing transformation.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية بنية الشبكات والإنترنت

Design and optimisation of an efficient HDF5 I/O kernel for massive parallel fluid flow simulations

198 - Christoph Ertl n Technische Universitat Munchen 2018

More and more massive parallel codes running on several hundreds of thousands of cores enter the computational science and engineering domain, allowing high-fidelity computations on up to trillions of unknowns for very detailed analyses of the underl ying problems. During such runs, typically gigabytes of data are being produced, hindering both efficient storage and (interactive) data exploration. Here, advanced approaches based on inherently distributed data formats such as HDF5 become necessary in order to avoid long latencies when storing the data and to support fast (random) access when retrieving the data for visual processing. Avoiding file locking and using collective buffering, write bandwidths to a single file close to the theoretical peak on a modern supercomputing cluster were achieved. The structure of the output file supports a very fast interactive visualisation and introduces additional steering functionality.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

Adapting the serial Alpgen event generator to simulate LHC collisions on millions of parallel threads

58 - J.T. Childers , T.D. Uram , T.J. LeCompte 2015

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.

فيزياء الطاقة العالية - الظواهر النظم الموزعة والتوازية والحوسبة العنقودية الفيزياء الحسابية

Parallel Binary Code Analysis

66 - Xiaozhu Meng , Jonathon M. Anderson , John Mellor-Crummey 2020

Binary code analysis is widely used to assess a programs correctness, performance, and provenance. Binary analysis applications often construct control flow graphs, analyze data flow, and use debugging information to understand how machine code relat es to source lines, inlined functions, and data types. To date, binary analysis has been single-threaded, which is too slow for applications such as performance analysis and software forensics, where it is becoming common to analyze binaries that are gigabytes in size and in large batches that contain thousands of binaries. This paper describes our design and implementation for accelerating the task of constructing control flow graphs (CFGs) from binaries with multithreading. Existing research focuses on addressing challenging code constructs encountered during constructing CFGs, including functions sharing code, jump table analysis, non-returning functions, and tail calls. However, existing analyses do not consider the complex interactions between concurrent analysis of shared code, making it difficult to extend existing serial algorithms to be parallel. A systematic methodology to guide the design of parallel algorithms is essential. We abstract the task of constructing CFGs as repeated applications of several core CFG operations regarding to creating functions, basic blocks, and edges. We then derive properties among CFG operations, including operation dependency, commutativity, monotonicity. These operation properties guide our design of a new parallel analysis for constructing CFGs. We achieved as much as 25$times$ speedup for constructing CFGs on 64 hardware threads. Binary analysis applications are significantly accelerated with the new parallel analysis: we achieve 8$times$ for a performance analysis tool and 7$times$ for a software forensic tool with 16 hardware threads.

الأداء

Finite Projective Geometry based Fast, Conflict-free Parallel Matrix Computations

146 - Shreeniwas Sapre , Hrishikesh Sharma , Abhishek Patil 2011

Matrix computations, especially iterative PDE solving (and the sparse matrix vector multiplication subproblem within) using conjugate gradient algorithm, and LU/Cholesky decomposition for solving system of linear equations, form the kernel of many ap plications, such as circuit simulators, computational fluid dynamics or structural analysis etc. The problem of designing approaches for parallelizing these computations, to get good speedups as much as possible as per Amdahls law, has been continuously researched upon. In this paper, we discuss approaches based on the use of finite projective geometry graphs for these two problems. For the problem of conjugate gradient algorithm, the approach looks at an alternative data distribution based on projective-geometry concepts. It is proved that this data distribution is an optimal data distribution for scheduling the main problem of dense matrix-vector multiplication. For the problem of parallel LU/Cholesky decomposition of general matrices, the approach is motivated by the recently published scheme for interconnects of distributed systems, perfect difference networks. We find that projective-geometry based graphs indeed offer an exciting way of parallelizing these computations, and in fact many others. Moreover, their applications ranges from architectural ones (interconnect choice) to algorithmic ones (data distributions).

التحليل العددي النظم الموزعة والتوازية والحوسبة العنقودية التحليل العددي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة القلمون الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

P = FS: Parallel is Just Fast Serial

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً