Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A continuous mathematical model of fault tolerance mechanisms for parallel applications

نموذج رياضي مستمر للتسامح مع الأعطال في التطبيقات المتوازية

1606 1 12 0 ( 0 )

Download Cite

Added by Aِl-Baath University ورقة بحثية

Publication date 2016

and research's language is العربية

Authors سمير جعفر( باحث )

Created by Shamra Editor

النمذجة الرياضية Mathematical modeling التطبيقات المتوازية التسامح مع الأعطال البنى التفرعية الموزعة الاحتمالات نظرية الموثوقية parallel applications parallel and distributed environment probability reliability theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we introduce a continuous mathematical model to optimize the compromise between the overhead of fault tolerance mechanism and the faults impacts in the environment of execution. The fault tolerance mechanism considered in this research is a coordinated checkpoint/recovery mechanism and the study based on stochastic model of different performance critics of parallel application on parallel and distributed environment.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

يقدم هذا البحث نموذجًا رياضيًا مستمرًا لتحسين التوازن بين الكلفة المضافة من آلية التسامح مع الأعطال وتأثير الأعطال على بيئة التنفيذ في التطبيقات المتوازية والموزعة. يعتمد النموذج على آلية تخزين/استرجاع متزامن ويستخدم نمذجة عشوائية مستمرة لقياس أداء التطبيقات المتوازية في بيئات التنفيذ التفرعية. يهدف البحث إلى إيجاد الحل الأمثل لتحديد التردد الزمني الأنسب لإنشاء نقاط الاستعادة للتطبيقات المتوازية من أجل تقليل زمن التنفيذ. تم تقديم نماذج رياضية لوصف الأعطال وبيئة التنفيذ، بالإضافة إلى صياغة مسألة الحل الأمثل لإيجاد استراتيجية جدولة نقاط الاستعادة التي تقلل من زمن التنفيذ.

Critical review

دراسة نقدية: يعد هذا البحث خطوة مهمة في مجال تحسين أداء التطبيقات المتوازية في بيئات التنفيذ التفرعية والموزعة. ومع ذلك، يمكن أن تكون النماذج الرياضية المقدمة معقدة بعض الشيء، مما قد يصعب على بعض القراء فهمها وتطبيقها. كما أن البحث يركز بشكل كبير على النماذج النظرية دون تقديم أمثلة عملية أو تجارب تطبيقية لدعم النتائج. قد يكون من المفيد تضمين دراسات حالة أو تجارب عملية لتوضيح كيفية تطبيق النماذج في الواقع العملي.

Questions related to the research

ما هو الهدف الرئيسي من البحث؟

الهدف الرئيسي من البحث هو إيجاد الحل الأمثل لتحديد التردد الزمني الأنسب لإنشاء نقاط الاستعادة للتطبيقات المتوازية من أجل تقليل زمن التنفيذ في بيئات التنفيذ التفرعية والموزعة.
ما هي آلية التسامح مع الأعطال المستخدمة في البحث؟

آلية التسامح مع الأعطال المستخدمة في البحث هي آلية تخزين/استرجاع متزامن.
ما هي التحديات التي تواجه التطبيقات المتوازية في بيئات التنفيذ التفرعية؟

التحديات تشمل زيادة معدل الأعطال مع زيادة عدد المعالجات، مما يؤدي إلى زيادة زمن التنفيذ والحاجة إلى آليات فعالة للتسامح مع الأعطال لضمان انتهاء التنفيذ بشكل موثوق.
كيف يمكن تحسين أداء التطبيقات المتوازية وفقًا للبحث؟

يمكن تحسين أداء التطبيقات المتوازية من خلال استخدام نماذج رياضية لتحديد التردد الزمني الأنسب لإنشاء نقاط الاستعادة، مما يقلل من زمن التنفيذ ويزيد من موثوقية الأداء.

Keywords

التطبيقات المتوازية البنى التفرعية الموزعة التسامح مع الأعطال الاحتمالات نظرية الموثوقية النمذجة الرياضية

References used

Feitelson D.G,2005-The supercomputer industry in light of the top500 data Computing in Science Engineering,7(1):42-47

Oldeld R.A.and all, 2007-Modeling the impact of checkpoints on next-generation systems, 24th IEEE conference on mass storage systems and technologies, pages30–46

Cappello F., Geist A., Gropp B., Kale L., Kramer B. and Snir M., 2009-Toward exascale resilience, International Journal of High Performance Computing Applications, 23(4) :374

rate research

Using differential equations for modeling performance of fault tolerance in parallel applications

1713 - Aِl-Baath University 2017 ورقة بحثية

In this paper we present a study on the time cost added to the grid computing as a result of the use of a coordinated checkpoint / recovery fault tolerance protocol, we aim to find a mathematical model which determined the suitable time to save t he checkpoints for application, to achieve a minimum finish time of parallel application in grid computing with faults and fault tolerance protocols, we have find this model by serial modeling to the goal errors, execution environment and the chosen fault tolerance protocol all that by Kolmogorov differential equations.

نموذج رياضي mathematical model Fault Tolerance معادلات تفاضلية التطبيقات المتوازية Parallel application التسامح مع الأعطال البيئات التفرعية distributed environment differential equation المزيد..

Using data flow graph to certificate parallel applications

1434 - Aِl-Baath University 2017 ورقة بحثية

In this research, We introduce two probabilistic mechanisms to certificate parallel applications on distribute architecture supposing that there are no oracles on which we depend on certification, in addition to introducing cost model of two mecha nisms and compare them. In this research, we are interested in parallel applications, which are represented by data-flow graph that is built dynamically during the execution and which are executed in a wide distributed heterogeneous and dynamic environment and these applications use the principle of work stealing to distribute the tasks among the processors.

التطبيقات المتوازية مخطط تدفق البيانات الخوارزميات الاحتمالية العقد الموثوقة التحقق Parallel application Data flow graph Probabilistic Algorithm Oracle Certification المزيد..

Fault Tolerance in Application-Level Multicast Networks

1541 - Tishreen University 2015 ورقة بحثية

Overlay multicast (Application-Level Multicast (ALM)) constructs a multicast delivery tree among end hosts. Unlike traditional IP multicast where the internal tree nodes are dedicated routers which are relatively stable and do not leave the multicast tree voluntarily, the non-leaf nodes in the overlay tree are free end hosts which can join/leave the overlay at will, or even crash without notification. So, the leaving node can leave suddenly and cannot give its descendants (and the Rendez-vous Point (RP)) the time to prepare the recovering (the reconnection) of the overlay tree, and so there is a need to trigger a rearrangement process in which each one of its descendants should rejoin the overlay tree. In this case, all of its downstream nodes are partitioned from the overlay tree and cannot get the multicast data any more. These dynamic characteristics cause the instability of the overlay tree, which can significantly impact the user. A key challenge in constructing an efficient and resilient ALM protocol is to provide fast data recovery when overlay node failures partition the data delivery paths. In this paper, we analyze the performance of the ALM tree recovery solutions using different metrics.

الشبكات التطبيقية متعددة البث البث المجموعاتي multicast سماحية الأعطال الطرائق التفاعلية الطرائق الاستباقية Application-Level Multicast Fault Tolerance Reactive methods Proactive Methods المزيد..

Auto Adaptive Strategy for Parallel Applications

1493 - Aِl-Baath University 2016 ورقة بحثية

We introduce an auto adaptive strategy enables to write a parallel algorithm adapts to the number of available resources at allocated parallel environment to execute the parallel program. The parallel applications we are studying which are represe nted by data-flow graph which built dynamically during the execution. The new suggested strategy is based on coupling of a sequential algorithm and a parallel one and relies on the principle of work stealing in the tasks scheduling. We offer a study of the complexity of the adaptive algorithm and analyze its performance on processors and compare it with a performance of a classic parallel algorithm.

مخطط تدفق البيانات سرقة العمل work stealing الخوارزمية المتوازية الخوارزمية المتكيفة المهام جدولة العمل parallel algorithm adaptive algorithm data-flow graph tasks work scheduling المزيد..

A Mathematical Model for Non-Uniform Memory Access Machines

595 - Damascus University 1998 ورقة بحثية

The NonUniform Memory Access (NUMA) machines are distributed shared memory systems. In this paper, we extend conventional virtual memory concepts to describe the status of memory in distributed, NUMA machines. We present a mathematical model for v irtual memory systems in centralized systems and NUMA machines. The model will show the status of memory in response to memory references.

الذاكرة المشتركة المنتشرة حواسيب ذاكرة الوصول غير المتماثل نظام التشغيل الذاكرة البعيدة استراتيجيات تبديل الصفحات الذاكرة الخيالية Distributed Shared Memory NonUniform Memory Access Machines Operating System Remote Memory Paging Replacement Policies Virtual Memory المزيد..

comments

Fetching comments

Al-Etihad University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A continuous mathematical model of fault tolerance mechanisms for parallel applications

نموذج رياضي مستمر للتسامح مع الأعطال في التطبيقات المتوازية

Ask ChatGPT about the research

Read More