نقدم في هذا البحث نموذج رياضي مستمر للحصول على الحل الأمثل للمشكلة الناتجة
عن إضافة آلية للتسامح مع الأعطال في بيئات التنفيذ التفرعية و الموزعة عالية الآداء
و هي مشكلة التسوية بين الكلفة المضافة من آلية التسامح مع الأعطال و تأثير الأعطال
على بيئة التنفيذ و بالتالي على زمن انتهاء تنفيذ التطبيق المتوازي. طريقة التسامح مع
الأعطال المدروسة هي آلية تخزين/استرجاع متزامن و الدراسة المقترحة تعتمد على
نمذجة عشوائية مستمرة لمختمف قيود الأداء للتطبيق المتوازي المنفذ على بنية تفرعية
موزعة.
In this paper, we introduce a continuous mathematical model to
optimize the compromise between the overhead of fault tolerance
mechanism and the faults impacts in the environment of
execution. The fault tolerance mechanism considered in this
research is a coordinated checkpoint/recovery mechanism and the
study based on stochastic model of different performance critics of
parallel application on parallel and distributed environment.
References used
Feitelson D.G,2005-The supercomputer industry in light of the top500 data Computing in Science Engineering,7(1):42-47
Oldeld R.A.and all, 2007-Modeling the impact of checkpoints on next-generation systems, 24th IEEE conference on mass storage systems and technologies, pages30–46
Cappello F., Geist A., Gropp B., Kale L., Kramer B. and Snir M., 2009-Toward exascale resilience, International Journal of High Performance Computing Applications, 23(4) :374
In this paper we present a study on the time cost
added to the grid computing as a result of the use of a
coordinated checkpoint / recovery fault tolerance protocol, we aim
to find a mathematical model which determined the suitable time
to save t
In this research, We introduce two probabilistic mechanisms to
certificate parallel applications on distribute architecture supposing
that there are no oracles on which we depend on certification, in
addition to introducing cost model of two mecha
Overlay multicast (Application-Level Multicast (ALM)) constructs a multicast delivery tree among end hosts. Unlike traditional IP multicast where the internal tree nodes are dedicated routers which are relatively stable and do not leave the multicast
We introduce an auto adaptive strategy enables to write a parallel
algorithm adapts to the number of available resources at allocated
parallel environment to execute the parallel program. The parallel
applications we are studying which are represe
The NonUniform Memory Access (NUMA) machines are distributed shared
memory systems. In this paper, we extend conventional virtual memory
concepts to describe the status of memory in distributed, NUMA machines. We
present a mathematical model for v