A continuous mathematical model of fault tolerance mechanisms for parallel applications
published by Aِl-Baath University
in 2016
in
and research's language is
العربية
Download
Abstract in English
In this paper, we introduce a continuous mathematical model to
optimize the compromise between the overhead of fault tolerance
mechanism and the faults impacts in the environment of
execution. The fault tolerance mechanism considered in this
research is a coordinated checkpoint/recovery mechanism and the
study based on stochastic model of different performance critics of
parallel application on parallel and distributed environment.
References used
Feitelson D.G,2005-The supercomputer industry in light of the top500 data Computing in Science Engineering,7(1):42-47
Oldeld R.A.and all, 2007-Modeling the impact of checkpoints on next-generation systems, 24th IEEE conference on mass storage systems and technologies, pages30–46
Cappello F., Geist A., Gropp B., Kale L., Kramer B. and Snir M., 2009-Toward exascale resilience, International Journal of High Performance Computing Applications, 23(4) :374