Using differential equations for modeling performance of fault tolerance in parallel applications


Abstract in English

In this paper we present a study on the time cost added to the grid computing as a result of the use of a coordinated checkpoint / recovery fault tolerance protocol, we aim to find a mathematical model which determined the suitable time to save the checkpoints for application, to achieve a minimum finish time of parallel application in grid computing with faults and fault tolerance protocols, we have find this model by serial modeling to the goal errors, execution environment and the chosen fault tolerance protocol all that by Kolmogorov differential equations.

References used

AHMED W., HASAN O., and TAHAR S., 2016-Formal Dependability Modeling and Analysis: A Survey. CICM: International Conference on Intelligent Computer Mathematics, PP: 132-147, doi: 10.1007/978-3-319- 42547-4-10
BUNTINA D., COTI C., HERAULT T., LEMARINIER P., PILARD L., REZMERITA A., RODRIGUEZ E., and CAPPELLO F., 2008-Blocking vs. non-blocking coordinated checkpointing for large scale fault tolerant MPI Protocols. in Future Generation Computer Systems, V. 24, Issue 1, PP: 73–84
DABROWSKI C., 2009-Reliability in grid computing systems. in journal Concurrency and Computation: Practice & Experience – A Special Issue from the Open Grid Forum, V. 21, Issue 8, PP: 927-959

Download