Failure detector implementation using dual mode of heartbeat and interaction


Abstract in English

Failure detection plays a central role in the engineering of distributed systems. Furthermore, many applications have timing constraints and require failure detectors that provide quality of service (QoS) with some quantitative timeliness guarantees. Therefore, they need failure detectors that are fast and accurate. Failure detectors are oracles that provide information about process crashes , they are an important abstraction for fault tolerance in distributed systems. Although current failure detectors theory provides great generality and expressiveness, it also possess significant challenges in developing a robust hierarchy of failure detectors. In this paper, we propose an implementation of failure detectors. this implementation uses a dual model of heartbeat and interaction. First, the heartbeat model is adopted to shorten the detection time. if the detecting process does not receive the heartbeat message in the expected time, the interaction model is then used to check the process further.

References used

T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable distributed systems,” Journal of the ACM, vol. 43,no. 2, pp. 225-267, 1996
S. Bansal, S. Sharma, and I. Trivedi, “Adaptive staircase multiple failure detector for parallel and distributed image processing” in Proceedings of the 1st International Conference on Recent Advances in Information Technology,Dhanbad,India, 2012, pp. 91-94
W. Chen, S. Toueg, and M. K. Aguilera, “On the quality of service of failure detectors,” IEEE Transactions on Computers, vol. 51, no. 5, pp. 561-580, 2002

Download