Failure detection plays a central role in the engineering of
distributed systems. Furthermore, many applications have timing
constraints and require failure detectors that provide quality of
service (QoS) with some quantitative timeliness guarante
es.
Therefore, they need failure detectors that are fast and accurate.
Failure detectors are oracles that provide information about process
crashes , they are an important abstraction for fault tolerance in
distributed systems. Although current failure detectors theory
provides great generality and expressiveness, it also possess
significant challenges in developing a robust hierarchy of failure
detectors.
In this paper, we propose an implementation of failure detectors.
this implementation uses a dual model of heartbeat and interaction.
First, the heartbeat model is adopted to shorten the detection time.
if the detecting process does not receive the heartbeat message in
the expected time, the interaction model is then used to check the
process further.
In this paper, we propose an implementation of hierarchical failure
detectors, which depends on dividing the processes into sub-groups
and elect one leader called the main process .
The main process then distributes the remaining processes into
g
roups and chooses one leader for each one.
Finally failure detector applied in the chosen leaders which send the
results to the central process.