Localizing Faults in Cloud Systems


Abstract in English

By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to optimize the operative costs of software systems, but impacts significantly on the reliability of software applications. The lack of control of applications over Cloud execution environments largely limits the applicability of state-of-the-art approaches that address reliability issues by relying on heavyweight training with injected faults. In this paper, we propose emph(LOUD}, a lightweight fault localization approach that relies on positive training only, and can thus operate within the constraints of Cloud systems. emph{LOUD} relies on machine learning and graph theory. It trains machine learning models with correct executions only, and compensates the inaccuracy that derives from training with positive samples, by elaborating the outcome of machine learning techniques with graph theory algorithms. The experimental results reported in this paper confirm that emph{LOUD} can localize faults with high precision, by relying only on a lightweight positive training.

Download