Finding Needles in the Haystack: Harnessing Syslogs for Data Center Management


Abstract in English

Network device syslogs are ubiquitous and abundant in modern data centers with most large data centers producing millions of messages per day. Yet, the operational information reflected in syslogs and their implications on diagnosis or management tasks are poorly understood. Prevalent approaches to understanding syslogs focus on simple correlation and abnormality detection and are often limited to detection providing little insight towards diagnosis and resolution. Towards improving data center operations, we propose and implement Log-Prophet, a system that applies a toolbox of statistical techniques and domain-specific models to mine detailed diagnoses. Log-Prophet infers causal relationships between syslog lines and constructs succinct but valuable problem graphs, summarizing root causes and their locality, including cascading problems. We validate Log-Prophet using problem tickets and through operator interviews. To demonstrate the strength of Log-Prophet, we perform an initial longitudinal study of a large online service providers data center. Our study demonstrates that Log-Prophet significantly reduces the number of alerts while highlighting interesting operational issues.

Download