Tools and Benchmarks for Automated Log Parsing

161 0 0.0 ( 0 )

Download Cite

Added by Jieming Zhu

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Jieming Zhu - Shilin He - Jinyang Liu

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information that allows developers and support engineers to monitor their systems and dissect anomalous behaviors and errors. The increasing scale and complexity of modern software systems, however, make the volume of logs explodes. In many cases, the traditional way of manual log inspection becomes impractical. Many recent studies, as well as industrial tools, resort to powerful text search and machine learning-based analytics solutions. Due to the unstructured nature of logs, a first crucial step is to parse log messages into structured data for subsequent analysis. In recent years, automated log parsing has been widely studied in both academia and industry, producing a series of log parsers by different techniques. To better understand the characteristics of these log parsers, in this paper, we present a comprehensive evaluation study on automated log parsing and further release the tools and benchmarks for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. We report the benchmarking results in terms of accuracy, robustness, and efficiency, which are of practical importance when deploying automated log parsing in production. We also share the success stories and lessons learned in an industrial application at Huawei. We believe that our work could serve as the basis and provide valuable guidance to future research and deployment of automated log parsing.

rate research

A Survey on Automated Log Analysis for Reliability Engineering

336 - Shilin He , Pinjia He , Zhuangbin Chen 2020

Logs are semi-structured text generated by logging statements in software source code. In recent decades, software logs have become imperative in the reliability assurance mechanism of many software systems because they are often the only data available that record software runtime information. As modern software is evolving into a large scale, the volume of logs has increased rapidly. To enable effective and efficient usage of modern software logs in reliability engineering, a number of studies have been conducted on automated log analysis. This survey presents a detailed overview of automated log analysis research, including how to automate and assist the writing of logging statements, how to compress logs, how to parse logs into structured event templates, and how to employ logs to detect anomalies, predict failures, and facilitate diagnosis. Additionally, we survey work that releases open-source toolkits and datasets. Based on the discussion of the recent advances, we present several promising future directions toward real-world and next-generation automated log analysis.

Software Engineering

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

422 - Shilin He , Jieming Zhu , Pinjia He 2020

Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.

Software Engineering

NLO automated tools for QCD and beyond

136 - Nikolas Kauer 2012

Theoretical predictions for scattering processes with multi-particle final states at next-to-leading order (NLO) in perturbative QCD are essential to fully exploit the physics potential of present and future high-energy colliders. The status of NLO QCD calculations and tools is reviewed.

High Energy Physics - Phenomenology High Energy Physics - Experiment

Physics data management tools: computational evolutions and benchmarks

176 - Mincheol Han 2010

The development of a package for the management of physics data is described: its design, implementation and computational benchmarks. This package improves the data management tools originally developed for Geant4 physics models based on the EADL, EEDL and EPDL97 data libraries. The implementation exploits recent evolutions of the C++ libraries appearing in the C++0x draft, which are intended for inclusion in the next C++ ISO Standard. The new tools improve the computational performance of physics data management.

Computational Physics

Physics Data Management Tools for Monte Carlo Transport: Computational Evolutions and Benchmarks