بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

424 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shilin He

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shilin He - Jieming Zhu - Pinjia He

هندسة البرمجيات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.

قيم البحث

160 - Jieming Zhu , Shilin He , Jinyang Liu 2018

Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information that allows developers and support engineers to monitor their systems and dissect anomalous behaviors and errors. The in creasing scale and complexity of modern software systems, however, make the volume of logs explodes. In many cases, the traditional way of manual log inspection becomes impractical. Many recent studies, as well as industrial tools, resort to powerful text search and machine learning-based analytics solutions. Due to the unstructured nature of logs, a first crucial step is to parse log messages into structured data for subsequent analysis. In recent years, automated log parsing has been widely studied in both academia and industry, producing a series of log parsers by different techniques. To better understand the characteristics of these log parsers, in this paper, we present a comprehensive evaluation study on automated log parsing and further release the tools and benchmarks for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. We report the benchmarking results in terms of accuracy, robustness, and efficiency, which are of practical importance when deploying automated log parsing in production. We also share the success stories and lessons learned in an industrial application at Huawei. We believe that our work could serve as the basis and provide valuable guidance to future research and deployment of automated log parsing.

هندسة البرمجيات

A Survey on Automated Log Analysis for Reliability Engineering

336 - Shilin He , Pinjia He , Zhuangbin Chen 2020

Logs are semi-structured text generated by logging statements in software source code. In recent decades, software logs have become imperative in the reliability assurance mechanism of many software systems because they are often the only data availa ble that record software runtime information. As modern software is evolving into a large scale, the volume of logs has increased rapidly. To enable effective and efficient usage of modern software logs in reliability engineering, a number of studies have been conducted on automated log analysis. This survey presents a detailed overview of automated log analysis research, including how to automate and assist the writing of logging statements, how to compress logs, how to parse logs into structured event templates, and how to employ logs to detect anomalies, predict failures, and facilitate diagnosis. Additionally, we survey work that releases open-source toolkits and datasets. Based on the discussion of the recent advances, we present several promising future directions toward real-world and next-generation automated log analysis.

هندسة البرمجيات

Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection

74 - Zhuangbin Chen , Jinyang Liu , Wenwei Gu 2021

Logs have been an imperative resource to ensure the reliability and continuity of many software systems, especially large-scale distributed systems. They faithfully record runtime information to facilitate system troubleshooting and behavior understa nding. Due to the large scale and complexity of modern software systems, the volume of logs has reached an unprecedented level. Consequently, for log-based anomaly detection, conventional methods of manual inspection or even traditional machine learning-based methods become impractical, which serve as a catalyst for the rapid development of deep learning-based solutions. However, there is currently a lack of rigorous comparison among the representative log-based anomaly detectors which resort to neural network models. Moreover, the re-implementation process demands non-trivial efforts and bias can be easily introduced. To better understand the characteristics of different anomaly detectors, in this paper, we provide a comprehensive review and evaluation on five popular models used by six state-of-the-art methods. Particularly, four of the selected methods are unsupervised and the remaining two are supervised. These methods are evaluated with two publicly-available log datasets, which contain nearly 16 millions log messages and 0.4 million anomaly instances in total. We believe our work can serve as a basis in this field and contribute to the future academic researches and industrial applications.

هندسة البرمجيات التعلم الآلي

A tomography of the $log(langle Irangle_e)-log(R_e)$ plane

282 - Mauro DOnofrio , Cesare Chiosi 2020

Context. We present a reanalysis of the distribution of galaxies in the $log(langle Irangle_e)-log(R_e)$ plane under a new theoretical perspective. Aims. Using the data of the WINGS database and those of the Illustris simulation we will demonstrate t hat the origin of the observed distribution in this parameter space can be understood only by accepting a new interpretation of the $log(L)$-$log(sigma)$ relation Methods. We simulate the distribution of galaxies in the $log(langle Irangle_e)-log(R_e)$ plane starting from the new $L=L_0sigma^beta$ relation proposed by DOnofrio et al. (2020) and we discuss the physical mechanisms that are hidden in this empirical law. Results. The artificial distribution obtained assuming that beta spans either positive and negative values and that $L_0$ changes with $beta$, is perfectly superposed to the observational data, once it is postulated that the Zone of Exclusion (ZoE) is the limit of virialized and quenched objects. Conclusions. We have demonstrated that the distribution of galaxies in the $log(langle Irangle_e)-log(R_e)$ plane is not linked to the peculiar light profiles of the galaxies of different luminosity, but originate from the mass assembly history of galaxies, made of merging, star formation events, star evolution and quenching of the stellar population.

الفيزياء الفلكية من المجرات

Towards Log-Linear Logics with Concrete Domains

609 - Melisachew Wudage Chekol , Jakob Huber , Heiner Stuckenschmidt 2015

We present $mathcal{MEL}^{++}$ (M denotes Markov logic networks) an extension of the log-linear description logics $mathcal{EL}^{++}$-LL with concrete domains, nominals, and instances. We use Markov logic networks (MLNs) in order to find the most pro bable, classified and coherent $mathcal{EL}^{++}$ ontology from an $mathcal{MEL}^{++}$ knowledge base. In particular, we develop a novel way to deal with concrete domains (also known as datatypes) by extending MLNs cutting plane inference (CPI) algorithm.

الذكاء الاصطناعي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الموصل

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً