Effective Removal of Operational Log Messages: an Application to Model Inference

221 0 0.0 ( 0 )

Download Cite

Added by Donghwan Shin

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Donghwan Shin - Domenico Bianculli - Lionel Briand

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Model inference aims to extract accurate models from the execution logs of software systems. However, in reality, logs may contain some noise that could deteriorate the performance of model inference. One form of noise can commonly be found in system logs that contain not only transactional messages---logging the functional behavior of the system---but also operational messages---recording the operational state of the system (e.g., a periodic heartbeat to keep track of the memory usage). In low-quality logs, transactional and operational messages are randomly interleaved, leading to the erroneous inclusion of operational behaviors into a system model, that ideally should only reflect the functional behavior of the system. It is therefore important to remove operational messages in the logs before inferring models. In this paper, we propose LogCleaner, a novel technique for removing operational logs messages. LogCleaner first performs a periodicity analysis to filter out periodic messages, and then it performs a dependency analysis to calculate the degree of dependency for all log messages and to remove operational messages based on their dependencies. The experimental results on two proprietary and 11 publicly available log datasets show that LogCleaner, on average, can accurately remove 98% of the operational messages and preserve 81% of the transactional messages. Furthermore, using logs pre-processed with LogCleaner decreases the execution time of model inference (with a speed-up ranging from 1.5 to 946.7 depending on the characteristics of the system) and significantly improves the accuracy of the inferred models, by increasing their ability to accept correct system behaviors (+43.8 pp on average, with pp=percentage points) and to reject incorrect system behaviors (+15.0 pp on average).

rate research

Web Application Testing: Using Tree Kernels to Detect Near-duplicate States in Automated Model Inference

65 - Anna Corazza , Sergio Di Martino , Adriano Peron 2021

In the context of End-to-End testing of web applications, automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the site under test. These models, in which states represent features of the web application and transitions represent reachability relationships, can be used for several model-based testing tasks, such as test case generation. However, current exploration techniques often lead to models containing many near-duplicate states, i.e., states representing slightly different pages that are in fact instances of the same feature. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites. As a web page can be naturally represented by its tree-structured DOM representation, we propose a novel near-duplicate detection technique to improve the model inference of web applications, based on Tree Kernel (TK) functions. TKs are a class of functions that compute similarity between tree-structured objects, largely investigated and successfully applied in the Natural Language Processing domain. To evaluate the capability of the proposed approach in detecting near-duplicate web pages, we conducted preliminary classification experiments on a freely-available massive dataset of about 100k manually annotated web page pairs. We compared the classification performance of the proposed approach with other state-of-the-art near-duplicate detection techniques. Preliminary results show that our approach performs better than state-of-the-art techniques in the near-duplicate detection classification task. These promising results show that TKs can be applied to near-duplicate detection in the context of web application model inference, and motivate further research in this direction.

Software Engineering

Astrophysically robust systematics removal using variational inference: application to the first month of Kepler data

263 - S. Roberts , A. McQuillan , S. Reece 2013

Space-based transit search missions such as Kepler are collecting large numbers of stellar light curves of unprecedented photometric precision and time coverage. However, before this scientific goldmine can be exploited fully, the data must be cleaned of instrumental artefacts. We present a new method to correct common-mode systematics in large ensembles of very high precision light curves. It is based on a Bayesian linear basis model and uses shrinkage priors for robustness, variational inference for speed, and a de-noising step based on empirical mode decomposition to prevent the introduction of spurious noise into the corrected light curves. After demonstrating the performance of our method on a synthetic dataset, we apply it to the first month of Kepler data. We compare the results, which are publicly available, to the output of the Kepler pipelines pre-search data conditioning, and show that the two generally give similar results, but the light curves corrected using our approach have lower scatter, on average, on both long and short timescales. We finish by discussing some limitations of our method and outlining some avenues for further development. The trend-corrected data produced by our approach are publicly available.

Instrumentation and Methods for Astrophysics

Optical turbulence forecast: ready for an operational application

144 - Elena Masciadri , Franck Lascaux , Alessio Turchi 2016

One of the main goals of the feasibility study MOSE (MOdellig ESO Sites) is to evaluate the performances of a method conceived to forecast the optical turbulence above the ESO sites of the Very Large Telescope and the European-Extremely Large Telescope in Chile. The method implied the use of a dedicated code conceived for the optical turbulence (OT) called Astro-Meso-Nh. In this paper we present results we obtained at conclusion of this project concerning the performances of this method in forecasting the most relevant parameters related to the optical turbulence (CN2, seeing , isoplanatic angle theta_0 and wavefront coherence time tau_0). Numerical predictions related to a very rich statistical sample of nights uniformly distributed along a solar year and belonging to different years have been compared to observations and different statistical operators have been analyzed such as classical bias, RMSE and and more sophisticated statistical operators derived by the contingency tables that are able to quantify the score of success of a predictive method such as the percentage of correct detection (PC) and the probability to detect a parameter within a specific range of values (POD). The main conclusions of the study tell us that the Astro-Meso-Nh model provides performances that are already very good to definitely guarantee a not negligible positive impact on the Service Mode of top-class telescopes and ELTs. A demonstrator for an automatic and operational version of the Astro-Meso-Nh model will be soon implemented on the sites of VLT and E-ELT.

Instrumentation and Methods for Astrophysics Atmospheric and Oceanic Physics

Log-minor distributions and an application to estimating mean subsystem entropy

45 - Alice C. Schwarze , Philip S. Chodrow , 2019

A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$. The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix. Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal $ktimes k$ submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use these in turn to derive upper bounds on the standard error of the sample mean of subsystem entropy. Our results demonstrate that, despite the rapid growth of the set of subsystems with $n$, the number of samples that are needed to bound the sampling error is asymptotically independent of $n$. Instead, it is sufficient to increase the number of samples in linear proportion to $k$ to achieve a desired sampling accuracy.

Probability Social and Information Networks Rings and Algebras

Modeling and Testing Implementations of Protocols with Complex Messages

44 - Tom Tervoort , I. S. W. B. Prasetya 2018

This paper presents a new language called APSL for formally describing protocols to facilitate automated testing. Many real world communication protocols exchange messages whose structures are not trivial, e.g. they may consist of multiple and nested fields, some could be optional, and some may have values that depend on other fields. To properly test implementations of such a protocol, it is not sufficient to only explore different orders of sending and receiving messages. We also need to investigate if the implementation indeed produces correctly formatted messages, and if it responds correctly when it receives different variations of every message type. APSLs main contribution is its sublanguage that is expressive enough to describe complex message formats, both text-based and binary. As an example, this paper also presents a case study where APSL is used to model and test a subset of Courier IMAP email server.

Software Engineering