Subscribe to the gold package and get unlimited access to Shamra Academy

The Connection between Process Complexity of Event Sequences and Models discovered by Process Mining

91 0 0.0 ( 0 )

Download Cite

Added by Maxim Vidgof

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Adriano Augusto - Jan Mendling - Maxim Vidgof

Formal Languages and Automata Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Process mining is a research area focusing on the design of algorithms that can automatically provide insights into business processes by analysing historic process execution data, known as event logs. Among the most popular algorithms are those for automated process discovery, whose ultimate goal is to generate the best process model that summarizes the behaviour recorded in the input event log. Over the past decade, several process discovery algorithms have been proposed but, until now, this research was driven by the implicit assumption that a better algorithm would discover better process models, no matter the characteristics of the input event log. In this paper, we take a step back and question that assumption. Specifically, we investigate what are the relations between measures capturing characteristics of the input event log and the quality of the discovered process models. To this end, we review the state-of-the-art process complexity measures, propose a new process complexity measure based on graph entropy, and analyze this set of complexity measures on an extensive collection of event logs and corresponding automatically discovered process models. Our analysis shows that many process complexity measures correlate with the quality of the discovered process models, demonstrating the potential of using complexity measures as predictors for the quality of process models discovered with state-of-the-art process discovery algorithms. This finding is important for process mining research, as it highlights that not only algorithms, but also connections between input data and output quality should be studied.

rate research

Bootstrapping Generalization of Process Models Discovered From Event Data

98 - Artem Polyvyanyy , Alistair Moffat , Luciano Garcia-Ba~nuelos 2021

Process mining studies ways to derive value from process executions recorded in event logs of IT-systems, with process discovery the task of inferring a process model for an event log emitted by some unknown system. One quality criterion for discovered process models is generalization. Generalization seeks to quantify how well the discovered model describes future executions of the system, and is perhaps the least understood quality criterion in process mining. The lack of understanding is primarily a consequence of generalization seeking to measure properties over the entire future behavior of the system, when the only available sample of behavior is that provided by the event log itself. In this paper, we draw inspiration from computational statistics, and employ a bootstrap approach to estimate properties of a population based on a sample. Specifically, we define an estimator of the models generalization based on the event log it was discovered from, and then use bootstrapping to measure the generalization of the model with respect to the system, and its statistical significance. Experiments demonstrate the feasibility of the approach in industrial settings.

Artificial Intelligence Machine Learning Software Engineering

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

1061 - Theodoros Tsiligkaridis , Keith W. Forsythe 2014

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed form parametric expression for the conditional likelihood, in which hyperparameters are recursively updated as a function of the streaming data assuming conjugate priors. Motivated by large-sample asymptotics, we propose a novel adaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate. We further prove that in the large-sample limit, the conditional likelihood and data predictive distribution become asymptotically Gaussian. We demonstrate through experiments on synthetic and real data sets that our approach is superior to other online state-of-the-art methods.

Machine Learning Machine Learning Methodology

Mine Me but Dont Single Me Out: Differentially Private Event Logs for Process Mining

89 - Gamal Elkoumy Universityn of Tartu 2021

The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private information. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to anonymize the event log to the extent that no individual can be singled out using the anonymized log. This paper addresses the problem of anonymizing an event log in order to guarantee that, upon disclosure of the anonymized log, the probability that an attacker may single out any individual represented in the original log, does not increase by more than a threshold. The paper proposes a differentially private disclosure mechanism, which oversamples the cases in the log and adds noise to the timestamps to the extent required to achieve the above privacy guarantee. The paper reports on an empirical evaluation of the proposed approach using 14 real-life event logs in terms of data utility loss and computational efficiency.

Cryptography and Security Software Engineering

Towards Measuring and Quantifying the Comprehensibility of Process Models -- The Process Model Comprehension Framework

99 - Michael Winter , Rudiger Pryss , Matthias Fink 2021

Process models constitute crucial artifacts in modern information systems and, hence, the proper comprehension of these models is of utmost importance in the utilization of such systems. Generally, process models are considered from two different perspectives: process modelers and readers. Both perspectives share similarities and differences in the comprehension of process models (e.g., diverse experiences when working with process models). The literature proposed many rules and guidelines to ensure a proper comprehension of process models for both perspectives. As a novel contribution in this context, this paper introduces the Process Model Comprehension Framework (PMCF) as a first step towards the measurement and quantification of the perspectives of process modelers and readers as well as the interaction of both regarding the comprehension of process models. Therefore, the PMCF describes an Evaluation Theory Tree based on the Communication Theory as well as the Conceptual Modeling Quality Framework and considers a total of 96 quality metrics in order to quantify process model comprehension. Furthermore, the PMCF was evaluated in a survey with 131 participants and has been implemented as well as applied successfully in a practical case study including 33 participants. To conclude, the PMCF allows for the identification of pitfalls and provides related information about how to assist process modelers as well as readers in order to foster and enable a proper comprehension of process models.

Software Engineering Human-Computer Interaction

Diagnostics and Visualization of Point Process Models for Event Times on a Social Network

56 - Jing Wu , Anna L. Smith , Tian Zheng 2020

Point process models have been used to analyze interaction event times on a social network, in the hope to provides valuable insights for social science research. However, the diagnostics and visualization of the modeling results from such an analysis have received limited discussion in the literature. In this paper, we develop a systematic set of diagnostic tools and visualizations for point process models fitted to data from a network setting. We analyze the residual process and Pearson residual on the network by inspecting their structure and clustering structure. Equipped with these tools, we can validate whether a model adequately captures the temporal and/or network structures in the observed data. The utility of our approach is demonstrated using simulation studies and point process models applied to a study of animal social interactions.

Applications

comments

Fetching comments

National Institute of Business Administration

Additional details More universities

The Connection between Process Complexity of Event Sequences and Models discovered by Process Mining

Ask ChatGPT about the research

No Arabic abstract

Read More