ترغب بنشر مسار تعليمي؟ اضغط هنا

The Connection between Process Complexity of Event Sequences and Models discovered by Process Mining

91   0   0.0 ( 0 )
 نشر من قبل Maxim Vidgof
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Process mining is a research area focusing on the design of algorithms that can automatically provide insights into business processes by analysing historic process execution data, known as event logs. Among the most popular algorithms are those for automated process discovery, whose ultimate goal is to generate the best process model that summarizes the behaviour recorded in the input event log. Over the past decade, several process discovery algorithms have been proposed but, until now, this research was driven by the implicit assumption that a better algorithm would discover better process models, no matter the characteristics of the input event log. In this paper, we take a step back and question that assumption. Specifically, we investigate what are the relations between measures capturing characteristics of the input event log and the quality of the discovered process models. To this end, we review the state-of-the-art process complexity measures, propose a new process complexity measure based on graph entropy, and analyze this set of complexity measures on an extensive collection of event logs and corresponding automatically discovered process models. Our analysis shows that many process complexity measures correlate with the quality of the discovered process models, demonstrating the potential of using complexity measures as predictors for the quality of process models discovered with state-of-the-art process discovery algorithms. This finding is important for process mining research, as it highlights that not only algorithms, but also connections between input data and output quality should be studied.

قيم البحث

اقرأ أيضاً

Process mining studies ways to derive value from process executions recorded in event logs of IT-systems, with process discovery the task of inferring a process model for an event log emitted by some unknown system. One quality criterion for discover ed process models is generalization. Generalization seeks to quantify how well the discovered model describes future executions of the system, and is perhaps the least understood quality criterion in process mining. The lack of understanding is primarily a consequence of generalization seeking to measure properties over the entire future behavior of the system, when the only available sample of behavior is that provided by the event log itself. In this paper, we draw inspiration from computational statistics, and employ a bootstrap approach to estimate properties of a population based on a sample. Specifically, we define an estimator of the models generalization based on the event log it was discovered from, and then use bootstrapping to measure the generalization of the model with respect to the system, and its statistical significance. Experiments demonstrate the feasibility of the approach in industrial settings.
We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed form param etric expression for the conditional likelihood, in which hyperparameters are recursively updated as a function of the streaming data assuming conjugate priors. Motivated by large-sample asymptotics, we propose a novel adaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate. We further prove that in the large-sample limit, the conditional likelihood and data predictive distribution become asymptotically Gaussian. We demonstrate through experiments on synthetic and real data sets that our approach is superior to other online state-of-the-art methods.
The applicability of process mining techniques hinges on the availability of event logs capturing the execution of a business process. In some use cases, particularly those involving customer-facing processes, these event logs may contain private inf ormation. Data protection regulations restrict the use of such event logs for analysis purposes. One way of circumventing these restrictions is to anonymize the event log to the extent that no individual can be singled out using the anonymized log. This paper addresses the problem of anonymizing an event log in order to guarantee that, upon disclosure of the anonymized log, the probability that an attacker may single out any individual represented in the original log, does not increase by more than a threshold. The paper proposes a differentially private disclosure mechanism, which oversamples the cases in the log and adds noise to the timestamps to the extent required to achieve the above privacy guarantee. The paper reports on an empirical evaluation of the proposed approach using 14 real-life event logs in terms of data utility loss and computational efficiency.
Process models constitute crucial artifacts in modern information systems and, hence, the proper comprehension of these models is of utmost importance in the utilization of such systems. Generally, process models are considered from two different per spectives: process modelers and readers. Both perspectives share similarities and differences in the comprehension of process models (e.g., diverse experiences when working with process models). The literature proposed many rules and guidelines to ensure a proper comprehension of process models for both perspectives. As a novel contribution in this context, this paper introduces the Process Model Comprehension Framework (PMCF) as a first step towards the measurement and quantification of the perspectives of process modelers and readers as well as the interaction of both regarding the comprehension of process models. Therefore, the PMCF describes an Evaluation Theory Tree based on the Communication Theory as well as the Conceptual Modeling Quality Framework and considers a total of 96 quality metrics in order to quantify process model comprehension. Furthermore, the PMCF was evaluated in a survey with 131 participants and has been implemented as well as applied successfully in a practical case study including 33 participants. To conclude, the PMCF allows for the identification of pitfalls and provides related information about how to assist process modelers as well as readers in order to foster and enable a proper comprehension of process models.
Point process models have been used to analyze interaction event times on a social network, in the hope to provides valuable insights for social science research. However, the diagnostics and visualization of the modeling results from such an analysi s have received limited discussion in the literature. In this paper, we develop a systematic set of diagnostic tools and visualizations for point process models fitted to data from a network setting. We analyze the residual process and Pearson residual on the network by inspecting their structure and clustering structure. Equipped with these tools, we can validate whether a model adequately captures the temporal and/or network structures in the observed data. The utility of our approach is demonstrated using simulation studies and point process models applied to a study of animal social interactions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا