ﻻ يوجد ملخص باللغة العربية
The insights revealed from process mining heavily rely on the quality of event logs. Activities extracted from healthcare information systems with the free-text nature may lead to inconsistent labels. Such inconsistency would then lead to redundancy of activity labels, which refer to labels that have different syntax but share the same behaviours. The identifications of these labels from data-driven process discovery are difficult and rely heavily on resource-intensive human review. Existing work achieves low accuracy either redundant activity labels are in low occurrence frequency or the existence of numerical data values as attributes in event logs. However, these phenomena are commonly observed in healthcare information systems. In this paper, we propose an approach to detect redundant activity labels using control-flow relations and numerical data values from event logs. Natural Language Processing is also integrated into our method to assess semantic similarity between labels, which provides users with additional insights. We have evaluated our approach through synthetic logs generated from the real-life Sepsis log and a case study using the MIMIC-III data set. The results demonstrate that our approach can successfully detect redundant activity labels. This approach can add value to the preprocessing step to generate more representative event logs for process mining tasks in the healthcare domain.
Providing appropriate structures around human resources can streamline operations and thus facilitate the competitiveness of an organization. To achieve this goal, modern organizations need to acquire an accurate and timely understanding of human res
Generalization is a central problem in Machine Learning. Indeed most prediction methods require careful calibration of hyperparameters usually carried out on a hold-out textit{validation} dataset to achieve generalization. The main goal of this paper
In the current world of economic crises, the cost control is one of the chief concerns for all types of industries, especially for the small venders. The small vendors are suppose to minimize their budget on Information Technology by reducing the ini
The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet, mining SQL queries is a difficult task. The fundamental problem is that queries are symboli
Selecting the best items in a dataset is a common task in data exploration. However, the concept of best lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Never