Table of Contents

A Machine Learning Approach to Understand Business Processes

L. Maruster
PhD Thesis. Technische Universiteit Eindhoven, Eindhoven, The Netherlands, 2006.


Business processes (industries, administration, hospitals, etc.) become nowadays more and more complex and it is difficult to have a complete understanding of them. The goal of the thesis is to show that machine learning techniques can be used successfully for understanding a process on the basis of data, by means of clustering processrelated measures, induction of predictive models, and process discovery. This goal is achieved by means of two approaches: (i) classify process cases (e.g. patients) into logistic homogeneous groups and induce models that assign a new case to a logistic group and (ii) discover the underlying process. By doing so, the process can be modelled, analysed and improved. Another benefit is that systems can be designed more efficiently to support and control the processes more effectively.

We target on the analysis of two sorts of data, namely aggregated data and sequence data.

Aggregated data result from performing some transformations on raw data, focusing on a specific concept, that is not yet explicit in the raw data. This aggregation is similar to feature construction, as used in the machine learning domain. In this thesis, aggregated data are the variables that result from operationalizing the concept of process complexity. These aggregated data are used to develop logistic homogeneous clusters. This means that elements in different clusters will differ from the routing complexity point of view. We show that developing homogeneous clusters for a given process is relevant in connection with the induction of predictive models. Namely, the routing in the process can be predicted using the logistic clusters. We do not aim to provide concrete directives for building control systems, rather our models should be taken as indicatives of their potential.

Sequence data describe the sequence of activities over time in a process execution. They are recorded in a process log, during the execution of the process steps. Due to exceptions, missing or incomplete registration and errors, the data can be noisy. By using sequence data, the goal is to derive a model explaining the events recorded. In situations without noise and sufficient information, we provide a method for building a process model from the process log. Moreover, we discuss the class of models for which it is possible to accurately rediscover the model by looking at the process log. Machine learning techniques are especially useful when discovering a process model from noisy sequence data. Such a model can be further analyzed and eventually improved, but these issues are beyond the scope of this thesis.

Through the applications of our proposed methods on different data (e.g. hospital data, workflow data and administrative governmental data), we have shown that our methods result in useful models and subsequently can be used in practice. We applied our methods on data-sets for which (i) it was possible to aggregate relevant information and (ii) sequence data were available.

Download PDF (969 KB)