Table of Contents

Passages in Big Data: Partitioning Event Logs and Process Models to Speed Up Process Mining Algorithms

Wil van der Aalst

Petri Nets 2012, Hamburg, Germany, June 2012.



Process discovery (discovering a process model from example behavior recorded in an event log) is one of the most challenging tasks in process mining. Discovery approaches need to deal with competing quality criteria such as fitness, simplicity, precision, and generalization. Moreover, event logs may contain low frequent behavior and tend to be far from complete (i.e., typically only a fraction of the possible behavior is recorded). At the same time, models need to have formal semantics in order to reason about their quality. These complications explain why dozens of process discovery approaches have been proposed in recent years. Most of these approaches are time-consuming and/or produce poor quality models. In fact, simply checking the quality of a model is already computationally challenging.

This talk shows that process mining problems can be decomposed into a set of smaller problems after determining the so-called causal structure. Given a causal structure, we partition the activities over a collection of passages. Conformance checking and discovery can be done per passage. The decomposition of the process mining problems has two advantages. First of all, the problem can be distributed over a network of computers. Second, due to the exponential nature of most process mining algorithms, decomposition can significantly reduce computation time (even on a single computer). As a result, conformance checking and process discovery can be done much more efficiently.