ProcessMining

Minimal Description Length

In the desire to derive a “good” model for the behavior observed in the execution log, shared notions of, e.g., fitness, precision and structure can be identified in the process mining literature. As a consequence, several metrics have been developed to measure the quality of a (mined) model according to these notions.

The Minimal Description Length analysis plug-in is an attempt to alleviate this problem based on the MDL principle known from the machine learning domain.

Prerequesites

This plug-in expects the process model (as a Petri net) and the log already being tied together, which is needed to establish a mapping between the logged events and the tasks in the model (see how to connect a model to a log ).

The process model must either be available in some Petri net format (a .tpn or .pnml file) or in another modeling paradigm that can be read by ProM (such as an EPC or a YAWL model), which can then be converted into a Petri net by dedicated conversion plug-ins within the framework.

Furthermore, the log should be preprocessed in the sense that process instances exhibiting the same event sequence are aggregated in one instance (see how to group a log ). This helps reducing calculation time as every process instance contained in the log will be replayed in the Petri net (frequencies will be respected by the log replay).

MDL Settings

When the Minimum Description Length plug-in is started, first the desired log compression and model complexity encodings can be chosen. The encodings discussed in [1] are selected per default and are recommend. Note that other encodings can be easily added to the plug-in.

Figure 1. Different encodings can be chosen both for measuring the log compression and the model complexity

MDL Results

Then, the encoding cost in bits are displayed for the evaluated model and the log compression with respect to the evaluated model. Furthermore, the encoding cost for the reference models are calculated (these reference models are automatically created based on the event log). Finally, the absolute complexity and compression measures and the averaged result (MDL metric) are displayed. The results can be exported as a CSV file.

Figure 2. The encoding cost with respect to the evaluated model are displayed, as well as the ``worst and ``best case reference model encoding costs, and the averaged result

Publications

[1] T. Calders, C.W. Günther, M. Pechenizkiy, and A. Rozinat. Using Minimum Description Length for Process Mining . In Sung Y. Shin and Sascha Ossowski, editors, Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, March 9-12, 2009, pages 1451–1455. ACM, 2009.