Differences

This shows you the differences between two versions of the page.

Link to this comparison view

online:mdl [2009/08/16 15:05] (current)
Line 1: Line 1:
 +====== Minimal Description Length ======
 +
 +In the desire to derive a “good” model for the behavior observed in the execution log, shared notions of, e.g., fitness, precision and structure can be identified in the process mining literature. As a consequence,​ several metrics have been developed to measure the quality of a (mined) model according to these notions. ​
 +
 +The Minimal Description Length analysis plug-in is an attempt to alleviate this problem based on the MDL principle known from the machine learning domain. ​
 +===== Prerequesites =====
 +
 +  * This plug-in expects the process model (as a Petri net) and the log already being tied together, which is needed to establish a mapping between the logged events and the tasks in the model (see [[connectmodellog|how to connect a model to a log]]). ​
 +The process model must either be available in some Petri net format (a .tpn or .pnml file) or in another modeling paradigm that can be read by ProM (such as an EPC or a YAWL model), which can then be converted into a Petri net by dedicated conversion plug-ins within the framework. ​
 +  * Furthermore,​ the log should be preprocessed in the sense that process instances exhibiting the same event sequence are aggregated in one instance (see [[grouplog|how to group a log]]). This helps reducing calculation time as every process instance contained in the log will be replayed in the Petri net (frequencies will be respected by the log replay).
 +
 +===== MDL Settings =====
 +
 +When the Minimum Description Length plug-in is started, first the desired log compression and model complexity encodings can be chosen. The encodings discussed in [1] are selected per default and are recommend. Note that other encodings can be easily added to the plug-in. ​
 +
 +{{:​documentation:​conformance:​mdl1.png?​567|MDL Settings}}
 +**Figure 1.** Different encodings can be chosen both for measuring the log compression and the model complexity
 +
 +===== MDL Results =====
 +
 +Then, the encoding cost in bits are displayed for the evaluated model and the log compression with respect to the evaluated model. Furthermore,​ the encoding cost for the reference models are calculated (these reference models are automatically created based on the event log). 
 +Finally, the absolute complexity and compression measures and the averaged result (MDL metric) are displayed. The results can be exported as a CSV file.
 +
 +{{:​documentation:​conformance:​mdl2.png?​567|MDL Results}}
 +**Figure 2.** The encoding cost with respect to the evaluated model are displayed, as well as the ``worst''​ and ``best''​ case reference model encoding costs, and the averaged result
 +
 +===== Publications =====
 +
 +[1] T. Calders, C.W. Günther, M. Pechenizkiy,​ and A. Rozinat. //​[[http://​portal.acm.org/​citation.cfm?​id=1529606|Using Minimum Description ​
 +Length for Process Mining]]//. In Sung Y. Shin and Sascha Ossowski, editors, Proceedings ​
 +of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, 
 +March 9-12, 2009, pages 1451–1455. ACM, 2009.