Differences

This shows you the differences between two versions of the page.

Link to this comparison view

online:conformance_checker [2009/05/26 10:55] (current)
Line 1: Line 1:
 +====== Conformance Checker ======
 +
 +Most information systems log events (e.g., in transaction logs, audit trails) to audit and monitor the processes they support. While process mining can be used to discover a process model based on a given event log, explicit process models describing how a business process should (or is expected to) be executed are frequently available. Together with the data recorded in the log, this situation raises the interesting question "Do the model and the log //conform// to each other?"​ [1,3,4]. As a result, analyzing the gap between a model and the real world both helps to detect violations (i.e., the real world does not "​behave properly"​),​ and to ensure transparency (as the model can be, e.g., outdated). This Conformance Checker has been applied to, for example, administrative processes of a municipality in The Netherlands. In [2], the question of conformance has been investigated in the context of web services.
 +
 +===== Prerequesites =====
 +
 +  * This plug-in expects the process model (as a Petri net) and the log already being tied together, which is needed to establish a mapping between the logged events and the tasks in the model (see [[connectmodellog|how to connect a model to a log]]). ​
 +  * The process model must either be available in some Petri net format (a .tpn or .pnml file) or in another modeling paradigm that can be read by ProM (such as an EPC or a YAWL model), which can then be converted into a Petri net by dedicated conversion plug-ins within the framework. ​
 +  * Furthermore,​ the log should be preprocessed in the sense that process instances exhibiting the same event sequence are aggregated in one instance (see [[grouplog|how to group a log]]). This helps reducing calculation time as every process instance contained in the log will be replayed in the Petri net (frequencies will be respected by the log replay).
 +
 +Note that there are example files you can use to play with the Conformance Checker. They are contained in your ProM folder at /​examples/​conformanceChecking/​. Alternatively they can be downloaded {{documentation:​conformance:​conformancecheckingexamples.zip|here}}.
 +
 +===== Analysis Settings =====
 +
 +Before the actual analysis is started, one can choose which kind of analysis should be performed. Whole categories or specific metrics can be selected and deselected. A brief description explains each of these options. The conformance checker supports analysis of the (1) Fitness, (2) Precision (or Behavioral Appropriateness),​ and (3) Structure (or Structural Appropriateness) dimension. ​
 +
 +The following analysis methods are used in order to calculate the conformance metrics:
 +
 +  * **Log Replay.** The log replay is carried out in a non-blocking way and from a log perspective,​ i.e., every log trace will be replayed in the model and if there are tokens missing to fire the transition in question they are created artificially and replay proceeds. While doing so, diagnostic data is collected and can be accessed afterwards. Used for calculating the metrics f, pSE, pPC, saB, and aaB.
 +  * **State Space analysis.** The coverability graph of the process model is traversed while loops are followed at most twice. Used for calculating both the metrics aaB and aaS.
 +  * **Structural analyis.** The structure of the process model is analyzed. Used for both the metrics saS and aaS.
 +
 +{{documentation:​conformance:​settings.png?​567|Conformance Checker Analysis Settings}}
 +
 +**Figure 1.** Analysis settings allow for the selection of those metrics to be calculated
 +
 +Typically, the default settings should be just fine for analysis (you can abort each type of analysis if it takes too long). However, here is some advice if you experience performance problems analyzing your log:
 +
 +  * It may be advisable to calculate the metrics in separate sessions (as it might be that one rather expensive technique unnecessarily limits the calculation of other, better scaling methods).
 +  * You could further restrict the search depth for invisible tasks during log replay. This means that if the currently replayed task is not directly enabled, we build a partial state space from the current marking in order to find out whether it can be enabled via some sequence of invisible tasks [1]. While the search depth that is needed to correctly replay your model is automatically determined by the Conformance Checker, you might want to further restrict it in order to get a result. For example, if you set the maximum search depth to 0, no state space is built at all. Note, however, that in this case the measurements are likely to be "​worse"​ than in reality. ​
 +
 +===== Conformance Analysis Results =====
 +
 +Generally, the evaluation of conformance can take place in different, orthogonal dimensions. First of all, the behavior allowed by the process model can be assessed. It may be both "too much" and "too little"​ compared to the behavior recorded in the event log. Second, we can evaluate the structure of the process model.
 +
 +==== Selecting Process Instances ====
 +
 +By selecting a subset of the process instances in the log one can restart the analysis for any selected subset of the event log. If the log has been preprocessed,​ then the cardinality column # indicates the number of process instances being summarized by each trace. By default the diagnostic results are given for all process instances contained in the log and therefore all the log traces are selected. However, it can be interesting to evaluate only the, e.g., 80% most frequent instances to exclude rare behavior and analyze the normal flow of the process. Thus, decreasing the percentage of covered process instances keeps the most frequent traces selected. On the other hand, any subset of log traces can be selected manually and after updating the results one can see how many percent of the whole log are covered by that selection. ​
 +
 +Note that, in combination with the button //Select Fitting// and //Invert Selection//,​ one can automatically select the subset of fitting or non-fitting traces in the log in order to, for example, further analyze them separately.
 +
 +==== (1) Fitness Analysis ====
 +
 +Fitness analysis is concerned with the investigation whether a process model is able to reproduce all execution sequences that are in the log, or, viewed from the other angle, whether the log traces comply with the description in the model. We call this dimension fitness dimension, i.e., the fitness is 100% if every trace in the log "​fits"​ the model description. So, fitness analysis aims at the detection of mismatches between the process specification and the execution of particular process instances.
 +
 +There are two perspectives of the fitness analysis results:
 +
 +=== Model Perspective ===
 +
 +The following metric is calculated from a model perspective in order to measure the degree of fitness:
 +
 +  * **Fitness.** The token-based fitness measure f relates the amount of missing tokens with the amount of consumed ones and the amount of remaining tokens with the amount of produced ones. So, if the log can be replayed correctly, i.e., there were no tokens missing nor remaining, it evaluates to 1. In the worst case, every produced and consumed token is remaining or missing, the metric evaluates to 0. 
 +
 +There are a number of options, which can be used to enhance the visualization of the process model by the indication of:
 +
 +  * **Token Counter.** Visualizes the missing and remaining tokens during log replay for each place. This allows localizing those parts in the model where the mismatch took place, if any.
 +  * **Failed Tasks.** Visualizes the transitions that were not enabled (that is, not ready) during log replay and therefore could not be successfully executed.
 +  * **Remaining Tasks.** Visualizes the transitions that remained enabled after log replay, which indicates non-proper completion of the specified process, and hints that this task should have been executed.
 +  * **Path Coverage.** Visualizes the transitions that were executed during log replay (regardless of whether that happened successfully or had to be enforced creating the missing tokens). This enables to follow the path a particular log trace, or a set of log traces, has been followed within the model.
 +  * **Passed Edges.** Indicates at each of the edges how often it was followed during the replay of the given process instances.
 +
 +{{documentation:​conformance:​fitness_modelview.png?​567|Conformance Checker Fitness Analysis - Model View}}
 +
 +**Figure 2.** Model view shows places in the model where problems occurred during the log replay (screenshot of analysis of the example files Log_L2.xml and M1_nonFitting.tpn)
 +
 +=== Log Perspective ===
 +
 +The diagnostic perspective can be changed to visualize the log file or a subset of log traces respectively. The following metrics are calculated from a log perspective in order to measure the degree of fitness:
 +
 +  * **Successful Execution.** The fraction of successfully executed process instances (taking the number of occurrences per trace into account).
 +  * **Proper Completion.** The fraction of properly completed process instances (taking the number of occurrences per trace into account).
 +
 +There is one option, which can be used to enhance the visualization of the event log by the indication of:
 +
 +  * **Failed Log Events.** Visualizes the log events that could not be replayed correctly, which corresponds to the failed tasks in the model view.
 +
 +{{documentation:​conformance:​fitness_logview.png?​564|Conformance Checker Fitness Analysis - Log View}}
 +
 +**Figure 3.** Log view shows where replay problems occurred in the log (screenshot of analysis of the example files Log_L2.xml and M1_nonFitting.tpn)
 +
 +==== (2) Behavioral Appropriateness ====
 +
 +On the other hand the process model may allow for more behavior than that recorded in the log. We call the analysis and detection of such "extra behavior"​ behavioral appropriateness,​ or precision dimension, i.e., the precision is 100% if the model "​precisely"​ allows for the behavior observed in the log. This way one can, for example, discover alternative branches that were never used when executing the process. ​
 +
 +The following metrics are available in order to measure the degree of behavioral appropriateness:​
 +
 +  * **Simple Behavioral Appropriateness.** The simple behavioral appropriateness measure saB based on the mean number of enabled transitions during log replay (the greater the value the less behavior is allowed by the process model and the more precisely the behavior observed in the log is captured). Note that this metric should only be used as a comparative means for models without alternative duplicate tasks.
 +  * **Advanced Behavioral Appropriateness.** The advanced behavioral appropriateness metric aaB is based on the detection of model flexibility (that is, alternative or parallel behavior) that was not used during real executions observed in the log. The measure is normalized by the **Degree of Model Flexibility** (0 for a model that only allows for one particular sequence of steps, 1 for the "​flower"​ model allowing for arbitrary execution of the contained steps), which means that it analyzes the difference between activity relations derived from the log (during log replay) and derived from the model (via state space analysis) in order to find out where the log relations are more specific (i.e., indicate that two activities //always// or //never// followed/​preceded each other) than the model relations (where these two activities could //​sometimes//​ follow/​precede each other). This means that if the model contains no flexibility at all, the value of the metric is trivially 1. On the other hand, if only one log trace is contained in the log (and the degree of model flexibility is greater than 0), the difference is maximal and the metric evaluates to 0. \\
 +
 +Note that pressing the buttons //Model Relations// and //Log Relations// will output these relations where the metric is based on to the Message console at the bottom of the ProM framework.
 +
 +Furthermore,​ there are a number of options, which can be used to enhance the visualization of the process model by the indication of:
 +
 +  * **Always Precedes.** Visualizes those activities that //always preceded// each other in the log, but only //sometimes precede// each other according to the model.
 +  * **Never Precedes.** Visualizes those activities that //never preceded// each other in the log, but //​sometimes//​ do //precede// each other in the model (cf. Figure 4 where the execution sequence GG never happened although it would be allowed by the model).
 +  * **Always Follows.** Visualizes those activities that //always followed// each other in the log, but only //sometimes follow// each other in the model relations.
 +  * **Never Follows.** Visualizes those activities that //never followed// each other in the log, but //​sometimes//​ do //follow// each other in the model (cf. Figure 4 where the execution sequence GG never happened although it would be allowed by the model).
 +
 +{{documentation:​conformance:​precision.png?​567|Conformance Checker Precision Analysis}}
 +
 +**Figure 4.** Analysis of the precision of a model allows to detect overgeneral parts (screenshot of analysis of the example files Log_L2.xml and M6_behaviouralInappropriate.tpn)
 +
 +==== (3) Structural Appropriateness ====
 +
 +In a process model, structure is the syntactic means by which behavior (i.e., the semantics) can be specified, using the vocabulary of the modeling language (for example, routing nodes such as AND or XOR). However, often there are several syntactic ways to express the same behavior, and there may be "​preferred"​ (for example, easier to understand) and "less suitable"​ representations. Clearly, this evaluation dimension highly depends on the process modeling formalism and is difficult to assess in an objective way (after all, there may be personal, or even corporate preferences). However, it is possible to formulate and evaluate certain "​design guidelines",​ such as calling for a minimal number of duplicate tasks in the model. ​
 +
 +The following metrics are available in order to measure the degree of structural appropriateness:​
 +
 +  * **Simple Structural Appropriateness.** The simple structural appropriateness saS is a simple measure based on the size of the graph (the greater the value the more compact is the model). Note that this metric should only be used as a comparative means for models allowing for the same amount of behavior.
 +  * **Advanced Structural Appropriateness.** The advanced structural appropriateness metric aaS is based on the punishment of duplicate tasks that are used to list alternatives only (detected via state space analysis), and invisible tasks that can be removed without affecting the behavior (detected via analysis of the structure of the model). If there are no alternative duplicate tasks and no redundant tasks, then the metric evaluates to 1. If every task in the model is either a duplication to list alternative behavior or redundant, the metric evaluates to 0.
 +
 +Furthermore,​ there are a number of options, which can be used to enhance the visualization of the process model by the indication of:
 +
 +  * **Redundant Invisible Tasks.** Visualizes those duplicate tasks that are never used together in one of the possible model paths. In general, duplicate tasks may be well desirable within a good process model, for example to express that one activity happens at the beginning and the end of the process. Note that alternative duplicate tasks are not detected by the conformance checker if they are contained in a loop.
 +  * **Alternative Duplicate Tasks.** Visualizes those invisible tasks that do not affect the behavior (based on [[http://​www.win.tue.nl/​~hverbeek/​doku.php?​id=projects:​prom:​plug-ins:​conversion:​pnred|these reduction rules]]) and, therefore, can be safely removed from the model, rendering it more compact. Note that the reduced can be exported by the Conformance checker.
 +
 +{{documentation:​conformance:​structure.png?​567|Conformance Checker Structure Analysis}}
 +
 +**Figure 5.** Structural analysis detects duplicate task that list alternative behavior and redundant tasks (screenshot of analysis of the example files Log_L2.xml and M5_structuralInappropriate.tpn)
 +
 +===== Further Steps =====
 +
 +The conformance checker provides a number of items to the ProM framework for further processing, that is, as input for other analysis methods or to export them to a file.
 +
 +  * The //initial Petri net// that was used for analysis
 +  * The complete //Log// that was used for analysis
 +  * The currently //selected subset// of the //Log// that was used for analysis
 +  * The current diagnostic visualization as a DOT file
 +  * The //covered Petri net// after removing all those tasks that were never executed during the last log replay (from the Fitness view). Note that these transitions and subsequently isolated places are simply removed, i.e., it cannot be guaranteed that this will always be a fully connected net
 +  * The //reduced Petri net// after removing all redundant invisible tasks (from the Structural Appropriateness view) if there were any detected
 +
 +===== Limitations =====
 +
 +Note that in the presence of invisible and duplicate tasks the log replay is not always guaranteed to find the optimal solution. For example, we choose the best shortest sequence of invisible tasks to enable the currently replayed task if possible. However, from a global viewpoint it could always be the case that firing some longer sequence would actually produce exactly those tokens that are needed in a later stage of the replay. Dealing with this issue in a global manner (i.e., minimizing the number of missing and remaining tokens during log replay) seems intractable for complexity reasons - see [1] for further details. However, in general our algorithms work very well for most cases, while keeping the metrics accessible for practical situations. ​
 +
 +In short: if the fitness metric yields 1.0 then you can be sure that the trace is 100% fitting. However, if the metric is smaller than 1.0 (and your Petri net contains invisible or duplicate tasks), then this does not necessarily mean that the trace is really not fitting the model (because maybe there is some other way of replaying it without errors, but the log replay did not find it). 
 +
 +===== Using the Conformance Metrics from your Own Code =====
 +
 +If you plan to use one of the metrics calculated by the Conformance Checker in you own code, then note that in the context of the Control Flow Benchmark Plugin (see [[http://​tabu.tm.tue.nl/​wiki/​controlflowbenchmark|help page]]) we implemented the conformance measures as so-called '​benchmark metrics'​. For example, you can find the fitness metric f in the class org.processmining.analysis.benchmark.metric.TokenFitnessMetric. The advantage here is that only the particular metric that you want to use is calculated (sometimes some other metric may take too much time if you calculate all), and that it is easy to use. 
 +
 +From your code you could then simply call it like this:
 +
 +//​TokenFitnessMetric fitness = new TokenFitnessMetric();​ \\
 +double fitnessResult = fitness.measure(inputModel,​ inputLog, null, new Progress(""​));​\\
 +System.out.println("​Fitness:​ " + fitnessResult);//​
 +
 +===== Publications =====
 +
 +{{page>​blogs:​pub2008:​conformance_checking_of_processes_based_on_monitoring_real_behavior&​noeditbtn&​firstseconly}}
 +
 +{{page>​blogs:​pub2005:​choreography_conformance_checking_an_approach_based_on_bpel_and_petri_nets&​noeditbtn&​firstseconly}}
 +
 +{{page>​blogs:​pub2005:​conformance_testing_measuring_the_fit_and_appropriateness_of_event_logs_and_process_models&​noeditbtn&​firstseconly}}
 +
 +{{page>​blogs:​pub2005:​conformance_testing_measuring_the_alignment_between_event_logs_and_process_models&​noeditbtn&​firstseconly}}
 +