Differences

This shows you the differences between two versions of the page.

Link to this comparison view

online:fuzzyminer [2009/06/17 17:56] (current)
Line 1: Line 1:
 +====== Fuzzy Miner ======
 +
 +The Fuzzy Miner is part of the official distribution of the [[http://​prom.sourceforge.net/​|ProM]] toolkit for [[http://​processmining.org|Process Mining]]. Its purpose is to empower users to interactively explore processes from event logs. Most notably, the Fuzzy Miner is suitable for mining less-structured processes which exhibit a large amount of unstructured and conflicting behavior.
 +
 +{{online:​fminer.png|Fuzzy Miner}}
 +
 +===== User documentation =====
 +
 +==== Preparation ====
 +
 +  - Open an event log file in ProM
 +  - Choose **Fuzzy Miner** from the **Mining** menu
 +
 +==== Measurement configuration ====
 +
 +{{online:​metricsconf.png|Metrics configuration}}
 +
 +You will be presented with the Fuzzy Miner'​s measurement configuration panel. On the left you will find a scrollable **list of configurable metrics**. Atop this list is a combobox which allows you to filter the subset of displayed metrics (useful, e.g., if you are only concerned with optimizing a specific aspect of measurement).
 +
 +=== Metrics configuration ===
 +
 +Each metric has the same set of configuration options, which help you to optimize the measurements taken with respect to your specific situation:
 +
 +  * **weight**: All metrics of a specific type (unary or binary significance and correlation) will be **aggregated** before taken into account for mining. By modifying the weight of each metric, you can specify how strongly it will be taken into account when aggregated. For example, to emphasize a specific metric, reduce the weight of all other metrics of this type.
 +
 +  * **invert**: If this checkbox is active, all measurements of this metric will be **inverted**. After measurement,​ all values gathered by a metric will be normalized, such that the highest measurement taken will be equal to **1.0**. If a metric is inverted, this means that for all measurements,​ **1.0 - original_value** will be returned. This can be a handy tool if, e.g., you want highly frequent events to be considered less significant.
 +
 +  * **active**: If this checkbox is non-checked,​ the respective metric will be **ignored** in the mining pass. Use this option when you think that a specific metric does not contribute to better results, or is even contra-productive. Note that setting a metric to non-active does **not improve performance**;​ the Fuzzy Miner is highly optimized for performance also when running with full-blown settings.
 +
 +=== Measure point configuration ===
 +
 +On the right of the measurement configuration panel, you can see the measure-point configuration area. The Fuzzy Miner is not limited to only measuring the significance and correlation of two events directly following one another in the log, it can also measure longer-term relationships,​ which is what this configuration area is there for. 
 +
 +On the top right, a histogram visualizes the number of measurement points per events (i.e., the number of histogram bars) and their evaluation factor (height of the bars). The histogram responds instantly to any configuration changes you make on the lower right of the configuration area.
 +
 +The bottom right area holds the measure point and attenuation settings. The maximal event distance to measure sets the number of measure points, i.e. how far the measurer will "look back" from each event in the log for applying the metrics.
 +
 +=== Attenuation ===
 +
 +Obviously, you want longer-distance relationships to affect the measurement less than direct following relationships. This is what the attenuation settings are for. The simplest attenuation is the **Linear attenuation**,​ which will ensure linear attenuation with rising distance of events.
 +
 +The **Nth root attenuation** allows for negative exponential attenuation by configuring an Nth root function. A relatively high **radical** value will progressively attenuate the longer-distance measure points, which is useful when you want to focus on short-term relationships. A relatively low radical helps when you have frequent interlockings of noise events which obscure the important relationships.
 +
 +==== Measurement phase ====
 +
 +The measurement phase is straightforward to use and understand: Just click the **start mining** button on the lower right, and the measurer will apply all metrics to the log, using the number of measurement points and attenuation you specified. The aggregate metrics which are created in this step will then be used for the final mining phase.
 +
 +==== Mining phase: Unary metrics view ====
 +
 +{{online:​result_unarymetrics.png|Unary metrics view}}
 +
 +The rightmost tab in the result view shows you the measurements of all metrics taken for all event classes in a multi-curve two-dimensional graph. This allows you to gain a quick and intuitive feedback on whether your metrics configuration was appropriate. Hover the mouse over the graph, and it will reveal the precise measurement values and event classes. Click the colored squares in the lower left to toggle visibility of each metric in the graph.
 +
 +==== Mining phase: Binary metrics view ====
 +
 +{{online:​result_binarymetrics.png|Binary metrics view}}
 +
 +The binary metrics are shown in form of a colored matrix. Use the mouse to hover over the matrix, which will reveal the precise measurement values for each event class combination. The combobox on the top allows you to switch between all metrics applied to the log.
 +
 +==== Mining phase: Graph view ====
 +
 +{{online:​resultview.png|Graph view}}
 +
 +The ultimate goal of the Fuzzy Miner is to create an appropriate graph representation of the process expressed in the mined log. The graph view allows you to explore this graph, and to finely tune the way it is derived from the metrics which have been measured in the first phase.
 +
 +{{online:​graphsample.png|Graph sample}}
 +
 +The graph notation used is fairly straightforward. **Yellow square nodes** represent event classes, their significance (maximal value is 1.0) is provided below the event class name in each node.
 +
 +Less significant and lowly correlated behavior is discarded from the process model, i.e. nodes and arcs which fall into this category are removed from the graph. Coherent groups of less significant behavior, which is however highly correlated, is represented in aggregated form, as clusters. **Cluster nodes** are represented as green octagons, displaying the mean significance of the clustered elements and their amount.
 +
 +The internal components of clusters and their structure can be explored by clicking on the green cluster nodes. Note that clusters which comprise a large number of primitive nodes will take a considerable time to render and layout.
 +
 +Links, or arcs, drawn between nodes are decorated with the significance and correlation represented by each relation. Additionally,​ arcs will be colored in a grey shade, the lower the significance of the relation the lighter the grey.
 +
 +=== Node filter ===
 +
 +{{online:​conf_nodefilter.png|Node filter}}
 +
 +The node filter controls the amount of event classes which will be included in the displayed graph. It features a single control, the **significance cutoff**. All event classes with an aggregate significance measure lower than the value specified in the significance cutoff will be subject to filtering. Depending on their environment,​ they will be either removed from the graph or aggregated in a cluster.
 +
 +=== Edge filter ===
 +
 +{{online:​conf_edges.png|Edge filter}}
 +
 +The edge filter influences the amount of edges, and their selection, which will be included in the displayed graph. Currently there are two edge filter implementations provided.
 +
 +== Best edges filter ==
 +
 +This filter preserves the best incoming and outgoing edge for each node, i.e. event class.
 +
 +== Fuzzy edges filter ==
 +
 +The fuzzy edges filter ranks all connected edges of a node locally. The parameter **S/C ratio** configures the evaluaton method for this ranking, as a ratio between significance and correlation. When the S/C ratio is set to 1.0, only the most significant edges will be preserved; conversely, a value of 0.0 will only take correlation into account.
 +
 +The **cutoff** parameter configures the amount of edges which will be preserved within each local ranking. The lower the cutoff, the less edges will be included in the graph.
 +
 +If the checkbox **ignore self-loops** is active, relations of nodes to themselves are not included in the evaluation for local ranking, and can thus not influence the amount of preserved links.
 +
 +The checkbox **interpret absolute** will change the interpretation of the cutoff value. If it is not checked, the default setting, the cutoff value will be interpreted in a relative manner, i.e. the same percentage of links will be preserved for each node. If the cutoff value is interpreted as absolute, the cutoff is interpreted as the percentage in the value range which will be preserved, leading to higher variation per nodes.
 +
 +
 +=== Concurrency filter ===
 +
 +{{online:​conf_concurrencyfilter.png|Concurrency Filter}}
 +
 +The concurrency filter is used to resolve conflicts between two event classes. Conflicts are defined as two nodes which are connected in both directions. This may represent either a lengh-two-loop or two event types which are actually in parallel, i.e. can be executed concurrently.
 +
 +The **preserve** parameter allows you to specify, how significant two conflicting relations need to be in order to both be preserved. This parameter, usually set to a low value, ensures that real length-two-loops will not accidentially be removed.
 +
 +All conflicting relations which do not meet the preservation criteria are resolved. The **balance** parameter will influence the method of resolution. If set to a high value, conflicting relations will rather be resolved by removing both relations from the graph. If set to a low value, conflicting relations will rather be preserved by removing only the weaker relation.