Table of Contents

Fuzzy Miner

The Fuzzy Miner is part of the official distribution of the ProM toolkit for Process Mining. Its purpose is to empower users to interactively explore processes from event logs. Most notably, the Fuzzy Miner is suitable for mining less-structured processes which exhibit a large amount of unstructured and conflicting behavior.

Fuzzy Miner

User documentation


  1. Open an event log file in ProM
  2. Choose Fuzzy Miner from the Mining menu

Measurement configuration

Metrics configuration

You will be presented with the Fuzzy Miner's measurement configuration panel. On the left you will find a scrollable list of configurable metrics. Atop this list is a combobox which allows you to filter the subset of displayed metrics (useful, e.g., if you are only concerned with optimizing a specific aspect of measurement).

Metrics configuration

Each metric has the same set of configuration options, which help you to optimize the measurements taken with respect to your specific situation:

Measure point configuration

On the right of the measurement configuration panel, you can see the measure-point configuration area. The Fuzzy Miner is not limited to only measuring the significance and correlation of two events directly following one another in the log, it can also measure longer-term relationships, which is what this configuration area is there for.

On the top right, a histogram visualizes the number of measurement points per events (i.e., the number of histogram bars) and their evaluation factor (height of the bars). The histogram responds instantly to any configuration changes you make on the lower right of the configuration area.

The bottom right area holds the measure point and attenuation settings. The maximal event distance to measure sets the number of measure points, i.e. how far the measurer will “look back” from each event in the log for applying the metrics.


Obviously, you want longer-distance relationships to affect the measurement less than direct following relationships. This is what the attenuation settings are for. The simplest attenuation is the Linear attenuation, which will ensure linear attenuation with rising distance of events.

The Nth root attenuation allows for negative exponential attenuation by configuring an Nth root function. A relatively high radical value will progressively attenuate the longer-distance measure points, which is useful when you want to focus on short-term relationships. A relatively low radical helps when you have frequent interlockings of noise events which obscure the important relationships.

Measurement phase

The measurement phase is straightforward to use and understand: Just click the start mining button on the lower right, and the measurer will apply all metrics to the log, using the number of measurement points and attenuation you specified. The aggregate metrics which are created in this step will then be used for the final mining phase.

Mining phase: Unary metrics view

Unary metrics view

The rightmost tab in the result view shows you the measurements of all metrics taken for all event classes in a multi-curve two-dimensional graph. This allows you to gain a quick and intuitive feedback on whether your metrics configuration was appropriate. Hover the mouse over the graph, and it will reveal the precise measurement values and event classes. Click the colored squares in the lower left to toggle visibility of each metric in the graph.

Mining phase: Binary metrics view

Binary metrics view

The binary metrics are shown in form of a colored matrix. Use the mouse to hover over the matrix, which will reveal the precise measurement values for each event class combination. The combobox on the top allows you to switch between all metrics applied to the log.

Mining phase: Graph view

Graph view

The ultimate goal of the Fuzzy Miner is to create an appropriate graph representation of the process expressed in the mined log. The graph view allows you to explore this graph, and to finely tune the way it is derived from the metrics which have been measured in the first phase.

Graph sample

The graph notation used is fairly straightforward. Yellow square nodes represent event classes, their significance (maximal value is 1.0) is provided below the event class name in each node.

Less significant and lowly correlated behavior is discarded from the process model, i.e. nodes and arcs which fall into this category are removed from the graph. Coherent groups of less significant behavior, which is however highly correlated, is represented in aggregated form, as clusters. Cluster nodes are represented as green octagons, displaying the mean significance of the clustered elements and their amount.

The internal components of clusters and their structure can be explored by clicking on the green cluster nodes. Note that clusters which comprise a large number of primitive nodes will take a considerable time to render and layout.

Links, or arcs, drawn between nodes are decorated with the significance and correlation represented by each relation. Additionally, arcs will be colored in a grey shade, the lower the significance of the relation the lighter the grey.

Node filter

Node filter

The node filter controls the amount of event classes which will be included in the displayed graph. It features a single control, the significance cutoff. All event classes with an aggregate significance measure lower than the value specified in the significance cutoff will be subject to filtering. Depending on their environment, they will be either removed from the graph or aggregated in a cluster.

Edge filter

Edge filter

The edge filter influences the amount of edges, and their selection, which will be included in the displayed graph. Currently there are two edge filter implementations provided.

Best edges filter

This filter preserves the best incoming and outgoing edge for each node, i.e. event class.

Fuzzy edges filter

The fuzzy edges filter ranks all connected edges of a node locally. The parameter S/C ratio configures the evaluaton method for this ranking, as a ratio between significance and correlation. When the S/C ratio is set to 1.0, only the most significant edges will be preserved; conversely, a value of 0.0 will only take correlation into account.

The cutoff parameter configures the amount of edges which will be preserved within each local ranking. The lower the cutoff, the less edges will be included in the graph.

If the checkbox ignore self-loops is active, relations of nodes to themselves are not included in the evaluation for local ranking, and can thus not influence the amount of preserved links.

The checkbox interpret absolute will change the interpretation of the cutoff value. If it is not checked, the default setting, the cutoff value will be interpreted in a relative manner, i.e. the same percentage of links will be preserved for each node. If the cutoff value is interpreted as absolute, the cutoff is interpreted as the percentage in the value range which will be preserved, leading to higher variation per nodes.

Concurrency filter

Concurrency Filter

The concurrency filter is used to resolve conflicts between two event classes. Conflicts are defined as two nodes which are connected in both directions. This may represent either a lengh-two-loop or two event types which are actually in parallel, i.e. can be executed concurrently.

The preserve parameter allows you to specify, how significant two conflicting relations need to be in order to both be preserved. This parameter, usually set to a low value, ensures that real length-two-loops will not accidentially be removed.

All conflicting relations which do not meet the preservation criteria are resolved. The balance parameter will influence the method of resolution. If set to a high value, conflicting relations will rather be resolved by removing both relations from the graph. If set to a low value, conflicting relations will rather be preserved by removing only the weaker relation.