Differences

This shows you the differences between two versions of the page.

Link to this comparison view

online:activityclusteringminer [2009/06/28 13:53] (current)
Line 1: Line 1:
 +====== Activity Clustering Miner ======
 +
 +The Activity Clustering Miner plugin for ProM has the purpose of mining higher-level,​ repetitive clusters which are supposed to resemble activity executions, from a detailled log of low-level events. For that purpose, a set of heuristics are employed which are resilient towards minor differences and do not consider the ordering of low-level events.
 +
 +===== Input =====
 +
 +The Activity Clustering Miner takes a regular event log (loaded from an MXML file) as input.
 +
 +===== Configuration =====
 +
 +The configuration window of the plugin looks as follows:
 +
 +{{online:​activityclusteringminer_config.png|}}
 +
 +Clusters are scanned using a number of properties. A scan window is slided over the log, of which the **maximum number of events** (discrete) can be given as the first option. The second option allows you to restrict the scan window size in terms of time, i.e. the **maximal time spent between two events in one cluster**.
 +
 +There are a number of options you can employ to improve clustering results:
 +
 +  * **Interpret proximity as split-on-break threshold**:​ The standard behavior is as described above, i.e. that the scan window time refers to events within one cluster. If this option is checked, the algorithm will break clusters if two events are farther apart than the time span given, i.e. interpret this time distance in a negative fashion.
 +
 +  * **Enforce uniform originator in cluster:** If this option is checked, only events from one originator, i.e. resource or person, can be contained in one cluster.
 +
 +  * **Enforce uniform event type in cluster:** If this option is checked, only events of the same event type can be contained in one cluster.
 +
 +  * **Monitor initial cluster set consistency continuously:​** This is a debug option, used to reveal problems appearing during the clustering pass. You may safely ignore it for normal operation.
 +
 +By choosing an **object equivalence relation** you can influence the way clusters are aggregated in a major way. Each equivalence relation has their own way of analyzing two clusters, and asssessing whether they belong to the same type and should thus be aggregated.
 +
 +Choosing to **consolidate the aggregated set before deriving the minimal, conflict-free set** may improve the performance when a large number of initial clusters are scanned. The final result is not affected qualitatively by this option.
 +
 +Picking a **method for cluster aggregation** also influences the way clusters are built from initial clusters. This mainly influences the **footprint**,​ or event set, of higher-level clusters (e.g., union of aggregated sets, or the like).
 +
 +The **minimal conflict-free set decision balance** influences the decision of how to derive the final set of clusters from aggregated clusters. You can either tilt this balance towards the **footprint size**, i.e. the number of event classes contained in one cluster. This will generally yield larger clusters. Or, you may choose to skew the balance towards the **number of aggregated clusters**, which tends to pick final clusters according to their **support**,​ i.e. the number of initial clusters supposed to implement that type.
 +
 +Finally, you can configure the algorithm to use multiple **iterations** in deriving the minimal, conflict-free set of clusters. This usually does not affect the final result significantly,​ but may improve the fidelity of the algorithm in complicated cases, i.e. where a high number of aggregated clusters survive the initial conflict decision pass. Usually, this option may be safely ignored and left set to 1.
 +
 +
 +===== Result View =====
 +
 +The result view of the Activity Clustering Miner looks as follows:
 +
 +{{online:​activityclusteringminer_result.png|}}
 +
 +The top pane shows a temporal view on the log, with selected clusters shown occupying the time frame on the log, which stretches from left to right. At the right of the log pane, you can influence the zoom factor of the log view, and the way the cluster markers are rendered.
 +
 +The bottom of the view features the **cluster browser**. You can choose any selection of aggregated clusters (i.e., cluster types) from either the aggregated or minimal conflict-free set. Initial clusters which belong to the selected types will be listed in the middle part of the lower display. Selecting any homogenous set of clusters will reveal their **footprint elements** in the right pane. Note that the log view will update to show the current selection of clusters only, i.e. if you want to see all clusters in the log, you will need to select all cluster types in the leftmost list.
 +