Event logs

To be able to apply process mining techniques it is essential to extract event logs from data sources (e.g., databases, transaction logs, audit trails, etc.). XES is the standard format for process mining supported by the majority of process mining tools. XES was adopted in 2010 by the IEEE Task Force on Process Mining as the standard format for logging events. It is now in the process of becoming an official IEEE standard. Next to XES (eXtensible Event Stream) other target formats supported by ProM are MXML (Mining eXtensible Markup Language) and CSV files.

There are several tools to extract XES logs from various data sources. Next to ProM itself one can use XESame, ProMimport , of commercial tools like Disco .

What kind of data does process mining require?

Process mining assumes the existence of an event log where each event refers to a case, an activity, and a point in time. An event log can be seen as a collection of cases and a case can be seen as a trace/sequence of events.

Event data may come from a wide variety of sources:

  • a database system (e.g., patient data in a hospital),
  • a comma-separated values (CSV) file or spreadsheet,
  • a transaction log (e.g., a trading system),
  • a business suite/ERP system (SAP, Oracle, etc.),
  • a message log (e.g., from IBM middleware),
  • an open API providing data from websites or social media,

The presentation What kind of data does process mining require? illustrates the requirements using several concrete examples.

Available data sets

For people new to the field, it is interesting to experiment with various data sets. Therefore, this website contains pointers to various example datasets: