Event data are everywhere!
Data are collected about anything, at any time, and at any place. Operational processes in finance, insurance, government, healthcare, production, logistics, education, and maintenance are no exception. A starting point for process mining is the event data collected by the information systems supporting such processes.
Data for process mining
event data
Process mining assumes the existence of an event log where each event refers to a case, an activity, and a point in time. An event log can be seen as a collection of cases and a case can be seen as a trace/sequence of events.
Event data may come from a wide variety of sources:
  • a database system (e.g., patient data in a hospital),
  • a comma-separated values (CSV) file or spreadsheet,
  • a transaction log (e.g., a trading system),
  • a business suite/ERP system (SAP, Oracle, etc.),
  • a message log (e.g., from IBM middleware),
  • an open API providing data from websites or social media,
  • ...
Formats of event data
XES
eXtensible Event Stream (XES) is the standard format for process mining supported by the majority of process mining tools. XES was adopted in 2010 by the IEEE Task Force on Process Mining as the standard format for logging events. It has become an official IEEE standard in 2016.
Currently, there are over 25 commercial process mining tools. The adoption of process mining has been accelerating in recent years. Tools like Disco (Fluxicon), Celonis Process Mining, ProcessGold Enterprise Platform, Minit, myInvenio, Signavio Process Intelligence, QPR ProcessAnalyzer, LANA Process Mining, Rialto Process, Icris Process Mining Factory, Worksoft Analyze & Process Mining for SAP, SNP Business Process Analysis, web-Methods Process Performance Manager, and Perceptive Process Mining are now available. Moreover, open-source tools like ProM, ProM Lite, and RapidProM are widely used. It is vital that event data can be exchanged between these tools. Several of these tools already support XES. For example, it is easy to exchange XES data between Disco, Celonis, ProM, Rialto Process, minit, and SNP.
Purpose: The purpose of this standard is to provide a generally acknowledged XML format for the interchange of event data between information systems in many application domains on the one hand and analysis tools for such data on the other hand. As such, this standard aims to fix the syntax and the semantics of the event data which, for example, is being transferred from the site generating this data to the site analyzing this data. As a result of this standard, if the event data is transferred using the syntax as described by this standard, its semantics will be well understood and clear at both sites.
Available data sets in XES:
Object-Centric Event Logs (OCEL)

Input for process mining is an event log. A traditional event log views a process from a particular angle provided by the case notion that is used to correlate events. Each event in such an event log refers to (1) a particular process instance (called a case), (2) an activity, and (3) a timestamp. There may be additional event attributes referring to resources, people, costs, etc., but these are optional. With some effort, such data can be extracted from any information system supporting operational processes. Process mining uses these event data to answer a variety of process-related questions.

The assumption that there is just one case notion and that each event refers to precisely one case is problematic in real-life processes. Therefore, we drop the case notion and assume that an event can be related to any number of objects. In such an object-centric event log, we distinguish different order types (e.g., orders, items, packages, customers, and products). Each event has three types of attributes:
  • Mandatory attributes like activity and timestamp.
  • Per object type, a set of object references (zero or more per object type).
  • Additional attributes (e.g., costs, etc.).
Purpose: The purpose of the OCEL standard is to provide a general standard to interchange object-centric event data with multiple case notions. We set the following goals for the standard:
  • Interoperability: with the provision of the OCEL standard and JSON/XML serializations of OCEL, we want to support a widespread collection of languages and systems.
  • Generalization: the standard supports the storage of events, objects, and their attributes. Furthermore, the standard can be extended.
  • Provision of a collection of examples: example logs, extracted from information systems supporting some widespread business processes, are provided for the OCEL standard.
  • Tool/Library Support: to support the implementation of OCEL in custom applications, tool/library support shall be provided.
CSV
Ideally, event logs are stored in the standard format for process mining XES. However, the native format is seldom and an event log. Often Comma-Separated Values (CSV) files are used as an intermediate format. The rows in a CSV file correspond to events and the columns to attributes of events. There should be columns for the case identifier, the activity name, and the timestamp of an event, but there may be many more attributes.
ProM and most other process mining tools can convert a CSV file into an event log by assigning columns to process mining concepts.
Available data sets in CSV: