====== Event logs ====== To be able to apply process mining techniques it is essential to extract //event logs// from data sources (e.g., databases, transaction logs, audit trails, etc.). XES is the standard format for process mining supported by the majority of process mining tools. XES was adopted in 2010 by the [[http://www.win.tue.nl/ieeetfpm|IEEE Task Force on Process Mining]] as the standard format for logging events. It is now in the process of becoming an official IEEE standard. Next to [[XES]] other target formats supported by ProM are [[MXML]] and [[CSV]]. There are several tools to extract XES logs from various data sources. Next to ProM itself one can use [[:xesame:|XESame]], [[:promimport:|ProMimport]], of commercial tools like [[http://www.fluxicon.com/|Disco]]. Once the relevant data has been located, the extraction and conversion is fairly straightforward. The challenge is to select event data related to the questions an organization has. ====== What kind of data does process mining require? ====== Process mining assumes the existence of an event log where each event refers to a case, an activity, and a point in time. An event log can be seen as a collection of cases and a case can be seen as a trace/sequence of events. Event data may come from a wide variety of sources: * a database system (e.g., patient data in a hospital), * a comma-separated values (CSV) file or spreadsheet, * a transaction log (e.g., a trading system), * a business suite/ERP system (SAP, Oracle, etc.), * a message log (e.g., from IBM middleware), * an open API providing data from websites or social media, * … The presentation {{presentations:event_logs_the_input_for_process_mining.pdf|What kind of data does process mining require?}} illustrates the requirements using several concrete examples. ====== Available data sets ====== For people new to the field, it is interesting to experiment with various data sets. Therefore, this website contains pointers to various example datasets: * There is a set of [[:event_logs_and_models_used_in_book:|event logs]] used in the [[:book:|process mining book]]. This [[:event_logs_and_models_used_in_book:|set]] is used to illustrate the various process mining techniques. See for example the event log [[:[[:event_logs_and_models_used_in_book:|reviewing.xes]]. * There is a set of event logs used in the [[:courses:processmining|course on process mining]] given at TU/e in 2007-2008. * Also see the [[:prom:tutorials|ProM Tutorials]] providing various example datasets. See for example the event log in {{tutorial:repairExample.zip|repairExample.zip}}. * The [[http://data.4tu.nl/repository/|4TU.Datacentrum]] also collects [[http://data.4tu.nl/repository/collection:event_logs|event logs]] partitioned in two categories: [[http://data.4tu.nl/repository/collection:event_logs_real|real-life event logs]] and [[http://data.4tu.nl/repository/collection:event_logs_synthetic|synthetic event logs]]. The [[http://data.4tu.nl/repository/collection:event_logs|repository]] contains many benchmark data sets, including event data from hospitals, government agencies, and banks.