## Create event log object

The function read_xes can be used to read XES-files and turn the data into an event log object in R. The function needs only one argument, called xesfile. This can be a local path to a file with a .xes extension or an URL. An example XES-file can be found at the following link: https://bupar.net/eventdata/exercise1.xes. When opening this file in a browser, you will see that it is an XML-file. More information on the notation can be found here.

Importing a XES-file is easily done as follows:

data <- read_xes("https://bupar.net/eventdata/exercise1.xes")
## Warning in read_xes("https://bupar.net/eventdata/exercise1.xes"): No
## activity instance identifier specified in xes-file. By default considered
## each event as a different activity instance. Please check!
data
## Log of 11 events consisting of:
## 3 traces
## 3 cases
## 11 instances of 5 activities
## 1 resource
## Events occurred from 2008-12-09 07:20:01 until 2008-12-09 07:23:01
##
## Variables were mapped as follows:
## Case identifier:     CASE_concept_name
## Activity identifier:     activity_id
## Resource identifier:     resource_id
## Activity instance identifier:    activity_instance_id
## Timestamp:           timestamp
## Lifecycle transition:        lifecycle_id
##
## # A tibble: 11 x 7
##    CASE_concept_na~ activity_id lifecycle_id resource_id
##    <chr>            <fct>       <fct>        <fct>
##  1 Case3.0          A           complete     UNDEFINED
##  2 Case3.0          E           complete     UNDEFINED
##  3 Case3.0          D           complete     UNDEFINED
##  4 Case2.0          A           complete     UNDEFINED
##  5 Case2.0          C           complete     UNDEFINED
##  6 Case2.0          B           complete     UNDEFINED
##  7 Case2.0          D           complete     UNDEFINED
##  8 Case1.0          A           complete     UNDEFINED
##  9 Case1.0          B           complete     UNDEFINED
## 10 Case1.0          C           complete     UNDEFINED
## 11 Case1.0          D           complete     UNDEFINED
## # ... with 3 more variables: timestamp <dttm>, activity_instance_id <chr>,
## #   .order <int>

Note that in the example above, the read_xes functions emits a warnings that no activity instance identifier was found. Recall that an event log objects in R needs certain data fields to be present. However, it might be so that not all of these field are available, in which case read_xes will throw a warning or an error. Ideally, the XES-file should contain at least the following elements:

<trace>
<string key="concept:name" value="Case3.0"/>
<event>
<string key="concept:name" value="A"/>
<int key="concept:instance" value = "1"/>
<string key="org:resource" value="UNDEFINED"/>
<date key="time:timestamp" value="2008-12-09T08:20:01.527+01:00"/>
<string key="lifecycle:transition" value="complete"/>
...
</event>
...
</trace>

These elements are translated as follows to the terminolgy used by bupaR.

XES bupaR
trace concept:name case_id
event concept:name activity_id
concept:instance activity_instance_id
org:resource resource_id
time:timestamp timestamp
lifecycle:transition lifecycle_id

When there is no case identifier, an artificial case identifier with the name CASE_ID will be created based on the hierarchy of the XES-file. In case of other missing elements, either an error will be thrown, or a warning.

### Errors

An error will be thrown if the XES-files does not contains an activity identifier or a timestamp. As such these are the minimum requirements to create an event log object from a XES-file.

### Warnings

In case the lifecycle transition identifier or the resource identifier is missing, an empty placeholder variable will be created and a warning will be emitted.

In case the activity instance identifier is missing, a default activity instance identifier column will be added. This column will regard every event in the log as a distinct activity instance. A warning will be emitted noting that you should check whether this is a justified assumption.

If available, missing information can be added manually to the event log object in R by overwritting the variables, e.g. with mutate.

Note that both traces and events can have additional elements in the XES-files. These will be added as extra variables in the resulting event log. Attributes at a the level of traces will get the prefix CASE_ in their name. 1

## Create list of cases

In certain circumstances, it might be useful to have a separate list of cases with case attributes. This can be obtained using function read_xes_cases. The argument for this function is the same, i.e. a xesfile. The result is a data.frame with one row for each case and one column for each attribute. Non-existing attributes for a specific case are filled in with NA. Below, this function is illustrated using the repairExample event log, which has one case attribute called description. For the sake of illustration, the entire event log is also imported.

read_xes_cases("https://bupar.net/eventdata/repairExample.xes")
## # A tibble: 1,104 x 2
##    CASE_concept_name CASE_description
##    <chr>             <chr>
##  1 1                 Simulated process instance
##  2 10                Simulated process instance
##  3 100               Simulated process instance
##  4 1000              Simulated process instance
##  5 1001              Simulated process instance
##  6 1002              Simulated process instance
##  7 1003              Simulated process instance
##  8 1004              Simulated process instance
##  9 1005              Simulated process instance
## 10 1006              Simulated process instance
## # ... with 1,094 more rows
read_xes("https://bupar.net/eventdata/repairExample.xes")
## Warning in read_xes("https://bupar.net/eventdata/repairExample.xes"): No
## activity instance identifier specified in xes-file. By default considered
## each event as a different activity instance. Please check!
## Log of 11855 events consisting of:
## 77 traces
## 1104 cases
## 11855 instances of 8 activities
## 13 resources
## Events occurred from 1970-01-01 05:36:00 until 1970-01-24 08:16:00
##
## Variables were mapped as follows:
## Case identifier:     CASE_concept_name
## Activity identifier:     activity_id
## Resource identifier:     resource_id
## Activity instance identifier:    activity_instance_id
## Timestamp:           timestamp
## Lifecycle transition:        lifecycle_id
##
## # A tibble: 11,855 x 12
##    CASE_concept_na~ CASE_description activity_id defectFixed defectType
##    <chr>            <chr>            <fct>       <chr>       <chr>
##  1 1                Simulated proce~ Register    <NA>        <NA>
##  2 1                Simulated proce~ Analyze De~ <NA>        <NA>
##  3 1                Simulated proce~ Analyze De~ <NA>        6
##  4 1                Simulated proce~ Repair (Co~ <NA>        <NA>
##  5 1                Simulated proce~ Repair (Co~ <NA>        <NA>
##  6 1                Simulated proce~ Test Repair <NA>        <NA>
##  7 1                Simulated proce~ Test Repair true        <NA>
##  8 1                Simulated proce~ Inform User <NA>        <NA>
##  9 1                Simulated proce~ Archive Re~ true        <NA>
## 10 10               Simulated proce~ Register    <NA>        <NA>
## # ... with 11,845 more rows, and 7 more variables: lifecycle_id <fct>,
## #   numberRepairs <chr>, resource_id <fct>, phoneType <chr>,
## #   timestamp <dttm>, activity_instance_id <chr>, .order <int>

## Write XES-files

Writing of XES-files can be done using the function write_xes.

args(write_xes)
## function (eventlog, xesfile = file.choose(), case_attributes = NULL)
## NULL

It minimally requires 2 arguments:

• an event log object
• a file name/path where to store the file (if not specified, as file system window will open to save the file)

Additionally, one can specify which of the variables in the event log should be regarded as case attributes by supplying a character vector of variable names to the case_attributes argument. If this argument is not specified, all the variables start with prefix CASE_ will be considered as case attributes.

eventdataR::patients

write_xes(patients, "patients.xes")

1. On terminology: what in XES is called a trace (i.e. between tags) is called a case or process instance in bupaR. In the context of bupaR the concept trace is reserved for an activity sequence, and is not related to a specific process instance. Many process instances can share the same trace of activities. The terminology used by bupaR is in correspondence with current literature. For more information about the data model used, look here.