Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract edges values as a data frame? #37

Open
lejarx opened this issue Nov 18, 2020 · 5 comments
Open

How to extract edges values as a data frame? #37

lejarx opened this issue Nov 18, 2020 · 5 comments

Comments

@lejarx
Copy link

lejarx commented Nov 18, 2020

Hi,

First of all, thank you so much for this wonderful package, I really hope to use only this package for process mining analysis, even though there're other tools out there. But as an R user, I definitely want to give bupaR a try first.

image

I would need some help with extracting values from different edges into a tibble.

For example, in from the screenshot, Open to Pending User Info is 46.44 hours, Work in Progress to Closed is 1.28 hours.

Output would be:
From | To | Value
Open | Pending User Info | 46.44
Work in Progress | Closed | 1.28

Thank you

@lejarx
Copy link
Author

lejarx commented Nov 23, 2020

library(bupaR)
res <- patients %>%
  filter_activity_presence(c("X-Ray", "MRI SCAN"), method = "none")  %>%
  process_map()

glimpse(res)

attr(res, 'base_precedence')
attr(res, 'edges')

I can extract it this way

@gertjanssenswillen
Copy link
Member

Hi

You can indeed use the attributes. Alternatively, you can use the function get_flows to extract the edge information (instead of getting the edges attribute) - this will probably be more stable towards the future. The base_precedence data is more detailed, you can still the way change the computation of performance metrics (mean, median, etc).

Its on the to-do list to make these functions clearer and documented.

@lejarx
Copy link
Author

lejarx commented Nov 23, 2020

@gertjanssenswillen thanks for the information on get_flows. I notice that the performance metrics is calculating based on calendar days, meaning it simply calculate the timestamp differences to calculate the performance metrics.

Are there some examples of how can I write the FUN function so that the performance metrics are calculated based on business hours in a day (8 hours) instead of the default 24 hours window?

@gertjanssenswillen
Copy link
Member

@lejarx not straigthforwardly at the moment. How would such a calculation work for you, ideally?
It is "easy" to not count days in a weekend, because they are known. How would you suggest to count only business hours?

  • Taking into account business hours, like from 9 to 5? Or only count a max of 8 hours a day?
  • what if anything happens outside business hours, (or make assumption its not possible?)

Certainly something we can add, but let's take a minute to get the idea right.

@lejarx
Copy link
Author

lejarx commented Nov 26, 2020

@gertjanssenswillen, let me know if we can create a new thread on this time calculation. It's good you ask.

It's probably more complex due to the possibility to pause a task (i.e. pending external information), and different business time zones, among others.

But yes, I think for a start, we can assume a blanket rule of business hours from 9 to 5 based on only one timezone.
I think by limiting to 9 to 5, we can already avoid overcalculating and this will be the closer to the actual performance than doing the difference between end and start of activities.

@gertjanssenswillen gertjanssenswillen transferred this issue from bupaverse/bupaR Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants