Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting dataframe into pre-post is inconsistent with covid example #273

Open
rlaker opened this issue Nov 23, 2023 · 4 comments
Open

Splitting dataframe into pre-post is inconsistent with covid example #273

rlaker opened this issue Nov 23, 2023 · 4 comments

Comments

@rlaker
Copy link

rlaker commented Nov 23, 2023

In the COVID excess deaths example, the data is split into pre and post treatment with

self.datapre = data[data.index <= self.treatment_time]
self.datapost = data[data.index > self.treatment_time]

However, when inspecting result.datapre we see that 2020-01-01 is included with the label pre=False

image

If 2020-01-01 should be in the post set, as the df says, then the splitting should be changed to:

self.datapre = data[data.index < self.treatment_time]
self.datapost = data[data.index >= self.treatment_time]

If it should be in the pre set, then deaths_and_temps_england_wales.csv needs to be updated so that pre=True for this date

@drbenvincent
Copy link
Collaborator

Thanks for this @rlaker.

I agree with the second code snippet - date < treatment_time should be classed as pre, and date >= treatment_time should be classed as post.

It could even be worth adding some input validation and add include that under test coverage. Did you want to have a go at this and submit a PR?

@drbenvincent
Copy link
Collaborator

Though it could be worth thinking if we want treatment always defined with >=, or if there will be some occasions where a user might want > instead.

@rlaker
Copy link
Author

rlaker commented Nov 24, 2023

Happy to have a go!

@rlaker
Copy link
Author

rlaker commented Nov 24, 2023

Perhaps the confusion is that the example dataframe defined its own pre column, which was then ignored by the PrePostFit class which used treatment_time instead. Maybe the user should create the pre column and the class should do

self.datapre = data[data['pre'] == True]
self.datapost = data[data['pre'] == False]

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants