Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a temporal validation period to synthetic control and interrupted time series experiments #367

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Jun 21, 2024

TODO

  • Implement for synthetic control
  • Implement for interrupted time series
  • Calculate Bayesian $R^2$ value separately for training period and validation period
  • Report Bayesian $R^2$ (including validation period where relevant) in the summary method instead of the plot title
  • Streamline the implementation - do we really need separate classes, or can we just add a intervention_time kwarg and add in additional logic to the existing classes?
  • Ensure class diagram is updated before requesting a review
  • Check docstrings are ok and doctests pass
  • Add test coverage
  • Test for ValueError when validation_time >= treatment_time

📚 Documentation preview 📚: https://causalpy--367.org.readthedocs.build/en/367/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@drbenvincent drbenvincent marked this pull request as draft June 21, 2024 13:50
@drbenvincent
Copy link
Collaborator Author

Good progress so far. But we are missing separate Bayesian $R^2$ for training and validation phases.

Synthetic control figure:
Screenshot 2024-06-21 at 14 51 07

Interrupted time series figure:
Screenshot 2024-06-21 at 14 50 53

Copy link

codecov bot commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 90.47619% with 4 lines in your changes missing coverage. Please review.

Project coverage is 85.81%. Comparing base (f6fd97c) to head (444e363).

Files Patch % Lines
causalpy/pymc_experiments.py 83.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #367      +/-   ##
==========================================
+ Coverage   85.60%   85.81%   +0.20%     
==========================================
  Files          22       22              
  Lines        1716     1748      +32     
==========================================
+ Hits         1469     1500      +31     
- Misses        247      248       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@drbenvincent
Copy link
Collaborator Author

drbenvincent commented Jun 24, 2024

Based on one of @cetagostini 's PR's (#368), I'm wondering if we should add a small feature to calculate a ROPE based on the validation period. Something a bit like this:

Screenshot 2024-06-24 at 14 36 21

Any thoughts/comments welcome. I'm not convinced this is a good idea yet - especially because once we add in actual time series models then the credible interval will increase as we forecast further into the future.

@drbenvincent drbenvincent marked this pull request as ready for review June 24, 2024 21:47
self.score = self.model.score(X=self.pre_X, y=self.pre_y)
if self.validation_time is None:
# We just have pre and post data, no validation data. So we can score the pre intervention data
self.score = self.model.score(X=self.pre_X, y=self.pre_y)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could replace on validation score by bayesian tail prob instead of R2? So, the interpretation here is about how much the real mean during the validation diverge from the posterior mean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Just looking into it so that I get it right - I just need the high level algorithm because I've not heard that much about it. Google search for "bayesian tail probability" shows very few hits. Is this a widely used approach? Doesn't matter if not, as long as it does what we want :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add validation period to synthetic control and interrupted time series
2 participants