Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian power analysis #276 #277

Closed
wants to merge 10 commits into from

Conversation

cetagostini
Copy link

@cetagostini cetagostini commented Dec 5, 2023

Hey everyone!

As we discussed in issue #276, I've added a new function to CausalPy that performs power analysis. This is based on my internal work on estimating effects and conducting sensitivity analysis within Bayesian frameworks.

What will you find?

  1. Notebook
  2. Code example

I hope this explanation makes sense. Ideally, I'd like to have a few rounds of feedback and answer any questions you may have before moving on to unit testing.

Looking forward to hearing your thoughts!

cc: @drbenvincent

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@drbenvincent drbenvincent self-requested a review December 12, 2023 11:03
@drbenvincent
Copy link
Collaborator

FYI: I changed the filename of the notebook to snake case and I also update the pre-commits to use the --fix flag for nbqa-ruff. So this has modified the notebook a little - mainly just removing redundant imports I believe.

More from me soon...

docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
@@ -0,0 +1,1412 @@
{
Copy link
Collaborator

@drbenvincent drbenvincent Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the following syntax for the note, and it will render as a nice admonition box:

:::{note}

MY TEXT HERE

:::


Reply via ReviewNB

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, should look better now? I think is not render well for my Jupyter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me
Screenshot 2024-01-05 at 16 21 00

docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/power_analysis.ipynb Outdated Show resolved Hide resolved
Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cetagostini! This looks great. So far I've just gone through a relatively quick pass, mostly focussing on some aspects of the example notebook. I will certainly have a few more comments (e.g. tests) but I'm just submitting these review comments now so that I'm not a blocker in the progress to getting this merged :)

@drbenvincent
Copy link
Collaborator

In terms of the failing doctest, I ran these locally and they failed. Then re-build the environment and they passed. So perhaps there's just a version issue here. Will let you know if I figure out something more definitive.

@drbenvincent
Copy link
Collaborator

Remote failing tests should hopefully be fixed whenever pymc-devs/pytensor#550 is fixed and make its way into a new pytensor release. Or when #279 is resolved. So don't sink any effort into trying to fix failing remote tests right now. You can of course ensure that all local tests pass - they are not affected by the issue. Not on my machine anyway.

@drbenvincent drbenvincent added the enhancement New feature or request label Dec 22, 2023
Copy link

codecov bot commented Dec 22, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (70de921) 76.27% compared to head (c534c55) 77.40%.

Files Patch % Lines
causalpy/pymc_experiments.py 89.04% 8 Missing ⚠️
causalpy/utils.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #277      +/-   ##
==========================================
+ Coverage   76.27%   77.40%   +1.12%     
==========================================
  Files          20       20              
  Lines        1332     1425      +93     
==========================================
+ Hits         1016     1103      +87     
- Misses        316      322       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@drbenvincent
Copy link
Collaborator

Remote tests seem to be passing now :) Give me a shout whenever you manage to work on updates @cetagostini :)

@cetagostini
Copy link
Author

Hey hey @drbenvincent, All the pre-commits are working on my side, except one related to ´pkg_resources´.

Let me know, if extra changes are needed.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvements. I didn't get time to properly review the example notebook, but I've left a bunch of change requests for the code.

If there are changes when you address the point about where in the class hierarchy the methods are, can you run make uml and that will update the UML class diagram which we have displayed in CONTRIBUTING.md

If you're ok then we'll just iterate a bit more. But like I said before, this is great so far.

"""
return pd.DataFrame(self._power_estimation(alpha=alpha, correction=correction))

def power_plot(self, alpha: float = 0.05, correction: bool = False) -> plt.Figure:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I request this is called plot_power instead?


# Posterior Mean
results["posterior_estimation"] = {
"cumulative": round(np.mean(_mu_samples_cumulative.values), 2),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change all instances of round(VALUE, 2) to round_num(VALUE, round_to) where round_to is a used-provided kwarg.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, I think we should only do this rounding at the point we display the results. So where we calculate results, let's remove this rounding, and instead only have it when results are outputted.

If that's done by plotting pandas dataframes, then it's also possible to use pd.set_option('display.precision', 2) for example in the example notebook.

}
return results

def power_summary(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user-facing methods power_summary and plot_power are currently methods of the PrePostFit class. You can see from the UML class diagram that these methods will be available to both the SyntheticControl experiment class and the Interrupted Time Series class.

Clearly from the example you are using Bayesian Power Analysis in the context of synthetic control, but can I ask... Does it make sense that these methods are also available in the interrupted time series situations? I don't know the answer, I've not thought about it. But we want to make sure that these methods are placed at the right point in the class hierarchy.

If the Bayesian Power Analysis is also appropriate for the interrupted time series context, then the best possible outcome would be to also create an example notebook for that. But I'd be fine if you instead created a separate issue to deal with that in the near future.

But if this new functionality only makes sense in the context of synthetic control, then can I ask you to either a) move these new methods into the SyntheticControl class, or b) create a Bayesian Power Analysis mix-in class instead. I don't mind which, but I think CausalPy will probably transition from the current class hierarchy to using mix-ins.

"""
Summarize the power estimation results in a DataFrame format.

This function calls '_power_estimation' to perform power estimation calculations and then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the docs are set up to not render docstrings for private methods. At the moment I think this is fine because the API docs will focus only on what users are intended to use. If we keep it like that, then we might want to remove references to private methods in the public docstrings.

minimum detectable effects (MDE) for both cumulative and mean measures. It can apply corrections to
account for systematic differences in the data.

Parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the parameters and returns parts of the docstrings are not rendering properly when you build the docs. Can you check the notation used in the other docstrings and just double check this is rendering ok?

This function calls '_power_estimation' to perform power estimation calculations and then
converts the resulting dictionary into a pandas DataFrame for easier analysis and visualization.

Parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about the parameters and returns blocks not rendering properly in when you build and check the docs locally.

along with the confidence intervals, real mean, and posterior mean values. It allows for adjustments based on
systematic differences if the correction is applied.

Parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about the parameters and returns blocks not rendering properly in when you build and check the docs locally.

"""
Calculate the probability of a given value being in a distribution defined by the posterior,

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about the parameters and returns blocks not rendering properly in when you build and check the docs locally.

if cdf_x <= 0.5:
probability = 2 * (cdf_x - cdf_lower) / (1 - cdf_lower - cdf_upper)
else:
probability = 2 * (1 - cdf_x + cdf_lower) / (1 - cdf_lower - cdf_upper)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safe, and beneficial, to make sure this line is covered by tests

else:
probability = 2 * (1 - cdf_x + cdf_lower) / (1 - cdf_lower - cdf_upper)

return abs(round(probability, 2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that we should not round the result at this point, only when displaying the results.

@cetagostini
Copy link
Author

@drbenvincent Starting this week to add the changes requested 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants