Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major code refactor to unify quasi experiment classes #381

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Jul 2, 2024

This is a relatively major code refactor with minor breaking changes to the API. The main purpose is to eliminate the parallel class hierarchy we had. Namely, we had virtually identical experiment classes which worked with either the PyMC or scikit-learn models. There were only slight differences here to deal with the fact that PyMC models produce InferenceData objects and the scikit-learn models would produce numpy arrays.

We don't have an intention of expanding beyond PyMC or scikit-learn models, however the new code structure would make it much much easier to expand the kinds of models used. The main appeal of this is to focus on high level description of quasi-experimental methods and to abstract away from model-related implementation issues. So you could add in non-PyMC Bayesian models (see #116), or use statsmodels (see #8) to use OLS but also get confidence intervals (which you don't get from scikit-learn models).

We should have 100% passing doctests and tests, and I re-ran all the notebooks to check that we have stable performance.

Before

classes

After

classes

So now we just have a single set of quasi experiment classes, all inheriting from ExperimentalDesign

Other changes

  • I renamed ModelBuilder to PyMCModel. This seems to make more sense as it contrasts better with a new ScikitLearnModel class/mixin which gives some extra functionality to scikit-learn models.
  • I increased test coverage
  • Plotting is largely removed from the experiment classes and is now taken care of in BayesianPlotComponent or OLSPlotComponent. The experiment class automatically knows whether it should use one of the other based on the model type it has been provided. There is still some code duplication in the plotting of Bayesian vs OLS results, but right now that is not a big deal. The exception to this is some experiment classes which are only implemented with Bayesian models in mind - namely Instrumental Variables and Inverse Propensity Weighting.

API changes

The change in API for the user is relatively small. The only change should really be how the experiment classes are imported. For example:

Before

import causalpy as cp
df = cp.load_data("did")
result = cp.pymc_experiments.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group*post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

After

The import changes from cp.pymc_experiments.DifferenceInDifferences to cp.DifferenceInDifferences.

import causalpy as cp
df = cp.load_data("did")
result = cp.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group*post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

Using arbitrary scikit-learn models

One of the other changes is that scikit-learn models need to be modified to have additional methods. For example, rather than importing from sklearn (from sklearn.linear_model import LinearRegression) we now import the pre-modified version (causalpy.skl_models.LinearRegression).

Another example is provided in the scikit-learn regression discontinuity docs page. Rather than just passing in GaussianProcessRegressor (from sklearn.gaussian_process import GaussianProcessRegressor), you just have to add the ScikitLearnModel mixin to give additional functionality/methods. Like this:

from causalpy.skl_models import ScikitLearnModel


class CustomGaussianProcessRegressor(GaussianProcessRegressor, ScikitLearnModel):
    pass

kernel = 1.0 * ExpSineSquared(1.0, 5.0) + WhiteKernel(1e-1)
result = cp.RegressionDiscontinuity(
    data,
    formula="y ~ 1 + x + treated",
    model=CustomGaussianProcessRegressor(kernel=kernel),
    treatment_threshold=0.5,
)

Feedback and further changes

I'm very open to feedback. There is still room open to improve how I've managed some of the changes.

TODO's

  • Fix up the not quite perfect use of if isinstance in the experiment classes
  • Add missing module level docstrings to improve the auto generated API docs

📚 Documentation preview 📚: https://causalpy--381.org.readthedocs.build/en/381/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Jul 2, 2024

Codecov Report

Attention: Patch coverage is 97.71372% with 23 lines in your changes missing coverage. Please review.

Project coverage is 94.38%. Comparing base (287f6e1) to head (1b26499).
Report is 6 commits behind head on main.

Files Patch % Lines
causalpy/exp_inverse_propensity_weighting.py 95.50% 9 Missing ⚠️
causalpy/expt_prepostnegd.py 92.98% 4 Missing ⚠️
causalpy/expt_diff_in_diff.py 96.87% 2 Missing ⚠️
causalpy/expt_instrumental_variable.py 96.72% 2 Missing ⚠️
causalpy/expt_regression_discontinuity.py 97.01% 2 Missing ⚠️
causalpy/experiments.py 92.30% 1 Missing ⚠️
causalpy/expt_prepostfit.py 98.24% 1 Missing ⚠️
causalpy/expt_regression_kink.py 98.46% 1 Missing ⚠️
causalpy/utils.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #381      +/-   ##
==========================================
+ Coverage   85.60%   94.38%   +8.78%     
==========================================
  Files          22       29       +7     
  Lines        1716     1800      +84     
==========================================
+ Hits         1469     1699     +230     
+ Misses        247      101     -146     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have more coherent testing of the summary method
1 participant