-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major code refactor to unify quasi experiment classes #381
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #381 +/- ##
==========================================
+ Coverage 85.60% 94.38% +8.78%
==========================================
Files 22 29 +7
Lines 1716 1800 +84
==========================================
+ Hits 1469 1699 +230
+ Misses 247 101 -146 ☔ View full report in Codecov by Sentry. |
This is a relatively major code refactor with minor breaking changes to the API. The main purpose is to eliminate the parallel class hierarchy we had. Namely, we had virtually identical experiment classes which worked with either the PyMC or scikit-learn models. There were only slight differences here to deal with the fact that PyMC models produce
InferenceData
objects and the scikit-learn models would produce numpy arrays.We don't have an intention of expanding beyond PyMC or scikit-learn models, however the new code structure would make it much much easier to expand the kinds of models used. The main appeal of this is to focus on high level description of quasi-experimental methods and to abstract away from model-related implementation issues. So you could add in non-PyMC Bayesian models (see #116), or use
statsmodels
(see #8) to use OLS but also get confidence intervals (which you don't get from scikit-learn models).We should have 100% passing doctests and tests, and I re-ran all the notebooks to check that we have stable performance.
summary
method, so closes Have more coherent testing of thesummary
method #305Before
After
So now we just have a single set of quasi experiment classes, all inheriting from
ExperimentalDesign
Other changes
ModelBuilder
toPyMCModel
. This seems to make more sense as it contrasts better with a newScikitLearnModel
class/mixin which gives some extra functionality to scikit-learn models.BayesianPlotComponent
orOLSPlotComponent
. The experiment class automatically knows whether it should use one of the other based on the model type it has been provided. There is still some code duplication in the plotting of Bayesian vs OLS results, but right now that is not a big deal. The exception to this is some experiment classes which are only implemented with Bayesian models in mind - namely Instrumental Variables and Inverse Propensity Weighting.API changes
The change in API for the user is relatively small. The only change should really be how the experiment classes are imported. For example:
Before
After
The import changes from
cp.pymc_experiments.DifferenceInDifferences
tocp.DifferenceInDifferences
.Using arbitrary scikit-learn models
One of the other changes is that scikit-learn models need to be modified to have additional methods. For example, rather than importing from sklearn (
from sklearn.linear_model import LinearRegression
) we now import the pre-modified version (causalpy.skl_models.LinearRegression
).Another example is provided in the scikit-learn regression discontinuity docs page. Rather than just passing in
GaussianProcessRegressor
(fromsklearn.gaussian_process import GaussianProcessRegressor
), you just have to add theScikitLearnModel
mixin to give additional functionality/methods. Like this:Feedback and further changes
I'm very open to feedback. There is still room open to improve how I've managed some of the changes.
TODO's
if isinstance
in the experiment classes📚 Documentation preview 📚: https://causalpy--381.org.readthedocs.build/en/381/