Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leaks in Detector1Pipeline #8668

Closed
TheSkyentist opened this issue Jul 23, 2024 · 7 comments
Closed

Memory Leaks in Detector1Pipeline #8668

TheSkyentist opened this issue Jul 23, 2024 · 7 comments

Comments

@TheSkyentist
Copy link

Summary:

Repeated runs of the Detector 1 Pipeline see a gradual increase in memory footprint. These appear to be related to the PipelineStep clone method. The leak corresponds to approximately 5x the size of the input data file per step of the pipeline that is run.

Context:

When testing #8667, I attempted to run for loops over the Detector1Pipeline to see the memory usage variance. Instead, I saw a consistent rise in the memory footprint over time:

NIRISS-WFSS_long.pdf

Discussing with @jdavies-st leads us to believe that this is likely a memory leak that is approximately 5x the size of the input data file per step of the pipeline.

Here is a .zip file containing all of the plots: long.zip

Possible Cause:

I have also attempted to profile this with https://github.com/bloomberg/memray?tab=readme-ov-file but I am a lot less familiar with this software, so my interpretation is likely less informed than an expert. However, when filtering specifically for leaks, it shows that the objects that are never de-referenced are produced by the stdatamodel DataModel clone method. I am unsure if this is therefore an issue with the stdatamodel package or its implementation in the jwst package.

Additional Thoughts:

It is unclear to me if this issue is related to #8667 or if they are solved by two different fixes, but they may both be due to parts of the pipeline sticking around for longer than they are needed, perhaps as evidenced by the fact that the memory footprint of the Detector1Pipeline grows over time, regardless of the type of step being run. I understand that it is important to copy the input model so as not to corrupt the original data in case of a failed run, but I wonder if there are strategies to mitigate this slightly.

Code:

#! /usr/bin/env python

# import os
import argparse
from jwst.pipeline import Detector1Pipeline

# Define the command line arguments
parser = argparse.ArgumentParser(description='Run the JWST pipeline on a FITS file')
parser.add_argument('file', type=str, help='Path to the FITS file')
args = parser.parse_args()
file = args.file


# Define Detector 1 steps (skip everything before jump)
steps = dict(
    persistence=dict(
        save_trapsfilled=False  # Don't save trapsfilled file
    ),
)

for i in range(10):
    _ = Detector1Pipeline.call(file,steps=steps)
    del _
@braingram
Copy link
Collaborator

braingram commented Jul 24, 2024

Thanks for the detailed issue. The plots, examples and description are very helpful.

If possible, would you try your test with a candidate change for the stpipe dependency by installing the source branch for this PR:
spacetelescope/stpipe#171
One option to install it would be to run:

pip install git+https://github.com/braingram/stpipe.git@log_records

We are currently testing the changes in the PR and are hopeful it will address some of the memory issues. However I think (given the example you provided) that there may be more than one.

@TheSkyentist
Copy link
Author

Unfortunately the upgrades to stpipe do not solve the memory leak issue. When pulling jwst and stpipe from HEAD I still see the same behavior:
MIRI-IFU_long.pdf

However, I have a hint as to where this problem was introduced. I installed jwst==14.1 and stpipe==0.5.2 and ran some quick tests and I see that the leaks disappear! Of course, the runtime of the Detector1Pipeline is much longer as well:
NIRISS-IMAGE_long.pdf

@braingram
Copy link
Collaborator

Thanks for doing the extra tests. Would you share:

  • the input file you are processing
  • the output of pip freeze for the environment that shows the leak after installing stpipe from main

@TheSkyentist
Copy link
Author

I'm happy to! All files I am processing are listed in #8667 broken down by JWST instrument/mode, and each plot also includes the input file in the title.

Here is the full pip freeze. I am on Python 3.12.4

asdf==3.3.0
asdf-astropy==0.6.1
asdf_coordinates_schemas==0.3.0
asdf_standard==1.1.1
asdf_transform_schemas==0.5.0
asdf_wcs_schemas==0.4.0
astropy==6.1.2
astropy-iers-data==0.2024.7.22.0.34.13
attrs==23.2.0
BayesicFitting==3.2.1
certifi==2024.7.4
charset-normalizer==3.3.2
contourpy==1.2.1
crds==11.18.0
cycler==0.12.1
drizzle==1.15.2
filelock==3.15.4
fonttools==4.53.1
future==1.0.0
gwcs==0.21.0
idna==3.7
imageio==2.34.2
importlib_metadata==8.2.0
jmespath==1.0.1
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jwst @ git+https://github.com/spacetelescope/jwst@7d97170f080771ef1f7a6276052f250089adf3bb
kiwisolver==1.4.5
lazy_loader==0.4
matplotlib==3.9.1
networkx==3.3
numpy==1.26.4
opencv-python-headless==4.10.0.84
packaging==24.1
Parsley==1.3
photutils==1.13.0
pillow==10.4.0
poppy==1.1.1
psutil==6.0.0
pyerfa==2.0.1.4
pyparsing==3.1.2
python-dateutil==2.9.0.post0
PyYAML==6.0.1
referencing==0.35.1
requests==2.32.3
rpds-py==0.19.1
scikit-image==0.24.0
scipy==1.14.0
semantic-version==2.10.0
setuptools==71.0.4
six==1.16.0
spherical_geometry==1.3.2
stcal @ git+https://github.com/spacetelescope/stcal@5fbabf672bc4f9a5c1eb522bc7b8896a6a5dfcc8
stdatamodels==2.0.0
stpipe @ git+https://github.com/spacetelescope/stpipe@ab747f93b0478a0d6400498dd8b9089113e49cb5
stsci.image==2.3.9
stsci.imagestats==1.8.3
stsci.stimage==0.2.9
synphot==1.4.0
tifffile==2024.7.24
tweakwcs==0.8.8
urllib3==2.2.2
wheel==0.43.0
wiimatch==0.3.2
zipp==3.19.2

@braingram
Copy link
Collaborator

braingram commented Jul 28, 2024

Thanks again for the extra testing and information.

Would you also post the pip freeze for the jwst==1.14.1 environment that doesn't show the leak? Specifically I'm interested in the stcal version.

If you're willing to run another test, would you try the current main versions of:

  • jwst
  • stpipe

and the latest released version of stcal with the ramp fitting algorithm set to OLS instead of OLS_C? The 1.15.x versions are using new c code for ramp fitting and I think the above test setup should tell us if the memory leak is due to that code. See #8607 for the PR that changed the setting.

I'm working on replicating your test setup. Thank you again for all your work debugging this.

@braingram
Copy link
Collaborator

I think that's the remaining issue. Here is the memory usage of 5 runs of detector 1 with jw04681444001_02201_00002_nis (the NIRISS-IMAGE) file. Aside from reducing the number of runs (for my impatience) I set ramp_fit.algorithm to OLS (instead of the default OLS_C).
Screenshot 2024-07-28 at 11 31 57 AM
As a comparison here is the same setup using OLS_C:
Screenshot 2024-07-28 at 11 40 02 AM

I would be greatly appreciative if you would still try the above change with your setup (to make sure I'm not on a goose chase) :)

My pip freeze is the following. I mostly reused an existing environment so this warrants some scrutiny.

Click to expand
alabaster==0.7.16
asdf==3.2.0
asdf-astropy==0.6.0
asdf_coordinates_schemas==0.3.0
asdf_standard==1.1.1
asdf_transform_schemas==0.5.0
asdf_unit_schemas==0.2.0
asdf_wcs_schemas==0.4.0
astropy==6.0.0
astropy-iers-data==0.2024.3.18.0.29.47
astropy-sphinx-theme==1.1
asttokens==2.4.1
attrs==23.2.0
Babel==2.15.0
basemap==1.4.1
basemap-data==1.3.2
BayesicFitting==3.2.0
blosc==1.11.1
cachetools==5.3.3
certifi==2024.2.2
chardet==5.2.0
charset-normalizer==3.3.2
ci-watson==0.6.2
colorama==0.4.6
contourpy==1.2.0
coverage==7.4.4
crds==11.17.19
cycler==0.12.1
decorator==5.1.1
distlib==0.3.8
docutils==0.19
drizzle==1.15.1
exceptiongroup==1.2.0
execnet==2.0.2
executing==2.0.1
filelock==3.13.1
fonttools==4.50.0
future==1.0.0
gwcs==0.21.0
idna==3.6
imageio==2.34.0
imagesize==1.4.1
importlib_metadata==7.1.0
iniconfig==2.0.0
ipython==8.22.2
jedi==0.19.1
Jinja2==3.1.3
jmespath==1.0.1
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
-e git+ssh://[email protected]/braingram/jwst@7198d892ddece590371b41d0a83b12656560325b#egg=jwst
kiwisolver==1.4.5
lazy_loader==0.3
linkify-it-py==2.0.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mdit-py-plugins==0.4.0
mdurl==0.1.2
memray==1.11.0
mistune==3.0.2
networkx==3.2.1
numpy==1.26.4
numpydoc==1.7.0
objgraph==3.6.1
opencv-python-headless==4.9.0.80
packaging==23.2
Parsley==1.3
parso==0.8.3
pexpect==4.9.0
photutils==1.11.0
pillow==10.2.0
platformdirs==4.2.0
pluggy==1.4.0
poppy==1.1.1
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyerfa==2.0.1.1
Pygments==2.17.2
Pympler==1.0.1
pyparsing==3.1.2
pyproj==3.6.1
pyproject-api==1.6.1
pyshp==2.3.1
pytest==8.1.1
pytest-cov==4.1.0
pytest-doctestplus==1.2.1
pytest-xdist==3.5.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
readchar==4.0.6
referencing==0.34.0
requests==2.31.0
requests-mock==1.11.0
rich==13.7.1
rpds-py==0.18.0
ruff==0.3.4
scikit-image==0.22.0
scipy==1.14.0
semantic-version==2.10.0
six==1.16.0
snakeviz==2.2.0
snowballstemmer==2.2.0
spherical-geometry==1.3.1
Sphinx==6.2.1
sphinx-asdf==0.2.4
sphinx-astropy==1.9.1
sphinx-automodapi==0.17.0
sphinx-gallery==0.16.0
sphinx-rtd-theme==2.0.0
sphinxcontrib-applehelp==1.0.8
sphinxcontrib-devhelp==1.0.6
sphinxcontrib-htmlhelp==2.0.5
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.7
sphinxcontrib-serializinghtml==1.1.10
stack-data==0.6.3
-e git+ssh://[email protected]/braingram/stcal@2112773e603fac985c28f603eb6cc1bdd168adf3#egg=stcal
-e git+ssh://[email protected]/braingram/stdatamodels@0326fd4aa771c42a731eb0e5e0bd9bb2fc267789#egg=stdatamodels
-e git+ssh://[email protected]/braingram/stpipe@8be0711eee0b3a7d2e4d28c4c0d807fa3804d25f#egg=stpipe
stsci-rtd-theme==1.0.1
stsci.image==2.3.5
stsci.imagestats==1.8.1
stsci.stimage==0.2.6
synphot==1.3.post0
tabulate==0.9.0
textual==0.56.4
tifffile==2024.2.12
toml==0.10.2
tomli==2.0.1
tornado==6.4
tox==4.14.2
traitlets==5.14.2
tweakwcs==0.8.6
typing_extensions==4.11.0
uc-micro-py==1.0.3
urllib3==2.2.1
virtualenv==20.25.1
wcwidth==0.2.13
wiimatch==0.3.2
zipp==3.18.1

@TheSkyentist
Copy link
Author

This is interesting, I see the same behavior! It appears the leak is indeed in OLS_C. I'm happy to close this issue and move discussion to STCAL. Thanks for the investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants