-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leaks in Detector1Pipeline #8668
Comments
Thanks for the detailed issue. The plots, examples and description are very helpful. If possible, would you try your test with a candidate change for the
We are currently testing the changes in the PR and are hopeful it will address some of the memory issues. However I think (given the example you provided) that there may be more than one. |
Unfortunately the upgrades to stpipe do not solve the memory leak issue. When pulling However, I have a hint as to where this problem was introduced. I installed |
Thanks for doing the extra tests. Would you share:
|
I'm happy to! All files I am processing are listed in #8667 broken down by JWST instrument/mode, and each plot also includes the input file in the title. Here is the full pip freeze. I am on Python 3.12.4
|
Thanks again for the extra testing and information. Would you also post the If you're willing to run another test, would you try the current main versions of:
and the latest released version of stcal with the ramp fitting algorithm set to I'm working on replicating your test setup. Thank you again for all your work debugging this. |
I think that's the remaining issue. Here is the memory usage of 5 runs of detector 1 with jw04681444001_02201_00002_nis (the NIRISS-IMAGE) file. Aside from reducing the number of runs (for my impatience) I set I would be greatly appreciative if you would still try the above change with your setup (to make sure I'm not on a goose chase) :) My pip freeze is the following. I mostly reused an existing environment so this warrants some scrutiny. Click to expand
|
This is interesting, I see the same behavior! It appears the leak is indeed in OLS_C. I'm happy to close this issue and move discussion to STCAL. Thanks for the investigation! |
Summary:
Repeated runs of the Detector 1 Pipeline see a gradual increase in memory footprint. These appear to be related to the
PipelineStep
clone method. The leak corresponds to approximately 5x the size of the input data file per step of the pipeline that is run.Context:
When testing #8667, I attempted to run for loops over the Detector1Pipeline to see the memory usage variance. Instead, I saw a consistent rise in the memory footprint over time:
NIRISS-WFSS_long.pdf
Discussing with @jdavies-st leads us to believe that this is likely a memory leak that is approximately 5x the size of the input data file per step of the pipeline.
Here is a .zip file containing all of the plots: long.zip
Possible Cause:
I have also attempted to profile this with https://github.com/bloomberg/memray?tab=readme-ov-file but I am a lot less familiar with this software, so my interpretation is likely less informed than an expert. However, when filtering specifically for leaks, it shows that the objects that are never de-referenced are produced by the
stdatamodel
DataModel
clone
method. I am unsure if this is therefore an issue with thestdatamodel
package or its implementation in thejwst
package.Additional Thoughts:
It is unclear to me if this issue is related to #8667 or if they are solved by two different fixes, but they may both be due to parts of the pipeline sticking around for longer than they are needed, perhaps as evidenced by the fact that the memory footprint of the Detector1Pipeline grows over time, regardless of the type of step being run. I understand that it is important to copy the input model so as not to corrupt the original data in case of a failed run, but I wonder if there are strategies to mitigate this slightly.
Code:
The text was updated successfully, but these errors were encountered: