Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put context around model opening. #97

Closed
wants to merge 1 commit into from

Conversation

schlafly
Copy link

stpipe currently leaves some files open due to this block, which trigger ResourceWarning: unclosed file in the Roman regression tests. This PR adds a context manager around the model opening to close these files. This eliminates the ResourceWarnings in romancal.

I have not run any detailed tests against stpipe, but wanted to be able to point people to this PR, thus the draft status.

In the context of Roman I am told that opening a model with units triggers also opening all of the associated binary extensions (lazy loading isn't used there?). So pulling these parameters ends up being pretty expensive. But we need to address that on the Roman side.

Copy link
Collaborator

@stscieisenhamer stscieisenhamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module changelog and other administrivia, LGTM.

@braingram
Copy link
Collaborator

braingram commented Jul 20, 2023

In the context of Roman I am told that opening a model with units triggers also opening all of the associated binary extensions (lazy loading isn't used there?). So pulling these parameters ends up being pretty expensive. But we need to address that on the Roman side.

Sorry for any confusion. I should clarify that the blocks containing data for non-quantity/unit arrays are still lazy loaded. For example, if I make a level 2 image with:

roman_datamodels.maker_utils.mk_level2_image(filepath='foo.asdf')

Then open the file (with current roman_datamodels main) and look at the blocks I see several loaded and several unloaded:

m = rdd.open('foo.asdf')
print(m._asdf._blocks._internal_blocks)

outputs

[<Block int off: 11137 alc: 524288 size: 524288>,
 <Block int off: 535479 alc: 524288 size: 524288>,
 <Block int off: 1059821 alc: 524288 size: 524288>,
 <Block int off: 1584163 alc: 524288 size: 524288>,
 <asdf.block.UnloadedBlock at 0x1275f9840>,
 <asdf.block.UnloadedBlock at 0x1275f99c0>,
 <asdf.block.UnloadedBlock at 0x1275f9930>,
 <asdf.block.UnloadedBlock at 0x1275f98a0>,
 <Block int off: 2370865 alc: 8388608 size: 8388608>,
 <Block int off: 10759527 alc: 66846976 size: 66846976>,
 <asdf.block.UnloadedBlock at 0x1275f9630>,
 <Block int off: 144453587 alc: 66846976 size: 66846976>,
 <Block int off: 211300617 alc: 66846976 size: 66846976>,
 <Block int off: 278147647 alc: 66846976 size: 66846976>,
 <Block int off: 344994677 alc: 66846976 size: 66846976>]

I see identical output when using asdf.open directly:

af = asdf.open('foo.asdf')
print(af._blocks._internal_blocks)

Examining the file I see block 9 is for 'data' which is a quantity (and not lazy loaded) whereas block 10 is for 'dq' which is not a quantity (and is lazy loaded). Let me know what I can do to help clarify this situation. It seems to be a result of asdf-astropy forcing the creation of an array before creating the quantity and astropy converting to an array if the provided value is not a subclass of ndarray (which asdf's NDArrayType is not currently).

@schlafly
Copy link
Author

Thanks Brett. Thanks for the clarification; yes, I meant the ndarrays associated with the units. I do think we want to support lazy-loading quantities with units so that things like this metadata query can happen without reading large chunks of the file. That will be a separate ticket, though. It sounds like asdf-astropy is the relevant repository? Would you mind filing the ticket? I just don't want to forget that this is something we should address at some point.

@braingram
Copy link
Collaborator

Thanks for the reply. Happy to open an issue: astropy/asdf-astropy#190

Copy link
Collaborator

@stscieisenhamer stscieisenhamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides all the other administrivia, the change itself LGTM.

@jdavies-st
Copy link
Contributor

Would be nice to have this PR finished and merged in the next release of stpipe.

@braingram
Copy link
Collaborator

I think the issue being resolved here was fixed in
#142
specifically stpipe main has the following (using a with context) for the code modified in this PR:

stpipe/src/stpipe/step.py

Lines 844 to 852 in 824863e

with cls._datamodels_open(dataset, asn_n_members=1) as model:
if isinstance(model, Sequence):
# Pull out first model in ModelContainer
model = model[0]
crds_parameters = model.get_crds_parameters()
crds_observatory = model.crds_observatory
except (OSError, TypeError, ValueError):
logger.warning("Input dataset is not an instance of AbstractDataModel.")
disable = True

The romancal regtest run for #142
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/608/
also contains no ResourceWarnings. @schlafly would you take a look at the above PR and regtest run to see if it addressed the issuse? Thanks!

@schlafly
Copy link
Author

schlafly commented Mar 6, 2024

Wonderful, thanks Brett, happy to close this.

@schlafly schlafly closed this Mar 6, 2024
@schlafly schlafly deleted the close-model branch March 6, 2024 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants