Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoized_path_copy helper to complement @memoize_path #57

Open
yarikoptic opened this issue Dec 7, 2021 · 5 comments
Open

memoized_path_copy helper to complement @memoize_path #57

yarikoptic opened this issue Dec 7, 2021 · 5 comments
Assignees
Labels

Comments

@yarikoptic
Copy link
Member

In the light of dandi/dandi-cli#848 discussion to allow for more efficient caching of digests, I wondered if it would be feasible to provide something like memoized_path_copy which would copy all (?) memoized invocations for a specific decorated function as they were invoked for another "new" path.

ATM, looking at the code, and since we rely on joblib memoization and otherwise do not track what specific parametrizations of the function were used, I really do not see how we could even do that. But may be you @jwodder see some way to provide such functionality?

@jwodder
Copy link
Member

jwodder commented Dec 7, 2021

@yarikoptic I'm not entirely clear on the behavior you're describing. Do you mean that a thus-decorated function should detect copies (How?) and memoize them as though they were the original path?

@yarikoptic
Copy link
Member Author

I was thinking about something like if we have

@cache.memoize_path
def decorated_func(path, ...):
   .... whatever ...

and decided to copy file from src_path to dest_path (well, could also be "move" instead of "copy"), we could do

copy(src_path, dest_path)
cache.memoized_path_copy(decorated_func, src_path, dest_path)

which would then copy all memoized/cached invocations for the decorated_func for the src_path so they would also be known for dest_path

@jwodder
Copy link
Member

jwodder commented Dec 7, 2021

@yarikoptic This might be possible depending on the underlying functionalities of joblib; I've brought this possibility up in a related issue there.

@yarikoptic
Copy link
Member Author

not sure if we would see desired development in joblib done/accepted in the nearest future... may be only if we send a PR for some alternative (probably based on FileSystemStoreBackend) backend which would provide desired interfaces/functionality.
Meanwhile tried already existing interface to get information about all entries in the cache:

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ time python3 -c 'from dandi.support.digests import checksums; c = checksums._memory.store_backend.get_items(); print(len(c)); print(c[0]);'
55341
CacheItemInfo(path='/home/dandi/.cache/fscacher/dandi-checksums/joblib/dandi/support/digests/get_dandietag/75ce6b526d6e61faac02b4164ac645c5', size=641, last_access=datetime.datetime(2021, 6, 29, 21, 55, 31, 925287))

real    0m3.325s
user    0m2.399s
sys     0m1.141s

and that was a "warm" run, original one was probably twice longer. But it is on drogon which "saw too much" (over 50k entries) and for a typical user, and probably having mv not that common -- this should be ok. So we can identify cache entries associated with a path easily and through an existing interface. The question would be either it would be possible to copy them into a new entry (with adjusted path and last_access)?

@jwodder
Copy link
Member

jwodder commented Jan 7, 2022

@yarikoptic Copying modified entries depends on too many implementation details of joblib.Memory which, at best, are managed via functions with no public documentation whose names start with underscores. If we want to be able to do this reliably, we need cooperation with joblib; see joblib/joblib#1237 or start a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants