Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Investigate another way to map functions and objects from the pipeline. #13

Open
jcfaracco opened this issue Jun 28, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@jcfaracco
Copy link
Collaborator

Describe the bug
Apparently, there is a high probability of collision by using a hash function from python.

To Reproduce
This is hard to reproduce, but it happens.

Expected behavior
If you are passing to the pipeline two objects that came from the same class but they are different instances, they should be threatened as separated stanzas.

Additional context
One possible solution is to use uuid module and functions instead.

@jcfaracco jcfaracco self-assigned this Jun 28, 2023
@otavioon
Copy link
Contributor

Could you please add the file location where hash is calculated? It would be here:

key = hash(obj)
?

@otavioon
Copy link
Contributor

otavioon commented Jun 28, 2023

uuid is a good alternative, but we will lose the determinism. The same object instance will have two different ids. Maybe we can look at dask's tokenize function, the function used for hashing and creating tasks names, and implement something similar...?

@jcfaracco
Copy link
Collaborator Author

I merged the last changes and it generated a new task for each hierarchy. This needs to be rebased if we are planning to support operators with uuids.

@jcfaracco jcfaracco added the bug Something isn't working label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants