Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Grafting can skip copying data sources #5465

Open
lutter opened this issue Jun 4, 2024 · 6 comments · May be fixed by #5464
Open

[Bug] Grafting can skip copying data sources #5465

lutter opened this issue Jun 4, 2024 · 6 comments · May be fixed by #5464
Labels
bug Something isn't working

Comments

@lutter
Copy link
Collaborator

lutter commented Jun 4, 2024

Bug report

The following sequence of steps will lead to a grafted (or copied) subgraph with missing dynamic data sources:

  1. Start a graft
  2. Interrupt it and rewind the subgraph
  3. Resume the graft

If at step (2), the graft had already copied data sources, and the rewind doesn't remove all of them, upon resuming, we will skip copying data sources because of this code

The result will be a graft that seems to be ok, but is missing dynamic data sources.

@lutter lutter added the bug Something isn't working label Jun 4, 2024
@lutter
Copy link
Collaborator Author

lutter commented Jun 4, 2024

The fix might be simple: the whole of copying private data sources runs outside of a transaction. We should just wrap all of it in a txn.

@avandras
Copy link

avandras commented Jun 5, 2024

@lutter those transactions could run for days, couldn't they?

@lutter
Copy link
Collaborator Author

lutter commented Jun 5, 2024

Copying private data sources should be very quick; at the very worst, it's a table with 100k rows, but usually much less. The actual data copying, which can take days, is already broken into txns that should take about 3 minutes

@avandras
Copy link

avandras commented Jun 5, 2024

Ah, understood, thanks!

@paymog
Copy link

paymog commented Jun 5, 2024

We've seen subgraphs up to 1.9 million data sources. Not sure if that matters at all for how this should be handled.

@lutter
Copy link
Collaborator Author

lutter commented Jun 5, 2024

True; I just looked through the code again, and it does one insert statement per row, which will be very slow for these numbers of data sources. We'll also need to improve the code to do the copying with one query, or break it up into multiple txns but that requires more bookkeeping.

But we need to first address correctness, which to me is more important than performance issues, though they can be devastating, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants