Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documents in journal and realtime collection have different _id #219

Closed
yahor-filipchyk opened this issue Feb 13, 2019 · 5 comments
Closed
Assignees

Comments

@yahor-filipchyk
Copy link

yahor-filipchyk commented Feb 13, 2019

My understanding is that _id gets assigned to the document when an insert gets executed by the mongodb client. The fact that the write to realtime collection is happening asynchronously makes me think that it is possible that some ids may end up being out of order between the batches. Besides, _id is used as an offset on the query side and when eventsByTag is used the saved offset (retrieved from the realtime collection) won't in fact exist in the journal which at least makes it weird as the ordering of ids may still make sense. I think setting _id field explicitly when serializing events makes sense as id generation is happening on the client side anyway so we get the same uniqueness and sequencing guarantees but it ensures the same documents in the two collections have the same ids.

Also wanted to point out how the write to the realtime collection is happening asynchronously with no error handling:

batchFuture.andThen { case _ => doBatchAppend(writes, realtime) }

I think at least some error logging would be useful, but it could also be a good idea if the user of the library could choose if an error to write to realtime collection should be propagated. I think failing all realtime subscriptions can be useful. Then the stream can restart and pick up missed events from the journal and continue listening to realtime events

@scullxbones
Copy link
Owner

Hi @yahor-filipchyk -

I've read and re-read this a few times. It seems like there's many different things going on - can you resummarize? Just trying to narrow this down to something actionable.

I think we can start with:

  • log errors on failure to batchAppend to realtime

@yahor-filipchyk
Copy link
Author

yahor-filipchyk commented Feb 17, 2019

Hi @scullxbones,

Sorry for shoveling all this into one issue. I think I can identify 3 distinct issues here:

  1. log errors on failure to batchAppend to realtime (just as you said)
  2. propagate errors from writes to realtime collection to all active realtime subscriptions if possible (an enhancement to 1)
  3. make an atom written to both journal and realtime collection have same _id

I've been writing this issue up with 3) in mind primarily because I think it's causing some problems (or can potentially).

To elaborate on 3) a little bit, when an atom gets serialized to BSON, _id is not assigned by the serializer which makes the mongo client create the id. This is happening asynchronously as writes get submitted to the mongo client. The result is the same atom when written to both journal and realtime collections will have different _id value. If you have a realtime listener listening to events by tag it will get the _id from the realtime collection as an offset. When the listener is restarted it is going to use the offset from the realtime collection to read events from the journal. Isn't that weird?

@scullxbones
Copy link
Owner

Ah ok yep I get it now, thanks for summarizing. Now I totally understand why the _id is a problem, I was struggling a bit with that. This just further convinces me that #214 / #95 are needed to fix in the ideal way. The _id fix would be good for consistency, and would help in the meantime for sure.

Bullet 2 will probably be a stretch, so i'd think 1 & 3 should take priority. I'm good with this ticket covering both. I'll pull bullet 2 into a separate ticket.

@scullxbones scullxbones self-assigned this Mar 2, 2019
scullxbones added a commit that referenced this issue Mar 3, 2019
* Generate IDs before sending to mongo, reuse serialized documents
* Log error if realtime write fails
* Clean up some deprecation warnings and other code warnings
scullxbones added a commit that referenced this issue Mar 4, 2019
Issue #219 - IDs should match between journal & realtime
@scullxbones
Copy link
Owner

Fixed by #222 ... will update with a release version

@scullxbones
Copy link
Owner

Released with 2.2.3

pepite pushed a commit to pepite/akka-persistence-mongo that referenced this issue Mar 27, 2019
* Generate IDs before sending to mongo, reuse serialized documents
* Log error if realtime write fails
* Clean up some deprecation warnings and other code warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants