Journal Issues on EventsByTag #370

bensavage · 2020-08-06T21:43:56Z

I've been attempting to get the EventsByTag working correctly but the data being returned is not as I would expect. I have ten transactions in my journal each containing 7 events (so 70 journal records total) in the store. When starting the read side however, I only receive 63 of these messages with odd ones being skipped (these appear to be anywhere in the sequence but is always 63 no matter how many times I remove the collection and restart the read side application. My read side code is simply:

public void runQueryStream(SharedKillSwitch killSwitch) {
    RestartSource.withBackoff(Duration.ofMillis(500), Duration.ofSeconds(20), 0.1, () ->
            Source.completionStageSource(
                    readOffset().thenApply(offset -> {
                        log.info("Starting stream for tag [{}] from offset [{}]", tag, offset);
                        return PersistenceQuery.get(Adapter.toClassic(system)).getReadJournalFor(JavaDslMongoReadJournal.class, MongoReadJournal.Identifier())
                                .eventsByTag(tag, Offset.noOffset())
                                .map(eventEnvelope -> storeRecord((T) eventEnvelope.event(), PersistenceId.ofUniqueId(eventEnvelope.persistenceId()), eventEnvelope.sequenceNr()));
                    })))
            .via(killSwitch.flow())
            .runWith(Sink.ignore(), system);
}

where storeRecord simply stores a simple document in a second collection containing the id, event and sequence number.

This is using version 3.0.3 (same on all v3). Any thoughts would be much appreciated.

The text was updated successfully, but these errors were encountered:

bensavage · 2020-08-07T16:17:36Z

On doing some further testing on this, the problem I think is that if two events for the same persistence id are stored too close together then the earlier of the events is removed. Is there some kind of process that is removing what it might consider as duplicate events based on a combination of persistence id and timestamp?

scullxbones · 2020-08-07T22:17:35Z

Hmm. I wonder if there is a bug with the batch handling on that query.

Can you look in mongo and see if the events that do not show are stored in a document containing multiple subdocuments in the events field? Can search by pid: <PROCESSOR_ID> and sn: <SEQUENCE_NUMBER>

Also can you confirm which driver you'e using?

bensavage · 2020-08-08T17:23:16Z

Thanks for coming back to me. We're using the rxmongo driver, From your commment above I've just tried the standard scala one and it seems much better, with that driver I've had to increase my test to run 175,000 events and only see 3 missing in that scope so a big improvement.

All the records just have a single sub document as far as I can tell so that looks ok.

I still suspect this is some kind of filtering based on timestamp, if i insert an artifical delay before every event persist the problem does appear to go away.

scullxbones · 2020-08-09T13:28:47Z

Can you share your duplicating test case?

Do you know if all of the events end up in the journal in the fist place? Want to confirm if this is a read or write problem.

Does CurrentEventsByTag give back 100% of events?

Also if you use the query EventsByPersistenceId for the pids you are using, do you see something similar or do you get 100% of events?

bensavage · 2020-08-12T11:42:51Z

Sorry for the couple of days delay, I got stuck doing some other things. I've put together a bit of a hacked together simplified setup here:

https://github.com/bensavage/mongo-read-test

There are tests here for both Mongo and Cassandra (both embedded and standalone). The cassandra ones pass every time but the mongo ones fail pretty much always (occasionally I've seen them pass...).

As mentioned if you slow everything down then it seems to work much better, I added the ability to set delays at the top of the test. So changing for example to 1000ms + pretty much always works.

Any help much appreciated!

scullxbones · 2020-08-18T01:22:04Z

Sorry it took a few days to get to looking at your repo. I think I see what you're being bit by - offset. I was missing that function readOffset() - do events still "disappear" when the full set is read? I think the better behavior of one driver vs. another is just a function of sample size 1, and given enough race attempts, they would even out.

The problem is that there isn't a journal-global counter, instead the _id field (which by convention is an ObjectId in mongodb) has a timestamp portion and random portion such that within the resolution of the timestamp (I want to say 1ms) a later timestamp will sort greater than the previous ms.

More info in #37, #214 and #219 with the idea for the fix in #95.

jzelayeta · 2022-01-24T19:19:41Z

has anyone found a workaround over this issue?

bensavage changed the title ~~Ben~~ Journal Issues on EventByTag Aug 7, 2020

bensavage changed the title ~~Journal Issues on EventByTag~~ Journal Issues on EventsByTag Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal Issues on EventsByTag #370

Journal Issues on EventsByTag #370

bensavage commented Aug 6, 2020

bensavage commented Aug 7, 2020

scullxbones commented Aug 7, 2020

bensavage commented Aug 8, 2020

scullxbones commented Aug 9, 2020

bensavage commented Aug 12, 2020

scullxbones commented Aug 18, 2020

jzelayeta commented Jan 24, 2022

Journal Issues on EventsByTag #370

Journal Issues on EventsByTag #370

Comments

bensavage commented Aug 6, 2020

bensavage commented Aug 7, 2020

scullxbones commented Aug 7, 2020

bensavage commented Aug 8, 2020

scullxbones commented Aug 9, 2020

bensavage commented Aug 12, 2020

scullxbones commented Aug 18, 2020

jzelayeta commented Jan 24, 2022