-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomkinesispartitionkey #88
base: development
Are you sure you want to change the base?
Randomkinesispartitionkey #88
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing a random hash, this should use the queue name.
What happens when you get throttling on one queue then? |
The reason to do it based on queue is that it will maintain the event order. With random shards there is no guarantee the events will be kept in order, but by pushing all events of the same type to the same shard they will be processed in order. Also if one shard get ahead of the other events for a queue my be created after the downstream bot has already moved passed that eid |
From PR 87 convo: This line is an issue when allowing "random" for opts.partitionHashKey, it will set the ExplicitHashKey to be the value random @Handyman @mrn3 @burleyb |
This makes sense on the ordering of messages. I witnessed it in my testing, before I saw your comments, but didn't really realize that is what was happening. I do have a use case though where there's a potential to hit a throttling limit on one queue with this methodology. This particular one wouldn't care about the order of the messages afterwards, so the ability to scale it by just adding more shards and not changing anything else would be nice. What if we just leave the default to be as it is with everything going to partitionKey = 0. Then put it back in the hands of the developer of how they want to set it. Basically just pass through opts the way you call it with the AWS API. Something like this:
And call it like this:
|
I think what @burleyb proposed makes sense where you can decide what you want the partitionKey to be. In most cases it should probably be the eventName, but it gives the flexibility to use whatever you need as needs arise. Is anyone opposed to this approach? |
That looks better. The only thing I would change is |
I was going to set the default to r.event but found that it was a buffer at that point in execution and didn't want to introduce any overhead to the operation. |
Use random partition key to allow for scaling of sharding.