Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomkinesispartitionkey #88

Open
wants to merge 4 commits into
base: development
Choose a base branch
from

Conversation

burleyb
Copy link
Contributor

@burleyb burleyb commented Oct 16, 2019

Use random partition key to allow for scaling of sharding.

Copy link
Contributor

@Handyman Handyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing a random hash, this should use the queue name.

@Handyman Handyman changed the base branch from master to development October 17, 2019 21:15
@burleyb
Copy link
Contributor Author

burleyb commented Oct 21, 2019

What happens when you get throttling on one queue then?

@zirkerc
Copy link
Contributor

zirkerc commented Oct 21, 2019

The reason to do it based on queue is that it will maintain the event order. With random shards there is no guarantee the events will be kept in order, but by pushing all events of the same type to the same shard they will be processed in order.

Also if one shard get ahead of the other events for a queue my be created after the downstream bot has already moved passed that eid

@mrn3
Copy link
Contributor

mrn3 commented Oct 21, 2019

From PR 87 convo:

This line is an issue when allowing "random" for opts.partitionHashKey, it will set the ExplicitHashKey to be the value random

@Handyman @mrn3 @burleyb
Using a random partitions is problematic and will not guarantee that events will be received in the correct order. To allow scaling and maintain ordering we need to set PartitionKey field to the target queue and set ExplicitHashKey to undefined

@burleyb
Copy link
Contributor Author

burleyb commented Oct 22, 2019

This makes sense on the ordering of messages. I witnessed it in my testing, before I saw your comments, but didn't really realize that is what was happening.

I do have a use case though where there's a potential to hit a throttling limit on one queue with this methodology. This particular one wouldn't care about the order of the messages afterwards, so the ability to scale it by just adding more shards and not changing anything else would be nice.

What if we just leave the default to be as it is with everything going to partitionKey = 0. Then put it back in the hands of the developer of how they want to set it. Basically just pass through opts the way you call it with the AWS API. Something like this:

kinesis.putRecords({ Records: records.map((r) => { let hashkey = 0 if(opts.partitionKey) { hashkey = opts.partitionKey } let ret = { Data: r, PartitionKey: hashkey.toString() } if(opts.explicitHashKey) { ret.ExplicitHashKey = opts.explicitHashKey.toString() } return ret }), StreamName: configure.stream }, function(err, data) {

And call it like this:

let stream = leo.load(botID, eventName, { partitionKey: eventName })

@mrn3
Copy link
Contributor

mrn3 commented Oct 26, 2019

I think what @burleyb proposed makes sense where you can decide what you want the partitionKey to be. In most cases it should probably be the eventName, but it gives the flexibility to use whatever you need as needs arise. Is anyone opposed to this approach?

@zirkerc
Copy link
Contributor

zirkerc commented Oct 26, 2019

That looks better. The only thing I would change is let hashkey = r.event so that by default it partitions by queue. With one shard it won't matter but then nothing needs to change if you increase shards

@burleyb
Copy link
Contributor Author

burleyb commented Oct 30, 2019

I was going to set the default to r.event but found that it was a buffer at that point in execution and didn't want to introduce any overhead to the operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants