-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use Nostrum.Api functions in a multi-host configuration due to the request Ratelimiter not running #620
Comments
For the Also have you tried what's suggested in our multi-node guide? |
Thanks for the extensive bug report.
About the consumer group:
For proper multi-node support here, I believe we would need to
distribute the consumers such that you have only a single "primary"
consumer for any relevant shard running - so `pg:get_members` would
still work including awaiting events, just it wouldn't route the events
to duplicate "normal" consumers. I think that given the complexity of
this topic with all the different distribution strategies it might be
best to allow users to just hook out of automatic management of
consumers and allow them to start their own, documenting it
appropriately to showcase how to do this over multiple nodes.
About the ratelimiter:
I thought about this a while ago, and I believe the best way to solve it
would be to run a ratelimiter on each node, and then determine the
correct ratelimiter to use via `erlang:phash2` of the ratelimit bucket.
Currently we have the `get_endpoint/2` method in the ratelimiter which
is already used to figure out the correct ratelimiter bucket to run.
Instead of obtaining that (only) in the ratelimiter itself, the
top-level `request` function should obtain the bucket for a request on
its own, figure out which ratelimiters are there in the cluster, and
then route it there accordingly.
The alternative, of course, would be to allow the user to submit their
own way to handle this. I think that for the standard usecase that you
describe this should be sufficient though.
I will try to make a patch for the ratelimiter phash approach described
above together with documentation amendments this weekend, I will get in
touch with the other maintainers regarding the best approach for the
consumers.
|
jchristgit
added a commit
that referenced
this issue
Aug 10, 2024
As described in #620, currently the ratelimiter is not usable across multiple nodes. Lay the groundwork for doing this by registering ratelimiters in a process group and selecting the matching ratelimiter based on the request route. Note that some minor further adjustments need to be made to make the ratelimiter fully functional over multiple nodes, primarily related to process registration, but some further documentation is also missing and will be amended once manual consumer startup is prepared.
jchristgit
added a commit
that referenced
this issue
Aug 10, 2024
As described in #620, currently the ratelimiter is not usable across multiple nodes. Lay the groundwork for doing this by registering ratelimiters in a process group and selecting the matching ratelimiter based on the request route. Note that some minor further adjustments need to be made to make the ratelimiter fully functional over multiple nodes, primarily related to process registration, but some further documentation is also missing and will be amended once manual consumer startup is prepared.
jchristgit
added a commit
that referenced
this issue
Aug 16, 2024
As described in #620, currently the ratelimiter is not usable across multiple nodes. Lay the groundwork for doing this by registering ratelimiters in a process group and selecting the matching ratelimiter based on the request route. Note that some minor further adjustments need to be made to make the ratelimiter fully functional over multiple nodes, primarily related to process registration, but some further documentation is also missing and will be amended once manual consumer startup is prepared.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
SCENARIO:
Nostrum is running in a multi-node environment, with the Nostrum.Application & consumer processes running on just one node and serving bot requests, as described in the documentation. We'd like the other nodes in the cluster to be able to use Nostrum.Api functions needed to perform web requests, background jobs, etc. Currently, these calls fail because the Ratelimiter instance is not running on the other nodes.
I do have a workaround, but it's probably not the right long-term solution. I'm happy to submit a fix, but want to create this issue to figure out the correct approach.
SOLUTIONS TRIED:
Start just the Ratelimiter on the non-main nodes: this fails when a failover occurs – the Nostrum application fails to restart on the new host due to the Ratelimiter already running.
Run the Nostrum application on multiple nodes, but run the main consumer on only a single node: this almost works, but results in duplicate events, because every node's connection dispatches to the consumer, which is globally registered with :pg in ConsumerGroup.
CURRENT WORKAROUND:
I'm running the Nostrum application on multiple nodes as described above, but have forked the library and updated the ConsumerGroup to use
:pg.get_local_members
instead of:pg.get_members
. Even though running the connection on multiple hosts is not the right approach in general, I do feel likeget_local_members
is more correct here - based on the current multi-node support, I'm not sure what the justification would be for dispatching to consumers on other nodes.OTHER POSSIBLE SOLUTIONS:
Make it possible to run Nostrom.Api.Ratelimiter independent of Nostrum.Application: if Nostrum.Application allowed an option to not start the Ratelimiter, applications could optionally start it themselves separately on every node.
Register the Ratelimiter as a global process (some tradeoffs are discussed in the Ratelimiter docs)
Any thoughts on the preferred approach, especially taking into consideration possible future distributed multi-node consumer support?
The text was updated successfully, but these errors were encountered: