Cannot use Nostrum.Api functions in a multi-host configuration due to the request Ratelimiter not running #620

jonklein · 2024-08-09T13:28:53Z

SCENARIO:
Nostrum is running in a multi-node environment, with the Nostrum.Application & consumer processes running on just one node and serving bot requests, as described in the documentation. We'd like the other nodes in the cluster to be able to use Nostrum.Api functions needed to perform web requests, background jobs, etc. Currently, these calls fail because the Ratelimiter instance is not running on the other nodes.

I do have a workaround, but it's probably not the right long-term solution. I'm happy to submit a fix, but want to create this issue to figure out the correct approach.

SOLUTIONS TRIED:

Start just the Ratelimiter on the non-main nodes: this fails when a failover occurs – the Nostrum application fails to restart on the new host due to the Ratelimiter already running.
Run the Nostrum application on multiple nodes, but run the main consumer on only a single node: this almost works, but results in duplicate events, because every node's connection dispatches to the consumer, which is globally registered with :pg in ConsumerGroup.

CURRENT WORKAROUND:

I'm running the Nostrum application on multiple nodes as described above, but have forked the library and updated the ConsumerGroup to use :pg.get_local_members instead of :pg.get_members. Even though running the connection on multiple hosts is not the right approach in general, I do feel like get_local_members is more correct here - based on the current multi-node support, I'm not sure what the justification would be for dispatching to consumers on other nodes.

OTHER POSSIBLE SOLUTIONS:

Make it possible to run Nostrom.Api.Ratelimiter independent of Nostrum.Application: if Nostrum.Application allowed an option to not start the Ratelimiter, applications could optionally start it themselves separately on every node.
Register the Ratelimiter as a global process (some tradeoffs are discussed in the Ratelimiter docs)

Any thoughts on the preferred approach, especially taking into consideration possible future distributed multi-node consumer support?

The text was updated successfully, but these errors were encountered:

Th3-M4jor · 2024-08-09T20:09:43Z

For the 1.0.0 release, which should be the next non-patch release, we do have plans to make it so that users have to start Nostrum as part of their Supervision tree instead of leveraging the Application path. As part of that we could make it possible to configure how the ratelimiter module is named.

Also have you tried what's suggested in our multi-node guide?

jchristgit · 2024-08-10T10:03:25Z

Thanks for the extensive bug report. About the consumer group: For proper multi-node support here, I believe we would need to distribute the consumers such that you have only a single "primary" consumer for any relevant shard running - so `pg:get_members` would still work including awaiting events, just it wouldn't route the events to duplicate "normal" consumers. I think that given the complexity of this topic with all the different distribution strategies it might be best to allow users to just hook out of automatic management of consumers and allow them to start their own, documenting it appropriately to showcase how to do this over multiple nodes. About the ratelimiter: I thought about this a while ago, and I believe the best way to solve it would be to run a ratelimiter on each node, and then determine the correct ratelimiter to use via `erlang:phash2` of the ratelimit bucket. Currently we have the `get_endpoint/2` method in the ratelimiter which is already used to figure out the correct ratelimiter bucket to run. Instead of obtaining that (only) in the ratelimiter itself, the top-level `request` function should obtain the bucket for a request on its own, figure out which ratelimiters are there in the cluster, and then route it there accordingly. The alternative, of course, would be to allow the user to submit their own way to handle this. I think that for the standard usecase that you describe this should be sufficient though. I will try to make a patch for the ratelimiter phash approach described above together with documentation amendments this weekend, I will get in touch with the other maintainers regarding the best approach for the consumers.

As described in #620, currently the ratelimiter is not usable across multiple nodes. Lay the groundwork for doing this by registering ratelimiters in a process group and selecting the matching ratelimiter based on the request route. Note that some minor further adjustments need to be made to make the ratelimiter fully functional over multiple nodes, primarily related to process registration, but some further documentation is also missing and will be amended once manual consumer startup is prepared.

jchristgit mentioned this issue Aug 10, 2024

Distribute ratelimiter requests across the cluster by route #621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use Nostrum.Api functions in a multi-host configuration due to the request Ratelimiter not running #620

Cannot use Nostrum.Api functions in a multi-host configuration due to the request Ratelimiter not running #620

jonklein commented Aug 9, 2024

Th3-M4jor commented Aug 9, 2024

jchristgit commented Aug 10, 2024 via email

Cannot use Nostrum.Api functions in a multi-host configuration due to the request Ratelimiter not running #620

Cannot use Nostrum.Api functions in a multi-host configuration due to the request Ratelimiter not running #620

Comments

jonklein commented Aug 9, 2024

Th3-M4jor commented Aug 9, 2024

jchristgit commented Aug 10, 2024 via email