Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS Poll - max stable Cluster Size = max DNS Entry Response Count #190

Open
bmalum opened this issue Mar 22, 2023 · 1 comment
Open

DNS Poll - max stable Cluster Size = max DNS Entry Response Count #190

bmalum opened this issue Mar 22, 2023 · 1 comment

Comments

@bmalum
Copy link

bmalum commented Mar 22, 2023

Steps to reproduce

  • Configuration Used
config :libcluster,
  debug: true,
  topologies: [
    dns: [
      strategy: Cluster.Strategy.DNSPoll,
      config: [
        poll_interval: 10_000,
        query: "appname.something",
        node_basename: "some-container"
      ]
    ]
  ]
  • Strategy Used
    Cluster.Strategy.DNSPoll
  • Errors/Incorrect Behaviour Encountered
    Maximum stable Cluster Size is the number of DNS results returned.

Description of issue

  • What are the expected results?
    DNS query, I would not expect nodes to be removed if not in the DNS response. I would expect to trust the disconnect if a node times out with net_ticktime and is not actively being removed. For example, if you have 15 nodes and DNS replies with 5 random node IPs, the cluster will become unstable.

  • Is the documentation incorrect?
    Documentation does not mention that nodes will be removed when no longer in DNS. It just says:

this strategy will periodically poll DNS and connect all nodes it finds.

Should we introduce a config flag to turn off removing nodes?

@bitwalker
Copy link
Owner

I'd be open to accepting a PR that makes removing nodes in this strategy optional based on a flag, something like prune: false to disable pruning the node list. I believe there was a reason we actively prune nodes when the source of data for the strategy (e.g. DNS in this case, but could be any system providing service discovery) no longer reports a node as being part of the cluster, but I can't recall the specifics at the moment, but it was a specific choice. libcluster is largely deferring to the source registry to tell us what nodes belong in the cluster. In the case of DNS, it is unusual for a node to disappear from DNS unless it is being permanently removed, but I can imagine scenarios where this might happen, such as under k8s or some other orchestrator that uses DNS for service discovery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants