You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've tried with 127.0.0.1, the pod id, hardcoding the basename, making it from the environment, using the release.name from elixir. Nothing has worked or has given me any other output other than "unable to connect". The K8_POD_IP is retrieved from the kubernetes environment status.podIP, it's valid. I've connected to the box and verified every single environment variable is correct. I've verified every single sourced environment works as expected.
Description of issue
All my pod manages to echo out is that it cannot connect and that's it. It's effectively a dead pod which start no other processes or services and never pings or has another connection error/warning. I have zero idea what it's doing or why it's failing. This is a built OTP release from a regular phoenix/elixir application.
04:42:21.386 [warn] Description: 'Authenticity is not established by certificate path validation'
Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
04:42:28.520 [warn] [libcluster:myapp] unable to connect to :"[email protected]"
04:42:35.521 [warn] [libcluster:myapp] unable to connect to :"[email protected]"
I'm at a complete loss as to how to debug this since there is zero output when I do anything. The IPs differ between the logs and the endpoint due to when I ran the commands, no they are not incorrect, no that is not the issue. If I kill the pod and it restarts it gives me the IPs within the endpoint list plus the IP of the pod which I just terminated.
Edit:
I've done about 40-50 deployments trying to get some combination of these fields to work and the only thing I've managed to outside of no logs is when I use the vm.args.eex with the following.
#vm.args.eex
-name ${BASENAME_GROUP}@${K8_POD_ID}
This gives me a nice constant stream of logs of the following, endlessly flowing.
05:59:59.917 [warn] [libcluster:myapp] unable to connect to :"[email protected]"
05:59:59.917 [error] ** System NOT running to use fully qualified hostnames **
** Hostname 10.7.1.13 is illegal **
So I went on to find this post. So I switch everything to use this config instead.
Why is there such a horrific lack of documentation on this process or even a basic example that actually works? Why are the configurations both mixed in code blocks as well as paragraphs? I'm having to find my solution it github issues rather than in the docs.
Surprise surprise the docs are wrong, you end up with a KeyError for namespace, which is included in the comment but not in the docs. More specifically 'Elixir.KeyError',key => namespace
I'm not sure if this is just an overlooked element but I've never had a more frustrating experience trying to get something to work than having to deal with this package. It seems to be such a common issue judging by the sheer volume of solutions I've found within github issues across a variety of projects. So I hope maybe by writing out it'll get picked up in a search and save someone the hours of increased blood pressure I've been through.
Unsurprisingly after adding the namespace I get back to my original issue where it logs twice saying it cannot connect and it is completely dead and stops logging. I'm going to stop trying to debug this for now. Hopefully someone, anyone, knows what I'm doing wrong.
06:49:14.669 [warn] [libcluster:myapp] unable to connect to :"[email protected]"
06:49:14.704 [warn] [libcluster:myapp] unable to connect to :"[email protected]"
Edit2: After more time has passed I'm and deleting the deployments and just overall resetting every single component. Magically, all of a sudden the issue has gone away.
The text was updated successfully, but these errors were encountered:
Sieabah
changed the title
Unable to get libcluster to connect over Kubernetes
Unable to get libcluster to connect over Kubernetes :pods :ip
Aug 13, 2021
Steps to reproduce
libcluster ~3.3
It uses a shared service specifically to group the pods and make them searchable
Strategy Used:
Elixir.Cluster.Strategy.Kubernetes
Errors/Incorrect Behaviour Encountered "unable to connect" with no logging, literally nothing else is done.
The endpoints requested properly show up when I run
k get endpoints
I've tried with 127.0.0.1, the pod id, hardcoding the basename, making it from the environment, using the release.name from elixir. Nothing has worked or has given me any other output other than "unable to connect". The K8_POD_IP is retrieved from the kubernetes environment status.podIP, it's valid. I've connected to the box and verified every single environment variable is correct. I've verified every single sourced environment works as expected.
Description of issue
All my pod manages to echo out is that it cannot connect and that's it. It's effectively a dead pod which start no other processes or services and never pings or has another connection error/warning. I have zero idea what it's doing or why it's failing. This is a built OTP release from a regular phoenix/elixir application.
I'm at a complete loss as to how to debug this since there is zero output when I do anything. The IPs differ between the logs and the endpoint due to when I ran the commands, no they are not incorrect, no that is not the issue. If I kill the pod and it restarts it gives me the IPs within the endpoint list plus the IP of the pod which I just terminated.
Edit:
I've done about 40-50 deployments trying to get some combination of these fields to work and the only thing I've managed to outside of no logs is when I use the vm.args.eex with the following.
This gives me a nice constant stream of logs of the following, endlessly flowing.
So I went on to find this post. So I switch everything to use this config instead.
Why is there such a horrific lack of documentation on this process or even a basic example that actually works? Why are the configurations both mixed in code blocks as well as paragraphs? I'm having to find my solution it github issues rather than in the docs.
Surprise surprise the docs are wrong, you end up with a KeyError for namespace, which is included in the comment but not in the docs. More specifically
'Elixir.KeyError',key => namespace
I'm not sure if this is just an overlooked element but I've never had a more frustrating experience trying to get something to work than having to deal with this package. It seems to be such a common issue judging by the sheer volume of solutions I've found within github issues across a variety of projects. So I hope maybe by writing out it'll get picked up in a search and save someone the hours of increased blood pressure I've been through.
Unsurprisingly after adding the namespace I get back to my original issue where it logs twice saying it cannot connect and it is completely dead and stops logging. I'm going to stop trying to debug this for now. Hopefully someone, anyone, knows what I'm doing wrong.
Edit2: After more time has passed I'm and deleting the deployments and just overall resetting every single component. Magically, all of a sudden the issue has gone away.
The text was updated successfully, but these errors were encountered: