Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haproxy ingress 1.11.2 reloads when number of backend pods scales #646

Open
nosmicek opened this issue Apr 18, 2024 · 10 comments
Open

haproxy ingress 1.11.2 reloads when number of backend pods scales #646

nosmicek opened this issue Apr 18, 2024 · 10 comments
Assignees
Labels

Comments

@nosmicek
Copy link

nosmicek commented Apr 18, 2024

We are experiencing issue with haproxy ingress controller v1.11.2 that reloads every time when our kubernetes service is scaled up or down, we see this haproxy ingress controller logs every time:

2024/04/18 06:43:59 INFO    instance/configuration.go:21 [transactionID=f698d432-6bcc-41d6-8bc3-5a00d487c5ec] reload required : creation/modification of prometheus endpoint
2024/04/18 07:01:14 INFO    instance/configuration.go:21 [transactionID=4198c61d-ba12-4ecf-bc6a-507567f0be04] reload required : creation/modification of prometheus endpoint
2024/04/18 07:17:15 INFO    instance/configuration.go:21 [transactionID=a091cf59-b6ff-4bc2-985f-fd14b9acaa6e] reload required : creation/modification of prometheus endpoint
2024/04/18 07:51:05 INFO    instance/configuration.go:21 [transactionID=6e918e42-4159-4629-8742-5cf036a6b9f6] reload required : creation/modification of prometheus endpoint
2024/04/18 08:24:06 INFO    instance/configuration.go:21 [transactionID=89ffca87-b7bb-437c-874d-74e90ba5ab5b] reload required : creation/modification of prometheus endpoint
2024/04/18 09:23:47 INFO    instance/configuration.go:21 [transactionID=1ab46c13-7393-48f5-a698-681d97fde75e] reload required : creation/modification of prometheus endpoint
2024/04/18 09:41:17 INFO    instance/configuration.go:21 [transactionID=687e9767-19ad-4e3d-a062-87bdaef9d003] reload required : creation/modification of prometheus endpoint
2024/04/18 09:41:22 INFO    instance/configuration.go:21 [transactionID=93f5b770-30c2-4feb-b34f-348ecde90e32] reload required : creation/modification of prometheus endpoint
2024/04/18 09:41:32 INFO    instance/configuration.go:21 [transactionID=88a874d6-7673-4ab3-821e-f2c87d20114e] reload required : creation/modification of prometheus endpoint
2024/04/18 09:41:57 INFO    instance/configuration.go:21 [transactionID=8f4dbcb5-8796-4cf2-b405-632caffdd989] reload required : creation/modification of prometheus endpoint
2024/04/18 09:51:53 INFO    instance/configuration.go:21 [transactionID=9ff3b889-e321-42ad-88b2-649917e8598a] reload required : creation/modification of prometheus endpoint
2024/04/18 10:01:48 INFO    instance/configuration.go:21 [transactionID=51ed4529-04e4-46b8-aad1-5aa309c2ec61] reload required : creation/modification of prometheus endpoint
2024/04/18 10:15:58 INFO    instance/configuration.go:21 [transactionID=147e6e1a-a36b-4aaa-890f-888b383a1157] reload required : creation/modification of prometheus endpoint
2024/04/18 10:16:03 INFO    instance/configuration.go:21 [transactionID=347d53fb-5dd0-4d07-8c49-36a53398a365] reload required : creation/modification of prometheus endpoint
2024/04/18 10:16:18 INFO    instance/configuration.go:21 [transactionID=aba1b42c-4e32-4bda-bfb8-6e138705d5b6] reload required : creation/modification of prometheus endpoint
2024/04/18 10:16:33 INFO    instance/configuration.go:21 [transactionID=2f25e08b-5d6f-4cf1-9404-098bcd8c7992] reload required : creation/modification of prometheus endpoint
2024/04/18 10:16:38 INFO    instance/configuration.go:21 [transactionID=b9163701-7225-4a71-8066-42c9af94bbee] reload required : creation/modification of prometheus endpoint
2024/04/18 10:17:08 INFO    instance/configuration.go:21 [transactionID=e300e03b-d496-4c94-ae60-5f1ec85ab3e7] reload required : creation/modification of prometheus endpoint
2024/04/18 10:27:03 INFO    instance/configuration.go:21 [transactionID=3d18e641-35bf-4335-8070-c70b02613111] reload required : creation/modification of prometheus endpoint
2024/04/18 10:27:28 INFO    instance/configuration.go:21 [transactionID=affbb16c-d752-44c7-bbde-9bfa58e95cde] reload required : creation/modification of prometheus endpoint
2024/04/18 10:37:23 INFO    instance/configuration.go:21 [transactionID=00a3d9d4-97e3-4093-b4c6-cf6256c26870] reload required : creation/modification of prometheus endpoint
2024/04/18 10:58:39 INFO    instance/configuration.go:21 [transactionID=698836dd-5bb9-4cee-9d7c-37bcfda29b4d] reload required : creation/modification of prometheus endpoint

number of backend server slots is not changed, just one is enabled or disabled, but it always triggers reload.

Similar problem was solved in #638 and #634 so maybe this was missed.

We don't run the controller with --prometheus CLI flag, just scrape metrics via ServiceMonitor bound on stats port, ie. it goes through frontend where the http-request use-service prometheus-exporter if { path /metrics }

@nosmicek
Copy link
Author

Solved by downgrading helm chart from 1.39.1 to 1.37.0 (haproxy ingress controller 1.11.2->1.10.11) with exactly the same configuration.
After downgrade we don't see any exhibit of this erroneous behaviour, clearly this is some new problem introduced in 1.11 version of the controller. With it, its practically unusable in dynamic environments where pods scale up/down often.

@ivanmatmati ivanmatmati self-assigned this Apr 22, 2024
@ivanmatmati
Copy link
Collaborator

Hi @nosmicek , Thanks for reporting, I'll check what happened.

@nosmicek
Copy link
Author

nosmicek commented Apr 23, 2024

Hi, thanks, if you need some info about our setup or configuration etc. don't hesitate to ask, I have some limited time dedicated to solving this.

Also, another issue that I've noticed when working on it was, that when I configured hard-stop-after and close-spread-time to 300000 (5minutes), the haproxy ingress controller ended up periodically reloading every 10minutes on 1.11.2 version regardless of our service scaling (was constant for the whole period of test). Again due to this prometheus endpoint. We tried this as a workaround to reloads, but it ended up in much worse scenario.

@ivanmatmati
Copy link
Collaborator

Hi @nosmicek , can you try with the nightly build ? There's a change that could solve the issue. To use it just replace the tag in your yaml. Switch from latest to nightly.

@nosmicek
Copy link
Author

Hi @ivanmatmati, with the latest, the issue is still present. I tried it today.

@ivanmatmati
Copy link
Collaborator

Latest or nightly ?

@nosmicek
Copy link
Author

nosmicek commented May 21, 2024

I tried now with nightly and the problem is even worse, I've got lots of reloads due to prometheus even when not scaling our backend service pods. It's really hard to distinguish whether it is caused by scaling or not, but it seems that it happens when scaling.

@ivanmatmati
Copy link
Collaborator

I don't see these reloads due to prometheus. Can you check the commit you're on ? On my side it's c3cd22c.

@TubbyStubby
Copy link

The same happened with me and I was on chart version 1.39.4. It got resolved after downgrading to 1.37.0 as suggested by @nosmicek
this was happening whenever pods were being added or removed by autoscaler. I got these logs. Also noticed that the status was showing nocheck in stats for all backends which got fixed after downgrade without any config change.

[NOTICE]   (153) : haproxy version is 2.8.9-1842fd0
[WARNING]  (153) : Former worker (8354) exited with code 0 (Exit)
[WARNING]  (8375) : Server ada_switch-service_wshttp/SRV_1 is UP/READY (leaving forced maintenance).
[WARNING]  (8392) : Server ada_switch-service_wshttp/SRV_1 is UP/READY (leaving forced maintenance).
[WARNING]  (153) : Former worker (8350) exited with code 0 (Exit)
[WARNING]  (8392) : Server ada_switch-service_wshttp/SRV_2 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8375) : Server ada_switch-service_wshttp/SRV_2 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024/05/30 07:08:03 INFO    service/service.go:157 [transactionID=5e49ea04-738f-4723-9ead-08c52417cf12] reload required : Service 'ada/switch-service': backend 'ada_switch-service_wshttp' updated: map[HTTPConnectionMode:[ http-server-close]]
2024/05/30 07:08:04 INFO    controller/controller.go:196 [transactionID=5e49ea04-738f-4723-9ead-08c52417cf12] HAProxy reloaded
[NOTICE]   (153) : Reloading HAProxy
[WARNING]  (8392) : Proxy healthz stopped (cumulated conns: FE: 86, BE: 0).
[WARNING]  (8392) : Proxy http stopped (cumulated conns: FE: 5, BE: 0).
[WARNING]  (8392) : Proxy https stopped (cumulated conns: FE: 233, BE: 0).
[WARNING]  (8392) : Proxy stats stopped (cumulated conns: FE: 3, BE: 0).
[WARNING]  (8392) : Proxy ada_switch-service_wshttp stopped (cumulated conns: FE: 0, BE: 5).
[WARNING]  (8392) : Proxy haproxy-controller_default-local-service_http stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (8392) : Proxy haproxy-controller_prometheus_http stopped (cumulated conns: FE: 0, BE: 0).
[NOTICE]   (153) : New worker (8413) forked
[NOTICE]   (153) : Loading success.
[NOTICE]   (153) : haproxy version is 2.8.9-1842fd0
[WARNING]  (153) : Former worker (8392) exited with code 0 (Exit)
2024/05/30 07:08:06 INFO    service/service.go:157 [transactionID=3e1b981d-33a0-4672-89a9-aaae0888008b] reload required : Service 'ada/switch-service': backend 'ada_switch-service_wshttp' updated: map[HTTPConnectionMode:[ http-server-close]]
2024/05/30 07:08:06 INFO    controller/controller.go:196 [transactionID=3e1b981d-33a0-4672-89a9-aaae0888008b] HAProxy reloaded
[NOTICE]   (153) : Reloading HAProxy
[WARNING]  (8375) : Proxy healthz stopped (cumulated conns: FE: 38, BE: 0).
[WARNING]  (8375) : Proxy http stopped (cumulated conns: FE: 1, BE: 0).
[WARNING]  (8375) : Proxy https stopped (cumulated conns: FE: 101, BE: 0).
[WARNING]  (8375) : Proxy stats stopped (cumulated conns: FE: 2, BE: 0).
[WARNING]  (8375) : Proxy ada_switch-service_wshttp stopped (cumulated conns: FE: 0, BE: 1).
[WARNING]  (8375) : Proxy haproxy-controller_default-local-service_http stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (8375) : Proxy haproxy-controller_prometheus_http stopped (cumulated conns: FE: 0, BE: 0).
[NOTICE]   (153) : New worker (8396) forked
[NOTICE]   (153) : Loading success.

Copy link

stale bot commented Jun 29, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants