-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverse proxy startup health check behavior results in 503 errors #6410
Comments
(As clarified in Slack, this is about active Health checks.) My vote is currently for no. 1. 2 - Is a NO from me because blocking provisioning can make config reloads slow, and we strive to keep them fast and lightweight. 3 - is a NO from me because if the proxy is started before the backends, we can't assume that backends are healthy right away. IMO, active health checks should assume unhealthy unless proven otherwise by a passing health check (compared to passive health checks, which assume healthy until proved otherwise). Number 1 is nice because it allows the server/config to start quickly, and the requests don't have to fail (even if they are delayed briefly). We also don't have bad status information. I imagine health checks -- especially passing ones -- happen very quickly, so the blocking will be instantaneous, less than 1/4 second probably. |
Note that I'm okay with 1 as long as the current behavior remains where a health check is immediately fired and the block is near instantaneous. I believe other loadbalancers like Nginx (paid) assumes that all listed upstreams are healthy after a reload/restart and doesn't take them out of the mix until the health checks fail. |
basically correct. during investigation i found that nginx plus and traefik set the initial state of the backend when no health checks have been made to them as healthy. However, they do preserve history across restarts to the same hosts (as does caddy, i believe) |
I didn't think about what other servers do when we implemented health checks, but this is surprising to me... it feels wrong for active health checks to assume a healthy backend without checking first. Marking them as healthy when you don't actually know seems... misleading?
Caddy preserves the health status across reloads but if the process quits then the memory is cleared. We don't persist it to storage as of yet. |
i think the argument can be made that marking them unhealthy is equally misleading. the remote is in superposition, since it has not been obvserved, it's a third distinct state that is currently handled as the unhealthy case. it seems existing implementations tip the scale slightly in favor the healthy superposition, my guess is it is order to have a faster time to first response. |
Currently, a remote is marked unhealthy if no active health checks to the remote have been done.
this causes the reverse proxy to return 503 before a health check is completed, even if the remote is truly healthy, in the time between load competing and the first health check.
there are a few solutions to the problem, but we have not decided which is correct.
The text was updated successfully, but these errors were encountered: