Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup Probe kills "/bin/opm serve" process and prevents operatorhubio pod to start #3269

Open
fjammes opened this issue May 21, 2024 · 1 comment

Comments

@fjammes
Copy link

fjammes commented May 21, 2024

Type of question

General context and help around the operator-sdk

Question

What did you do?

Install operator-sdk v0.28.0

What did you expect to see?

Operator startup

What did you see instead? Under which circumstances?

Operatorhubio pod does not start:

runner@arc-runners-x2src-runner-mxhq2:~$ kubectl get pods -A | grep operatorhubio
olm                  operatorhubio-catalog-gqxnw                  0/1     CrashLoopBackOff   15 (4m6s ago)   55m
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl describe pods -n olm operatorhubio-catalog-gqxnw | tail -n 5
  Normal   Pulled     52m                    kubelet            Successfully pulled image "quay.io/operatorhubio/catalog:latest" in 16.469534578s
  Normal   Created    52m (x2 over 54m)      kubelet            Created container registry-server
  Normal   Started    52m (x2 over 54m)      kubelet            Started container registry-server
  Warning  Unhealthy  5m47s (x150 over 54m)  kubelet            Startup probe failed: timeout: failed to connect service ":50051" within 1s
  Warning  BackOff    42s (x132 over 42m)    kubelet            Back-off restarting failed container
runner@arc-runners-x2src-runner-mxhq2:~$ kubectl logs  -n olm operatorhubio-catalog-gqxnw
time="2024-05-21T10:21:02Z" level=info msg="starting pprof endpoint" address="localhost:6060"
time="2024-05-21T10:21:02Z" level=info msg="found existing cache contents" backend=pogreb.v1 cache=/tmp/cache configs=/configs

Process seems to freeze for 2/3 minutes at the step logged above.

Environment

  • operator-lifecycle-manager version: v0.28.0

  • Kubernetes version information:

kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-09-01T23:30:43Z", GoVersion:"go1.19", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
  • Kubernetes cluster kind:

ARC and kind based:

kind version
kind v0.15.0 go1.19 linux/amd64

Additional context

The command /bin/opm serve /configs --cache-dir=/tmp/cache takes ~2/3 minutes to start in this container and this trigger the startupProbe. This occurs only on one of our infrastructure. Is there a way to increase the probe duration or to debug what's happening in opm process?

@fjammes fjammes changed the title Startup Probe kill "/bin/opm serve" process and prevent operatorhubio pod to start Startup Probe kills "/bin/opm serve" process and prevents operatorhubio pod to start May 21, 2024
@jkranner
Copy link

I am also facing this issue.
Pod: operatorhubio-catalog-ql6bs
Startup probe failed: timeout: failed to connect service ":50051" within 1s
Then keeps crash-looping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants