Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the tester handling when a cluster is upgraded #2130

Conversation

johscheuer
Copy link
Member

Description

A cluster that contains tester processes can block the upgrade. The changes in this PR will change this and ignore tester processes in the pending upgrade check as those processes are reporting to the cluster but cannot be restarted with fdbcli.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Discussion

I added a new e2e test for this setup.

Testing

Ran the test manually.

Documentation

Follow-up

@johscheuer johscheuer added the bug Something isn't working label Sep 11, 2024
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 4cf542d
  • Duration 2:59:27
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 8659e33
  • Duration 2:58:22
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 2:56:55
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this Sep 12, 2024
@johscheuer johscheuer reopened this Sep 12, 2024
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 3:02:46
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this Sep 12, 2024
@johscheuer johscheuer reopened this Sep 12, 2024
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 3:17:57
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this Sep 12, 2024
@johscheuer johscheuer reopened this Sep 12, 2024
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 3:02:09
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this Sep 13, 2024
@johscheuer johscheuer reopened this Sep 13, 2024
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 2:43:53
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer
Copy link
Member Author

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 2:43:53
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
Summarizing 1 Failure:
  [FAIL] Operator HA Upgrades when no remote storage processes are restarted [It] Upgrade from 7.1.63 to 7.3.43 [e2e, pr]
  /codebuild/output/src714216153/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/ha_fdb_cluster.go:314

Ran 8 of 10 Specs in 5763.040 seconds
FAIL! -- 7 Passed | 1 Failed | 2 Pending | 0 Skipped
--- FAIL: TestOperatorHaUpgrade (5851.08s)
FAIL
FAIL	github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_ha_upgrades	5851.102s
FAIL

That's another test that failed. I'll spend some time next week looking into the test stability.

Copy link
Contributor

@nicmorales9 nicmorales9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usual nits, lgtm!

controllers/bounce_processes_test.go Show resolved Hide resolved
controllers/bounce_processes_test.go Show resolved Hide resolved
@johscheuer johscheuer merged commit c8e5665 into FoundationDB:main Sep 17, 2024
35 of 36 checks passed
@johscheuer johscheuer deleted the improve-tester-process-handling-upgrades branch September 17, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants