Skip to content
This repository has been archived by the owner on Jan 18, 2024. It is now read-only.

pgbackrest_restore.sh exits despite backup being enabled #631

Open
theelderbeever opened this issue Nov 27, 2023 · 0 comments
Open

pgbackrest_restore.sh exits despite backup being enabled #631

theelderbeever opened this issue Nov 27, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@theelderbeever
Copy link

Cross posting from slack message

What happened?

We have backup enabled for pgbackrest in our self hosted chart. There is a corresponding PGBACKREST_BACKUP_ENABLED environment variable that is set to true when exec-ing into the pod. Upon running reinit or adding a replica to our HA cluster the we see a message in our logs indicating the that pgbackrest restore has exited with a 1.

The pgbackrest_restore.sh should only do this if it exits on the environment variable being not true.

ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1

This seemingly also as the effect where if we are writing to the primary while the replica is initializing it will start failing to find WAL files and never manage to switch over to the primary streaming replication. The only workaround thusfar is to stop all writes to the primary while the replica creates.

Did you expect to see something different?
The backup from pgbackrest should succeed.

How to reproduce it (as minimally and precisely as possible):

Environment

  • Which helm chart and what version are you using?

  • What is in your values.yaml ?

timescaledb-single:
  replicaCount: 3
  image:
    tag: pg15.4-ts2.12.2-all
  secrets:
    credentialsSecretName: "billing-platform-timescaledb-patroni"
    pgbackrestSecretName: "billing-platform-timescaledb-pgbackrest"

  podManagementPolicy: Parallel

  backup:
    enabled: true
    pgBackRest:
      compress-type: lz4
      process-max: 4
      start-fast: "y"
      repo1-retention-diff: 2
      repo1-retention-full: 2
      repo1-cipher-type: "none"
      repo1-type: s3
      repo1-s3-region: us-east-1
      repo1-s3-endpoint: s3.amazonaws.com

    pgBackRest:archive-push:
      process-max: 4
      archive-async: "y"

    pgBackRest:archive-get:
      process-max: 4
      archive-async: "y"
      archive-get-queue-max: 2GB

  patroni:
    log:
      level: WARNING
    # https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#bootstrap
    bootstrap:
      dcs:
        synchronous_mode: true
        synchronous_node_count: 1
        master_start_timeout: 0
        postgresql:
          use_slots: false # https://github.com/timescale/helm-charts/blob/timescaledb-single-0.33.1/charts/timescaledb-single/examples/high_throughput.example.yaml-values.yaml
          parameters:
            max_wal_size: 16384
            wal_keep_size: 1024
            wal_segment_size: 67108864 # 64MB
            checkpoint_timeout: 300s
            temp_file_limit: '1024GB'
            max_connections: 1000
            synchronous_commit: remote_apply

  # Values for defining the primary & replica Kubernetes Services.
  service:
    primary:
      type: LoadBalancer
      port: 5432

    replica:
      type: LoadBalancer
      port: 5432

  persistentVolumes:
    data:
      enabled: true
      size: 3Ti
      storageClass: gp3-iops16k
    wal:
      enabled: false
      size: 100Gi
      storageClass: gp3-iops16k
  resources:
    limits:
      cpu: 16000m
      memory: 128Gi
    requests:
      cpu: 16000m
      memory: 128Gi

  sharedMemory:
    useMount: true

  pgBouncer:
    enabled: true
    port: 6432

  prometheus:
    enabled: true
  • Kubernetes version information:
kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:38Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.17-eks-4f4795d", GitCommit:"af19e454a15b5eb16d9f29d4d2361b3050ac78a6", GitTreeState:"clean", BuildDate:"2023-10-20T23:22:36Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.24) exceeds the supported minor version skew of +/-1
  • Kubernetes cluster kind:

AWS EKS

Anything else we need to know?:

@theelderbeever theelderbeever added the bug Something isn't working label Nov 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant