Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't clone from WAL (backed by the Azure storage container) #2851

Open
l-maciej opened this issue Jan 21, 2025 · 0 comments
Open

Can't clone from WAL (backed by the Azure storage container) #2851

l-maciej opened this issue Jan 21, 2025 · 0 comments

Comments

@l-maciej
Copy link

l-maciej commented Jan 21, 2025

  • Which image of the operator are you using? e.g. ghcr.io/zalando/postgres-operator:v1.14.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? - AKS
  • Are you running Postgres Operator in production? - no
  • Type of issue? - question

I have issue with setting up the WAL restore process using data stored in AZ blob storage container

Operator was configured using values.yaml

configKubernetes:
  enable_cross_namespace_secret: true
configLogicalBackup:
  logical_backup_docker_image: "ghcr.io/zalando/postgres-operator/logical-backup:v1.14.0" # PG17 support
  logical_backup_provider: "az"
  logical_backup_azure_storage_account_name: "XXXXXXXXXXXXXx"
  logical_backup_azure_storage_container: "XXXXXXXXXXXXXXXXxxxxx"
  logical_backup_azure_storage_account_key: "XXXXXXXXXXXXXXXXXXXXXXXXX"
  logical_backup_job_prefix: "logical-backup-"
  logical_backup_schedule: "37 21 * * *"
  # Fake catalog structure 
  logical_backup_s3_bucket: "XXXXXXXXXXXXX"
  logical_backup_s3_bucket_prefix: "XXXXXXXXXXXX"
  logical_backup_s3_bucket_scope_suffix: "XXXXXXXXXXXXXXXXX"
configAwsOrGcp:
  wal_az_storage_account: "XXXXXXXXXXXXXXXXX"
configGeneral:
  docker_image: ghcr.io/zalando/spilo-17:4.0-p2

First the "base" CRD was deployed

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  labels:
    team: test-team
  name: testingdb-1
  namespace: postgress-db-1
spec:
  enableLogicalBackup: true
  teamId: "test-team"
  volume:
    storageClass: "zone-redundant"
    size: 2Gi
  env:
  - name: USE_WALG_BACKUP
    value:  "true"
  - name: USE_WALG_RESTORE
    value:  "true"
  - name: CLONE_USE_WALG_RESTORE
    value:  "true"
  - name: WALG_AZ_PREFIX
    value: "azure://XXXXX/$(SCOPE)/$(PGVERSION)"
  - name: AZURE_STORAGE_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: wal-creds
        key: AZ_KEY
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    pg-cross.bot: [] #role with NS level secret
    test: []
    foo_user: []  # role for application foo
  databases:
    testdb: zalando  # dbname: owner
  postgresql:
    version: "17"

The database is operational, I'm able to connect create additional DB's and clone it using the direct connection ( just by the name of the DB).
The WAL backup is visible from the inside of the pod (wal-g backup-list).

When trying to clone using UUID with CRD listed below operation fails.

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  labels:
    team: test-team
  name: wal-r-b
  namespace: postgress-db-1
spec:
  enableLogicalBackup: true
  teamId: "test-team"
  volume:
    storageClass: "zone-redundant"
    size: 1Gi
  clone: 
    cluster: testingdb-1
    uid: xxxxxxxxxxxxxxxxxxxxxxxxxxx
    timestamp: "2025-01-15T16:00:00.000+00:00"
  env:
  - name: BACKUP_SCHEDULE  # WAL backup sched to to testing
    value: "2 */1 * * *"
  - name: WALG_AZ_PREFIX
    value: "azure://xxxxxxxxxxxxx/$(SCOPE)/$(PGVERSION)"
  - name: CLONE_USE_WALG_RESTORE
    value: "true"
  - name: USE_WALG_BACKUP
    value:  "true"
  - name: USE_WALG_RESTORE
    value:  "true"
  - name: AZURE_STORAGE_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: wal-creds
        key: AZ_KEY
  numberOfInstances: 2
  postgresql:
    version: "17"

How I can investigate what causes the errors below

2025-01-21 10:43:02,816 ERROR: Error creating replica using method wal_e: envdir /run/etc/wal-e.d/env bash /scripts/wale_restore.sh exited with code=1
2025-01-21 10:43:32,939 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-01-21 10:43:32,939 ERROR: failed to bootstrap from leader 'wal-r-b-0'
2025-01-21 10:43:41,335 ERROR: Error creating replica using method wal_e: envdir /run/etc/wal-e.d/env bash /scripts/wale_restore.sh exited with code=1
2025-01-21 10:44:11,453 ERROR: Error creating replica using method basebackup_fast_xlog: /scripts/basebackup.sh exited with code=1
2025-01-21 10:44:11,453 ERROR: failed to bootstrap from leader 'wal-r-b-0'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant