Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velero-plugin-for-aws v1.9.x no longer works with S3-compatible BackupStorageLocation #7828

Closed
losil opened this issue May 27, 2024 · 8 comments
Assignees

Comments

@losil
Copy link

losil commented May 27, 2024

What steps did you take and what happened:
We have updated our velero deployment with the latest Helm chart 6.4.0 which installs velero 1.13.2. With this upgrade the version of the velero-plugin-for-aws plugin has also been updated to v1.9.0 respectively v1.9.2 during troubleshooting. The upgrade itself went through smoothly. Also the BackupStorageLocation which is a S3-compatible NetApp StorageGrid backend was in Available state after velero was initialized.
After that we tested some backup with all were unsuccessful and ended in the state Failed. We noticed that that during the backup run the BackupStorageLocation went to Unavailable with the corresponding log message:

BackupStorageLocation "netapp-s3" is unavailable: rpc error: code = Unknown desc = operation error S3: ListObjectsV2, https response error StatusCode: 403, RequestID: 1716281754869367, HostID: 12783833, api error AccessDenied: V4 authentication signed header not found: accept-encoding

The configuration of the BackupStorageLocation looks like this and as said is a S3-compatible NetApp Storagegrid system:

configuration:
  backupStorageLocation:
    - name: netapp-s3
      provider: aws
      bucket: mycluster
      prefix: velero
      default: true
      accessMode: ReadWrite
      credential:
        name: velero-s3-credentials
        key: cloud
      config:
        region: myregion
        s3ForcePathStyle: true
        s3Url: https://objectstore.localdomain.local:10443/
        signatureVersion: "4"

After the Backup run has ended velero marked the BackupStorageLocation as Available again in its regularly validation schedule.

Downgrading the velcro-plugin-for-aws to v1.8.2 solves the issue and the Backups are successful again.

What did you expect to happen:

We expect the same behavior when using the current version of the velcro-plugin-for-aws initContainer. Velero should be able to use the S3-compatible backend provided by NetApp Storagegrid.

The following information will help us better understand what's going on:

bundle-2024-05-27-09-20-52.tar.gz

Environment:

  • Velero version (use velero version): v1.13.2
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version): v1.26.15+rke2r1
  • Kubernetes installer & version: Rancher RKE2 - v1.26.15+rke2r1
  • Cloud provider or hardware configuration: VMware ESXi, 7.0.3, 20328353
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.4 LTS

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@losil losil changed the title velero-plugin-for-aws v1.9.x is no more working with S3-compatible BackupStorageLocation velero-plugin-for-aws v1.9.x no longer works with S3-compatible BackupStorageLocation May 27, 2024
@blackpiglet
Copy link
Contributor

I didn't find useful information in the bundle other than V4 authentication signed header not found: accept-encoding.

This should be related to the AWS plugin SDK version bumping to v2 in the v1.9.x.
Per my understanding, the AWS SDK v2 uses the v4 accept-encoding by default, so the signatureVersion was deleted from the AWS plugin configuration.

After the SDK version bumping, we already saw some errors caused by the S3-compatible backend not compatible to the S3 spec.
Not sure whether this is caused by the NetApp Storagegrid inconsistency with the S3.

@losil
Copy link
Author

losil commented May 29, 2024

After checking the official documentation it seems that they did not implement/support the Accept-Encoding header.

@blackpiglet do you see any problems when using velero 1.13.x in combination with velero-plugin-for-aws 1.8.x?

@blackpiglet
Copy link
Contributor

I haven't tried that, but if your scenario doesn't require the new parameters (tagging and checksumAlgorithm) added in release 1.9, then it should work.

@scaleoutsean
Copy link

After checking the official documentation it seems that they did not implement/support the Accept-Encoding header.

I think that doesn't mean it must. From [3]:

As long as the identity;q=0 or *;q=0 directives do not explicitly forbid the identity value that means no encoding, the server must never return a 406 Not Acceptable error.

Also from [3], why preferred Accepted-Encoding may not be acceptable to the server:

Two common cases lead to this:
The data to be sent is already compressed.
The server is overloaded and cannot allocate computing resources.

It seems the server should respond rather than silently handle. Does it return 406 Not Acceptable [1], 415 [2] or the content itself in response? If not, then I'd create issue with the S3 vendor. If yes, then Velero client shouldn't fail.

IMO Velero client may prefer whatever it does, but should accept any (*) with a > 0 preference value (*;q=0.001) to cater to those common cases above. Is "any" (original representation) accepted by Velero now, i.e. is qvalues weighting for * > 0?

I don't know if that would help when/if the server doesn't handle that header (in which case it may be responding with original content, but Velero maybe does not accept it).
In that case Velero could still work around that by trying whichever way works (with, or without specific encoding), but arguably the S3 vendor should fix their code to handle the header better and Velero should accept the original representation (if it now does not at this time) as mentioned in [3].

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406
[2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/415
[3] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding
[4] https://datatracker.ietf.org/doc/html/rfc7231#section-5.3.4

@reasonerjt reasonerjt added Area/Storage/Minio For marking the issues where backend storage is minio Area/storage/netapp and removed Area/Cloud/AWS Area/Storage/Minio For marking the issues where backend storage is minio labels Aug 20, 2024
@reasonerjt
Copy link
Contributor

After reading the comments.
It seems to me the gap is that the netapp s3 service does not work with aws-sdk-v2.

@losil You may tweak the code and see if it may work with some parameters change when calling the sdk, but I don't think we can make sure the plugin works with EVERY storage which declares it's s3-compatible but indeed may work differently in details from AWS S3.

@kaovilai
Copy link
Member

kaovilai commented Oct 4, 2024

duped by #8152

Long term we should add another plugin that uses an SDK that has the ability to ignore that accept-encoding header like https://github.com/minio/minio-go/blob/99336902dd57f3760e272caf6550e6791eabe0af/pkg/signer/request-signature-v4.go#L60

@kaovilai
Copy link
Member

kaovilai commented Oct 4, 2024

doc'ing in vmware-tanzu/velero-plugin-for-aws#219

@losil
Copy link
Author

losil commented Nov 29, 2024

With StorageGRID® Version 11.8.0.7 our issues have been fixed and we can use the velero-plugin-for-aws with the according the compability matrix.
For this reason i will close this issue. Thank you for your cooperation.

@losil losil closed this as completed Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants