Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCSHook's functions for list and download do not work with Requester Pays buckets. #31137

Closed
2 tasks done
ABoothInTheWild opened this issue May 8, 2023 · 5 comments
Closed
2 tasks done
Assignees
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@ABoothInTheWild
Copy link

ABoothInTheWild commented May 8, 2023

Apache Airflow version

2.6.0

What happened

When accessing data in a "Requester Pays" bucket, the user's project needs to be supplied in the storage client's definition of the bucket, or set in the acl. When calling the "list" or "download" function from the GCSHook, there is no place to supply a user project id. This results in the following error: Bucket is a requester pays bucket but no user project provided.

This is explicit in the GCP documentation.

What you think should happen instead

In the "insert_bucket_acl" function in the GCSHook, a user_project is optionally supplied for Requester Pays projects. This code looks like this:

""":param user_project: (Optional) The project to be billed for this request.
            Required for Requester Pays buckets."""

if user_project:
    bucket.acl.user_project = user_project
bucket.acl.save()

I believe this code should be added to the list and download functions as well. This should also fix any operators from GCP to GCP/S3/Azure that want to transfer data from a "Requester Pays" bucket.

How to reproduce

Call hook.list() on any GCS bucket with Requester Pays enabled


hook = GCSHook(
    gcp_conn_id=self.gcp_conn_id,
    delegate_to=self.delegate_to,
    impersonation_chain=self.google_impersonation_chain,
)

self.log.info(
    'Getting list of the files. Bucket: %s; Delimiter: %s; Prefix: %s',
    self.bucket,
    self.delimiter,
    self.prefix,
)

files = hook.list(bucket_name=self.bucket,
                  prefix=self.prefix,
                  delimiter=self.delimiter)

Operating System

Debian 11

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ABoothInTheWild ABoothInTheWild added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 8, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 8, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@pankajastro pankajastro added provider:google Google (including GCP) related issues good first issue and removed area:core needs-triage label for new issues that we didn't triage yet labels May 8, 2023
@eladkal
Copy link
Contributor

eladkal commented May 9, 2023

cc @shahar1

@hankehly
Copy link
Contributor

If someone could please assign me, I will implement this.

@potiuk
Copy link
Member

potiuk commented Jun 13, 2023

Done.

@eladkal
Copy link
Contributor

eladkal commented Aug 23, 2023

fixed in #32760

@eladkal eladkal closed this as completed Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
Development

No branches or pull requests

5 participants