-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSToBigQueryOperator and BigQueryToGCSOperator do not respect their project_id arguments #32106
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
I should mention that I am about to go on a 2-week vacation, so while I am willing to discuss and submit a PR, my responses will be delayed. I also think this is a serious enough bug for someone else to take on if they have the time. |
Also, #32093 is related to the inconsistent handling of project_ids caused by the PR I linked in the description. I filed this ticket because I felt the need to call attention to the broader range of issues |
@irvifa @nathadfield @avinashpandeshwar I'm a bit lost here. There are 4 open PRs for this issue Kindly asking to take under advisment @hussein-awala comment in #32095 (review) and raise 1 PR to address all occurrences (or any way you decide to sort this which makes it easier to review) cc @potiuk |
@eladkal Yeah, it's definitely something that needs addressing more broadly. Will bear all of these in mind. |
@eladkal Yeah, your'e right. I will be merging these fixes into my PR. |
I'm still on vacation, but popping in here to add my 2 cents regarding your PR and comment here @avinashpandeshwar -- I don't think these PR's are mutually compatible, with regard to how they treat @Yaro1's changes. I see that your PR currently keeps the default to the storage project if I agree with @hussein-awala's comment on #32095, linked above, and would prefer the approach taken in #32143, which keeps the |
@bhagany I have corrected my PR #32232 so that hook's project_id is the fallback for both storage and compute of the bq_to_gcs and gcs_to_bq jobs. The project_id param is the first preference for compute, if provided. Similarly, the table's project_id is the first preference for storage, if specified. |
@avinashpandeshwar May I know which version of |
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
We experienced this issue Airflow 2.6.1, but the problem exists in the Google provider rather than core Airflow, and were introduced with these changes. We are using version 10.0.0 of the provider.
The issue that resulted in these changes seems to be based on an incorrect understanding of how projects interact in BigQuery -- namely that the project used for storage and the project used for compute can be separate. The user reporting the issue appears to mistake an error about compute (
User does not have bigquery.jobs.create permission in project {project-A}.
for an error about storage, and this incorrect diagnosis resulted in a fix that inappropriately defaults the compute project to the project named in destination/source (depending on the operator) table.The change attempts to allow users to override this (imo incorrect) default, but unfortunately this does not currently work because
self.project_id
gets overridden with the named table's project here and here.What you think should happen instead
I think that the easiest fix would be to revert the change, and return to defaulting the compute project to the one specified in the default google cloud connection. However, since I can understand the desire to override the
project_id
, I think handling it correctly, and clearly distinguishing between the concepts of storage and compute w/r/t projects would also work.How to reproduce
Attempt to use any other project for running the job, besides the one named in the source/destination table
Operating System
debian
Versions of Apache Airflow Providers
apache-airflow-providers-google==10.0.0
Deployment
Other 3rd-party Helm chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: