Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BigQuery transfer operators to respect project_id arguments #32232

Merged

Conversation

avinashpandeshwar
Copy link
Contributor

@avinashpandeshwar avinashpandeshwar commented Jun 28, 2023

While submitting the BigQuery to GCS job or GCS to BigQuery job, the current code only prefers the project_id of the BigQuery table, and ignores the provided project_id argument to the BigQueryToGCSOperator and GCSToBigQueryOperator operators (as pointed out by @bhagany in #32106). This is not the expected behaviour, and the provided project_id must be used to submit the job to BigQuery and use that project's resources for computation.

This change aims to correct this behaviour by making hook's project_id as the fallback project for both BigQuery storage and compute.

  • Storage : Prefer project_id in table name. If not provided, fallback to hook's project_id.
  • Compute : Prefer project_id param of operator. If not provided, fallback to hook's project_id.

Note: This change replaces @Yaro1 's previous change in #30053, where the BQ table's project_id is used for computation.

closes: #32106

suppress: #32144
suppress: #32143
suppress: #32095

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jun 28, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 28, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Member

potiuk commented Jun 29, 2023

LGTM. Is it possible to add test ? and @Yaro1 -> are you ok with it?

@potiuk
Copy link
Member

potiuk commented Jul 3, 2023

Static checks need fixing + rebase.

@eladkal
Copy link
Contributor

eladkal commented Jul 3, 2023

Lets please fix all occurrences as mentioned in #32106 (comment)

@avinashpandeshwar
Copy link
Contributor Author

@eladkal Sure, that makes sense. Working on merging all the fixes into this PR.

@avinashpandeshwar
Copy link
Contributor Author

@eladkal The PR now includes fixes to both bq_to_gcs and gcs_to_bq ( fixed in multiple places within the operator - regular table, external table, deferred runs, max_id_key calculation ), and incorporates inputs from @bhagany's comment and @hussein-awala's comment.

cc @potiuk

@potiuk
Copy link
Member

potiuk commented Jul 5, 2023

Tests are failing.

@eladkal eladkal changed the title Fixing Issue - Provided project_id parameter not getting used to subm… Fix BigQuery transfer operators to respect project_id arguments Jul 5, 2023
@potiuk potiuk merged commit 2d690de into apache:main Jul 6, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 6, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@bhagany
Copy link
Contributor

bhagany commented Jul 11, 2023

Congrats on this, and my thanks!

@nathadfield
Copy link
Collaborator

Yes, many thanks from my side too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCSToBigQueryOperator and BigQueryToGCSOperator do not respect their project_id arguments
5 participants