Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7242] Avoid unnecessary bigquery table update when using sync tool #10374

Merged
merged 2 commits into from
Dec 22, 2023

Conversation

jp0317
Copy link
Contributor

@jp0317 jp0317 commented Dec 20, 2023

Change Logs

The PR added a table schema update step for bigquery sync tool. It seems there're two issues (this PR focuses on the 1st one):

  1. When it reform the schema which is then compared to the bq table schema, the reformed schema puts partition fields in the beginning, while the bq table schema by default has partition fields at the end. So it unnecessarily triggers a schema update due to to order difference.

  2. Though the sync tool for 0.14.0 does not support big lake connection id (there's a recent PR last month adding this support), a user can still recreate their table manually by adding connection id. The table update is adding the new schema into. external table definition. This does not work for biglake tables, and will cause error: "Schema can be specified only on the Table.Schema field for external tables with an associated connection_id but schema was provided on Table.Externaldataconfig.Schema".

Impact

Avoid the unnecessary table update.

Risk level (write none, low medium or high below)

low. It only corrects the order of fields.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@danny0405
Copy link
Contributor

cc @the-other-tim-brown Looks a reasonable fix~

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@the-other-tim-brown
Copy link
Contributor

LGTM! Thanks for catching this @jp0317

@danny0405 danny0405 merged commit f0356de into apache:master Dec 22, 2023
21 of 32 checks passed
yihua pushed a commit that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants