Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType #4143

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

ala
Copy link
Contributor

@ala ala commented Feb 11, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR modifies the nested filed name normalization code. This code is used when we write to a Delta table. It makes sure that the names of all the columns in the source data (meaning, the data that we are writing to the table) match the target table column names in terms of case. For example, if the source contains column address.CITY, while the table contains column address.city, the source column is renamed to match the table.

If the schema of the source data and the table data differs enough to interfere with the name normalization, we log delta.assertions.schemaNormalization.nonNestedTypeMismatch. However, this check is too restrictive, and as a result, we log this assertion too often. This PR relaxes the check. We now assume that any two atomic columns can be matched to each other.

How was this patch tested?

New tests in SchemaUtilsSuite.

Does this PR introduce any user-facing changes?

No.

Copy link
Contributor

@c27kwan c27kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Don't forget to tag the PR title with [Spark] prefix. I would also remove the whitespace after SchemaUtils. in the PR title. :)

@ala ala changed the title Relax type check in SchemaUtils. normalizeColumnNamesInDataType [Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType Feb 12, 2025
Copy link
Contributor

@c27kwan c27kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still lgtm. The whitespace changes improve readability.

@ala ala force-pushed the permissive_type_check branch from 682f931 to 39c79ef Compare February 12, 2025 12:57
@c27kwan
Copy link
Contributor

c27kwan commented Feb 12, 2025

actually, maybe we can leave out the whitespace for another time. :)

@ala ala force-pushed the permissive_type_check branch from cfd1f75 to 11e11c3 Compare February 13, 2025 09:49
@ala
Copy link
Contributor Author

ala commented Feb 13, 2025

All right. Back to the original minimal change then.

@allisonport-db allisonport-db merged commit 0b818bf into delta-io:master Feb 13, 2025
30 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants