Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Structs evolution in MapType keys and values #1868

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Jun 27, 2023

Description

This PR resolves issue #1641 to allow automatic schema evolution in structs that are inside maps.

Assuming the target and source tables have the following schemas:
target: id string, map map<int, struct<a: int, b: int>>
source: id string, map map<int, struct<a: int, b: int, c: int>>

SET spark.databricks.delta.schema.autoMerge.enabled = true;

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET *

returns an analysis error today:

AnalysisException: cannot resolve 's.map' due to data type mismatch: cannot cast map<string,struct<a:int,b:int>> to map<string,struct<a:int,b:int,c:string>>;

With this change, the merge command succeeds and the target table schema evolves to include field c inside the map value. The same also works for map keys.

How was this patch tested?

  • Tests are added to MergeIntoSuiteBase and MergeIntoSQLSuite to cover struct evolution inside of maps values and keys.

Does this PR introduce any user-facing changes?

Yes, struct evolution inside of maps now succeeds instead of failing with an analysis error, see previous example.

Copy link
Contributor

@fred-db fred-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! I had some minor comments. From what I can tell, this also fixes schema evolution in arrays nested in maps. Could we have one additional test for this for completeness?

@@ -123,6 +123,56 @@ class MergeIntoSQLSuite extends MergeIntoSuiteBase with MergeIntoNotMatchedBySo
update = "value.a.x = 2" :: "value.a.x = 3" :: Nil,
errorStrs = "There is a conflict from these SET columns" :: Nil)

test("Complex Data Type - Map with Struct Values") {
withTable("source") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge onto one line
withTable("source", "target") {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tests in MergeIntoSQLSuite are removed after I also implemented struct evolution inside of map keys and not just in map values. There actually was not good reason to limit it to map values only.

@johanl-db johanl-db changed the title Add support for Structs evolution in MapType value Add support for Structs evolution in MapType keys and values Jul 4, 2023
@johanl-db
Copy link
Collaborator Author

Nice fix! I had some minor comments. From what I can tell, this also fixes schema evolution in arrays nested in maps. Could we have one additional test for this for completeness?

I added a number of tests to cover map keys evolution and two tests to cover arrays nested in maps.

@johanl-db johanl-db requested a review from fred-db July 4, 2023 11:35
// scalastyle:on line.size.limit

// Struct evolution inside of map keys.
testEvolution("new source column in map struct key")(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the testNestedStructsEvolution helper method? It makes the tests super readable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, testNestedStructsEvolution relies on parsing JSON data which is what makes it more readable but JSON requires the keys in maps to be strings which is too limiting for these tests.

@johanl-db johanl-db requested a review from fred-db July 6, 2023 16:46
Copy link
Contributor

@fred-db fred-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants