Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Automatic schema evolution does not work in map value structs #1641

Closed
1 of 3 tasks
Orpheuz opened this issue Mar 9, 2023 · 3 comments
Closed
1 of 3 tasks
Labels
bug Something isn't working

Comments

@Orpheuz
Copy link

Orpheuz commented Mar 9, 2023

Bug

Describe the problem

Automatic schema evolution in delta does not allow evolution of structs inside maps

Steps to reproduce

This can be replicated with

import scala.collection.JavaConverters._
import org.apache.spark.sql.types._
import io.delta.tables._

spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

val schema = StructType(
    StructField("id", IntegerType) ::
    StructField(
      "map", MapType(
        StringType, StructType(
          StructField("a", IntegerType) ::
            StructField("b", IntegerType) ::
            StructField("c", StringType) ::
            Nil
        )
      )) ::
    Nil
)

val sourceDataFrame = spark.createDataFrame(
  Seq(
    Row.fromSeq(
      Seq(0, Map("key" -> Tuple3(0, 1, "a"), "key1" -> Tuple3(2, 3, "b")))
    )).asJava, schema)

sourceDataFrame.write.format("delta").mode("append").save("/delta/test")

val updatedSchema = StructType(
    StructField("id", IntegerType) ::
    StructField(
      "map", MapType(
        StringType, StructType(
          StructField("a", IntegerType) ::
            StructField("b", IntegerType) ::
            Nil
        )
      )) ::
    Nil
)

val updatedSourceDataFrame = spark.createDataFrame(
  Seq(
    Row.fromSeq(
      Seq(0, Map("key" -> Tuple2(0, 1), "key1" -> Tuple2(2, 3)))
    )).asJava, updatedSchema)

val targetDeltaTable = DeltaTable.forPath(spark, "/delta/test")

targetDeltaTable.alias("t").merge(
    updatedSourceDataFrame.alias("s"),
    "t.id = s.id")
  .whenMatched().updateAll()
  .whenNotMatched().insertAll()
  .execute()

Observed results

AnalysisException: cannot resolve 's.map' due to data type mismatch: cannot cast map<string,struct<a:int,b:int>> to map<string,struct<a:int,b:int,c:string>>;

Expected results

No exception to be thrown and maps being correctly casted to allow evolution.

Further details

Environment information

DBR 12.1 LTS

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@Orpheuz Orpheuz added the bug Something isn't working label Mar 9, 2023
@scottsand-db
Copy link
Collaborator

HI @Orpheuz - which version of Delta are you using?

Also - thanks for offering to contribute the fix!

@Orpheuz
Copy link
Author

Orpheuz commented Mar 9, 2023

Hey @scottsand-db the code snippet in the description was tested using DBR 12.2 LTS. I have also reproduced the bug in unit tests in master. I'll try to push a fix soon

Update: I might have underestimated the effort while comparing this fix to the array of structs. Maps are much more tricky 😞
I'll be off for two weeks and I'll try picking it up after that, but will definitely need some guidance

Update 2: Managed to solve the issue, had a typo that was throwing me off

allisonport-db pushed a commit that referenced this issue Jul 21, 2023
## Description

This PR resolves issue #1641 to allow automatic schema evolution in structs that are inside maps.

Assuming the target and source tables have the following schemas:
target: `id string, map map<int, struct<a: int, b: int>>`
source: `id string, map map<int, struct<a: int, b: int, c: int>>`
```
SET spark.databricks.delta.schema.autoMerge.enabled = true;

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET *
```
returns an analysis error today:
```
AnalysisException: cannot resolve 's.map' due to data type mismatch: cannot cast map<string,struct<a:int,b:int>> to map<string,struct<a:int,b:int,c:string>>;
```

With this change, the merge command succeeds and the target table schema evolves to include field `c` inside the map value. The same also works for map keys.

- Tests are added to `MergeIntoSuiteBase` and `MergeIntoSQLSuite` to cover struct evolution inside of maps values and keys.

## Does this PR introduce _any_ user-facing changes?
Yes, struct evolution inside of maps now succeeds instead of failing with an analysis error, see previous example.

Closes #1868

GitOrigin-RevId: 07ce2531e03c4e2fa69e8a34f33ba8d2dc3a0228
@johanl-db
Copy link
Collaborator

Merged #1868 implementing struct evolution for map keys and values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants