Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

druid-deltalake-extensions support for StructType #16782

Closed
Donutellko opened this issue Jul 23, 2024 · 4 comments · Fixed by #16884
Closed

druid-deltalake-extensions support for StructType #16782

Donutellko opened this issue Jul 23, 2024 · 4 comments · Fixed by #16884

Comments

@Donutellko
Copy link

Donutellko commented Jul 23, 2024

Description

Hello, we are trying to load data from a DeltaTable, and facing an issue with StructType (manually formatted error message, full stack trace in the attachments: druid-delta-unsupported-StructType.log ):

Failed to sample data: Unsupported data type[
  struct(
    StructField(name=FieldOne,type=string,nullable=true,metadata={}), 
    StructField(name=FieldTwo,type=string,nullable=true,metadata={})
  )
] for fieldName[MetaData].
        at org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:460)
        at ...
        at org.apache.druid.error.InvalidInput.exception(InvalidInput.java:30)
        at org.apache.druid.delta.input.DeltaInputRow.getValue(DeltaInputRow.java:201)
        at org.apache.druid.delta.input.DeltaInputRow._getRaw(DeltaInputRow.java:163)
        at org.apache.druid.delta.input.DeltaInputRow.<init>(DeltaInputRow.java:74)
        at org.apache.druid.delta.input.DeltaInputSourceReader$DeltaInputSourceIterator.next(DeltaInputSourceReader.java:140)
        at ...

Using apache/druid:30.0.0

Expected behavior:

  • StructType's StructFields are loaded as a set of columns with a common prefix: MetaData.FieldOne, MetaData.FieldTwo, ...;
  • or (at least) StructType is loaded as a JSON string.
  • Additionally, I would like to discuss a possibility of loading delta ArrayType as a JSON string.

Motivation

  • Storing StructType is a common approach for DeltaTables, and with Spark they are widely used to group some fields and accessing them like this: .select(col("MetaData.FieldOne")). Supporting loading this data seems indispensable for a common use of druid-delta-extensions.
@Donutellko
Copy link
Author

Feature initially introduced in #15755.

@abhishekrb19, I appreciate a lot your work and would appreciate even more your kind response.

@abhishekrb19
Copy link
Contributor

Hi @Donutellko, thanks for reporting. IIRC struct and array types weren't fully supported with the upstream Delta Kernel library in 3.0.0 when the extension was originally written. Now that we use Kernel 3.2.0, it seems that support has been added. I will look into adding it in the Druid connector.

Re the expected behavior you note:

Expected behavior:

  • StructType's StructFields are loaded as a set of columns with a common prefix: MetaData.FieldOne, MetaData.FieldTwo, ...;
  • or (at least) StructType is loaded as a JSON string.
  • Additionally, I would like to discuss a possibility of loading delta ArrayType as a JSON string.

I think the Delta input source should just write structs as json and arrays as arrays. For structs, if a user wants to flatten the fields or extract/transform specific fields present in the struct, it should still be possible to do so using the SQL JSON functions that can be used at ingest and/or query time . For example, JSON_VALUE("MetaData", '$.FieldOne') AS "Metadata.FieldOne" will extract FieldOne as a separate column in whichever way you'd like.

Does that sound good to you?

@Donutellko
Copy link
Author

Thank you for your response @abhishekrb19.

I think the Delta input source should just write structs as json and arrays as arrays. <...>
Does that sound good to you?

Yes, that sounds good. Could you provide any ETA for the implementation?

@abhishekrb19
Copy link
Contributor

@Donutellko the fix was merged in #16884. It will be available in the next release, Druid 31.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants