-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle nested struct in schemaString #257
handle nested struct in schemaString #257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never like these catch all serialization things, but seems like in this case it's somewhat of a necessary thing.
@@ -62,9 +62,7 @@ pub fn sort_record_batch(batch: RecordBatch) -> DeltaResult<RecordBatch> { | |||
Ok(RecordBatch::try_new(batch.schema(), columns)?) | |||
} | |||
|
|||
static SKIPPED_TESTS: &[&str; 2] = &[ | |||
// iceberg compat requires column mapping | |||
"iceberg_compat_v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay!
kernel/src/schema.rs
Outdated
// } | ||
// TODO: Perhaps make this more restrictive, the PROTOCOL seems to indicate that this should | ||
// always be a `String->Int` map | ||
Object(serde_json::Map<String, serde_json::Value>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then IIUC this is letting column/structfield metadata also include arbitrary string->json? Is there any API here other than the ability to just ask for the metadata back?
Not that i'm against it, just thinking of some implications: this makes serde_json::Map/Value part of the public API right?
We could also take the extreme approach and do Json(serde_json::Value)
? or do the most specific of Map<Int, String>. or keep the in-between currently in the PR.
This comment is mostly questions/ideas I've thought of - I think most would be reasonable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does indeed make it part of the public API. I don't think this is exactly correct, but I think there's some work to be done to spec out in the protocol exactly what can go here.
This basically allows:
{
"key": [number],
"key": [string],
"key": [bool],
"key": {
[pretty much any object]
}
}
Allowing serde_json::Map<String, serde_json::Value>
is basically equivalent to "anything in a json object" because json objects are just string -> Value
maps. So what we're disallowing here is:
{
"key": [1,2,3], // array
"key": null
}
and that's it. I can't find use for that in the spec so I slightly prefer the more restrictive form. We can relax this later without breaking things, but it's harder to go from more loose to tighter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we disallowing array/null? Isn't there serde_json::Value::Null
and serde_json::Value::Array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just means the top level of enum there has to be an object, what is contained in the map can technically be anything serde_json::Value
can take
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we disallowing array/null? Isn't there
serde_json::Value::Null
andserde_json::Value::Array
?
We are only disallowing it at the top level. Inside the Object
variant you could have anything, but at the top we only deser into the types I listed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec only says metadata
is a "JSON map" and AFAIK delta-spark (spark, really) represents it as a String -> Any map (see Metadata.scala). So technically even Null and Array values would be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I've updated this to now just have an Other
variant that will catch anything we don't understand, plus a comment explaining why that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i misunderstood yay thanks lgtm :)
633ba5c
to
4f43d3b
Compare
Handle parsing nested structs in
schemaString
. Setting iceberg compat causes this. Also enable the iceberg_compat_v1 test for dat, since we do now support column mapping.Closes #253 .