-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add utility to filter for columns in a schema #4151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a suggestion that I think would simplify the test logic. Besides that, LGTM!
assert(results.size() === expectedColumns.size) | ||
|
||
// Helper function to get the expected `StructField` based on a column path | ||
def expectedStructField(columnPath: String): StructField = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I did find expectedStructField
and findStructFieldInResults
to be a bit confusing.
I have a different idea: either in SchemaUtils or just in this test suite, it seems like we could create a function that takes a schema and flattens it to a list of path -> structFIeld. I'm flexible if this is a path of a list of strings, or just the column names esacped with backticks as needed and joined by .
.
this would be useful for doing lookups in a schema. we would create this by calling recurseIntoComplexTypes
with recurseIntoMapOrArrayElements = true and stopOnFirstMatch = false and f
always returns true.
so flattenedSchema = <invoke this method>
and then we have our val results =filterRecursively
and then for each expected column, we can look up the expected column in the flattenedSchema
, and assert that the actual results has the expected column, and compare them.
I think this would really simplify the test code. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Simplified even further. Also changed the flattenNestedColumnNames
to use the new generic method.
@@ -191,36 +216,89 @@ public static int findColIndex(StructType schema, String colName) { | |||
* will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3" | |||
* </pre> | |||
*/ | |||
private static List<String> flattenNestedFieldNames(StructType schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will now use the generic method added.
.asScala.map(f => (f._1.asScala.mkString("."), f._2)).toMap | ||
|
||
// Assert that the number of results matches the expected columns | ||
assert(results.size === expectedColumns.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
This utility method can be used to filter for columns with invariants or a certain data type (timestamp_ntz, variant, etc.) to implicitly identify the table features that should be enabled.
How was this patch tested?
Unit tests.