[Kernel] Add utility to filter for columns in a schema #4151

vkorukanti · 2025-02-13T05:28:18Z

Description

This utility method can be used to filter for columns with invariants or a certain data type (timestamp_ntz, variant, etc.) to implicitly identify the table features that should be enabled.

How was this patch tested?

Unit tests.

scottsand-db

I left a suggestion that I think would simplify the test logic. Besides that, LGTM!

scottsand-db · 2025-02-13T22:33:39Z

kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/SchemaUtilsSuite.scala

+        assert(results.size() === expectedColumns.size)
+
+        // Helper function to get the expected `StructField` based on a column path
+        def expectedStructField(columnPath: String): StructField = {


So, I did find expectedStructField and findStructFieldInResults to be a bit confusing.

I have a different idea: either in SchemaUtils or just in this test suite, it seems like we could create a function that takes a schema and flattens it to a list of path -> structFIeld. I'm flexible if this is a path of a list of strings, or just the column names esacped with backticks as needed and joined by ..

this would be useful for doing lookups in a schema. we would create this by calling recurseIntoComplexTypes with recurseIntoMapOrArrayElements = true and stopOnFirstMatch = false and f always returns true.

so flattenedSchema = <invoke this method>

and then we have our val results =filterRecursively

and then for each expected column, we can look up the expected column in the flattenedSchema, and assert that the actual results has the expected column, and compare them.

I think this would really simplify the test code. wdyt?

Good idea. Simplified even further. Also changed the flattenNestedColumnNames to use the new generic method.

vkorukanti · 2025-02-13T23:46:54Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/SchemaUtils.java

@@ -191,36 +216,89 @@ public static int findColIndex(StructType schema, String colName) {
   *   will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"
   * </pre>
   */
-  private static List<String> flattenNestedFieldNames(StructType schema) {


This will now use the generic method added.

scottsand-db · 2025-02-14T00:01:42Z

kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/SchemaUtilsSuite.scala

+            .asScala.map(f => (f._1.asScala.mkString("."), f._2)).toMap
+
+        // Assert that the number of results matches the expected columns
+        assert(results.size === expectedColumns.size)


scottsand-db

LGTM!

[Kernel][TableFeatures] Add utility to filter for columns in a schema

e0e44b5

vkorukanti added the kernel label Feb 13, 2025

vkorukanti requested a review from anoopj February 13, 2025 19:00

scottsand-db requested review from scottsand-db and allisonport-db February 13, 2025 21:47

scottsand-db requested changes Feb 13, 2025

View reviewed changes

address review

801c486

vkorukanti commented Feb 13, 2025

View reviewed changes

vkorukanti added 2 commits February 13, 2025 15:47

f

3111e47

stylefix

e3e45ab

vkorukanti requested a review from scottsand-db February 13, 2025 23:56

scottsand-db reviewed Feb 14, 2025

View reviewed changes

scottsand-db approved these changes Feb 14, 2025

View reviewed changes

vkorukanti merged commit dfc50d6 into delta-io:master Feb 14, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Add utility to filter for columns in a schema #4151

[Kernel] Add utility to filter for columns in a schema #4151

vkorukanti commented Feb 13, 2025

scottsand-db left a comment

scottsand-db Feb 13, 2025

vkorukanti Feb 13, 2025

vkorukanti Feb 13, 2025

scottsand-db Feb 14, 2025

scottsand-db left a comment

[Kernel] Add utility to filter for columns in a schema #4151

[Kernel] Add utility to filter for columns in a schema #4151

Conversation

vkorukanti commented Feb 13, 2025

Description

How was this patch tested?

scottsand-db left a comment

Choose a reason for hiding this comment

scottsand-db Feb 13, 2025

Choose a reason for hiding this comment

vkorukanti Feb 13, 2025

Choose a reason for hiding this comment

vkorukanti Feb 13, 2025

Choose a reason for hiding this comment

scottsand-db Feb 14, 2025

Choose a reason for hiding this comment

scottsand-db left a comment

Choose a reason for hiding this comment