-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GpuMapConcat
support for nested type keys.
#6290
Add GpuMapConcat
support for nested type keys.
#6290
Conversation
Signed-off-by: remzi <[email protected]>
build |
frame => { | ||
import frame.sparkSession.implicits._ | ||
frame.select(map_concat($"col1", $"col2")) | ||
} | ||
} | ||
|
||
testGpuFallback( | ||
"MapConcat with Struct keys fall back", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a corresponding test for MapType
keys as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark doesn't support MapType
as a key type. If you use Map
as a key, the Spark catalyst will throw an error.
The changes look good, but I think we should also add integration tests so that we test this on all supported platforms. |
execsAllowedNonGpu = Seq("ProjectExec", "ShuffleExchangeExec")) { | ||
testSparkResultsAreEqual( | ||
"MapConcat with Array keys", | ||
ArrayKeyMapDF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add in some integration tests as well or instead of these? It would also be good to try and add/update the data gen for the tests so it is much more likely to have collisions. This is especially true for nested types like arrays or structs where the probability of a collision goes way down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add in some integration tests as well or instead of these? It would also be good to try and add/update the data gen for the tests so it is much more likely to have collisions.
If what you mean is adding tests in Python Integration test, unfortunately, it is difficult. Because List
or Struct
is not hashable in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> schema = StructType([
... StructField('name', StringType(), True),
... StructField('properties', MapType(ArrayType(StringType()),StringType()),True)])
>>> dataDictionary = [
... ('James',{['hair', 'hair']:'black',['eye', 'eye']:'brown'})]
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: unhashable type: 'list'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @revans2, is there any workaround way that can generate random MapType
data in IT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @revans2, I find it is a little difficult to update the MapGen
to support the nested keys because currently the MapGen
uses dict
however, list
and struct
are unhashable in dict.
A workaround way is firstly generating lists of structs using ListGen
and StructGen
and then convert the data to map type. But I am not sure whether this is a good workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine. Sorry I didn't respond sooner. I have been stuck on something else.
Signed-off-by: remzi [email protected]
This is a subtask of #5559.
The
GpuMaxConcat
could support array typed keys and struct typed keys.