-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25832][SQL] Remove newly added map related functions #22828
Conversation
@@ -313,7 +313,6 @@ exportMethods("%<=>%", | |||
"lower", | |||
"lpad", | |||
"ltrim", | |||
"map_entries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you review this, @felixcheung ?
sc = SparkContext._active_spark_context | ||
return Column(sc._jvm.functions.map_entries(_to_java_column(col))) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you review this, @HyukjinKwon and @BryanCutler ?
expression[ArrayFilter]("filter"), | ||
expression[ArrayExists]("exists"), | ||
expression[ArrayAggregate]("aggregate"), | ||
expression[TransformValues]("transform_values"), | ||
expression[TransformKeys]("transform_keys"), | ||
expression[MapZipWith]("map_zip_with"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you review this, @gatorsmile and @cloud-fan ?
@@ -61,8 +61,6 @@ | |||
*/ | |||
public final class UnsafeRow extends InternalRow implements Externalizable, KryoSerializable { | |||
|
|||
public static final int WORD_SIZE = 8; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added by map_entries
To run the Jenkins faster, I create a standalone PR instead of making a PR to #22821 . |
Oh, I'm closing this in favor of #22827. |
Test build #98030 has finished for PR 22828 at commit
|
What changes were proposed in this pull request?
This PR aims to supercede #22821 . The main author is @cloud-fan .
According to the discussion in RC4 voting thread and SPARK-25829, Spark current has a very weird behavior when we have duplicated keys in map. The newly added map-related functions in Apache Spark 2.4.0 make it worse. For instance,
map_filter
may return incorrect result like SPARK-25823Before we entire fix the map behavior, we should not expose these functions to users.
How was this patch tested?
Pass the Jenkins.