Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25832][SQL] Remove newly added map related functions #22828

Closed
wants to merge 3 commits into from
Closed

[SPARK-25832][SQL] Remove newly added map related functions #22828

wants to merge 3 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 25, 2018

What changes were proposed in this pull request?

This PR aims to supercede #22821 . The main author is @cloud-fan .

According to the discussion in RC4 voting thread and SPARK-25829, Spark current has a very weird behavior when we have duplicated keys in map. The newly added map-related functions in Apache Spark 2.4.0 make it worse. For instance, map_filter may return incorrect result like SPARK-25823

Before we entire fix the map behavior, we should not expose these functions to users.

  • map_entries
  • map_filter
  • map_zip_with
  • transform_values
  • transform_keys

How was this patch tested?

Pass the Jenkins.

@@ -313,7 +313,6 @@ exportMethods("%<=>%",
"lower",
"lpad",
"ltrim",
"map_entries",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @felixcheung ?

sc = SparkContext._active_spark_context
return Column(sc._jvm.functions.map_entries(_to_java_column(col)))


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @HyukjinKwon and @BryanCutler ?

expression[ArrayFilter]("filter"),
expression[ArrayExists]("exists"),
expression[ArrayAggregate]("aggregate"),
expression[TransformValues]("transform_values"),
expression[TransformKeys]("transform_keys"),
expression[MapZipWith]("map_zip_with"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @gatorsmile and @cloud-fan ?

@@ -61,8 +61,6 @@
*/
public final class UnsafeRow extends InternalRow implements Externalizable, KryoSerializable {

public static final int WORD_SIZE = 8;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added by map_entries

@dongjoon-hyun
Copy link
Member Author

To run the Jenkins faster, I create a standalone PR instead of making a PR to #22821 .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-25832][SQL] remove newly added map related functions [SPARK-25832][SQL] Remove newly added map related functions Oct 25, 2018
@dongjoon-hyun
Copy link
Member Author

Oh, I'm closing this in favor of #22827.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-25832 branch October 25, 2018 18:39
@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #98030 has finished for PR 22828 at commit 71d3b3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants