[SPARK-25832][SQL] Remove newly added map related functions #22828

dongjoon-hyun · 2018-10-25T17:03:57Z

What changes were proposed in this pull request?

This PR aims to supercede #22821 . The main author is @cloud-fan .

According to the discussion in RC4 voting thread and SPARK-25829, Spark current has a very weird behavior when we have duplicated keys in map. The newly added map-related functions in Apache Spark 2.4.0 make it worse. For instance, map_filter may return incorrect result like SPARK-25823

Before we entire fix the map behavior, we should not expose these functions to users.

map_entries
map_filter
map_zip_with
transform_values
transform_keys

How was this patch tested?

Pass the Jenkins.

dongjoon-hyun · 2018-10-25T17:05:01Z

R/pkg/NAMESPACE

@@ -313,7 +313,6 @@ exportMethods("%<=>%",
              "lower",
              "lpad",
              "ltrim",
-              "map_entries",


Could you review this, @felixcheung ?

dongjoon-hyun · 2018-10-25T17:05:37Z

python/pyspark/sql/functions.py

-    sc = SparkContext._active_spark_context
-    return Column(sc._jvm.functions.map_entries(_to_java_column(col)))
-
-


Could you review this, @HyukjinKwon and @BryanCutler ?

dongjoon-hyun · 2018-10-25T17:06:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

    expression[ArrayFilter]("filter"),
    expression[ArrayExists]("exists"),
    expression[ArrayAggregate]("aggregate"),
-    expression[TransformValues]("transform_values"),
-    expression[TransformKeys]("transform_keys"),
-    expression[MapZipWith]("map_zip_with"),


Could you review this, @gatorsmile and @cloud-fan ?

dongjoon-hyun · 2018-10-25T17:10:07Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java

@@ -61,8 +61,6 @@
 */
 public final class UnsafeRow extends InternalRow implements Externalizable, KryoSerializable {

-  public static final int WORD_SIZE = 8;
-


This is added by map_entries

dongjoon-hyun · 2018-10-25T17:11:08Z

To run the Jenkins faster, I create a standalone PR instead of making a PR to #22821 .

dongjoon-hyun · 2018-10-25T17:18:38Z

Oh, I'm closing this in favor of #22827.

SparkQA · 2018-10-25T21:22:51Z

Test build #98030 has finished for PR 22828 at commit 71d3b3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan and others added 3 commits October 25, 2018 12:42

remove newly added map related functions from FunctionRegistry

7e919e3

fix tests

726fc30

Remove

71d3b3c

dongjoon-hyun commented Oct 25, 2018

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-25832][SQL] remove newly added map related functions~~ [SPARK-25832][SQL] Remove newly added map related functions Oct 25, 2018

dongjoon-hyun closed this Oct 25, 2018

dongjoon-hyun deleted the SPARK-25832 branch October 25, 2018 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25832][SQL] Remove newly added map related functions #22828

[SPARK-25832][SQL] Remove newly added map related functions #22828

dongjoon-hyun commented Oct 25, 2018 •

edited

Loading

dongjoon-hyun Oct 25, 2018

dongjoon-hyun Oct 25, 2018

dongjoon-hyun Oct 25, 2018

dongjoon-hyun Oct 25, 2018

dongjoon-hyun commented Oct 25, 2018

dongjoon-hyun commented Oct 25, 2018

SparkQA commented Oct 25, 2018

		sc = SparkContext._active_spark_context
		return Column(sc._jvm.functions.map_entries(_to_java_column(col)))

[SPARK-25832][SQL] Remove newly added map related functions #22828

[SPARK-25832][SQL] Remove newly added map related functions #22828

Conversation

dongjoon-hyun commented Oct 25, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

dongjoon-hyun Oct 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun Oct 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun Oct 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun Oct 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun commented Oct 25, 2018

dongjoon-hyun commented Oct 25, 2018

SparkQA commented Oct 25, 2018

dongjoon-hyun commented Oct 25, 2018 •

edited

Loading