Skip to content

Commit

Permalink
[SPARK-17849][SQL] Fix NPE problem when using grouping sets
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Prior this pr, the following code would cause an NPE:
`case class point(a:String, b:String, c:String, d: Int)`

`val data = Seq(
point("1","2","3", 1),
point("4","5","6", 1),
point("7","8","9", 1)
)`
`sc.parallelize(data).toDF().registerTempTable("table")`
`spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()`

The reason is that when the grouping_id() behavior was changed in #10677, some code (which should be changed) was left out.

Take the above code for example, prior #10677, the bit mask for set "(a)" was `001`, while after #10677 the bit mask was changed to `011`. However, the `nonNullBitmask` was not changed accordingly.

This pr will fix this problem.
## How was this patch tested?

add integration tests

Author: wangyang <[email protected]>

Closes #15416 from yangw1234/groupingid.

(cherry picked from commit fb0d608)
Signed-off-by: Herman van Hovell <[email protected]>
  • Loading branch information
wangyang authored and hvanhovell committed Nov 5, 2016
1 parent 3071d87 commit 446d72c
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -299,10 +299,15 @@ class Analyzer(
case other => Alias(other, other.toString)()
}

val nonNullBitmask = x.bitmasks.reduce(_ & _)
// The rightmost bit in the bitmasks corresponds to the last expression in groupByAliases
// with 0 indicating this expression is in the grouping set. The following line of code
// calculates the bitmask representing the expressions that absent in at least one grouping
// set (indicated by 1).
val nullBitmask = x.bitmasks.reduce(_ | _)

val attrLength = groupByAliases.length
val expandedAttributes = groupByAliases.zipWithIndex.map { case (a, idx) =>
a.toAttribute.withNullability((nonNullBitmask & 1 << idx) == 0)
a.toAttribute.withNullability(((nullBitmask >> (attrLength - idx - 1)) & 1) == 1)
}

val expand = Expand(x.bitmasks, groupByAliases, expandedAttributes, gid, x.child)
Expand Down
17 changes: 17 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
CREATE TEMPORARY VIEW grouping AS SELECT * FROM VALUES
("1", "2", "3", 1),
("4", "5", "6", 1),
("7", "8", "9", 1)
as grouping(a, b, c, d);

-- SPARK-17849: grouping set throws NPE #1
SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS (());

-- SPARK-17849: grouping set throws NPE #2
SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((a));

-- SPARK-17849: grouping set throws NPE #3
SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((c));



42 changes: 42 additions & 0 deletions sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 4


-- !query 0
CREATE TEMPORARY VIEW grouping AS SELECT * FROM VALUES
("1", "2", "3", 1),
("4", "5", "6", 1),
("7", "8", "9", 1)
as grouping(a, b, c, d)
-- !query 0 schema
struct<>
-- !query 0 output



-- !query 1
SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS (())
-- !query 1 schema
struct<a:string,b:string,c:string,count(d):bigint>
-- !query 1 output
NULL NULL NULL 3


-- !query 2
SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((a))
-- !query 2 schema
struct<a:string,b:string,c:string,count(d):bigint>
-- !query 2 output
1 NULL NULL 1
4 NULL NULL 1
7 NULL NULL 1


-- !query 3
SELECT a, b, c, count(d) FROM grouping GROUP BY a, b, c GROUPING SETS ((c))
-- !query 3 schema
struct<a:string,b:string,c:string,count(d):bigint>
-- !query 3 output
NULL NULL 3 1
NULL NULL 6 1
NULL NULL 9 1

0 comments on commit 446d72c

Please sign in to comment.