-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23280][SQL] add map type support to ColumnVector #20450
Conversation
Test build #86866 has finished for PR 20450 at commit
|
We should also enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one comment.
column.putArray(0, 0, 1) | ||
column.putArray(1, 1, 2) | ||
column.putNull(2) | ||
column.putArray(3, 2, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column.putArray(3, 3, 0)
?
Seems like line 670 is the same mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key array is: 0, 1, 2, 3, 4, 5
the value array is: 0, 2, 4, 6, 8, 10
Note that, map [1->2, 2->4]
contributes 2 keys and values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so the offset of the next array should be 3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see, you are right!
} | ||
} | ||
|
||
// Populate it with maps [0->0], [1->2, 2->4], [], [3->6, 4->8, 5->10] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Populate it with maps [0->0], [1->2, 2->4], null, [], [3->6, 4->8, 5->10]
?
Test build #86868 has finished for PR 20450 at commit
|
* To support map type, implementations must construct an {@link ColumnarMap} and return it in | ||
* this method. {@link ColumnarMap} requires a {@link ColumnVector} that stores the data of all | ||
* the keys of all the maps in this vector, and another {@link ColumnVector} that stores the data | ||
* of all the values of all the maps in this vector, and an offset and length which specifies the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: an offset and length
-> a pair of offset and length
? Or specifies
-> specify
?
Test build #86872 has finished for PR 20450 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Test build #86883 has finished for PR 20450 at commit
|
retest this please |
@@ -530,7 +530,7 @@ public int putByteArray(int rowId, byte[] value, int offset, int length) { | |||
@Override | |||
protected void reserveInternal(int newCapacity) { | |||
int oldCapacity = (nulls == 0L) ? 0 : capacity; | |||
if (isArray()) { | |||
if (isArray() || type instanceof MapType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we may also have a method isMap()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an overkill, isArray
needs to take care of many types, but isMap
we only accept one type: MapType.
* | ||
* To support map type, implementations must construct an {@link ColumnarMap} and return it in | ||
* this method. {@link ColumnarMap} requires a {@link ColumnVector} that stores the data of all | ||
* the keys of all the maps in this vector, and another {@link ColumnVector} that stores the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maps
-> map entries
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys of map entries
sounds weird...
} | ||
|
||
@Override | ||
public int numElements() { return length; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numElements
or length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a API from parent, we can't change it.
LGTM only some nits and naming issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* key array with a index and a value from the value array with the same index contribute to | ||
* an entry of this map type value. | ||
* | ||
* To support map type, implementations must construct an {@link ColumnarMap} and return it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct an -> construct a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very minor, may not worth to wait for another QA round. Maybe we can fix it in your "return null" PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. LGTM.
LGTM |
Test build #86896 has finished for PR 20450 at commit
|
thanks, merging to master/2.3! |
## What changes were proposed in this pull request? Fill the last missing piece of `ColumnVector`: the map type support. The idea is similar to the array type support. A map is basically 2 arrays: keys and values. We ask the implementations to provide a key array, a value array, and an offset and length to specify the range of this map in the key/value array. In `WritableColumnVector`, we put the key array in first child vector, and value array in second child vector, and offsets and lengths in the current vector, which is very similar to how array type is implemented here. ## How was this patch tested? a new test Author: Wenchen Fan <[email protected]> Closes #20450 from cloud-fan/map. (cherry picked from commit 52e00f7) Signed-off-by: Wenchen Fan <[email protected]>
I found that we don't enable |
@viirya Thanks! but I'm working on it. I'll do it soon. |
@ueshin Ok. No problem. :) |
## What changes were proposed in this pull request? This is a follow-up of #20450 which broke lint-java checks. This pr fixes the lint-java issues. ``` [ERROR] src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:[20,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData. [ERROR] src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java:[21,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData. [ERROR] src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData. ``` ## How was this patch tested? Checked manually in my local environment. Author: Takuya UESHIN <[email protected]> Closes #20468 from ueshin/issues/SPARK-23280/fup1. (cherry picked from commit 8bb70b0) Signed-off-by: Takuya UESHIN <[email protected]>
## What changes were proposed in this pull request? This is a follow-up of apache#20450 which broke lint-java checks. This pr fixes the lint-java issues. ``` [ERROR] src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:[20,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData. [ERROR] src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java:[21,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData. [ERROR] src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData. ``` ## How was this patch tested? Checked manually in my local environment. Author: Takuya UESHIN <[email protected]> Closes apache#20468 from ueshin/issues/SPARK-23280/fup1.
## What changes were proposed in this pull request? This is a followup pr of #20450. We should've enabled `MutableColumnarRow.getMap()` as well. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes #20471 from ueshin/issues/SPARK-23280/fup2. (cherry picked from commit 89e8d55) Signed-off-by: Takuya UESHIN <[email protected]>
## What changes were proposed in this pull request? This is a followup pr of apache#20450. We should've enabled `MutableColumnarRow.getMap()` as well. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#20471 from ueshin/issues/SPARK-23280/fup2.
What changes were proposed in this pull request?
Fill the last missing piece of
ColumnVector
: the map type support.The idea is similar to the array type support. A map is basically 2 arrays: keys and values. We ask the implementations to provide a key array, a value array, and an offset and length to specify the range of this map in the key/value array.
In
WritableColumnVector
, we put the key array in first child vector, and value array in second child vector, and offsets and lengths in the current vector, which is very similar to how array type is implemented here.How was this patch tested?
a new test