-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23272][SQL] add calendar interval type support to ColumnVector #20438
Conversation
@@ -195,6 +196,7 @@ | |||
* struct field. | |||
*/ | |||
public final ColumnarRow getStruct(int rowId) { | |||
if (isNullAt(rowId)) return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
int month = data.getChild(0).getInt(offset + ordinal); | ||
long microseconds = data.getChild(1).getLong(offset + ordinal); | ||
return new CalendarInterval(month, microseconds); | ||
return data.getInterval(offset + ordinal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should insert if (data.isNullAt(offset + ordinal)) return null;
to be consistent with other ColumnarXxx
s?
Or I guess we can remove these null-checks from all other ColumnarXxx
s and leave it to ColumnVector.getInterval()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'm going to require ColumnVector.getXXX
to return null if the value is null, but I'll do it in another PR, to update all the documents and define the behavior of batched getXXX
methods.
@@ -87,7 +87,7 @@ public static CalendarInterval fromString(String s) { | |||
} | |||
} | |||
|
|||
public static long toLongWithRange(String fieldName, | |||
private static long toLongWithRange(String fieldName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?! It's much harder (if at all possible) to test private
methods (been bitten few times this week and remember the pain).
* | ||
* In Spark, calendar interval type value is basically an integer value representing the number of | ||
* months in this interval, and a long value representing the number of microseconds in this | ||
* interval. A interval type vector is same as a struct type vector with 2 fields: `months` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"An interval" + "is the same as"
* `microseconds`. | ||
* | ||
* To support interval type, implementations must implement {@link #getChild(int)} and define 2 | ||
* child vectors: the first child vector is a int type vector, containing all the month values of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is an int type vector
*/ | ||
public final CalendarInterval getInterval(int rowId) { | ||
if (isNullAt(rowId)) return null; | ||
final int months = getChild(0).getInt(rowId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of final
keyword here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from the previous code, probably it tries to make the compiler happy and run the code faster.
final long microseconds = getChild(1).getLong(rowId); | ||
return new CalendarInterval(months, microseconds); | ||
} | ||
|
||
/** | ||
* Returns the ordinal's child column vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@return [[ColumnVector]] at the ordinal
* interval. An interval type vector is the same as a struct type vector with 2 fields: `months` | ||
* and `microseconds`. | ||
* | ||
* To support interval type, implementations must implement {@link #getChild(int)} and define 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: interval type
-> calendar interval type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little annoying to type calendar interval type
all the time...
Test build #86819 has finished for PR 20438 at commit
|
Test build #86825 has finished for PR 20438 at commit
|
*/ | ||
public abstract ColumnVector getChild(int ordinal); | ||
protected abstract ColumnVector getChild(int ordinal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Now, it became protected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ColumnVector
is public, could you add some description in PR description for this kind of visibility change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
LGTM. |
@@ -146,9 +146,7 @@ public UTF8String getUTF8String(int ordinal) { | |||
@Override | |||
public CalendarInterval getInterval(int ordinal) { | |||
if (columns[ordinal].isNullAt(rowId)) return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #20438 (comment) . In this PR I just fixed the returning null issue for getStruct
and getInterval
, because they are non-abstract. There should be a follow up to clearly document that ColumnVector.getBinary/getUTF8String/...
should return null if this slot is null. Then we can remove these null checks here. I appreciate it if you have time to take this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to prepare a PR tonight.
@@ -139,9 +139,7 @@ public UTF8String getUTF8String(int ordinal) { | |||
@Override | |||
public CalendarInterval getInterval(int ordinal) { | |||
if (data.getChild(ordinal).isNullAt(rowId)) return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this null check now?
thanks, merging to master/2.3! |
## What changes were proposed in this pull request? `ColumnVector` is aimed to support all the data types, but `CalendarIntervalType` is missing. Actually we do support interval type for inner fields, e.g. `ColumnarRow`, `ColumnarArray` both support interval type. It's weird if we don't support interval type at the top level. This PR adds the interval type support. This PR also makes `ColumnVector.getChild` protect. We need it public because `MutableColumnaRow.getInterval` needs it. Now the interval implementation is in `ColumnVector.getInterval`. ## How was this patch tested? a new test. Author: Wenchen Fan <[email protected]> Closes #20438 from cloud-fan/interval. (cherry picked from commit 695f714) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
ColumnVector
is aimed to support all the data types, butCalendarIntervalType
is missing. Actually we do support interval type for inner fields, e.g.ColumnarRow
,ColumnarArray
both support interval type. It's weird if we don't support interval type at the top level.This PR adds the interval type support.
This PR also makes
ColumnVector.getChild
protect. We need it public becauseMutableColumnaRow.getInterval
needs it. Now the interval implementation is inColumnVector.getInterval
.How was this patch tested?
a new test.