-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23272][SQL] add calendar interval type support to ColumnVector #20438
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
import org.apache.spark.sql.catalyst.util.MapData; | ||
import org.apache.spark.sql.types.DataType; | ||
import org.apache.spark.sql.types.Decimal; | ||
import org.apache.spark.unsafe.types.CalendarInterval; | ||
import org.apache.spark.unsafe.types.UTF8String; | ||
|
||
/** | ||
|
@@ -195,6 +196,7 @@ public double[] getDoubles(int rowId, int count) { | |
* struct field. | ||
*/ | ||
public final ColumnarRow getStruct(int rowId) { | ||
if (isNullAt(rowId)) return null; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! |
||
return new ColumnarRow(this, rowId); | ||
} | ||
|
||
|
@@ -236,9 +238,29 @@ public MapData getMap(int ordinal) { | |
public abstract byte[] getBinary(int rowId); | ||
|
||
/** | ||
* Returns the ordinal's child column vector. | ||
* Returns the calendar interval type value for rowId. | ||
* | ||
* In Spark, calendar interval type value is basically an integer value representing the number of | ||
* months in this interval, and a long value representing the number of microseconds in this | ||
* interval. An interval type vector is the same as a struct type vector with 2 fields: `months` | ||
* and `microseconds`. | ||
* | ||
* To support interval type, implementations must implement {@link #getChild(int)} and define 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a little annoying to type |
||
* child vectors: the first child vector is an int type vector, containing all the month values of | ||
* all the interval values in this vector. The second child vector is a long type vector, | ||
* containing all the microsecond values of all the interval values in this vector. | ||
*/ | ||
public final CalendarInterval getInterval(int rowId) { | ||
if (isNullAt(rowId)) return null; | ||
final int months = getChild(0).getInt(rowId); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the purpose of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's from the previous code, probably it tries to make the compiler happy and run the code faster. |
||
final long microseconds = getChild(1).getLong(rowId); | ||
return new CalendarInterval(months, microseconds); | ||
} | ||
|
||
/** | ||
* @return child [[ColumnVector]] at the given ordinal. | ||
*/ | ||
public abstract ColumnVector getChild(int ordinal); | ||
protected abstract ColumnVector getChild(int ordinal); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I see. Now, it became There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
|
||
/** | ||
* Data type for this column. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -135,9 +135,7 @@ public byte[] getBinary(int ordinal) { | |
|
||
@Override | ||
public CalendarInterval getInterval(int ordinal) { | ||
int month = data.getChild(0).getInt(offset + ordinal); | ||
long microseconds = data.getChild(1).getLong(offset + ordinal); | ||
return new CalendarInterval(month, microseconds); | ||
return data.getInterval(offset + ordinal); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should insert There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! I'm going to require |
||
} | ||
|
||
@Override | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -139,9 +139,7 @@ public byte[] getBinary(int ordinal) { | |
@Override | ||
public CalendarInterval getInterval(int ordinal) { | ||
if (data.getChild(ordinal).isNullAt(rowId)) return null; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we remove this null check now? |
||
final int months = data.getChild(ordinal).getChild(0).getInt(rowId); | ||
final long microseconds = data.getChild(ordinal).getChild(1).getLong(rowId); | ||
return new CalendarInterval(months, microseconds); | ||
return data.getChild(ordinal).getInterval(rowId); | ||
} | ||
|
||
@Override | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #20438 (comment) . In this PR I just fixed the returning null issue for
getStruct
andgetInterval
, because they are non-abstract. There should be a follow up to clearly document thatColumnVector.getBinary/getUTF8String/...
should return null if this slot is null. Then we can remove these null checks here. I appreciate it if you have time to take this :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to prepare a PR tonight.