-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-29221][SQL] LocalTableScanExec: handle the case where executors are accessing "null" rows #25913
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,7 +37,7 @@ case class LocalTableScanExec( | |
"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) | ||
|
||
@transient private lazy val unsafeRows: Array[InternalRow] = { | ||
if (rows.isEmpty) { | ||
if (rows == null || rows.isEmpty) { | ||
Array.empty | ||
} else { | ||
val proj = UnsafeProjection.create(output, output) | ||
|
@@ -59,7 +59,9 @@ case class LocalTableScanExec( | |
} | ||
|
||
override protected def stringArgs: Iterator[Any] = { | ||
if (rows.isEmpty) { | ||
if (rows == null) { | ||
Iterator("<unknown>", output) | ||
} else if (rows.isEmpty) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it ok to return different values in a driver and executors? How about computing this value in a driver then passing into executors? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The value except |
||
Iterator("<empty>", output) | ||
} else { | ||
Iterator(output) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot remove
@transient
instead of this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding transient is to avoid serializing and copying whole rows, so I guess it's preferred one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related commits are here:
256358f
f70f46d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I'm a bit worried that this value could be different between a driver and executors. Which code touches this variable in executors? What's the stacktrace for
unsafeRows
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, that's actually a kind of defensive programming, as it would bring NPE if there's no check for null. I haven't seen any actual stack trace to refer to
unsafeRows
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think
unsafeRows
is not expected to be called in executor sides, so we need to checkassert(rows != null)
instead of the current approach. If it fails then, we need to fix some code not to call this variable in executors.