-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4109][CORE] Correctly deserialize Task.stageId #2971
Conversation
Can one of the admins verify this patch? |
@@ -89,13 +89,13 @@ private[spark] object ResultTask { | |||
* input RDD's partitions). | |||
*/ | |||
private[spark] class ResultTask[T, U]( | |||
stageId: Int, | |||
_stageId: Int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, does this need to be renamed? it's just a constructor parameter. Later the assignment to stageId
is already setting the parent value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't actually need to rename this. scala will handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming is better, otherwise simply by accessing stageId you could upgrade it to a field accidentally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala is smart enough to not to do that actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually hold on - I tested it again and the compiler is not doing the right thing. Let me verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at here: https://gist.github.com/rxin/6be132f46b72c27d8f89
I think if the parent constructor declares the field as val or var, then it is fine. Otherwise it is not. So technically in this case it should be fine to not rename, but it is better to rename it explicitly just to avoid the compiler behavior change (since this might be a corner case) in the future.
add to whitelist |
Test build #22370 has started for PR 2971 at commit
|
Test build #22370 has finished for PR 2971 at commit
|
Test FAILed. |
@@ -128,7 +128,7 @@ private[spark] class ShuffleMapTask( | |||
} | |||
|
|||
override def readExternal(in: ObjectInput) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you're modifying this code, mind tossing a Utils.tryOrIOException
here so that any errors that occur here are reported properly? See #2932 for explanation / context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I missed this case in my PR.
Jenkins, retest this please. |
Test build #22689 has started for PR 2971 at commit
|
Test build #22689 has finished for PR 2971 at commit
|
Test FAILed. |
Jenkins, retest this please. |
Test build #22702 has started for PR 2971 at commit
|
Test build #22702 has finished for PR 2971 at commit
|
Test FAILed. |
Jenkins, retest this please. |
Test build #22740 has started for PR 2971 at commit
|
Test build #22740 has finished for PR 2971 at commit
|
Test FAILed. |
Jenkins, retest this please. |
Test build #22751 has started for PR 2971 at commit
|
Test build #22751 has finished for PR 2971 at commit
|
Test FAILed. |
cc @marmbrus @liancheng any idea why the unit test for Python is failing? Are there some unit tests testing the stageId? |
Hmmm, not sure what is going on here, especially since other SQL PRs seem to be passing at the moment. @davies any idea? |
@marmbrus This PR is targeted for 1.0, but PySpark SQL 1.0 can not passed the tests in Python 2.6 (Pyrolite can not deserialize of array.array from Python 2.6). We should change the default Python to 2.7 for testing spark 1.0 branch. It's not related to this PR. |
Ok I'm going to merge this and patch it for master as well. |
The two subclasses of Task, ShuffleMapTask and ResultTask, do not correctly deserialize stageId. Therefore, the accessing of TaskContext.stageId always returns zero value to the user. Author: luluorta <[email protected]> Closes #2971 from luluorta/fix-task-ser and squashes the following commits: ff35ee6 [luluorta] correctly deserialize Task.stageId
Actually this is only relevant for the old branches. In master we no longer do special serialization. |
The two subclasses of Task, ShuffleMapTask and ResultTask, do not correctly deserialize stageId. Therefore, the accessing of TaskContext.stageId always returns zero value to the user.