-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5231][WebUI] History Server shows wrong job submission time. #4029
Conversation
Test build #25490 has finished for PR 4029 at commit
|
It looks like this failed due to a test compilation error:
To prevent these sorts of issues, you can run |
Thanks, I just did |
Test build #25516 has finished for PR 4029 at commit
|
Test build #25527 has finished for PR 4029 at commit
|
This is a nice patch, but I wonder whether there's a smaller fix that doesn't require changing SparkListener events; that will make it easier to backport that patch to So, I guess the approach here seems like the right fix. I'd guess we might be able to do a separate fix in branch-1.2 to use the first/last stage time approximations. I have a couple of comments on the code here, so I'll comment on those inline. |
@@ -58,6 +58,7 @@ case class SparkListenerTaskEnd( | |||
@DeveloperApi | |||
case class SparkListenerJobStart( | |||
jobId: Int, | |||
time: Option[Long], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is an option for backwards-compatibility reasons? We definitely know the time when posting this event to the listener bus, so I think the right approach is to have time just be a regular Long
and pass a dummy value (-1
) when replaying JSON that's missing that field.
…SparkListenerJobEnd as regular Long Added a test case for checking backward compatibility
@JoshRosen Thanks for your advises. I reflected your comment and added a test case. |
Test build #25569 has finished for PR 4029 at commit
|
Test build #25627 has finished for PR 4029 at commit
|
@sarutak @JoshRosen I would prefer to use the same solution for both master and 1.2 if possible, so if something goes wrong we only need to reason about one fix. Since we should not push this PR's current changes to branch-1.2 (it will break upgrades for 1.2.0 users), I would actually rather not fix this in branch-1.2 if we're not using the approximation solution. In general it would be good to minimize the complexity that arises from the divergence of behavior across branches. |
The changes here actually LGTM, so I will merge this in master. Thanks @sarutak! |
If the only backwards incompatibility is via pattern matching on listener events, can we maybe just make the new field a mutable var that's set outside of the constructor? This should avoid that incompatibility even if it's a little messy. Sent from my phone
|
@JoshRosen that could work. If we do decide to do it we must absolutely make sure that we don't break anything for 1.2.0 users, since compatibility in a patch release even for a developer API is expected by most users. |
@@ -32,6 +32,7 @@ import org.apache.spark.executor._ | |||
import org.apache.spark.scheduler._ | |||
import org.apache.spark.storage._ | |||
import org.apache.spark._ | |||
import org.apache.hadoop.hdfs.web.JsonUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this breaks Mapr3 builds. <hadoop.version>1.0.3-mapr-3.0.3</hadoop.version>
The version of mapr3 should be upgraded then to 1.0.3-mapr-3.1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this import is even used anywhere, so I think we can probably remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshRosen you're right. Is this something you're looking after or do you want me to raise a PR? Should I make a JIRA? Its breaking my mapr3 build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file a JIRA, since that helps us to track and prioritize bugs. Feel free to open a pull request if removing this import fixes your build.
Simple PR to Remove unused import JsonUtil from from org.apache.spark.util.JsonProtocol.scala which fails builds with older versions of hadoop-core This import is unused. It was introduced in PR #4029 #4029 as a part of JIRA SPARK-5231 Author: nemccarthy <[email protected]> Closes #4320 from nemccarthy/master and squashes the following commits: 8e34a11 [nemccarthy] [SPARK-5543][WebUI] Remove unused import JsonUtil from from org.apache.spark.util.JsonProtocol.scala which fails builds with older versions of hadoop-core
History Server doesn't show collect job submission time.
It's because
JobProgressListener
updates job submission time every timeonJobStart
method is invoked fromReplayListenerBus
.