-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage. #4043
SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage. #4043
Conversation
Test build #25544 has started for PR 4043 at commit
|
d335f8f
to
f6dc584
Compare
Test build #25545 has started for PR 4043 at commit
|
Test build #25544 has finished for PR 4043 at commit
|
Test PASSed. |
Test build #25545 has finished for PR 4043 at commit
|
Test PASSed. |
lgtm. I was going to suggest that pending stages should be sorted with oldest submission time first, not reversed ... but I guess we want the completed stages sorted with oldest last, and probably makes sense to keep those tables consistent with each other. |
Hi Imran, Thanks for taking a look. @pwendell please take a look ? |
I'm not sure this can be merged as-is. The state clean-up here is based on the assumption that every stage that is pending will at some later time be submitted. Is that definitely true? What happens if a stage is aborted or failed, won't its dependent stages remain pending indefinitely for the entire history of the application? I think at a minimum you need to make sure you remove all associated stages with a given job if the job ends. There may also be other corner cases I'm not thinking of. A second assumption this makes is that the job start event will always occur before the stage is submitted. Is that definitely true? It would be good to dig through the reporting API and make sure that is a safe assumption. I think the guarantees of the listener around event ordering are pretty minimal. Finally, what about putting pending stages after active stages in the display page? My concern is you may have dozens or more pending stages in production jobs, I think people will be frustrated if they open the UI and they have to scroll way down every time they need to e.g. refresh the page. It's a big odd to go (active -> pending -> completed), but I think better for usability. |
Thanks Patrick !
If I am understanding this correctly, "All stages for a particular job Id on jobEnd event - should be cleaned up". I am certainly missing something, is this not already achieved by this ?
This can lead to an anomaly that a stage will appear in progress in both active stages section and pending section. But it will still be obvious that - that stage is in progress. Is this safe to ignore, or is it a concern that should be addressed at the cost of minute complexity ?
This is thoughtful, I am going to incorporate this asap. [EDIT] |
Ah I see - so on the first point, the issue may be covered by the code you referenced. For some reason the diff originally rendered in a way where I didn't notice that. For the second issue. My concern is that this line could throw an exception: I.e. if you had a stage submitted, but you never received the job started event. It could be that you can assume it, but it would be good to just note the ordering assumptions around the listener events on that line, so that if it gets changed later and someone gets a no-such-elemnt-exception. It's possible to debug it. |
Good point, in map in scala remove returns an Option and does not throw exception. However in lists what you said holds. See https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/mutable/LinkedHashMap.scala#L76 |
Test build #25644 has started for PR 4043 at commit
|
@ScrapCodes Rather than change this to +LinkedHashMap+ can you just check if it contains it before removing it? It might not be obvious to developers that +remove+ has that specific behavior. I think it's better to just be explicit. |
LinkedHashMap was introduced to maintain the insertion order of stagesIds(Hasmap will lead to arbitrary order), all map(s) in scala has return an |
ah I see - if the existing remove call is safe, then I think it's fine. |
val now = System.currentTimeMillis | ||
|
||
val activeStagesTable = | ||
new StageTableBase(activeStages.sortBy(_.submissionTime).reverse, | ||
parent.basePath, parent.listener, isFairScheduler = parent.isFairScheduler, | ||
killEnabled = parent.killEnabled) | ||
val pendingStagesTable = | ||
new StageTableBase(pendingStages.sortBy(_.submissionTime), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwendell Can I use Sorting.stableSort here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just keep the sorting the same as it was? I think the submission time is unlikely to be tied in most cases. It would be good to just make it consistent with the existing ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about usability, like the most interesting stage which is next to be executed will appear at last. Anyway will change it.
Test build #25644 has finished for PR 4043 at commit
|
Test PASSed. |
@@ -37,12 +37,18 @@ private[ui] class AllStagesPage(parent: StagesTab) extends WebUIPage("") { | |||
val numCompletedStages = listener.numCompletedStages | |||
val failedStages = listener.failedStages.reverse.toSeq | |||
val numFailedStages = listener.numFailedStages | |||
val pendingStages = listener.pendingStages.values.toSeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this up with activeStages
to make the declarations grouped properly?
Test build #25656 has started for PR 4043 at commit
|
Test build #25656 has finished for PR 4043 at commit
|
Test PASSed. |
@ScrapCodes mind bringing up to date? The current form LGTM |
… on AllStagesPage.
…pleted->failed. And changed pending stages to not reverse sort.
15cdda4
to
3b11803
Compare
Test build #25748 has started for PR 4043 at commit
|
@pwendell - patch updated to latest master. |
Test build #25748 has finished for PR 4043 at commit
|
Test PASSed. |
This is a first step towards having time remaining estimates for queued and running jobs. See SPARK-5216