Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4598] use pagination to show tasktable #3456

Closed
wants to merge 1 commit into from
Closed

[SPARK-4598] use pagination to show tasktable #3456

wants to merge 1 commit into from

Conversation

XuTingjun
Copy link
Contributor

When the application has too many tasks, tasktable with all tasks costs a lot of memory. If using pagination, every time tasktable shows some tasks. So this can reduce the memory usage

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Can you profile what is causing a lot of memory? Pagination will break all of our sorting capabilities in the UI, so I'd like to avoid doing that as long as possilbe.

@XuTingjun
Copy link
Contributor Author

First, I found showing tasks on web causing a lot of memory. No pagination, 50000 tasks will occurs OOM. With pagination, 200000 tasks can successfully show in web. I have tested it.
Second, sotring is before pagination, so pagination will not break all of our sorting capabilities in the UI.
stage

@srowen
Copy link
Member

srowen commented Nov 28, 2014

Patrick's point is that you can no longer simply sort on the client browser side, but must reload the page to sort. I think it's a fair question still -- why would 200K smallish objects run out of memory? it might be worth a very quick heap dump to understand what takes all of that memory. There may be a quite simple solution to reduce it.

@pwendell
Copy link
Contributor

Sorry I still don't see why this doesn't break sorting. Let's say I want to answer "which task as the longest GC time in this page?" If there is pagination there is no way to globally sort this unless we push ordering logic into the server side rendering of the page (which this patch doesn't do). With this patch I can only sort locally within the sub-range defined by the page boundaries.

The second thing is, it would be good to run a heap dump and specifically see why we OOM when rendering a large page like this. We already have to keep around O(tasks) state in memory, so I don't see why rendering should require excessively more memory.

@tsudukim
Copy link
Contributor

I don't think it's a good idea that we lose the way to sort tasks globally by other than launch time.
About OOM, I wrote something to the JIRA ticket.

@JoshRosen
Copy link
Contributor

I commented over on JIRA, but just to recap here: the high memory usage is due to the scala.xml NodeSeq corresponding to the Stage page and the string containing the generated markup. The stage page, uncompressed, was over 75MB for a stage with tens of thousands of tasks and the NodeSeq was ~200MB.

@XuTingjun
Copy link
Contributor Author

val tasks = stageData.taskData.values.toSeq.sortBy(_.taskInfo.launchTime)
val showTasks = tasks.slice(actualFirst, Math.min(actualFirst + pageSize, tasks.size))

From the code above, we can see "tasks" will sort all tasks of the application, and the "showTasks" is the sub-range of "tasks". So the "showTasks" has sorted all tasks not sub-range.

As JoshRosen says, the large NodeSql takes most memory, so we just need to reduce the node number of html. With pagination, we can reduce the node to a low number. So I still think pagination is a reasonable solution.

@pwendell
Copy link
Contributor

@XuTingjun currently the UI supports sorting globally by any field. This would break with the current patch.

@XuTingjun
Copy link
Contributor Author

I am sorry I did not take this into consideration. Of this, I think the application table in HistoryServer web also doesn't support sorting globally by any field. Is it a bug?

@pwendell
Copy link
Contributor

pwendell commented Dec 1, 2014

Are you sure it doesn't support this? It is supported by clicking headers in rendered table.

@XuTingjun
Copy link
Contributor Author

It supports clicking headers in rendered table, but just order one page applications, not global.
I think it's the feature of HTML "th" label.

@pwendell
Copy link
Contributor

Let's close this issue. This breaks global pagination which means it can't be merged.

@asfgit asfgit closed this in 534f24b Dec 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants