-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4598] use pagination to show tasktable #3456
Conversation
Can one of the admins verify this patch? |
Can you profile what is causing a lot of memory? Pagination will break all of our sorting capabilities in the UI, so I'd like to avoid doing that as long as possilbe. |
Patrick's point is that you can no longer simply sort on the client browser side, but must reload the page to sort. I think it's a fair question still -- why would 200K smallish objects run out of memory? it might be worth a very quick heap dump to understand what takes all of that memory. There may be a quite simple solution to reduce it. |
Sorry I still don't see why this doesn't break sorting. Let's say I want to answer "which task as the longest GC time in this page?" If there is pagination there is no way to globally sort this unless we push ordering logic into the server side rendering of the page (which this patch doesn't do). With this patch I can only sort locally within the sub-range defined by the page boundaries. The second thing is, it would be good to run a heap dump and specifically see why we OOM when rendering a large page like this. We already have to keep around O(tasks) state in memory, so I don't see why rendering should require excessively more memory. |
I don't think it's a good idea that we lose the way to sort tasks globally by other than launch time. |
I commented over on JIRA, but just to recap here: the high memory usage is due to the |
val tasks = stageData.taskData.values.toSeq.sortBy(_.taskInfo.launchTime) From the code above, we can see "tasks" will sort all tasks of the application, and the "showTasks" is the sub-range of "tasks". So the "showTasks" has sorted all tasks not sub-range. As JoshRosen says, the large NodeSql takes most memory, so we just need to reduce the node number of html. With pagination, we can reduce the node to a low number. So I still think pagination is a reasonable solution. |
@XuTingjun currently the UI supports sorting globally by any field. This would break with the current patch. |
I am sorry I did not take this into consideration. Of this, I think the application table in HistoryServer web also doesn't support sorting globally by any field. Is it a bug? |
Are you sure it doesn't support this? It is supported by clicking headers in rendered table. |
It supports clicking headers in rendered table, but just order one page applications, not global. |
Let's close this issue. This breaks global pagination which means it can't be merged. |
When the application has too many tasks, tasktable with all tasks costs a lot of memory. If using pagination, every time tasktable shows some tasks. So this can reduce the memory usage