-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray] Implement gc for ray task executor context #3061
[Ray] Implement gc for ray task executor context #3061
Conversation
Co-authored-by: Liu Bao <[email protected]>
…into feat-ray-executor-cancel
Could you explain a bit about |
Currently, Ray executor holds all the object refs of a stage. This PR is to free the unused object refs as soon as possible to reduce the memory peak. We will update the description of this PR. |
total = len(subtask_graph) | ||
completed_subtasks = OrderedSet() | ||
|
||
deletion_seq = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the deletion_seq
, a mock _task_context
can be used for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reminder. I'll fix it.
|
||
@require_ray | ||
@pytest.mark.asyncio | ||
async def test_detail_context_gc(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case can be moved to mars/services/task/execution/ray/tests/test_ray_execution_backend.py because no deployment is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
may be deleted. | ||
""" | ||
i = 0 | ||
garbaged_subtasks = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
garbage
is a noun, please rename it to gc_subtasks
or collected_subtasks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reminder.
): | ||
yield | ||
if pred in garbaged_subtasks: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking subtask is collected should be moved to the beginning of the loop for pred in subtask_graph.iter_predecessors(subtask)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks.
e139f12
to
3f89bd0
Compare
3f89bd0
to
b6e7b69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(cherry picked from commit 505361b)
…3116) Co-authored-by: bermaker <[email protected]>
What do these changes do?
Ray executor holds all the object refs of a stage which is unnecessary. This PR is to free the unused object refs as soon as possible to reduce the memory peak and object store stress.
This PR includes the following parts:
subtask_monitor_interval
forRayExecutionConfig
which can be set as:Related issue number
#2893
Check code requirements