Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray] Implement gc for ray task executor context #3061

Merged
merged 41 commits into from
May 30, 2022

Conversation

zhongchun
Copy link
Contributor

@zhongchun zhongchun commented May 23, 2022

What do these changes do?

Ray executor holds all the object refs of a stage which is unnecessary. This PR is to free the unused object refs as soon as possible to reduce the memory peak and object store stress.

This PR includes the following parts:

  • Implements Ray task executor context gc.
  • Optimize task progress implementation.
  • Add exception and error handling
  • Add an option subtask_monitor_interval for RayExecutionConfig which can be set as:
task:
  execution_config:
    backend: ray
    ray:
      subtask_monitor_interval: 1.0

Related issue number

#2893

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@qinxuye qinxuye added the type: enhancement request label May 24, 2022
@qinxuye
Copy link
Collaborator

qinxuye commented May 25, 2022

Could you explain a bit about ray task executor context gc, I am not aware of what problem you are solving.

@fyrestone
Copy link
Contributor

Could you explain a bit about ray task executor context gc, I am not aware of what problem you are solving.

Currently, Ray executor holds all the object refs of a stage. This PR is to free the unused object refs as soon as possible to reduce the memory peak. We will update the description of this PR.

total = len(subtask_graph)
completed_subtasks = OrderedSet()

deletion_seq = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the deletion_seq, a mock _task_context can be used for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminder. I'll fix it.


@require_ray
@pytest.mark.asyncio
async def test_detail_context_gc():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case can be moved to mars/services/task/execution/ray/tests/test_ray_execution_backend.py because no deployment is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

may be deleted.
"""
i = 0
garbaged_subtasks = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garbage is a noun, please rename it to gc_subtasks or collected_subtasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminder.

):
yield
if pred in garbaged_subtasks:
continue
Copy link
Contributor

@fyrestone fyrestone May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking subtask is collected should be moved to the beginning of the loop for pred in subtask_graph.iter_predecessors(subtask).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks.

@zhongchun zhongchun force-pushed the feat-ray-executor-context-gc branch from e139f12 to 3f89bd0 Compare May 27, 2022 06:49
@zhongchun zhongchun force-pushed the feat-ray-executor-context-gc branch from 3f89bd0 to b6e7b69 Compare May 27, 2022 09:13
@qinxuye qinxuye changed the title [Ray] Implement ray task executor context gc [Ray] Implement gc for ray task executor context May 27, 2022
Copy link
Contributor

@fyrestone fyrestone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 505361b into mars-project:master May 30, 2022
qinxuye pushed a commit to qinxuye/mars that referenced this pull request Jun 6, 2022
wjsi pushed a commit that referenced this pull request Jun 6, 2022
@wjsi wjsi added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Jun 6, 2022
This was referenced Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants