Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray] Optimize ray executor submit subtask #3271

Merged

Conversation

fyrestone
Copy link
Contributor

@fyrestone fyrestone commented Sep 27, 2022

What do these changes do?

There is a set intersection logic in Ray executor to determine whether the subtask result chunks are the stage results. It could be very slow when the stage has too many chunks and subtasks, e.g. a stage with 100000 result chunks and 50000 subtasks. This PR reuses the info generated in constructing a subtask to avoid set intersecion.

Also, Fix Ray executor track bug.

Test code

join_key_range = 100
data_size = 5000000
chunk_size = 100

mars.new_session(backend="ray")

df1 = md.DataFrame(
    mt.random.randint(
        1, join_key_range + 1, size=(data_size, 10), chunk_size=(chunk_size, 5)
    ),
    columns=list("ABCDEFGHIJ"),
)

start_time = time.time()
result = df1.execute()
end_time = time.time()
print(f"Finished in {end_time - start_time} seconds and execution result: {result}")

This case submit 50000 subtasks in a stage.

  • total time, before: 211.34s, after: 87.67s
  • submit subtask time, before: 153s, after: 35s

Related issue number

Fixes #xxxx

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@fyrestone fyrestone self-assigned this Sep 27, 2022
@fyrestone fyrestone changed the title Optimize ray executor submit subtask [Ray] Optimize ray executor submit subtask Sep 27, 2022
@fyrestone fyrestone marked this pull request as ready for review September 30, 2022 08:26
@fyrestone fyrestone requested a review from a team as a code owner September 30, 2022 08:26
Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chaokunyang chaokunyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaokunyang chaokunyang merged commit 6e2f7c9 into mars-project:master Oct 11, 2022
qianduoduo0904 pushed a commit to qianduoduo0904/mars that referenced this pull request Oct 13, 2022
* Optimize Ray executor submit subtask

* Pin pandas<1.5.0

* Try to fix CI

* Try to fix CI

* Print stderr of asv benchmark

* Fix

* Fix Ray executor track bug

* Fix

* Fix

* Fix

* Remove asv benchmark

* Improve coverage

* Refine comments

* Improve coverage

Co-authored-by: 刘宝 <[email protected]>
(cherry picked from commit 6e2f7c9)
qianduoduo0904 pushed a commit to qianduoduo0904/mars that referenced this pull request Oct 13, 2022
* Optimize Ray executor submit subtask

* Pin pandas<1.5.0

* Try to fix CI

* Try to fix CI

* Print stderr of asv benchmark

* Fix

* Fix Ray executor track bug

* Fix

* Fix

* Fix

* Remove asv benchmark

* Improve coverage

* Refine comments

* Improve coverage

Co-authored-by: 刘宝 <[email protected]>
(cherry picked from commit 6e2f7c9)
aresnow1 pushed a commit to xorbitsai/mars that referenced this pull request Oct 13, 2022
* Optimize Ray executor submit subtask

* Pin pandas<1.5.0

* Try to fix CI

* Try to fix CI

* Print stderr of asv benchmark

* Fix

* Fix Ray executor track bug

* Fix

* Fix

* Fix

* Remove asv benchmark

* Improve coverage

* Refine comments

* Improve coverage

Co-authored-by: 刘宝 <[email protected]>
(cherry picked from commit 6e2f7c9)
qianduoduo0904 pushed a commit to qianduoduo0904/mars that referenced this pull request Oct 24, 2022
* Optimize Ray executor submit subtask

* Pin pandas<1.5.0

* Try to fix CI

* Try to fix CI

* Print stderr of asv benchmark

* Fix

* Fix Ray executor track bug

* Fix

* Fix

* Fix

* Remove asv benchmark

* Improve coverage

* Refine comments

* Improve coverage

Co-authored-by: 刘宝 <[email protected]>
(cherry picked from commit 6e2f7c9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants