-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize trace hang && fix event leak #58707
optimize trace hang && fix event leak #58707
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
// convert vector to string, concatenate continuous intervals with `:`, | ||
// concatenate discontinuous intervals with `#` eg: [1,2,3,4,5,7,8,9] => | ||
// 1:3#4#5#7:9 | ||
inline std::string VectorToString(const std::vector<int>& vec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test of it.
",seq:" + std::to_string(seq_) + | ||
",started:" + std::to_string(IsStarted()) + | ||
",completed:" + std::to_string(IsCompleted()) + | ||
auto global_ranks = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test of it and test the limit of the Msg length.
@@ -484,6 +484,7 @@ class ProcessGroup { | |||
} | |||
|
|||
protected: | |||
int global_rank_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int global_rank_{-1};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int global_rank_{-1};
fixed
@@ -860,6 +862,44 @@ void ProcessGroupNCCL::CreateNCCLEnvCache(const Place& place, | |||
auto comm_ctx = std::make_unique<phi::GPUContext>(place); | |||
comm_ctx->set_nccl_comm(nccl_comm); | |||
|
|||
// gather global ranks in current group | |||
int* gpu_global_rank = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Use tensor instead of raw data since cuda memory APIs are not efficient?
- If use raw data, check the result of CUDA API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Use tensor instead of raw data since cuda memory APIs are not efficient?
- If use raw data, check the result of CUDA API.
fixed
remove useless log
d96f29b
to
0d8bf53
Compare
0a49939
to
5171f79
Compare
5171f79
to
a3a9e47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* add comm async trace module, (#56916) * Fix trace hang (#57536) * fix trace hang * fix compile error * fix code style * tinyfix * tiny update * fix code style --------- Co-authored-by: ForFishes <[email protected]> * Fix nccl trace (#58338) * fix nccl_async_trace destruct problem when train finished * update * format code style * optimize trace hang && fix event leak (#58707) * update * fix compile problems * fix code style * fix logging * fix code style * remove useless * add ut && tinyfix * opt cudaMalloc and cudaMemcpy update * tinyfix --------- Co-authored-by: ForFishes <[email protected]>
PR types
Others
PR changes
Others
Description