-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet Executor] Construct runtime graph #37158
Conversation
Thanks for your contribution! |
d3910f3
to
c7b7dca
Compare
91ed45e
to
26c9f7b
Compare
在关键的地方加一写VLOG(3)的输出用来debug吧。比如推导依赖的部分,比如interceptor_id 与 task_id 、rank等映射的部分。 |
python/paddle/fluid/tests/unittests/test_fleet_executor_multi_devices.py
Outdated
Show resolved
Hide resolved
26c9f7b
to
b4e48fd
Compare
b4e48fd
to
1bdb86e
Compare
1bdb86e
to
451a1ac
Compare
@@ -24,4 +24,7 @@ message FleetExecutorDesc { | |||
optional string grain = 1 [ default = "coarse" ]; | |||
optional int64 cur_rank = 2 [ default = 0 ]; // Rank id of current processor | |||
repeated RankInfo cluster_info = 3; | |||
optional int32 dp_degree = 4 [ default = 1 ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面复用distributed_strategy是不是更好些,可能还会有sharding_degree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为distributed_strategy.proto在framework目录下,和这个proto不在一个文件夹,在当前文件夹下的CMakeList里调用generic.cmake里定义的proto_library函数,会将protobuf的搜索路径设置为当前文件夹,同时protobuf的import不支持相对路径,所以暂时没想到怎么直接引用distributed_strategy.proto里的定义。
int32_t pp_indice = rank % pp_degree; | ||
rank /= mp_degree; | ||
int32_t dp_indice = rank % dp_degree; | ||
return {dp_indice, pp_indice, mp_indice}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dp、pp、mp以后的顺序可能会变
return {dp_indice, pp_indice, mp_indice}; | ||
} | ||
|
||
int64_t PPUpstreamRank(int64_t dp_degree, int64_t pp_degree, int64_t mp_degree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议把dp_degree、pp_degree、mp_degree这几个封装为一个结构体,当做笛卡尔坐标系,然后加上进程rank号和笛卡尔坐标系的相互转换,可能简洁一点点。然后顺序的问题加个映射也很容易解决
{x, y, z} = rank2coord(pid);
left_x = (x - 1 + xranks) % xranks; left_rank = coord2rank({left_x, y, z})
2a44f1f
to
849eb85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New Features
PR changes
Others
Describe
创建运行时图