Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet Executor] Construct runtime graph #37158

Merged
merged 2 commits into from
Nov 17, 2021

Conversation

LiYuRio
Copy link
Contributor

@LiYuRio LiYuRio commented Nov 12, 2021

PR types

New Features

PR changes

Others

Describe

创建运行时图

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@FeixLiu
Copy link
Contributor

FeixLiu commented Nov 15, 2021

在关键的地方加一写VLOG(3)的输出用来debug吧。比如推导依赖的部分,比如interceptor_id 与 task_id 、rank等映射的部分。

@@ -24,4 +24,7 @@ message FleetExecutorDesc {
optional string grain = 1 [ default = "coarse" ];
optional int64 cur_rank = 2 [ default = 0 ]; // Rank id of current processor
repeated RankInfo cluster_info = 3;
optional int32 dp_degree = 4 [ default = 1 ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面复用distributed_strategy是不是更好些,可能还会有sharding_degree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为distributed_strategy.proto在framework目录下,和这个proto不在一个文件夹,在当前文件夹下的CMakeList里调用generic.cmake里定义的proto_library函数,会将protobuf的搜索路径设置为当前文件夹,同时protobuf的import不支持相对路径,所以暂时没想到怎么直接引用distributed_strategy.proto里的定义。

int32_t pp_indice = rank % pp_degree;
rank /= mp_degree;
int32_t dp_indice = rank % dp_degree;
return {dp_indice, pp_indice, mp_indice};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dp、pp、mp以后的顺序可能会变

return {dp_indice, pp_indice, mp_indice};
}

int64_t PPUpstreamRank(int64_t dp_degree, int64_t pp_degree, int64_t mp_degree,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议把dp_degree、pp_degree、mp_degree这几个封装为一个结构体,当做笛卡尔坐标系,然后加上进程rank号和笛卡尔坐标系的相互转换,可能简洁一点点。然后顺序的问题加个映射也很容易解决
{x, y, z} = rank2coord(pid);
left_x = (x - 1 + xranks) % xranks; left_rank = coord2rank({left_x, y, z})

paddle/fluid/distributed/fleet_executor/runtime_graph.cc Outdated Show resolved Hide resolved
@FeixLiu FeixLiu requested a review from wangxicoding November 16, 2021 10:53
Copy link
Contributor

@wangxicoding wangxicoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FeixLiu FeixLiu merged commit 0daa69d into PaddlePaddle:develop Nov 17, 2021
@LiYuRio LiYuRio deleted the runtime_graph branch November 17, 2021 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants