-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Semi AutoParall] Support Partial Semantic I #55508
Conversation
… semi-auto/rule-base
你的PR提交成功,感谢你对开源项目的贡献! |
// partial map would be small (less than mesh.size) | ||
// iterate operation (copy and comparision) would more frequency than random | ||
// element access. <key: dim on mesh, value: partial object> | ||
paddle::flat_hash_map<int64_t, _Partial_> partial_status_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a map structure here? If the "dim_" in Partial indicates the mesh dim, it seems unnecessary to store another mesh dim. In addition, if one tensor has only one reduce type, is it better to use a data structure like:
Partial {
vector<int64_t> mesh_dims;
ReduceType type_;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct !
firstly I thought we would use 『set』 for partial_status_, so build the Partial struct.
then I found『map』would be better for partial_status_, in most of use cases we use dim as key to retrieve Partial .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MIN, | ||
PRODUCT, | ||
ANY, | ||
ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL means?
PR types
New features
PR changes
Others
Description
Pcard-70448
Background
![image](https://private-user-images.githubusercontent.com/38102074/255846719-928c28ee-e2a6-45a5-8d1e-4c1767c24331.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODI5ODAsIm5iZiI6MTczOTU4MjY4MCwicGF0aCI6Ii8zODEwMjA3NC8yNTU4NDY3MTktOTI4YzI4ZWUtZTJhNi00NWE1LThkMWUtNGMxNzY3YzI0MzMxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAxMjQ0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNiZDBjMTFkY2Q4N2RlMDAyNGU1MjAwNzk2MDg4OGE5NmZlMmE1NGM4ODliOTc0MDg0NjA0MDhiZmRhODFjODkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.J0_Q0xRboXCnbsx5Tq5g9kCeRre44i2dP_jtSK6DabY)
"Partial" is a data distribution type for tensor like "Replicated" and "Sharded".
A tensor is Partial means that: its shape is the same among ranks, but the local element value in each rank is only a partial of the global value, and reduced (sum/max/min/all/any) op over ranks is need to rebuild the global tensor from the locals.
Motivation
Before the PR, dist_tensor in auto parallel only has two type: "Replicated" and "Sharded". And it work quite well for most hybrid parallelism scenario. It is just a design choice to introduce a third distribution type "Partial".
HOW
The Introducing of Partial is divided into two Stages.
First Stage(This PR):
Second Stage(Future PR):