[Semi-auto] add elementwise spmd rule for auto parallel #54373

pkuzyc · 2023-06-06T03:16:56Z

PR types

New features

PR changes

Others

Description

Pcard-70448
Add elementwise ops' spmd rule for inferring distributed attributes. Implement the InferForward function for elementwise op, i.e. infer output tensor's distributed attributes from input tensors'.

paddle-bot · 2023-06-06T03:17:00Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JZ-LIANG · 2023-06-07T07:50:13Z

python/paddle/distributed/auto_parallel/static/utils.py

+      op_desc = dist_op.serial_op.desc
+      input_name_list = []
+      output_name_list = []
+      input_name_list.append(op_desc.input('X')[0]) # 'X' is the arg name for op


it would be better that the wrap could take op as the only argument and user not need to border the input_name_list/output_name_list construction.

in order to achieve that wrap need to maintain the order of op argument slot from Phi API.

JZ-LIANG · 2023-06-07T07:52:16Z

python/paddle/distributed/auto_parallel/static/utils.py

+
+    # Construct each input tensor's DistTensorSpec with shape and dist_attr
+    for name in input_names:
+        tensor_dist_attr = dist_op.dist_attr.get_input_dist_attr(name)


in static mode, the dist attr of op is the destination dist attr.
here should use the source dist attr which is hold by tensor.

paddle-ci-bot · 2023-06-14T03:16:56Z

Sorry to inform you that 325a998's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

JZ-LIANG · 2023-07-03T06:18:43Z

paddle/fluid/distributed/auto_parallel/spmd_rules/common.cc

@@ -39,7 +39,7 @@ SPMDRuleBase::InferBackward(const std::vector<DistTensorSpec>& output_specs,
 }

 std::unordered_map<std::string, int64_t> ShardingMergeForTensors(
-    const std::vector<std::pair<const std::string, const std::vector<int64_t>>>&


why remove the "const" if you wouldn't modify the input argument ?

The "const" decorating vector forbids the modification of the input argument, so no need to add "const" inside the "pair".

JZ-LIANG · 2023-07-03T06:26:33Z

paddle/fluid/distributed/auto_parallel/spmd_rules/rules.h

@@ -24,6 +25,7 @@ namespace auto_parallel {

 // matmul rule
 REGISTER_SPMD_RULE(matmul, MatmulSPMDRule);
+REGISTER_SPMD_RULE(elementwise, ElementwiseSPMDRule);


the register name should be op_name, like: rule, elementwise_add, elementwise_div, elementwise_max, etc

to show the mapping of op_name to spmd_rule explicitly.

JZ-LIANG · 2023-07-03T06:48:44Z

paddle/fluid/distributed/auto_parallel/spmd_rules/common.h

+
+// Get dimsmapping for the given tensors. Return the pair of each
+// tensor's einsum notation and the corresponding dimsmapping.
+std::vector<std::pair<std::string, std::vector<int64_t>>> GetAxesShardingInfo(


"GetAxesShardingInfo" is kind of Ambiguous, what about something like "GetAxesMappingsPair"

Modified to "GetAxesDimsMappingPair".

JZ-LIANG · 2023-07-03T06:55:36Z

paddle/fluid/distributed/auto_parallel/spmd_rules/elementwise_spmd_rule.cc

+
+  // step2.4: handle partial
+  // Step2.3.1 Output Partial
+  std::vector<int64_t> partial_on_dims =


elementwise logic would not genenate partial, but it is ok here.

JZ-LIANG · 2023-07-03T06:57:06Z

paddle/fluid/distributed/auto_parallel/spmd_rules/elementwise_spmd_rule.cc

+
+  // Step2.3.2  handle input tensor partial (TODO)
+  VLOG(4) << "ElementwiseSPMDRule InferForward: "
+          << " Output dims_mapping: [" << str_join(output_dims_mapping)


input tensor might be reshard, therefore the src_dims_mapping and dst_dims_mapping of input tensor should also be logged.

JZ-LIANG · 2023-07-03T06:57:50Z

paddle/fluid/distributed/auto_parallel/spmd_rules/elementwise_spmd_rule.cc

+    const std::vector<DistTensorSpec>& output_specs,
+    const paddle::framework::AttributeMap& attrs) {
+  PADDLE_THROW(phi::errors::Unimplemented(
+      "InferBackward of MatmulSPMDRule is NOT implemented yet."));


MatmulSPMDRule --> ElementwiseSPMDRule

JZ-LIANG · 2023-07-03T07:02:51Z

paddle/fluid/distributed/auto_parallel/spmd_rules/elementwise_spmd_rule.cc

+        } else if (shape[idim - start_dim] == 1) {
+          broadcast_axis_count[idim] += 1;
+          // mark the broadcast axis to a special "1"
+          axes_notation[idim - start_dim] = '1';


"1" concept is not need for sharding merge.

since we assume that the income mapping is correct for spmd rule, if any tensor axis's size is "1", and the dim_mapping for this axis should be "-1" correspondingly. and if all inputs' dim_mappings are "-1", it merge to "-1" with no doubt.

but it is ok here.

Here "1" is a special label for broadcasting dim, with this label broadcasting case can handled with the same function as common cases.

JZ-LIANG · 2023-07-03T07:03:57Z

paddle/fluid/distributed/auto_parallel/test/CMakeLists.txt

-cc_test_old(spmd_rule_test SRCS spmd_rule_test.cc DEPS spmd_rule)
+cc_test_old(spmd_rule_test SRCS spmd_rule_test.cc DEPS spmd_rules)
+
+cc_test_old(elementwise_spmd_rule_test SRCS ./elementwise_spmd_rule_test.cc


where is elementwise_spmd_rule_test

Added elementwise_spmd_rule_test.cc back, the test cases in elementwise_spmd_rule_test.cc is less than test_elementwise_rule.py

sorry，what i want to say is remove this line while not getting elementwise_spmd_rule_test.cc back.

python unitest is perfected.

JZ-LIANG · 2023-07-03T07:11:34Z

test/auto_parallel/spmd_rules/test_elementwise_rule.py

+        self.assertEqual(infered_input_dist_attrs[1].dims_mapping, [1])
+        self.assertEqual(infered_output_dist_attrs[0].dims_mapping, [0, -1, 1])
+
+        # [0, 1, -1], [0] --> [0, 1, -1], [-1], [0, 1, -1]


conflict fixing logic might change in future, but it is ok here.

JZ-LIANG · 2023-07-06T07:51:07Z

paddle/fluid/distributed/auto_parallel/test/CMakeLists.txt

-cc_test_old(spmd_rule_test SRCS spmd_rule_test.cc DEPS spmd_rule)
+cc_test_old(spmd_rule_test SRCS spmd_rule_test.cc DEPS spmd_rules)
+
+cc_test_old(elementwise_spmd_rule_test SRCS ./elementwise_spmd_rule_test.cc


sorry，what i want to say is remove this line while not getting elementwise_spmd_rule_test.cc back.

python unitest is perfected.

JZ-LIANG · 2023-07-06T07:56:21Z

paddle/fluid/distributed/auto_parallel/spmd_rules/elementwise_spmd_rule.cc

+
+  // step2.4: handle partial
+  // Step2.3.2  handle input tensor partial (TODO)
+  std::string log_str =


more detail info for log, in order to help debug.

JZ-LIANG

LGTM

…#54373) * add some basic functions * add elementwise rule for auto parallel * add unit test for elementwise rule * fix the lib name in spmd rule test cmake file * fix some bugs * add unit tests for elementwise spmd rule in python * bug fix * delete cpp unit test for elementwise spmd rule (use python ut now) * add cpp unit test for elementwise rule * use concrete op name in unit test * fix typo * fix code style * delete cpp unit test * add more details in log

JZ-LIANG reviewed Jun 7, 2023

View reviewed changes

pkuzyc force-pushed the elementwise_rule branch 2 times, most recently from 61cb076 to b97a6a1 Compare June 29, 2023 13:59

JZ-LIANG reviewed Jul 3, 2023

View reviewed changes

pkuzyc force-pushed the elementwise_rule branch 2 times, most recently from 5aecc33 to ffec781 Compare July 4, 2023 12:02

pkuzyc added 11 commits July 6, 2023 14:36

add some basic functions

4b5f8b4

add elementwise rule for auto parallel

2f31a76

add unit test for elementwise rule

802fe58

fix the lib name in spmd rule test cmake file

604ce05

fix some bugs

1726933

add unit tests for elementwise spmd rule in python

6b3a612

bug fix

57aefc1

delete cpp unit test for elementwise spmd rule (use python ut now)

901a0dc

add cpp unit test for elementwise rule

a80ffc5

use concrete op name in unit test

f6bd999

fix typo

2ff031c

pkuzyc force-pushed the elementwise_rule branch from ffec781 to 2ff031c Compare July 6, 2023 06:39

fix code style

f96403c

JZ-LIANG reviewed Jul 6, 2023

View reviewed changes

pkuzyc added 2 commits July 6, 2023 16:05

delete cpp unit test

7cde2e1

add more details in log

78ec3a3

JZ-LIANG approved these changes Jul 7, 2023

View reviewed changes

JZ-LIANG merged commit 8e5b0af into PaddlePaddle:develop Jul 7, 2023

pkuzyc deleted the elementwise_rule branch July 12, 2023 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Semi-auto] add elementwise spmd rule for auto parallel #54373

[Semi-auto] add elementwise spmd rule for auto parallel #54373

pkuzyc commented Jun 6, 2023 •

edited

Loading

paddle-bot bot commented Jun 6, 2023

JZ-LIANG Jun 7, 2023

JZ-LIANG Jun 7, 2023

paddle-ci-bot bot commented Jun 14, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 3, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 3, 2023

pkuzyc Jul 3, 2023

JZ-LIANG Jul 6, 2023

pkuzyc Jul 6, 2023

JZ-LIANG Jul 3, 2023

JZ-LIANG Jul 6, 2023

JZ-LIANG Jul 6, 2023

pkuzyc Jul 6, 2023

JZ-LIANG left a comment

[Semi-auto] add elementwise spmd rule for auto parallel #54373

[Semi-auto] add elementwise spmd rule for auto parallel #54373

Conversation

pkuzyc commented Jun 6, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jun 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paddle-ci-bot bot commented Jun 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG left a comment

Choose a reason for hiding this comment

pkuzyc commented Jun 6, 2023 •

edited

Loading