[Auto Parallel] Sharding Pass #38502

JZ-LIANG · 2021-12-27T13:17:46Z

PR types

New features

PR changes

Others

Describe

sharding optimization pass for auto parallel
base framework for stage 1-2-3, more functions and example wait next pr

paddle-bot-old · 2021-12-27T13:18:04Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…rding-stage-1-2-3

sneaxiy · 2021-12-28T14:07:42Z

python/paddle/distributed/auto_parallel/parallelizer.py

+    def _apply_post_optimization_passed(self, main_program, startup_program,
+                                        rank, params_grads):
+
+        # apply amp forward pass


What is the comment for? TODO or wrong comment? The following code is sharding but not amp. If it is a TODO, try to add TODO(who is responsible TODO) at the beginning of this comment.

it is kind of a TODO that indicate the place where the amp pass will be in future. our final goal is that all optimization pass will be applied within that function after autoparallel-graph partition. we will have several update to achieve that goal.
the final order will be: graph_partition-amp-recompute-sharding-gradient_merge
but at this moment, we implement it as amp-recompute-graph_partition-sharding-gradient_merge

fixed~

sneaxiy · 2021-12-28T14:08:47Z

python/paddle/distributed/auto_parallel/parallelizer.py

+            auto_parallel_sharding_pass.apply(
+                [main_program], [startup_program], self._pass_context)
+
+        # apply recompute forward pass


Same above.

Same as above reply.

sneaxiy · 2021-12-28T14:09:24Z

python/paddle/distributed/auto_parallel/parallelizer.py

+
+        # apply recompute forward pass
+        if self._dist_strategy.gradient_merge:
+            pass


If this code is not implemented yet, try to remove it first.

sneaxiy · 2021-12-28T14:14:17Z

python/paddle/fluid/tests/unittests/distributed_passes/auto_parallel_pass_test_base.py

@@ -63,12 +63,12 @@ def apply_no_passes(self):

    def check_main(self, gpus=None, **kwargs):


I do not know why we really need a class which has so many duplicate codes with DistPassTestBase.

we want to rewrite the member functions: "_run_gpu_main": since we want the check how the pass co-operate with other auto_parallel logic(like graph partition), so we need to call this pass from fleet_base where will trigger both auto parallel and this pass. we could not re-use the _run_gpu_main in DistPassTestBase.

I will think a better plan for this problem in next pr

sneaxiy · 2021-12-28T14:20:13Z

python/paddle/distributed/passes/auto_parallel_sharding.py

+            process_mesh = dist_attr.process_mesh
+            input_dim_mapping = dist_attr.get_input_dims_mapping(input_name)
+            mesh_shape = process_mesh.topology
+            # TODO replace with specific batch size dimension


I suggest that each TODO comment should add who is responsible TODO.

sneaxiy · 2021-12-28T14:25:32Z

python/paddle/distributed/passes/auto_parallel_sharding.py

+    def _check_conflict(self, other_pass):
+        return True
+
+    def _apply_single_impl(self, main_program, startup_program, context):


I remember that our sharding pass does not support multiple blocks. How about add some assertion here?

sneaxiy · 2021-12-28T14:31:52Z

python/paddle/distributed/passes/auto_parallel_sharding.py

+                else:
+                    op._set_attr("ring_id", self.outer_dp_group.id)
+
+        main_block._sync_with_cpp


main_block._sync_with_cpp()

XieYunshen

LGTM for set_tests_properties(${TEST_OP} PROPERTIES LABELS "RUN_TYPE=DIST")

auto parallel sharding base

210b790

JZ-LIANG added 3 commits December 27, 2021 21:18

chmod

6f031e8

Merge remote-tracking branch 'upstream/develop' into AutoParallel/Sha…

aad24bc

…rding-stage-1-2-3

add unitest

d693a48

JZ-LIANG changed the title ~~auto parallel sharding base~~ [Auto Parallel] Sharding Pass Base Dec 28, 2021

set unitest cmake dist label

7becc2c

JZ-LIANG requested review from aoyulong and sneaxiy and removed request for aoyulong December 28, 2021 07:14

aoyulong previously approved these changes Dec 28, 2021

View reviewed changes

sneaxiy reviewed Dec 28, 2021

View reviewed changes

revise code according to rewiew

d8d7c91

JZ-LIANG dismissed aoyulong’s stale review via d8d7c91 December 29, 2021 03:08

chmod

63323bb

sneaxiy approved these changes Dec 29, 2021

View reviewed changes

JZ-LIANG changed the title ~~[Auto Parallel] Sharding Pass Base~~ [Auto Parallel] Sharding Pass Dec 29, 2021

XieYunshen approved these changes Dec 29, 2021

View reviewed changes

JZ-LIANG merged commit e3faf34 into PaddlePaddle:develop Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel] Sharding Pass #38502

[Auto Parallel] Sharding Pass #38502

JZ-LIANG commented Dec 27, 2021

paddle-bot-old bot commented Dec 27, 2021

sneaxiy Dec 28, 2021 •

edited

Loading

JZ-LIANG Dec 29, 2021 •

edited

Loading

sneaxiy Dec 28, 2021

JZ-LIANG Dec 29, 2021

sneaxiy Dec 28, 2021

JZ-LIANG Dec 29, 2021

sneaxiy Dec 28, 2021

JZ-LIANG Dec 29, 2021 •

edited

Loading

sneaxiy Dec 28, 2021

JZ-LIANG Dec 29, 2021

sneaxiy Dec 28, 2021

sneaxiy Dec 28, 2021

JZ-LIANG Dec 29, 2021 •

edited

Loading

XieYunshen left a comment

		@@ -63,12 +63,12 @@ def apply_no_passes(self):

		def check_main(self, gpus=None, **kwargs):

[Auto Parallel] Sharding Pass #38502

[Auto Parallel] Sharding Pass #38502

Conversation

JZ-LIANG commented Dec 27, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Dec 27, 2021

sneaxiy Dec 28, 2021 • edited Loading

Choose a reason for hiding this comment

JZ-LIANG Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

sneaxiy Dec 28, 2021 •

edited

Loading

JZ-LIANG Dec 29, 2021 •

edited

Loading

JZ-LIANG Dec 29, 2021 •

edited

Loading

JZ-LIANG Dec 29, 2021 •

edited

Loading