[AutoParallel] Generate spmd rule and reshard impl in phi api #56831

chenwhql · 2023-08-31T02:50:52Z

PR types

New features

PR changes

Others

Description

Pcard-73145

[AutoParallel] Generate spmd rule and reshard impl in phi api

在PHI前向API中生成切分推导与切分转换的逻辑实现。

切分推导

具体地，以matmul为例，切分推导规则所对应函数增加到了yaml的infer_meta字段下

- op : matmul
  args : (Tensor x, Tensor y, bool transpose_x = false, bool transpose_y = false)
  output : Tensor
  infer_meta :
    func : MatmulInferMeta
    spmd_rule : MatmulSpmdInferForward
  kernel :
    func : matmul
  backward : matmul_grad

考虑如下：

切分推导仍然是属于Tensor Meta信息推导的范畴，且预计在较长时间内都是可选字段，因此不在顶层新增，而作为infer_meta字段的一个子字段
切分推导的输入参数目前复用infer_meta的param信息，如果出现不一致的情况，可能也需要为spmd_rule新增param的子字段

然后在前向API实现中生成对MatmulSpmdInferForward的调用：

    // 1. InferSpmd (Infer DistAttr of Inputs&Outputs)
    auto meta_dist_x = MakeDistMetaTensor(*x.impl());
    auto meta_dist_y = MakeDistMetaTensor(*y.impl());
    auto spmd_info = phi::distributed::MatmulSpmdInferForward(meta_dist_x, meta_dist_y, transpose_x, transpose_y);

切分转换

前向API仅需要对Input进行切分转换，对原来假设的流程进行了微调，生成代码如下：

    // 5. Reshard Input
    auto dist_input_x = ReshardDistTensor(dev_ctx, x, spmd_info.first[0]);
    auto dist_input_y = ReshardDistTensor(dev_ctx, y, spmd_info.first[1]);

调整后的前向API动半分支生成结果：

  // Auto Parallel condition
  if (AllInputsAreDistTensor(x, y)) {
    // 1. InferSpmd (Infer DistAttr of Inputs&Outputs)
    auto meta_dist_x = MakeDistMetaTensor(*x.impl());
    auto meta_dist_y = MakeDistMetaTensor(*y.impl());
    auto spmd_info = phi::distributed::MatmulSpmdInferForward(meta_dist_x, meta_dist_y, transpose_x, transpose_y);

    // 2. Create API Output & Prepare Dist and Dense Output
    Tensor api_output;

    auto dist_out = SetKernelDistOutput(&api_output, spmd_info.second[0]);
    auto dense_out = dist_out->unsafe_mutable_value();

    // 3. Infer DistTensor's Global Shape
    phi::MetaTensor meta_dist_out(dist_out);
    phi::MatmulInferMeta(meta_dist_x, meta_dist_y, transpose_x, transpose_y, &meta_dist_out);

    // 4. Select Kernel
    VLOG(6) << "matmul API dist branch: kernel key: [" << kernel_backend << ", " << kernel_layout << ", "<< kernel_data_type << "]";
    auto kernel_result = phi::KernelFactory::Instance().SelectKernelOrThrowError(
        "matmul", {kernel_backend, kernel_layout, kernel_data_type});
    const auto& kernel = kernel_result.kernel;
    VLOG(6) << "matmul kernel: " << kernel;
    auto* dev_ctx = GetDeviceContextByBackend(kernel_result.has_fallback_cpu ? Backend::CPU : kernel_backend);

    // 5. Reshard Input
    auto dist_input_x = ReshardDistTensor(dev_ctx, x, spmd_info.first[0]);
    auto dist_input_y = ReshardDistTensor(dev_ctx, y, spmd_info.first[1]);

    // 6. PrepareData (DataTransform & Prepare Dense Input)
    dist_input_x = PrepareDataForDistTensor(dist_input_x, GetKernelInputArgDef(kernel.InputAt(0), kernel_backend), {}, kernel_result.is_stride_kernel);
    auto input_x = &dist_input_x->value();

    dist_input_y = PrepareDataForDistTensor(dist_input_y, GetKernelInputArgDef(kernel.InputAt(1), kernel_backend), {}, kernel_result.is_stride_kernel);
    auto input_y = &dist_input_y->value();

    // 7. Infer Local DenseTensor Meta
    phi::MetaTensor meta_dense_out(dense_out);
    phi::MatmulInferMeta(MakeMetaTensor(*input_x), MakeMetaTensor(*input_y), transpose_x, transpose_y, &meta_dense_out);

    // 8. DenseTensor Kernel Call
    using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, const phi::DenseTensor&, bool, bool, phi::DenseTensor*);
    auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
    (*kernel_fn)(*dev_ctx, *input_x, *input_y, transpose_x, transpose_y, dense_out);

    // 9. Return
    return api_output;
  }

TODO（下个PR进行）：

反向流程需要重新梳理完善一下
InferSpmd函数命名更新
特殊情况，前向输出如果是partial，需要reshard为replicated

paddle-bot · 2023-08-31T02:50:56Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… ap/generate_spmd_and_reshard

LiYuRio

LGTM

GhostScreaming

LGTM

…Paddle#56831) * add spmd and reshard code gen * add backward reshard code gen * test matmul forward success * polish test impl * add unsafe mutable value * polish details and add test * fix unittest time out * fix typo * refactor reshard input generate impl * resolve conflict with develop * fix compile error

chenwhql added 3 commits August 30, 2023 09:52

add spmd and reshard code gen

3723ecb

add backward reshard code gen

bc0a453

resolve conflict with develop

50f47ab

chenwhql changed the title ~~[AutoParallel] Adapt general spmd rule for static and dynamic mode~~ [AutoParallel] Generate spmd rule and reshard impl in phi api Aug 31, 2023

chenwhql added 12 commits September 4, 2023 08:17

test matmul forward success

c45c5ac

polish test impl

6f9ccc7

add unsafe mutable value

5e7450e

polish details and add test

ebaa069

resolve conflict with develop

41a86d0

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d954c6f

… ap/generate_spmd_and_reshard

fix unittest time out

63f2d02

fix typo

dfae006

refactor reshard input generate impl

94b8516

resolve conflict with develop

458f7db

resolve conflict with develop

9cb7ea2

fix compile error

ea075d9

chenwhql requested review from zyfncg, GhostScreaming and LiYuRio September 6, 2023 06:02

LiYuRio approved these changes Sep 6, 2023

View reviewed changes

zyfncg approved these changes Sep 6, 2023

View reviewed changes

GhostScreaming approved these changes Sep 6, 2023

View reviewed changes

chenwhql closed this Sep 6, 2023

chenwhql reopened this Sep 6, 2023

chenwhql merged commit e9364a3 into PaddlePaddle:develop Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Generate spmd rule and reshard impl in phi api #56831

[AutoParallel] Generate spmd rule and reshard impl in phi api #56831

chenwhql commented Aug 31, 2023 •

edited

Loading

paddle-bot bot commented Aug 31, 2023

LiYuRio left a comment

GhostScreaming left a comment

[AutoParallel] Generate spmd rule and reshard impl in phi api #56831

[AutoParallel] Generate spmd rule and reshard impl in phi api #56831

Conversation

chenwhql commented Aug 31, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 31, 2023

LiYuRio left a comment

Choose a reason for hiding this comment

GhostScreaming left a comment

Choose a reason for hiding this comment

chenwhql commented Aug 31, 2023 •

edited

Loading