New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

cherry-pick fused_rope from develop #55931

Merged

sneaxiy merged 8 commits into PaddlePaddle:incubate/new_frl from AnnaTrainingG:rope_frl

Aug 7, 2023

Contributor

AnnaTrainingG commented Aug 3, 2023

PR types

Others

PR changes

Others

Description

Others

AnnaTrainingG added 2 commits

August 3, 2023 02:18


          Add fused_rope forward op (PaddlePaddle#54351)

03d15d9

* style

* more

* update ctest

* Update legacy_backward.yaml

* Update legacy_ops.yaml

* Update legacy_ops.yaml

* update

* update

* update for move


          Update the rope op according to the comments (PaddlePaddle#54985)

b975118

paddle-bot bot commented Aug 3, 2023

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.


          Update multiary.cc

c6167b1

Xreki requested a review from sneaxiy

August 3, 2023 02:29


          Update __init__.py

98ae034

sneaxiy reviewed

View reviewed changes

python/paddle/incubate/nn/functional/fused_rotary_position_embedding.py

+                          out_q, out_k, out_v = fused_rotary_position_embedding(q, k, v)
+                  """
+                  if in_dynamic_mode():
+                      return _C_ops.fused_rotary_position_embedding(q, k, v)

Collaborator

sneaxiy Aug 3, 2023

不支持静态图？加个assert？

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_grad_kernel.cu Outdated

+                                                            MPType div_c) {
+                int index = (blockIdx.x * blockDim.x + threadIdx.x) * VecSize;
+                int stride = gridDim.x * blockDim.x * VecSize;
+                int size = batch_size * seq_len * num_heads * head_dim;

Collaborator

sneaxiy Aug 3, 2023

使用int64_t防止溢出。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_grad_kernel.cu Outdated

+                  for (int nx = 0; nx < VecSize; ++nx) {
+                    // get sin_index and cos_index
+                    int index_wc = (index + nx) % (seq_len * num_heads * head_dim);
+                    int pos_seq = index_wc / (num_heads * head_dim);

Collaborator

sneaxiy Aug 3, 2023

使用int64_t。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_grad_kernel.cu Outdated

+                                       DenseTensor* dq,
+                                       DenseTensor* dk,
+                                       DenseTensor* dv) {
+                int numel = dout_q.numel();

Collaborator

sneaxiy Aug 3, 2023

使用int64_t。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu Outdated

+                                                        MPType div_c) {
+                int index = (blockIdx.x * blockDim.x + threadIdx.x) * VecSize;
+                int stride = gridDim.x * blockDim.x * VecSize;
+                int size = batch_size * seq_len * num_heads * head_dim;

Collaborator

sneaxiy Aug 3, 2023

使用int64_t。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu Outdated

+                  for (int nx = 0; nx < VecSize; ++nx) {
+                    // get sin_index and cos_index
+                    int index_wc = (index + nx) % (seq_len * num_heads * head_dim);
+                    int pos_seq = index_wc / (num_heads * head_dim);

Collaborator

sneaxiy Aug 3, 2023

使用int64_t。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu Outdated

+                                   DenseTensor* out_q,
+                                   DenseTensor* out_k,
+                                   DenseTensor* out_v) {
+                int numel = q.numel();

Collaborator

sneaxiy Aug 3, 2023

使用int64_t。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu Outdated

+                int numel = q.numel();
+                if (numel <= 0) return;
+                dev_ctx.template Alloc<T>(out_q);
+                out_q->Resize(q.dims());

Collaborator

sneaxiy Aug 3, 2023

这个Resize是冗余的？因为在InferMeta的时候已经设置过shape了。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu Outdated

+                if (k.get_ptr()) {
+                  dev_ctx.template Alloc<T>(out_k);
+                  out_k->Resize(q.dims());

Collaborator

sneaxiy Aug 3, 2023

这个Resize是冗余的？因为在InferMeta的时候已经设置过shape了。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu Outdated

+                if (v.get_ptr()) {
+                  dev_ctx.template Alloc<T>(out_v);
+                  out_v->Resize(q.dims());

Collaborator

sneaxiy Aug 3, 2023

这个Resize是冗余的？因为在InferMeta的时候已经设置过shape了。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

AnnaTrainingG added 2 commits

August 3, 2023 06:15


          for int64_t and assert

7d45c81


          Merge branch 'rope_frl' of https://github.com/niuliling123/Paddle int…

6ce08e8

…o rope_frl

sneaxiy reviewed

View reviewed changes

paddle/phi/kernels/fusion/gpu/fused_rope_grad_kernel.cu Outdated

+                                                            phi::Array<T*, 3> outs_data,
+                                                            int num_inputs,
+                                                            MPType div_c) {
+                int64_t index = (blockIdx.x * blockDim.x + threadIdx.x) * VecSize;

Collaborator

sneaxiy Aug 3, 2023

之前曾经遇到过blockIdx.x * blockDim.x + threadIdx.x计算本身越界，导致出现负数的情况，因此建议做static_cast<int64_t>(blockDim.x)后运算，下同。

Contributor Author

AnnaTrainingG Aug 3, 2023

done

AnnaTrainingG and others added 2 commits

August 3, 2023 06:27


          more

68c2e3f


          remove useless assert first

e3e9b8d

sneaxiy approved these changes

View reviewed changes

sneaxiy merged commit 8d3a988 into PaddlePaddle:incubate/new_frl

hitywt pushed a commit to hitywt/Paddle that referenced this pull request


          part-3 cherry from(no change): cherry-pick fused_rope from develop (P…

ecda239

…addlePaddle#55931)

* Add fused_rope forward op (PaddlePaddle#54351)

* style

* more

* update ctest

* Update legacy_backward.yaml

* Update legacy_ops.yaml

* Update legacy_ops.yaml

* update

* update

* update for move

* Update the rope op according to the comments (PaddlePaddle#54985)

* Update multiary.cc

* Update __init__.py

* for int64_t and assert

* more

* remove useless assert first

---------

Co-authored-by: sneaxiy <[email protected]>

hitywt pushed a commit to hitywt/Paddle that referenced this pull request


          part-3 cherry from(no change): cherry-pick fused_rope from develop (P…

58c7652

…addlePaddle#55931)

* Add fused_rope forward op (PaddlePaddle#54351)

* style

* more

* update ctest

* Update legacy_backward.yaml

* Update legacy_ops.yaml

* Update legacy_ops.yaml

* update

* update

* update for move

* Update the rope op according to the comments (PaddlePaddle#54985)

* Update multiary.cc

* Update __init__.py

* for int64_t and assert

* more

* remove useless assert first

---------

Co-authored-by: sneaxiy <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet