Add auto growth allocator for CUDA pinned allocator #57625

sneaxiy · 2023-09-22T04:26:43Z

PR types

Performance optimization

PR changes

Others

Description

Add auto growth allocator for CUDA pinned allocator.

paddle-bot · 2023-09-22T04:27:08Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ForFishes

LGTM

Add auto growth allocator for CUDA pinned allocator * fix h2d bandwidth * remove useless flags

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

* part-3 cherry from: add check for cembedding (#55621) * part-3 fix cherry from: add check for cembedding * part-3 fix c_embedding * fix test_gpt_with_pir caused by pir * part-3 cherry from: [Distributed] Support dp/sharding overlap in virtual pp (#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log * part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015) * [FlashAttn] add flash randomness control (#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: Chitsing KUI <[email protected]> * part-4 cherry from: fix codestyle (#56066) * part-4 cherry from(no change): Add assert for static and other plateform (#56044) * part-4 cherry-pick from: dp and sharding coexist (#56096) * dp and sharding coexist * dp * part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441) * add debug information * fix log * fix log * add detach for pp * part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451) * fix bug in synchronize * fix bug in synchronize * part-4 cherry from: add fused gradient (#57048) * part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517) * add eager_nccl_connection * add eager_connection * add eager_connection * part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625) * fix h2d bandwidth * remove useless flags * fix cherrry pick #56066 * part-5 cherry from: Add allocation debug FLAGS (#57797) * Add allocation debug FLAGS * add sync after value set * refine flags * part-5 cherry from: fix softmax backward (#57971) * part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp * fix * fix comments * fix kunlun compatibility issues * fix test_fused_rotary_position_embedding.py * fix allocator.h * tinyfix * fix conflicts * fix new ir translator c_embedding failure --------- Co-authored-by: ShenLiang <[email protected]> Co-authored-by: umiswing <[email protected]> Co-authored-by: Chitsing KUI <[email protected]> Co-authored-by: niuliling123 <[email protected]> Co-authored-by: liuzhenhai93 <[email protected]> Co-authored-by: sneaxiy <[email protected]>

fix h2d bandwidth

01b1397

sneaxiy changed the title ~~fix h2d bandwidth~~ Add auto growth allocator for CUDA pinned allocator Sep 22, 2023

remove useless flags

e2cdfb0

ForFishes approved these changes Sep 25, 2023

View reviewed changes

ForFishes merged commit 7aa22d2 into PaddlePaddle:incubate/new_frl Sep 25, 2023

sneaxiy deleted the fix_h2d_copy_bw branch September 28, 2023 11:33

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 17, 2023

cherry-pick (PaddlePaddle#57625)

a3a4483

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 23, 2023

fix cherry-pick (PaddlePaddle#57625)

5be8467

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 26, 2023

cherry-pick from (PaddlePaddle#57625)

103cd48

Add auto growth allocator for CUDA pinned allocator * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 7, 2023

cherry-pick from (PaddlePaddle#57625)

5cd0b8c

Add auto growth allocator for CUDA pinned allocator * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 8, 2023

cherry-pick from (PaddlePaddle#57625)

678780b

Add auto growth allocator for CUDA pinned allocator * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 9, 2023

cherry-pick from (PaddlePaddle#57625)

fa42c9d

Add auto growth allocator for CUDA pinned allocator * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 14, 2023

cherry-pick from (PaddlePaddle#57625)

bffa407

Add auto growth allocator for CUDA pinned allocator * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 27, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

a2dfa04

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

35d8e6f

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

b45d7a7

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

b79e5b9

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 4, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

9de80bc

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

66405b0

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

af0ad5b

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add auto growth allocator for CUDA pinned allocator #57625

Add auto growth allocator for CUDA pinned allocator #57625

sneaxiy commented Sep 22, 2023 •

edited

Loading

paddle-bot bot commented Sep 22, 2023

ForFishes left a comment

Add auto growth allocator for CUDA pinned allocator #57625

Add auto growth allocator for CUDA pinned allocator #57625

Conversation

sneaxiy commented Sep 22, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Sep 22, 2023

ForFishes left a comment

Choose a reason for hiding this comment

sneaxiy commented Sep 22, 2023 •

edited

Loading