Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto growth allocator for CUDA pinned allocator #57625

Merged
merged 2 commits into from
Sep 25, 2023

Conversation

sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Sep 22, 2023

PR types

Performance optimization

PR changes

Others

Description

Add auto growth allocator for CUDA pinned allocator.

@sneaxiy sneaxiy changed the title fix h2d bandwidth Add auto growth allocator for CUDA pinned allocator Sep 22, 2023
@paddle-bot
Copy link

paddle-bot bot commented Sep 22, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ForFishes ForFishes merged commit 7aa22d2 into PaddlePaddle:incubate/new_frl Sep 25, 2023
@sneaxiy sneaxiy deleted the fix_h2d_copy_bw branch September 28, 2023 11:33
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 17, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 23, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 26, 2023
Add auto growth allocator for CUDA pinned allocator

* fix h2d bandwidth

* remove useless flags
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 7, 2023
Add auto growth allocator for CUDA pinned allocator

* fix h2d bandwidth

* remove useless flags
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 8, 2023
Add auto growth allocator for CUDA pinned allocator

* fix h2d bandwidth

* remove useless flags
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 9, 2023
Add auto growth allocator for CUDA pinned allocator

* fix h2d bandwidth

* remove useless flags
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 14, 2023
Add auto growth allocator for CUDA pinned allocator

* fix h2d bandwidth

* remove useless flags
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 27, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 4, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
zhiqiu pushed a commit that referenced this pull request Dec 6, 2023
* part-3 cherry from: add check for cembedding (#55621)

* part-3 fix cherry from: add check for cembedding

* part-3 fix c_embedding

* fix test_gpt_with_pir caused by pir

* part-3 cherry from: [Distributed] Support dp/sharding overlap in  virtual pp (#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log

* part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015)

* [FlashAttn] add flash randomness control (#52902)

* add flash randomness control

* fix VLOG undefied

* [WIP] Integration flash attention 2 (#55758)

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

---------

Co-authored-by: Chitsing KUI <[email protected]>

* part-4 cherry from: fix codestyle (#56066)

* part-4 cherry from(no change): Add assert for static and other plateform (#56044)

* part-4 cherry-pick from: dp and sharding coexist (#56096)

* dp and sharding coexist

* dp

* part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441)

* add debug information

* fix log

* fix log

* add detach for pp

* part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451)

* fix bug in synchronize

* fix bug in synchronize

* part-4 cherry from: add fused gradient (#57048)

* part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517)

* add eager_nccl_connection

* add eager_connection

* add eager_connection

* part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625)

* fix h2d bandwidth

* remove useless flags

* fix cherrry pick #56066

* part-5 cherry from: Add allocation debug FLAGS (#57797)

* Add allocation debug FLAGS

* add sync after value set

* refine flags

* part-5 cherry from: fix softmax backward (#57971)

* part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299)

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* fix

* fix comments

* fix kunlun compatibility issues

* fix test_fused_rotary_position_embedding.py

* fix allocator.h

* tinyfix

* fix conflicts

* fix new ir translator c_embedding failure

---------

Co-authored-by: ShenLiang <[email protected]>
Co-authored-by: umiswing <[email protected]>
Co-authored-by: Chitsing KUI <[email protected]>
Co-authored-by: niuliling123 <[email protected]>
Co-authored-by: liuzhenhai93 <[email protected]>
Co-authored-by: sneaxiy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants