-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set expected place in child thread for dataloader to avoid costing cuda memory on other card #30338
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for multi-process DataLoader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the change of framework.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…da memory on other card (PaddlePaddle#30338) * set expected place in child thread for dataloader * set device id when set tensor from numpy * revert tensor_py change * add compile guard * fix ci * fix bug
PR types
Bug fixes
PR changes
Others
Describe
Set expected place in child thread for dataloader to avoid costing cuda memory on other card
cudaSetDevice()
is valid inside host thread, so the child thread in dataloader may have the default device id (0) if not set explicitly, which may cost about 500MB memory on card 0.It is worse when doing multi-card training in dygraph, since each process will cost about 500MB memory on card 0.