Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support checkpoint method #2162

Closed
jihochu opened this issue Mar 31, 2023 · 3 comments
Closed

Support checkpoint method #2162

jihochu opened this issue Mar 31, 2023 · 3 comments
Labels

Comments

@jihochu
Copy link
Contributor

jihochu commented Mar 31, 2023

checkpoint method is suggested in pytorch, and is generally used to reduce runtime memory usage.
It leads to redundant calculations but also reduces memory consumption, and this trade-off can give some options to each application.

To implement the method in nntrainer, several considerations are necessary.
The biggest issue is that nntrainer only supports the pre-calculated memory planning method now. Managed memory area should be planned before the training phase, instant memories could be allocated in layers or calculation methods, but the checkpoint method needs to manage these instant memories more carefully. (It needs to decide when the memory is allocated/freed).
The other issue is related to the policy. It needs to check how we decide which layer needs to be checkpointed. More checkpoint means much memory reduction, also high CPU usage.

@taos-ci
Copy link

taos-ci commented Mar 31, 2023

:octocat: cibot: Thank you for posting issue #2162. The person in charge will reply soon.

Copy link

github-actions bot commented Feb 3, 2025

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale label Feb 3, 2025
Copy link

github-actions bot commented Feb 6, 2025

This issue was closed because it has been stalled for 3 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants