-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pt): improve out-of-memory capture #3857
Conversation
I just received another error message that reports out of memory. It's a bad design of PyTorch that all errors use a general `RuntimeError`. Signed-off-by: Jinzhe Zeng <[email protected]>
WalkthroughWalkthroughThe Changes
Sequence Diagram(s) (Beta)sequenceDiagram
participant User
participant auto_batch_size.py
participant CUDA Driver
User->>auto_batch_size.py: Call is_oom_error(error_message)
auto_batch_size.py->>auto_batch_size.py: Check for existing OOM conditions
auto_batch_size.py-->>auto_batch_size.py: Check for new CUDA OOM condition
auto_batch_size.py-->>User: Return True/False based on OOM condition
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Additional comments not posted (4)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #3857 +/- ##
=======================================
Coverage 82.66% 82.66%
=======================================
Files 517 517
Lines 49724 49724
Branches 2984 2984
=======================================
Hits 41105 41105
Misses 7709 7709
Partials 910 910 ☔ View full report in Codecov by Sentry. |
I just received another error message that reports out of memory. It's a bad design of PyTorch that all errors use a general `RuntimeError`. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved out-of-memory error detection for CUDA driver issues. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Jinzhe Zeng <[email protected]>
I just received another error message that reports out of memory. It's a bad design of PyTorch that all errors use a general
RuntimeError
.Summary by CodeRabbit