-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Graph Error - CUDA failure 900: operation not permitted when stream is capturing #15002
Comments
Hi @feihugis - I recall you saying that the model your team flighted also used CUDA Graph. Did you run into issues like the above while trying to capture the graph ? AFAIK - Cuda stream synchronize has always existed in the code. I wonder why we didn't see something like this while testing your model. |
@tianleiwu - Could it be that in the "large" unet model, it is using a kernel that internally uses For the "small" model, it may be that the stream synchronize using op/kernel doesn't kick-in? If you look at the CUDA EP setup that captures the graph, we first finish capturing the graph in
Unfortunately, if one of the intermediate kernels it encounters between graph capture begin and graph capture end contains synchronization logic, it cannot be captured. |
Hi @hariharans29 and @tianleiwu sorry for the late response. I did not see this message and suddenly saw it when I search my email for something else. Yes, the model we had mainstreamed around one year ago did not meet any issue when capturing the CUDA graph. Recently when I tried GPT2+Beam Search, I met similar issues. After making some codes changes (feihugis@de67b88), CUDA Graph capturing can work, but as some of ops are not on GPU, the outputs are not correct. Please feel free to ping me on Team if I missed your comments. |
Fix two issues related to cuda graph capture: #14942 and #15002 Issue 1: Previously, graph capture starts at the second run. However, memory pattern optimization will allocate memory from the second run, and cudamalloc is not allowed during graph capture. In this PR, the graph capture will start graph capture after 2 runs to avoid the issue. Issue 2: #13495 introduced multiple stream support. But stream cleanup will call cudaStreamSyncronize which is not allowed in cuda graph capture. In this PR, we move stream cleanup after cuda graph capture. Update the squeeze net test model with dynamic axis so that we can test with larger batch size. Add a test that could reproduce the bug (when changing min runs from 2 back to 1).
I still see this error when running multiple models in parallel. You can reproduce the error by running:
The folder /data/onnx holds test models and their input/output data from https://github.com/onnx/onnx |
2024-07-23 16:30:08.420038342 [E:onnxruntime:Default, dataitem_request.cc:32 operator()] argmin_default_axis_random:Non-zero status code returned while running ArgMin node. Name:'' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing |
@snnn, this issue is for cuda graph error in single thread. Your reported error is another issue of multi-threading. Stream capturing error shall not appear when cuda graph is not enabled. If you see that error in onnx test runner, that basically means ORT has some code is not thread-safe, which cause buffer overrun and mess up the call stack. |
Is there any update in this issue, I have the same problems with start triton server |
hi, did u solve it, I met the same problem |
@tham-tran-ts , @fclearner please provide reproduce (test script and model) if you need help. |
thanks, I have solved it, the cause is setting "export CUDA_LAUNCH_BLOCKING=1" makes the onnx not threading-safe on gpu |
Describe the issue
During cuda graph catpure, ORT will trigger cudaStreamSynchronize, which is not allowed in CUDA graph catpure. Call stack is like the following:
Error is like the following (I added file and line):
To reproduce
The error is not always triggered with small model. But with larger model like unet, it can always reproduce.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.14.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: