This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 10
[Bug]: When running repo hello world: RuntimeError: CUDA error: an illegal instruction was encountered #187
Labels
bug
Something isn't working
Comments
Thanks for reporting. We will look into the issue. |
@remiconnesson thanks for posting this issue. We were able to reproduce it on H100 and find the cause. There was a problem with one of our PTX assembly codes, the fix is pretty simple. Here is the ongoing PR to fix it: vllm-project#4218. |
pcmoritz
referenced
this issue
in vllm-project/vllm
Apr 24, 2024
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
Fix has landed, thanks for reporting! |
xjpang
referenced
this issue
in xjpang/vllm
Apr 25, 2024
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
robertgshaw2-redhat
referenced
this issue
Apr 26, 2024
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
alexeykondrat
referenced
this issue
in alexeykondrat/ci-vllm
May 1, 2024
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
z103cb
referenced
this issue
in z103cb/opendatahub_vllm
May 7, 2024
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Your current environment
🐛 Describe the bug
From the repo hello world example, I encounted an error
The text was updated successfully, but these errors were encountered: