-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug][MetaSchedule] Failed to tune fp16 dense_add workload of some shapes on cuda #14137
Comments
Can you examine what the function
|
No, this function seems to be invoked in the relay's Legalize pass, while this input is a prim_func. |
cc @vinx13 |
I'll take a look. There was a previous attempt #14030 to solve this, but it is alleviate most cases but it's not a full solution. The problem is current arithmetic analysis can't not handle arbitrary block predicates, it can only handle simple bounds like |
I meet with fellow error when trying to reproduce: Environment: tvm commit id: ccc0b91 |
@lileidev, It looks like you are trying to build an arm module on an x86 CPU, which not works |
I compiled tvm based on default cmake/config.cmake file, didn't specify ARM platform. And you can find that the package path "tvm-0.12.dev387+gccc0b9162-py3.9-linux-x86_64.egg" is x86_64. This error can be produced just by "import tvm.tir.tensor_intrin" |
|
Both Module1 and Module0 can run pass on my machine. Module1:
|
I found that when tuning the fp16 tensorcore
dense_add
kernel, the tuning fails on some shapes and the reported error is non-deterministic.For example, when the workload is
N=1, M=1000, K=512
, the tuning fails.There are two kinds of reported errors. From my observation, the following error may be reported more frequently:
Click me
and may report this error with a lower frequency:
Click me
I tried different
N
, and found that whenN=2, 4, 8, 12, 17, 18, 24
the tuning still fails, but whenN=16, 32
it succeeds. I guess it may be because of the alignment requirement ofm16n16k16
tensor core.Expected behavior
The tuning succeeds
Environment
Steps to reproduce
Triage
cc @ibsidorenko
The text was updated successfully, but these errors were encountered: