-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Ada Lovelace & Blackwell #6
Comments
I'm not sure whether it is hard to add other arch-support and maximize the performance. We may release new version if we get some other arch support, also, open-source community PRs are welcomed. |
when openCL+RISC-V? :P |
i'm working on adapting to Ada Lovelace |
An update on my efforts: First step is to install cutlass within the third_party folder for the intended architecture:
I am using 18 cores since I have an intel i9, but specify a number here! -j flag will go nuts and consume all RAM during the tests if you don't tailor it for your system. Next, change references from SM90 & sm_90 to SM89 & sm_89 in the following files:
Then install. The recommended python server.py install didn't work, but the much simpler pip install did, though perhaps this is connected to later problems with test_core.py:
Next, run tests. Right now test_jit.py works:
But test_core.py fails:
This is because it looks like cutlass doesn't compile a cluster_sm89.hpp file similar to a cluster_sm90.hpp file, presumably because these GPUs aren't intended to be server workhorses. So it is possible that cluster won't work. But it might be possible to split the test file to see if it runs on a single GPU. Working on that next. For now, just encouraged that jit still works. My hope, which I'm pursuing right now, is that I've got to rebuild their version of cutlass for sm_89, and that these files just need to be compiled. So exploring ways to do that, and will report progress, if any. |
An update on my end, and possibly fatal block. In fp8_gemm.cuh, the following libraries must be included:
Now, for Ada Lovelace in particular, one can link the sm80 versions of files. But after re-compiling cutlass, I don't think there are instructions to build sm_80/sm_89 versions. They are specific to sm_90 and sm_100. So unless a way can be found away this, I think we're stuck. |
This is perhaps better served as a feature request, but I can see wide interest in proper F8 implementation for the 4090/5090 series of GPUs. Is there any interest in adapting this for sm_89 & sm_100/a? Is it a steep challenge or is it feasible?
Many thanks for providing this repo. It is a remarkable contribution to the field.
The text was updated successfully, but these errors were encountered: