-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[auto_scheduler] Part.1 metal default hardware params #7022
Conversation
@antinucleon I heard from @merrymercy you are trying new Apple Silicon Mac with auto scheduler (CPU / GPU). For CPU, I heard llvm can not recognize the target parameter. I just tried a little and referred from https://reviews.llvm.org/D82699. We could set it |
@FrozenGene I tried it is tricky lol. I need to cross compile an ARM 64 LLVM. I did once but seems it was still x86-64 binary, then it will cause ARM conda segfault. If I don't work with conda, the tricky part is scipy, which requires gcc. I probably need to cross compile an ARM gcc, then compile LAPACK + Scipy. Or I can eliminate scipy dependency in TVM (which I have done in one version) Last time I don't have enough time to figure out how to correctly cross compile ARM64 LLVM, maybe I can try again this week. A few tips for people who want to compile LLVM on MacMini directly:
|
When you build it on the Apple Silicon machine, LLVM can not recognize correct host target and still produce x86 binary? For the GCC part, could we build LLVM with Clang together then make a softlink gcc pointing to clang like Apple doing it on Mac? |
My guess is: we don't know whether M1 is using some new vector / FMA instruction. If so, we can only obtain good perf until Apple contributes back to upstream LLVM. But I hope we will have more sense this week. |
Right. This is my previous reply's point. I refer the link https://reviews.llvm.org/D82699 and we could set it |
FYI: 2k by 2k matmul, Apple Accelerate library is able to achieve around 600GFLOPS, which is 1/3 of the peak FLOPS of M1 GPU. While running Ansor + llvm11 with |
@antinucleon Could you use |
This is the first PR to enable auto-scheduler working on Metal desktop GPU (AMD / M1).
To fully enable auto-scheduler on Metal, the following PRs are required: