-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: update compilation flags for improved performance #1099
base: master
Are you sure you want to change the base?
Conversation
src/ggml-cuda/CMakeLists.txt
Outdated
@@ -96,7 +96,7 @@ if (CUDAToolkit_FOUND) | |||
|
|||
set(CUDA_CXX_FLAGS "") | |||
|
|||
set(CUDA_FLAGS -use_fast_math) | |||
set(CUDA_FLAGS -use_fast_math --threads=0 --split-compile=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we better create a new cmake option: eg CUDA_COMPILE_THREADS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea thats an easy fix. let me do that
What is the advantage vs. specifying the number of threads via CMake? For example, this is the command that I use locally: cmake -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON .. && time cmake --build . -j 32 -- --quiet |
@JohannesGaessler this is for the |
BTW using this option i was able to cut from-scratch compile time by 50% from ~4hrs to 2hr. |
CMake |
@JohannesGaessler on Windows it does not. |
How are you testing? |
Even at 2 hours, that's much higher than expected, even when building for all the supported architectures. Can you share more details about the setup that you are using to build? Hardware, MSVC and CUDA toolkit versions, and anything else that you think may be relevant. |
@slaren if you look at recent windows cublas builds for whisper.cpp e.g. https://github.com/ggerganov/whisper.cpp/actions/runs/13115822916/job/36589762164 you'll notice it takes roughly 4 hours to complete. |
Um yeah, the whisper CI does not even use |
@slaren so i just tried cmake |
Yes absolutely, if it improves performance we should add it. But it may also cause thread contention if used together with |
Here are my results overall
So we can conclude either |
This adds CUDA
nvcc
compile parallelization to speed up.cu
files compilation (which take >3 hours today).Setting
--threads=0
lets the system find out how many cores it can use for parallelization.Per NVidia documents: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#threads-number-t