-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple GPU SM versions #32
Comments
In CMake 3.18, |
Has anyone done this successfully and have insights to share? We have dockerized pbrt-v4 for GPUs, and made it possible for our students and researchers to remotely render scenes on our servers that have GPUs. Which, by the way, is awesome, and puts us miles ahead of where we were with v3. However, some of our servers have GPUs with multiple architectures, and we've been unable to build a binary (and in turn a Docker image, since we Dockerize everything) that can run on any GPU architecture other than the primary GPU. I've tried fiddling with CMAKE_CUDA_ARCHITECTURE and some other tweaks, but haven't gotten anything to work. Thanks! -- David Cardinal, Vistalab, Stanford |
Cool! If I do something like this: % cmake -G Ninja -DPBRT_OPTIX7_PATH=~/optix-7.4.0 ~/pbrt-v4 -DPBRT_GPU_SHADER_MODEL=sm_60 I am able to build a binary that is, as far as I can tell, compiled with the flags to specify shader model 6.0. Is your issue being unable to compile with a specified shader model, unable to compile a single binary with multiple shader models, or finding that the binary is invalid in spite of the above? FWIW I haven't been able to figure out how to compile a single binary that supports multiple shader models. |
Matt -- Thanks!! It seems to be working. I can build our docker image on the same Linux server for both our 3070 and the 2080 ti's that we were lucky enough to have donated:) That means we have at least 3 GPUs live that people can render on, even if their personal machine is a low-end box. I'm especially happy, as my major area of interest is computational photography, so bursts of images are needed. No further info on how to make a single binary for multiple architectures. Do you think any of the new -arch flags could help with that, or maybe they're not relevant for the pbrt compilation pipeline. In any case, this is great progress. Thanks! -- David |
Great! As far as I can tell a single binary for multiple architectures should be possible via the "fat binary" functionality of nvcc, but I'm not sure how to wire that up with the cmake stuff. Another issue is that pbrt's OptiX kernels would need to be handled similarly, which I'm not sure how to do either. Anyway, something to hopefully be fixed someday, but glad you're set for now. |
From what I understand from the doc, specifying Now for the OptiX kernels, compiling to multiple SM versions using CMake should not be too hard but I am not sure how the binaries would be specified for the applications to load them as expected. |
Ah, that's helpful. It looks like CMake 3.18 was released in 2020, though, which would require many folks to manually upgrade, which is somewhat undesirable for everyone who doesn't need this functionality. The OptiX kernels are basically compiled to PTX and then encoded as a big string that's stored in a global variable that's linked into the executable: extern const unsigned char PBRT_EMBEDDED_PTX[]; That string is passed in to the OptiX API. So "all" that would be necessary there I think would be to do that step multiple times, with different |
How would the unique naming work? Would extern "C" {
extern const unsigned char PBRT_EMBEDDED_PTX_SM30[];
extern const unsigned char PBRT_EMBEDDED_PTX_SM50[];
extern const unsigned char PBRT_EMBEDDED_PTX_SM60[];
extern const unsigned char PBRT_EMBEDDED_PTX_SM70[];
extern const unsigned char PBRT_EMBEDDED_PTX_SM71[];
extern const unsigned char PBRT_EMBEDDED_PTX_SM72[];
} And then when calling |
Something like that. Come to think of it another option might be for aggregate.cpp to have an extern const unsigned char PBRT_EMBEDDED_PTX_SM80[];
// ...
std::map<std::string, const unsigned char *>> archPTX {
{ "sm80", PTX_EMBEDDED_PTX_SM80 },
// ...
}; |
pbrt currently reports "FATAL CUDA error: invalid device symbol" and dumps a stack trace if its GPU path is run on a GPU that doesn't support the SM version it was compiled for. If nothing else, that's a pretty obscure error message; it would nice to say something more descriptive.
More generally, there's the question about whether the build should be improved so this doesn't happen. One option would be to just compile to PTX. Alternatively, the
cmake/checkcuda.cu
program currently reports a single SM version, and that of the first GPU that was detected. If multiple GPUs were installed, we might compile for each of them. Or perhaps we should allow the user to specify one or more SM versions, so that they could build for multiple SM versions even if they didn't have corresponding GPUs in their system at the moment...Building the GPU part of the system is fairly slow already, however, so it's not attractive to add more work to that phase of compilation...
The text was updated successfully, but these errors were encountered: