-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows GPU performance much worse than linux #128
Comments
The 7.3s with the 3090 was on Linux I'm guessing? Just tested with 06000be on a 3080 on Windows for the crown scene, and it took ~1138s. The stats for it can be found in this text file. |
I have found that in CPU mode, performance is better when compiling with MinGW instead of MSVC. Unfortunately, CUDA does not support MinGW, so this is not a 'fix' for GPU mode, although it might be possible to use Clang. |
@pierremoreau those times were on Linux (3090) and Windows (2080), indeed. I should add that I was rendering with 64spp for those, which probably explains your timings! Nevertheless, your stats seem to exhibit the same issue of the intersection kernels taking way too much time. @pbrt4bounty in this case, the issue is almost certainly code that's running on the GPU, so it's likely that wouldn't make a difference (though that isn't yet certain!) |
I meant CPU vs CPU, the most fast is the MinGW build: MSVC ->140 sec, MinGW -> 86 sec. in the same scene. |
I can confirm that this improves performance significantly, going from 1138s to 259s on my 3080 when running without Did the OptiX validation have no impact on the Linux numbers? Since the validation seemed to be enabled on both OSes, I am surprised it would be responsible for most of the performance difference between the two. |
Hm, interesting... For crown with Here is what I get from Linux (3090)
Windows (2080Ti)
Things are mostly proportional, though on Windows the OptiX kernels and the queue resets seem disproportionately slow. If you can capture these on your system, that'd be interesting, since it'd be the same GPU for both, which would make any issues more clear... |
Also, OptiX validation only has about a 5% perf. impact on Linux, which presumably explains why I didn't notice any issues when I enabled it (when the Windows GPU path was broken...) |
Regarding Windows (will reboot later to try it out on Linux), I am getting some variation but roughly in line with what you have as well; the following are all with From this morning:
same binary as run this morning, but run anew:
I'll try testing with only the 3080 plugged in and see if it makes any difference, though PBRT has the |
This is "interesting" in your numbers:
(And 7.2% in your second run.) That's way higher than I'm seeing on either Linux or Windows, and it really should be in the noise as far as runtime. |
With the latest version of nsight systems (catching up to my driver version, which it was complaining about being too recent for it), I am no longer seeing the GPU going idle during rendering on Windows, which is good news. However, I am still troubled by those long "reset queues" times you're seeing, @pierremoreau... |
Here are the results from the same system, also running CUDA 11.2 and OptiX 7.2, but on Linux (and it rendered in 7-8s):
I ran with the 1080 Ti unplugged for this run. I also did a run on Windows without the 1080 Ti, but the numbers were quite close to the ones I got this morning. |
It looks like your Linux 3080 numbers are generally 10-15% slower than my Linux 3090 numbers, so that's good as far as that being roughly the difference I'd expect. So it seems we are left with just Windows still being off. |
Would a Nsight Systems trace on Windows help, or something else? |
Sure, that'd be interesting to take a look at. |
I'll try to gather one over the weekend. |
How do you make an Nsight System trace? |
e.g. crown is 7.3s on a 3090 and 97.6s on a 2080 (though the 3090 is faster, it's not that much faster!).
Looking at the
--stats
output, the issue seems to be in the OptiX launches; hopefully it's an issue of compiler flags being wrong, inlining not happening as expected, etc.The text was updated successfully, but these errors were encountered: