-
Notifications
You must be signed in to change notification settings - Fork 109
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve execution policy heuristics #380
Comments
Related: openscad/openscad#391 |
This makes sense, but do we have a sense of how much this can gain us? I wonder if effort wouldn't be better spent parallelizing some more of the single-threaded code. What fraction of total time are we spending on triangulation and decimation and such? |
Probably a lot, at least for the CUDA case. On my laptop with a 3050Ti mobile and 12900HK CPU, small models are 10 times slower with CUDA enabled, and large models are only < 10% faster with CUDA. |
Oh wow, fair enough! My benchmarking has tended to focus on problems with large numbers of triangles (spheres, sponge). What do you think would be a good benchmark for small models? |
Not sure, I am testing those python examples. I think we can port some more simple OpenSCAD benchmarks, which are usually not too large. |
@ochafik One possible reason is that |
If we are concerned about that performance, maybe we can also make collider update lazy (actually seems to be a good idea if users are going to do many transforms) |
Isn't it already lazy since it's part of the lazy application of transforms in general? |
Well, we can be more lazy: don't compute the collider if we don't use the mesh for further boolean operations. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
The current heuristics in https://github.com/elalish/manifold/blob/master/src/utilities/include/par.h is basically a random number that results in an OK-ish performance but definitely not optimal. There are two problems here:
I think we need a more complete wrapper around thrust to do this, and VecDH should have a boolean indicating whether or not it was passed to the GPU or was used on the host. Ideally we can also try things like vulkan compute shader as an alternative backend for this API, selectively implementing it for some functions that can get good apeesup.
The text was updated successfully, but these errors were encountered: