Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve execution policy heuristics #380

Closed
pca006132 opened this issue Mar 18, 2023 · 9 comments
Closed

Improve execution policy heuristics #380

pca006132 opened this issue Mar 18, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@pca006132
Copy link
Collaborator

The current heuristics in https://github.com/elalish/manifold/blob/master/src/utilities/include/par.h is basically a random number that results in an OK-ish performance but definitely not optimal. There are two problems here:

  1. The policy is not stored in the vector. If a vector was passed to the GPU, it would make sense to prefer GPU instead of CPU for future operation.
  2. The optimal number of elements for parallelizarion will depend on the algorithm and element size.

I think we need a more complete wrapper around thrust to do this, and VecDH should have a boolean indicating whether or not it was passed to the GPU or was used on the host. Ideally we can also try things like vulkan compute shader as an alternative backend for this API, selectively implementing it for some functions that can get good apeesup.

@pca006132 pca006132 added the enhancement New feature or request label Mar 18, 2023
@pca006132
Copy link
Collaborator Author

Related: openscad/openscad#391

@elalish
Copy link
Owner

elalish commented Mar 20, 2023

This makes sense, but do we have a sense of how much this can gain us? I wonder if effort wouldn't be better spent parallelizing some more of the single-threaded code. What fraction of total time are we spending on triangulation and decimation and such?

@pca006132
Copy link
Collaborator Author

Probably a lot, at least for the CUDA case. On my laptop with a 3050Ti mobile and 12900HK CPU, small models are 10 times slower with CUDA enabled, and large models are only < 10% faster with CUDA.

@elalish
Copy link
Owner

elalish commented Mar 20, 2023

Oh wow, fair enough! My benchmarking has tended to focus on problems with large numbers of triangles (spheres, sponge). What do you think would be a good benchmark for small models?

@pca006132
Copy link
Collaborator Author

Not sure, I am testing those python examples. I think we can port some more simple OpenSCAD benchmarks, which are usually not too large.

@pca006132
Copy link
Collaborator Author

@pca006132 good to know, thanks! Also, noticed Manifold::Transform seems ~8x slower than OpenSCAD's PolySet::transform (which uses Eigen transforms); haven't fully investigated but wondering if TBB has too much overhead maybe? (even without Eigen's SIMD optimizations, I'd expect to match its speed when throwing a dozen cores at it). I tried batching the thrust::transform calls in Impl::Transform, to no avail.

@ochafik One possible reason is that Manifold::Transform will perform a collider update if the transform is not axis aligned, and that is pretty expensive. Can you give an example model for me to check?

@pca006132
Copy link
Collaborator Author

If we are concerned about that performance, maybe we can also make collider update lazy (actually seems to be a good idea if users are going to do many transforms)

@elalish
Copy link
Owner

elalish commented Mar 21, 2023

Isn't it already lazy since it's part of the lazy application of transforms in general?

@pca006132
Copy link
Collaborator Author

Well, we can be more lazy: don't compute the collider if we don't use the mesh for further boolean operations.

Repository owner locked and limited conversation to collaborators Apr 3, 2023
@pca006132 pca006132 converted this issue into discussion #397 Apr 3, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants