-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exploit embarrassingly parallel options in c++ code #20
Comments
check out unrolling, from this blog post: https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/ |
Good guide here for parallel computing: http://gallery.rcpp.org/articles/parallel-distance-matrix/ |
Heya @mpadge, does |
Nup, coz it's all in C and dependency free, but what you could do is pilfer code from |
Ah fair enough! Thanks for the links, I'll dig more into this when I get into the 0.4.0 release |
You could close this, no? Non-parallel |
Yeah let's close this, but perhaps it might be useful to have a "wishlist" of things - speed can be one of them? Although to be honest, I'm pretty sure the biggest bottle neck is actually building the design matrix. |
I see you closed this issue, but I have written implementations that are very similar to your distance_matrix_cpp and nearest_facility_dist that use RcppParallel. My nearest "facility" function only returns the identifier and not the distance, but I could most likely modify it to do that. I also designed it to take another parameter that specifies if you want to look within X miles, so it would ignore any sites that are not within X miles. It's about ~5.8 times faster on my machine using the parallel version, but I'm not well versed in high performance computing, so who knows if it could be improved beyond that.
|
Here is a proof of concept with the C++ code included: https://github.com/mkuehn10/pargeodist |
Wow @mkuehn10 - that looks amazing, thank you so much for writing and sharing that! It would be great to include in maxcovr, but in the future I'm looking to incorporate geodist to calculate distances, to make that part of the package easier to maintain, plus faster, since there are options such as cheaprule, haversine, and so on. That said, I feel like a parallel version of geodist would be a very valuable contribution, but I'm not sure if it would be its own package, or if it should be rolled into geodist - that would be up to @mpadge to decide. But, looking at the benchmarks you have provided, there is about a 4x speedup using parallel, which seems pretty amazing. Cross ref: hypertidy/geodist#16 |
Thank you. I'm surprised I hadn't found these packages until yesterday (in retrospect the whole reason I went down the parallel path is that I couldn't find anything that was fast enough for my use case). I have access to a fairly beefy computer (something like 72 cores and a ton of memory), so it can melt through some fairly large distance calculations. I like having options and it wasn't really a premature optimization in this case -- I was sitting there waiting for like 20-30 minutes (using a completely implemented solution in R) for some calculations to complete and after I implemented the parallel version it was something like 45 seconds. I'm happy to (try to) make some pull requests, or to just inspire someone else to write better parallel code. |
Also, regarding the 4x speedup -- that would/should scale up based on whatever system you are using. I see about a ~8-9x difference running it on a much beefier machine and running it through much larger matrices. This was when n = 15000
|
The following functions have embarrassingly parallel operations for rowwise and colwise
Perhaps RcppParallel can do something here
The text was updated successfully, but these errors were encountered: