-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start Rcpp branch #16
Comments
If this helps, here is the current time comparison for haversines distances for geodist and the distance functions in maxcovr, library(geodist)
library(maxcovr)
n <- 500
x <- cbind (-10 + 20 * runif (n), -10 + 20 * runif (n))
y <- cbind (-10 + 20 * runif (2 * n), -10 + 20 * runif (2 * n))
colnames (x) <- colnames (y) <- c ("x", "y")
mb1 <- microbenchmark::microbenchmark(
"geodist" = geodist(x,y, measure = "haversine"),
"maxcovr" = distance_matrix_cpp(y, x)
)
mb1
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> geodist 25.62657 28.69873 30.82551 30.6244 32.65784 46.65979 100
#> maxcovr 31.89214 34.29572 36.62601 36.2090 38.45201 44.87097 100
microbenchmark::autoplot.microbenchmark(mb1)
#> Loading required namespace: ggplot2
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one. Created on 2018-07-20 by the reprex package (v0.2.0). As a side note, it looks like maxcovr has a slightly different implementation of haversines formula to geodist, so their results are not identical. |
I just recently discovered this package as I sometimes have the need to calculate largish distance matrices for spatial data (to examine the closeness of "customers" to "sites"). I've written some hacky code using Rcpp and RcppParallel which seems to work fairly well and quickly. I did some comparisons to the geodist implementation. I don't know if there's any interest in trying to clean that up and add to this package or to some other package. I couldn't really find anything when I was trying to solve this problem several months ago, so I just hacked something together. I use it now in several Shiny apps to show interactive maps that calculate the distance matrices behind the scenes. And now I am looking at your maxcovr package (@njtierney) you made and that seems to be something I would have liked to have seen earlier. :) This isn't really reproducible since you don't have access to my rcpp_parallel_distm_C function (I'm pretty good at naming things...). I'd be happy to eventually share that code, but I guarantee it can probably be cleaned up and improved, so I'm slightly embarrassed to provide it at the moment!
|
Here is a proof of concept with the C++ code included: https://github.com/mkuehn10/pargeodist |
Thanks for sharing @mkuehn10 ! I'll point to another thread I wrote about here: njtierney/maxcovr#20 |
Thanks @mkuehn10 for the input here and there. My initial thought in designing this package was that distance calculation would generally be a small part of larger workflows which would easily be parallelized on the R side of things, so there was no need to parallelize the actual distance calculations within this package. An additional design aim of That said, the recent benchmarks of @symbolixAU suggest |
@mkuehn10 I've just put a comment in @symbolixAU's benchmarks that shows how Note also that the official WSG84 definitions bundled in |
This is now officially a non-trivial issue, because the benchmark clearly shows that the C code in indeed quite a bit faster than C++, and so ... The QuestionShould C code be converted to (in this case) demonstrably less efficient C++ form in order to allow regains in efficiency through parallelization? I'm tempted to answer "No", simply because the extra layer of complexity introduced in quite significant, but then again it would allow quite a boost for large calculations, so I'm open to other possibilities ... |
I think that the extra gains are worthwhile since they scale well - but perhaps it should be tested in either 1. another branch, or 2. a separate package. The upshot of a separate package is that you can then keep |
I guess the only other thing I can think of re performance is if there might be any gain in writing it in fortran - but from what I understand the interface to FORTRAN isn't exactly amazing? |
Now that would be an interesting idea ... but nah, I do not want to dig up my fortran skills (had to do a tiny bit of that back in my maths degree time). I doubt that anything would be faster than C. If i were to do it in anything else, it'd be rust, but even then ... nah, easier just to leave it. Speed ups via #22 are already in some use cases pretty significant. I'll still nevertheless do this Rcpp branch to investigate the original purpose, which was compilation times. |
Here is a basic package that I created that implements only the distance matrix functionality for now and only does the haversine and vincenty distances. I aligned the calculations to the way geodist does them and I get identical matrices back from the geodist function and the parallel version now. |
I came across this repo through a StackOverflow question, and I figured I'd give my 2 cents regarding parallelization. I've had some experience with this in my @mkuehn10 if I'm not mistaken, the way you're calling How you parallelize calculations is closely related to the type of output you want. In my package I implemented pairwise, symmetric and asymmetric calculations. For the symmetric case, I didn't parallelize based on rows or columns, I considered the whole lower triangular as a long vector and divided it into chunks for each thread. I have different distance-matrix fillers for different operations. You can see an example of this logic in this answer I gave on SO. For my use case, parallelization was very useful, and scaled even linearly in the number of cores for large matrices and some distances. My package includes some benchmarks, which you could see here if you like. AFAIK, you could also parallelize with something like OpenMP to avoid C++ dependencies, but with that I have no experience. I believe the code can stay the same, and will be parallelized when the user has OpenMP support and stay sequential otherwise. |
I have tried adjusting the grain size, and it only slows down the process. I thought it would actually help, but it did not seem to. I also think 1 is the default, and that's only left over from when I was playing around with that setting. Otherwise, I wouldn't even be putting that in the call to parallelFor. https://rcppcore.github.io/RcppParallel/#tuning For the distance matrix, you don't need larger chunks unless I'm mistaken. I'm no expert, so everything you said could be theoretically true, but I have only seen the speedups by doing what I've currently done. I currently have this method in use in several Shiny apps that I've deployed at work and it definitely makes a difference. I started working on a project over a year ago that required some large distance calculations, and I couldn't find anything that helped, so I tinkered around and wrote my own. I finally decided to just share what I had since it seems to work very quickly. I am trying to convert this in to a more formal package, so if you have specific feedback, please feel free to leave it here: https://github.com/mkuehn10/geodistpar |
I might have time to send you a quick PR. For now, here's a quick comparison. File // [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppParallel,RcppThread)]]
#include <cstddef> // size_t
#include <math.h> // sin, cos, sqrt, atan2, pow
#include <vector>
#include <RcppThread.h>
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
// single to double indices for lower triangular of matrices without diagonal
void s2d(const size_t id, const size_t nrow, size_t& i, size_t& j) {
// tmp to avoid ambiguity
double tmp = static_cast<double>(-8 * id + 4 * nrow * (nrow - 1) - 7);
j = nrow - 2 - static_cast<size_t>(sqrt(tmp) / 2 - 0.5);
i = id + j + 1 - nrow * (nrow - 1) / 2 + (nrow - j) * ((nrow - j) - 1) / 2;
}
class HaversineCalculator : public Worker
{
public:
HaversineCalculator(const NumericVector& lon, const NumericVector& lat, NumericMatrix& out)
: lon_(lon)
, lat_(lat)
, cos_lat_(lon.length())
, out_(out)
{
// terms for distance calculation
for (size_t i = 0; i < cos_lat_.size(); i++) {
cos_lat_[i] = cos(lat_[i] * 3.1415926535897 / 180);
}
}
void operator()(size_t begin, size_t end) {
double to_rad = 3.1415926535897 / 180;
size_t i, j;
for (size_t ind = begin; ind < end; ind++) {
if (RcppThread::isInterrupted(ind % static_cast<int>(1e5) == 0)) return;
s2d(ind, lon_.length(), i, j);
// haversine distance
double d_lon = (lon_[j] - lon_[i]) * to_rad;
double d_lat = (lat_[j] - lat_[i]) * to_rad;
double d_hav = pow(sin(d_lat / 2), 2) + cos_lat_[i] * cos_lat_[j] * pow(sin(d_lon / 2), 2);
if (d_hav > 1) d_hav = 1;
d_hav = 2 * atan2(sqrt(d_hav), sqrt(1 - d_hav)) * 6378137;
out_(i,j) = d_hav;
out_(j,i) = d_hav;
}
}
private:
const RVector<double> lon_;
const RVector<double> lat_;
vector<double> cos_lat_;
RMatrix<double> out_;
};
// [[Rcpp::export]]
SEXP haversine(const DataFrame& input, NumericMatrix output, const int nthreads) {
NumericVector lon = input["lon"];
NumericVector lat = input["lat"];
HaversineCalculator hc(lon, lat, output);
int size = lon.length() * (lon.length() - 1) / 2;
int grain = size / nthreads;
RcppParallel::parallelFor(0, size, hc, grain);
RcppThread::checkUserInterrupt();
return R_NilValue;
} Comparison with 4 cores: library(Rcpp)
library(RcppParallel)
library(RcppThread)
library(geodistpar)
Sys.setenv(PKG_CXXFLAGS = "-DRCPP_PARALLEL_USE_TBB=1")
sourceCpp("haversine.cpp")
df <- data.frame(lon=runif(1e3, -90, 90), lat=runif(1e3, -90, 90))
mat <- as.matrix(df)
library(microbenchmark)
foo <- function(df) {
out <- matrix(0.0, nrow(df), nrow(df))
haversine(df, out, RcppParallel::defaultNumThreads())
out
}
microbenchmark(times = 30L,
ans1 <- geodistpar(mat, mat, "haversine"),
ans2 <- foo(df))
Unit: milliseconds
expr min lq mean median uq max neval
ans1 <- geodistpar(mat, mat, "haversine") 54.48880 54.87569 59.30028 55.91926 60.62092 93.68114 30
ans2 <- foo(df) 19.29622 19.37289 21.23298 19.63739 23.43039 26.90884 30
all.equal(ans1, ans2)
TRUE |
Wow. Thanks. Definitely a bit beyond my capability or understanding at this time, but I will definitely take a look. |
Some additional considerations to keep in mind when deciding whether to use
|
Thanks @asardaes for the useful insights. I concur with the My current inkling is that parallelisation can and should be handled outside the core routines of In this light, I think the real resolution of this issue will come through comparing (1) a package-external parallelisation wrapper with (2) equivalent internal At the very least, I suspect that this issue ought to prompt a vignette on how to effectively parallelise |
Just to compare compilation times for Dirk
The text was updated successfully, but these errors were encountered: