-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRAN submission! #1
Comments
The package is in a reasonable state, although I've not put it through its paces as much as I should. Also, there's some RcppParallel stuff in there that might not work right and might need to just be removed. The Python wrappers allow for searching by label, which I haven't implemented. Not sure how much demand there is for that. Apart from that, this could be submitted to CRAN, if you think it adds some value over Annoy. I haven't done much benchmarking to see if the difference is large once you no longer have access to platform-specific code. And thank you for the micro code-review! |
I'm not too worried about the AVX stuff. In fact, for Annoy, I explicitly turned off the vectorization to avoid debugging headaches due to platform-to-platform differences in compilation. (As you now, vectorization performs additions in a different order, so the distances would be slightly different due to round-off error at each addition - a fact that is probably more pronounced when using Annoy's plenty fast without vectorization, and I imagine that I would reach the same conclusion for HNSW. I don't know what the head-to-head is like, though - if these graphs are to be believed, HNSW dominates Annoy in all tested data sets, though this probably depends on the data. |
Putting my money where my mouth is: https://github.com/LTLA/BiocNeighbors/tree/hnsw. |
@LTLA, once again thank you for the pull requests. Before proceeding with CRAN submission, I plan to spend some quality time with at least some of the HDF5 datasets that ann-benchmarks offer. This has already resulted in some extra refactoring. If there are no unpleasant surprises we are probably good to go. |
@jlmelville, I was wondering whether it would be a good idea to push your portability modifications upstream to the original hnswlib repo. I can't imagine we're the only people running into this. (Also, it would reduce divergence of your code from hnswlib, which would make it easier to update RcppHNSW when upstream changes occur.) Would @yurymalkov be willing to entertain a PR? For full portability, I would guess that we would also need |
Yeah, I have toyed with the idea a few times, but didn't want to shift the maintenance burden upstream if no-one else was going to care. I should probably put it out there anyway. If you have the time, would you mind pointing to line numbers that you think still need looking at? |
Went one better and made #3. I haven't run this through a build system yet, but I think I caught all of the non-standard C++ lying around. I don't think it would be particularly selfish of us to pass the changes upstream, a problem shared is a problem halved and all that. |
I noticed that the default |
No reason other than when I realised I had failed to differentiate between I have yet to come to any conclusions about parameter settings. I was able to run through four of the ann-benchmarks datasets without any problems, but it's a lot slower than Annoy (which seems counter to the benchmarks results). I assume this is because I am running this package via As the package has been through a lot of churn over the past couple of days I haven't got round to installing a non-dev version via |
On the subject of pushing the preprocessor additions up to HNSW itself:
|
|
Ok, I will deal with the PR. Thanks for the info. I will at least test a few combinations of the compiler flags before submitting upstream. |
@LTLA, how are the
to:
and I get the warning:
|
Both of these However, I'm not sure how safe it is to set the |
Thanks for the clarification. I tried combining |
What compiler error did you get? I go through fine when I set |
My mistake. I got the error with |
Might be a good idea to remove line 174 in |
PR has been submitted at: nmslib/hnswlib#85 -- Travis CI failed however in the python bindings test. Any ideas on that one? |
Looking at it now. I don't have much python binding experience, but let's see. |
Well, I would look at it, if I could get Python 3 to install... argh. |
Huh. |
I think I know where this comes from. The statements in self.assertAlmostEqual(diff_with_gt_labels,0,1e-4) ... are probably incorrect, as the third argument should be an integer according to this. This is consistent with the error message observed in the latest Travis build. Now, the question is why this causes problems here and not in the I'm not sure why the situation changed with the addition of our macros, but it seems that the arguments being tested are floats that weren't meant to be exactly equal anyway... |
PR has been checked into nmslib/hnswlib master and I have synced back up. Onwards. I should probably document how to use the package with the data provided by ann-benchmarks, but I don't know how useful it would be as a vignette as none of the code should be automatically run. |
Is it time to finish the fight? What else do we need - a vignette, perhaps? I can whip one up from your example in the |
If you can think of a vignette that will be useful, go for it. I'm not a huge fan of vignettes which don't add much beyond simple examples, or where the code isn't being run because it takes too long (which would have happened with anything to do with ann-benchmarks) and my inspiration has run dry. I am waiting on:
|
Okay. Not having a vignette is fine as well; I'm just used to it for BioC submissions. Reading nmslib/hnswlib#87 reminds me how lucky I am to not have to mess around with pointer casts. |
nmslib/hnswlib#87 is fixed. I will aim to submit sometime after the 14th January. |
Nudge. (Let me know if there's anything I can help with.) |
I uploaded the submission to CRAN yesterday. Will update in the face of glorious victory/crushing defeat. |
We're in: https://cran.r-project.org/package=RcppHNSW. Thank you for your help @LTLA. |
Great work. Immediate rev-dep from BiocNeighbors as soon as it builds! |
Well, it's built, so you can see the spanking new dependency on the BiocNeighbors page. It's a shame that CRAN doesn't track BioC reverse dependencies on the RcppHNSW landing page, but whatever. What's the story behind |
It looks like Where are the |
Sorry, I was referring to http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/tokay2-checksrc.html Bit of a head-scratcher, as it's the same C++ code, yet RcppHNSW checks fine but BiocNeighbors falls apart. Maybe it's something to do with the lack of RcppHNSW Windows binaries from CRAN. |
RcppHNSW binaries just got released on CRAN - let's see how the next BioC build goes for BIocNeighbors. I was hoping you'd say something like "oh yeah, RcppHNSW didn't compile on Windows until I added Sigh - another round of head-banging-on-wall action to debug the Windows build... it's not the first time, and it probably won't be the last. |
Gah! Turned out to be a memory overrun on my end. Whoops. The other test failure seems to be due to some differences between R's But this is all fixed now, and the panel is all green: http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors ... so let the party begin! |
very minor changes to make it run on Windows.
I'm also looking forward to a CRAN submission for this package; it'll allow me to give Annoy some company in the "approximate NN methods" section of BiocNeighbors. It looks pretty ready to go - is there anything else that needs to be done?
Also, insofar as you're using C++11, you could replace
&dv[0]
withdv.data()
.The text was updated successfully, but these errors were encountered: