Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations to nearest neighbor #35

Merged
merged 3 commits into from
May 25, 2020
Merged

Conversation

aschampion
Copy link
Contributor

Thanks for the great library! A few optimizations for our use case of querying hundreds of thousands of nodes in large (10-100K node) trees:

  • Inline annotation on point indexing: surprised this is not happening automatically, but I receive consistent ~5% speedup with these.
  • Optimizing min_max_dist_2 to use O(P::DIMENSIONS) multiplications rather than O(P::DIMENSIONS^2) multiplications. This doesn't do much for 2-D, but yields 15-20% speedup for our 3-D cases. Note that due to order of floating point operations, the result of the function is not identical to the previous version. If this is important, I have a slightly less elegant but same O() version I can substitute that gives identical results to the old version.
  • Removing priority queue allocations from single nearest neighbor queries. When making many nearest neighbor queries, this has a large effect. For us, combined with the other optimizations this has a 1.5-1.8x speedup. It's also important for the performance of a follow-up PR I have go doing tree-to-tree all nearest neighbor queries.

Change this from O(P::DIMENSIONS^2) multiplications to
O(P::DIMENISIONS). Note this is not identical to previous results due to
floating precision.
Use a stack based binary heap by default that only allocates on the heap
after its stack size has been exhausted. This yields up to 1.3x speedup
for repeated lookups in large trees.
@codecov-io
Copy link

Codecov Report

Merging #35 into master will decrease coverage by 0.64%.
The diff coverage is 74.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #35      +/-   ##
==========================================
- Coverage   98.22%   97.57%   -0.65%     
==========================================
  Files          18       17       -1     
  Lines        1237     1239       +2     
==========================================
- Hits         1215     1209       -6     
- Misses         22       30       +8     
Impacted Files Coverage Δ
rstar/src/point.rs 100.00% <ø> (ø)
rstar/src/algorithm/nearest_neighbor.rs 91.11% <60.86%> (-5.35%) ⬇️
rstar/src/aabb.rs 100.00% <100.00%> (ø)
rstar/src/params.rs 84.61% <0.00%> (-1.10%) ⬇️
...ar/src/algorithm/bulk_load/bulk_load_sequential.rs 96.77% <0.00%> (-0.11%) ⬇️
rstar/src/rtree.rs 98.57% <0.00%> (-0.03%) ⬇️
rstar/src/algorithm/rstar.rs 96.75% <0.00%> (-0.02%) ⬇️
rstar/src/algorithm/removal.rs 98.78% <0.00%> (-0.02%) ⬇️
rstar/src/algorithm/selection_functions.rs 100.00% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34dbac5...4e69af8. Read the comment docs.

@Stoeoef
Copy link
Collaborator

Stoeoef commented May 25, 2020

Thanks a lot for your work on this!

I like all the changes, the performance benefits speak for themselves. I wish there was a "SmallHeap" package that would not require to implement the heap "on the fly", but we'll do what we have to do!.

Thanks a lot for upstreaming this!
Version 0.8.0 just got released, containing these changes.

aschampion added a commit to aschampion/rstar that referenced this pull request Jun 18, 2020
PR georust#35 changed the structure of floating point operations in this, which
leads to precision issues and inconsistency with distance_2. This fixes
the structure so that it is identical with results of the previous
implementation of min_max_dist_2.
aschampion added a commit to aschampion/rstar that referenced this pull request Jun 22, 2020
PR georust#35 changed the structure of floating point operations in this, which
leads to precision issues and inconsistency with distance_2. This fixes
the structure so that it is identical with results of the previous
implementation of min_max_dist_2.
aschampion added a commit to aschampion/rstar that referenced this pull request Apr 29, 2021
PR georust#35 changed the structure of floating point operations in this, which
leads to precision issues and inconsistency with distance_2. This makes
the order of floating point operations in min_max_dist_2 identical to
length_2, particularly adding squares in the same dimension order. This
resolves issues with discrepencies in distances due to non-associativity
of floating point operations.
bors bot added a commit that referenced this pull request Apr 29, 2021
40: Aabb: fix min_max_dist_2 consistency with distance_2 r=urschrei a=aschampion

PR #35 changed the structure of floating point operations in this, which
leads to precision issues and inconsistency with distance_2. This fixes
the structure so that it is identical with results of the previous
implementation of min_max_dist_2.

Performance difference is negligible.

These floating point differences compounded to significant changes in our data, so this is worth making consistent. This is also necessary for correctness of optimizations in a following PR.

Co-authored-by: Andrew Champion <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants