-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid allocating before we have checked if it is in the cache + don't rehash subtree #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clever, especially since the builder API already has the limit of 3 children for what it deduplicates. I should probably port this optimization over to sorbus as well.
@simonvandel tangentially related: would you mind testing #58 to see what impact it has? I'm working on an experimental(ish) rewrite of rowan's green trees and don't have the experience to actually measure the impact properly here on Windows. |
@CAD97 for benchmarking I simply ran hyperfine with a release build with and without the changes applied. It’s probably easier if you just run that locally yourself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to fix the cache to deal with hash collisions properly, but otherwise this looks like a clear positive on all other factors.
Hi @CAD97 I propose landing this in the current state. As a follow-up PR, I would like to experiment with eliminating the need for the SmallVec collection. Essentially I would like to compute the hash of kind + children pointers by just iterating through the input children iterator. That probably requires using the raw_entry api to avoid the hashmap key needing to be owned. Is it possible to iterate the children iterator twice? One time for building the hash, and then potentially a second time in case the node did not exist in the cache, and therefore needs to be allocated. |
As I mentioned on zulip, I'm concerned now that this as currently implemented will noticeably lower the number of cache hits, as two nodes are only considered deduplicatable by the current test if all children were deduplicated (and thus are the same identity). Obviously, skipping the heap allocation in the common case is beneficial, as this patch is still positive. But before we do anything other than that one small step, I really would like to see some measurement on heuristics for what nodes are cachable as discussed on zulip. |
Not while we take [ (Apologies: I'm going to use sorbus's names for things below because I'm looking at sorbus right now and don't remember exactly how sorbus's builder names map back into rowan's.) The tricky part is that a key part of how The thing is, all of these types offer a Off I go to try to make a crate to define an iterator trait for "iterables that can be iterated non-destructively by-ref and then destructively by-val". It won't be perfect because I think the optimal definition wants GAT, but I think I might be able to make something work. EDIT: wasn't able to figure anything out, and |
Hmm, I don't think we can land this as an improvement before we can reuse the allocation of the children iterator passed into |
45: Remove allocation for builder cache hits r=CAD97 a=CAD97 Credit to @simonvandel [over at the main rowan repository](rust-analyzer/rowan#59) for pointing out how this is possible. Previously: #40 Now that the necessary APIs for this are riding the trains to stable, here's an actual decent implementation of alloc-free cache hits! Also adds a bit more explicit typing on pointers, as that led to a bug in the construction of this patch. Co-authored-by: CAD97 <[email protected]>
We now check if the node/token is in the cache. If it is not in the cache, then we allocate.
In addition, avoid hashing the complete sub-tree.
This speeds up analysis-stats on rust-analyser by ~5% on my local machine.
Benchmark
➜ rust-analyzer-base git:(master) hyperfine --warmup 1 --min-runs=3 '/home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .' '/home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .'
Benchmark #1: /home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .
Time (mean ± σ): 46.027 s ± 0.419 s [User: 45.120 s, System: 0.785 s]
Range (min … max): 45.551 s … 46.343 s 3 runs
Benchmark #2: /home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .
Time (mean ± σ): 48.445 s ± 0.457 s [User: 47.657 s, System: 0.715 s]
Range (min … max): 47.978 s … 48.892 s 3 runs
Summary
'/home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .' ran
1.05 ± 0.01 times faster than '/home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .'