-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize EndianReader to use ptr/len #302
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A couple nitpicks inline below. Thanks @philipc !
src/endian_reader.rs
Outdated
} | ||
} | ||
} | ||
|
||
impl<Endian, T> EndianReader<Endian, T> | ||
where | ||
Endian: Endianity, | ||
T: Deref<Target = [u8]> + Clone + Debug, | ||
T: CloneStableDeref<Target = [u8]> + Clone + Debug, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the Clone
trait bound anymore for any of these, because CloneStableDeref
has Clone
as a super trait.
@@ -143,59 +149,77 @@ where | |||
// `self.endian`. Splitting the sub-range out from the endian lets us work | |||
// around this, making it so that only the `self.range` borrow is held active, | |||
// not all of `self`. | |||
// | |||
// This also serves to encapsulate the unsafe code concerning `CloneStableDeref`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also comment about how we must keep a handle to bytes
around since the ptr
is only valid while we still hold the bytes
(eg because holding the bytes
holds a refcount for us).
r.range.end = r.range.start + idx.end; | ||
r.range.start += idx.start; | ||
assert!(r.range.start <= r.range.end); | ||
assert!(r.range.end <= self.range.end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice to have all this encapsulated.
Interesting that our benches didn't show any speed up but the larger program did. I wonder if our benches are small enough that the caches are hiding some of the indirection costs for us here. |
This requires `stable_deref_trait::CloneStableDeref` for the bytes. If needed in future, we could relax that to `stable_deref_trait::StableDeref` by implementing a `Clone` for `SubRange` that recalculates the `ptr`. This doesn't show any significant change in the gimli benchmarks, but I have modified `addr2line` to use `Rc<Cow<[u8]>>`, and it shows about 15% improvement for all operations in that (both context creation and lookup).
8fc936e
to
f4a43c2
Compare
perf reports that a large chunk of the time in the benches is spent in a move and drop_in_place, and this didn't change due to this PR. This PR did make the code smaller and remove some slice bounds checks, but they weren't being reported as hot. That move and drop aren't present in the |
This requires
stable_deref_trait::CloneStableDeref
for the bytes. If needed in future, we could relax that tostable_deref_trait::StableDeref
by implementing aClone
forSubRange
that recalculates theptr
.This doesn't show any significant change in the gimli benchmarks, but I have modified
addr2line
to useRc<Cow<[u8]>>
, and it shows about 15% improvement for all operations in that (both context creation and lookup).