-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of extend_from_slice
where T: Copy
#235
Comments
@zslayton Side question: What did you use to produce the visualization of assembly branch structure above? It's really nice! Please note, we actually saw a very slight performance degredation from #229 in OXC oxc-project/oxc#2417. I believe this is due to OXC only dealing with really short strings, where copying byte-by-byte (presumably inlined) is actually faster than a call to |
I used Cutter, an open source reverse engineering tool. It's very helpful!
That's good food for thought, thank you for raising it--I'll defer to @fitzgen as to whether it needs to be addressed. My use case is a mix of short strings and byte arrays that are anywhere from empty to several kilobytes in typical scenarios. The change in #236 was a very large performance increase in all of my existing benchmarks, but I didn't closely examine the performance of small slices. |
@overlookmotel I ended up extending the benchmark to test a variety of input sizes. You can see the results here. |
This was fixed by #236. |
When using a
Vec<'_, u8>
, I was surprised to see thatmy_vec.extend_from_slice("SUCCESS".as_bytes());
produced assembly that looks like this onx86_64
:(mac OS 14.3 Sonoma, Intel processor,
rustc
v1.76.0)Notice that it's performing a reserve, error check, and push for each byte in the input slice. I had expected LLVM to reduce this to something like a single reserve, error check, and
memcpy
for the entire slice.I'd like to contribute a change analogous to the one recently merged in #229 but for
&[u8]
instead of&str
. Before I get started, I wanted to check whether 1) you'd be interested in merging such an optimization and 2) what the API should look like (since we don't have specialization to customizeextend_from_slice
forVec<'_, u8>
).The text was updated successfully, but these errors were encountered: