-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ResourceArc::make_binary #487
Conversation
@josevalim FYI |
@filmor looks great! 👍 |
Btw, for completeness, I don't think this would work for my use case, so we would still need the low-level APIs. I am allocating multiple binaries from the same resource: Well, maybe you could allow a Vec of slices to be returned? But also note in the example above that I proceed to create sub binaries from the resource binary, and even if you changed this function to create multiple binaries, it would require me doing an additional pass on the data. In any case, I think this is still solid for a convenient, safer API. :) Even though I don't think my opinion is worth much here. |
Couldn't you just call |
@hansihe that's a chunked array, so |
This API is a really good idea by the way @filmor! |
@josevalim Ahh, I see what you mean |
How about an |
I think adding a We need to make the documentation for that function very scary. |
@filmor that would work but I may send a PR for |
Having the same signature helps to clarify the difference, IMO. It also makes it clear that the binary's lifetime should be connected to the referenced object.
What overhead are you referring to? Creating a |
Let’s merge this with the unsafe version and I can give it a try and benchmark. But last I checked it was measurable. :) |
I was actually able to reproduce the benchmark with the current code. Here is the current code: Here is the patched version: iter.rfold(acc, |acc, uncast_offset| {
let offset = *uncast_offset as NIF_TERM;
let term = if validity_iter.next_back().unwrap_or(true) {
unsafe {
rustler_sys::enif_make_sub_binary(
env_as_c_arg,
binary_as_c_arg,
offset,
last_offset - offset,
)
}
} else {
nil_atom.as_c_arg()
};
last_offset = offset;
unsafe { list::make_list_cell(env_as_c_arg, term, acc) }
}) Here are the benchmark numbers:
I have also removed the bound checks, to see if they are the root cause, and no meaningful difference in the numbers:
And for completeness, here is the benchmark: alias Explorer.Series, as: S
strings =
for _ <- 1..1000 do
:crypto.strong_rand_bytes(8) |> Base.encode64()
end
series = S.from_list(strings)
Benchee.run(
%{"new" => fn -> S.to_list(series) end},
time: 5
) In a nutshell, each call to |
Well, we have this pending PR to introduce inlining, maybe that does it already? |
@filmor I have just tried the PR, it actually made things slower?
My understanding from other languages is that sometimes inlining can cause negative impact, by making the code larger so it doesn't fit the cache or by making code reuse harder (and then purged more frequently). I am not sure how much this applies to Rust. |
I looked into this a little bit, and I think the main difference in performance between using You can see that As far as I can see, we only really use the |
@hansihe nice find! We could apply the same idea to this PR as well. There is no reason to inspect it if we have the term, pointer and size upfront. |
Made a PR for the change |
@hansihe awesome! It brings the benchmark up to:
|
True, will do that later today. We just need to add an additional unsafe private constructor to |
I think the remaining overhead might be code bloat from panic handling for when Introducing and using a |
e1c9add
to
fc9a709
Compare
fc9a709
to
4c92d8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general!
Would want some additional info in the docs for the unsafe variant
@josevalim Can you check with your benchmark whether this is now acceptable without using the direct |
I think we might still need to have a |
@hansihe correct. I am on 10k with this branch, the code looks super clean though. :) |
@josevalim I added the unchecked variant and extended the comments a bit, will that do now? :) |
Looking at your code from the branch again, you can make it even more safe by using the direct term methods, so instead of |
With the new check:
I get this:
It is an improvement but more than 20% slower than the previous 15k. I will convert this to Using Term.list_prepend on top of the above, I got this:
However, I just realized I have been benchmarking in |
Yeah, I can actually measure those in MIX_ENV=prod too:
|
Yes, after more benchmarks the wrapper/rustler_sys version is still ~30% faster. So I would say having something like I am not sure if it makes sense to keep the unchecked version though if we add the wrapper. :) Thanks btw for all the back and forth in this, I am learning a lot! |
I am definitely for exporting It would be nice however if we could try to iron out the remaining potential performance issues and see if we can get the higher level APIs on par with the The version of rustler you are using doesn't have the In Rust, functions not marked with |
This is the branch/PR: elixir-explorer/explorer#346 - notice I am pointing to a local Rustler for now. Here is the benchmark:
Run it with |
I have added inline to |
OK, I think we should do the following:
Does this make sense to you @josevalim @filmor? |
Beautiful, thank you! I will send a PR for the missing bit! |
No description provided.