-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes BytesMut not to rely on Vec + Arc #435
base: master
Are you sure you want to change the base?
Conversation
This changes the internal BytesMut representation to allocate memory on its own, and not to use Arc + Vec under the hood. Doing this allows to have a single representation which is shareable (by being able to modify an embedded refcount) with zero-cost right from the start. Not representation changes have to be performed over the lifetime of `BytesMut`. Therefore `BytesMut::freeze()` is now also free. The change also opens the door for adding support for buffer pools or custom allocators, since an allocator is directly called. Adding this support would mostly mean having to pass an allocator to BytesMut which is used called instead of a fixed function. This would however be a different change. The change is WIP, and misses the following parts: - The buffer growth strategy is not exactly the same as in the old version. It needs to be reviewed what the best strategy is. - Some situations where an empty buffer (which has no backing storage) is manipulated might have bugs. This needs another round of review, and some additional tests.
// There are some situations, using `reserve_exact` that the | ||
// buffer capacity could be below `original_capacity`, so do a | ||
// check. | ||
let double = self.cap.checked_shl(1).unwrap_or(new_cap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one part that leads to the failing tests.
The old version doubled the original capacity of the storage. However if that was split, the capacity of the split part could be much lower.
I'm not really sure whether going back to the original capacity or even doubling it makes sense => If someone split off a 100 byte chunk of a 1MB BytesMut
and they want to write a few more bytes into it, it shouldn't grow towards 1 or even 2 MB - right?
I'm somehwat leaning towards ignoring the original capacity at all, just making decisions based on capacity of the current part. But maybe I'm just missing why it was added that way? Maybe to make sure that a receive buffer somewhere which uses BytesMut
and from deserialized data is split off always stays at the same size? If people would want that, they could still .reserve()
that size themselves after splitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it seems like surprising behavior. I would vote to ignore the original capacity.
cc/ @seanmonstar who did most of this work most recently. |
… versions did Also add a lot more comments around it
I'm now rather happy with this, so I removed the I added some more documentation around the memory allocation strategy - based on my understanding of previous versions - and questions on what we should do there. |
Not sure what is going on with Loom. I think it doesn't like my use of |
|
// There are some situations, using `reserve_exact` that the | ||
// buffer capacity could be below `original_capacity`, so do a | ||
// check. | ||
let double = self.cap.checked_shl(1).unwrap_or(new_cap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it seems like surprising behavior. I would vote to ignore the original capacity.
Co-authored-by: Cameron Bytheway <[email protected]>
It looks like you are inlining the capacity with the data? Given that buffers are often allocated to be page length, would this not result in odd allocation patterns? The size would often be |
Looking through this, I believe this should be able to land w/o a breaking change. I don't think we need to block 0.6 on this. edit: fixed the version number. |
Are you aware about any applications which do that? I don't think you can currently rely on that, since your system memory allocator might also not do the same - e.g. it might also need to store it's metadata inside the allocation header and therefore take size away from it. But it's true that buffer sizes which are multiples of 4k are often programmers favorites. What we could do is maybe rounding up allocations to page size, so that the capacity isn't lost? But that might require adding a |
If users want that behavior, the can still reserve for the desired capacity manually.
regarding
I confirmed that e.g. mimalloc will optimize fixed sized chunks, and Based on this I see 2 ways forward for this:
I guess for the use-case of "have a small serialization buffer and continuously add stuff" approach 2) will end up slower than 1) (and this CR). However it's more deterministic for managing large chunks of data. |
Do we know why we can't entirely hide the the |
Would this also allow for free |
let shared_addr = self as *mut Self; | ||
let data_addr = shared_addr.offset(1) as *mut u8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, under Stacked Borrows this pointer only has provenance over Self
, so using it to access the data section would be UB. Under PNVI (the proposed C aliasing model) I believe this is allowed. It's not entirely clear what the finalized rules for Rust will be, but you should probably do one of:
- Fix this under SB, probably by adding a
[u8]
member toShared
- Fix this when it becomes UB for sure
- Campaign for this to not become UB, perhaps on Storing an object as &Header, but reading the data past the end of the header rust-lang/unsafe-code-guidelines#256
This changes the internal BytesMut representation to allocate memory on
its own, and not to use Arc + Vec under the hood.
Doing this allows to have a single representation which is shareable (by
being able to modify an embedded refcount) with zero-cost right from
the start. No representation changes have to be performed over the
lifetime of
BytesMut
. ThereforeBytesMut::freeze()
is now also free.The change also opens the door for adding support for buffer pools or
custom allocators, since an allocator is directly called. Adding this
support would mostly mean having to pass an allocator to BytesMut
which is used called instead of a fixed function. This would however be
a different change.
Fixes #401