Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Bytes::from_owner #742

Merged
merged 28 commits into from
Oct 29, 2024
Merged

Conversation

amunra
Copy link
Contributor

@amunra amunra commented Oct 23, 2024

Introduces Bytes::from_owner, allowing the creation of a bytes object from another AsRef<[u8]> object which is responsible for the ownership of the exposed byte slice.

The core use case is to safely construct from mapped memory.

use memmap2::Map;

let file = File::open("upload_bundle.tar.gz")?;
let mmap = unsafe { Mmap::map(&file) }?;
let b = Bytes::from_owner(mmap);

It should practically address the vast majority of use cases for wanting to expose the VTable as has often been requested in #437, without actually exposing the VTable.

@Darksonn Darksonn requested a review from seanmonstar October 23, 2024 14:31
@Darksonn
Copy link
Contributor

What do you think, Sean? It is interesting that it avoids exposing the vtable like previous suggestions.

@amunra
Copy link
Contributor Author

amunra commented Oct 23, 2024

What do you think, Sean? It is interesting that it avoids exposing the vtable like previous suggestions.

Yes, that was my goal: #437 (comment)

src/bytes.rs Show resolved Hide resolved
@carllerche
Copy link
Member

At a quick glance, this looks like it would be forwards compatible with exposing the vtable?

src/bytes.rs Outdated
///
/// Note that converting [Bytes] constructed from an owner into a [BytesMut]
/// will always create a deep copy of the buffer into newly allocated memory.
pub fn from_owner<T: AsRef<[u8]>>(owner: T) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need some kind of StableDeref bound to protect against weird edge cases where the referenced region changes over time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string crate had a similar issue: carllerche/string#9

I do wonder if Sync would be sufficient here... In general, we probably do need a Send bound as well since Bytes is Send

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync wouldn't cover it since you can have a thread-safe implementation that still mutates internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for StableDeref isn't calling as_ref just once, never moving the memory afterwards because it's Boxed and never calling any other method already guaranteeing the Bytes invariants and soundness in general?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah only calling as_ref once may actually make this fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a safe way for it to still be UB, even with only calling it once. If I pass in a form of Arc<Mutex<_>>, and then afterwards change the internal buffer completely (maybe a reallocation), the captured pointer could be referring to invalid memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference we get back from the as_ref needs to be valid for the remaining lifetime of the object though - it can't be swapped out without a call to an &mut method that would ensure the initial borrow is gone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's sound as-is. You cannot write a safe thing where this breaks even with Arc/Mutex because if the buffer is inside the mutex, the implementer of AsRef can't return the buffer from as_ref since it borrows from the mutex guard.

As written, the as_ref call essentially borrows the owner field until we run the destructor. As long as we never make any mutable function calls on owner during that time, it's okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::sync::{Arc, Mutex};

struct Evil {
    within: Arc<Mutex<Vec<u8>>>,
}

impl AsRef<[u8]> for Evil {
    fn as_ref(&self) -> &[u8] {
        // error[E0515]: cannot return value referencing temporary value
        self.within.try_lock().unwrap().as_ref()
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken the current API should guard against Owner exposing a thread local as well.

src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated
///
/// Note that converting [Bytes] constructed from an owner into a [BytesMut]
/// will always create a deep copy of the buffer into newly allocated memory.
pub fn from_owner<T: AsRef<[u8]>>(owner: T) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we may want our own trait instead of AsRef<[u8]> so we can add more methods in the future.

Copy link
Contributor Author

@amunra amunra Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<opinionated>

A custom trait is how I had originally started implementing this, but gave up early on for a few reasons:

  • Worse ergonomics for custom trait:
    • Many things already implement AsRef<[u8]>, so Bytes::from_owner(mmap); is easy and straight-forward.
    • Conversely, a custom trait would require Bytes::from_custom(MyMmapWrapper::new(mmap)).
  • Safety:
    • AsRef<[u8]> can be called during construction and only once. It's a trick that makes the from_owner API safe.
    • Any additional APIs would - in practice - need to be called during the lifetime of bytes. As an example trait CustomOwner { fn try_into_mut(self) -> Result<impl CustomOwnerMut, Self>; } (unless I'm wrong) would make it into an unsafe trait.
    • What's missing from the current approach is zero-copy conversion to BytesMut.
      The implementor of a trait that that exposed this would need to be very careful here to respect the Vec<u8> semantics. Context: If Bytes is not unique, the existing code creates a newly allocated Vec<u8> copy of the referenced memory. Unless due care is taken - in the specific case of an Mmap - this would likely lead an implementor to erroneously assume that they should create an MmapMut. This would be wrong, especially if the mmapped section is backed by a file as the resulting behaviour would sometimes update the contents of a file and sometimes not, depending on Bytes's internal refcount.
    • In general, also allowing any other calls to the object during its lifetime also allows for scope for the object to invalidate the slice returned from .as_ref() rendering that initial call unsafe too, since there's no borrow checker here to police usage.
  • Feature creep:
    • By all means, I think a custom trait approach might work in the future for a fn from_custom <T: CustomOwner> and as a good way of exposing more of the vtable functionality, but it is not where the current pain point in the API is and there are many hurdles to get past that as has been shown in previous efforts to tackle meta: Expose Bytes vtable #437.

The full-fat custom use case would mostly cover - as far as I can tell - mixing Bytes created and managed via multiple custom memory allocators (QuestDB would certainly care!), but I suspect there's a more elegant way to handle those anyway.

</opinionated>

.. that said, if you guys prefer that route I'm happy to amend my PR in that direction.

src/bytes.rs Outdated Show resolved Hide resolved
@amunra amunra force-pushed the external_ownership branch from b5015b9 to 6271eff Compare October 23, 2024 22:20
@carllerche
Copy link
Member

Also, I want to thank @amunra for putting this PR together and moving the issue forward. I think this is a great/clever way to move bytes forward incrementally without figuring out all the complexities of exposing the vtable.

I will try to do a closer review of the code tomorrow and we may want to consider other names than from_owned (though I think it might be the best we come up with).

tests/test_bytes.rs Outdated Show resolved Hide resolved
@amunra amunra force-pushed the external_ownership branch from ddcffb3 to d1aed1d Compare October 24, 2024 09:40
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
tests/test_bytes.rs Outdated Show resolved Hide resolved
src/bytes.rs Outdated Show resolved Hide resolved
@amunra
Copy link
Contributor Author

amunra commented Oct 27, 2024

I shall address the latest feedback tomorrow. I would appreciate clarity around what the behaviour of is_unique should be for from_owner.
If it is always to return false (the most pragmatic approach I think) I can update the API docs.

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@Darksonn Darksonn merged commit 30ee8e9 into tokio-rs:master Oct 29, 2024
15 checks passed
@amunra
Copy link
Contributor Author

amunra commented Oct 29, 2024

Thank you!

@bluestreak01 bluestreak01 deleted the external_ownership branch October 29, 2024 13:43
@Rexagon
Copy link

Rexagon commented Nov 21, 2024

Is there a plan to release a new version with this feature soon?

This is exactly what I needed to avoid allocating twice when streaming rocksdb::DBPinnableSlice using some common models. Big thanks for adding this!

@paolobarbolini
Copy link
Contributor

Is there a plan to release a new version with this feature soon?

It makes sense to make a new release at this point. If you'd like feel free to send a release PR.

@kylebarron kylebarron mentioned this pull request Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants