Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking issue] Querying number of bytes needed to encode a struct #539

Closed
bitwiseshiftleft opened this issue Apr 9, 2022 · 15 comments
Closed

Comments

@bitwiseshiftleft
Copy link

I've noticed that v2.0 doesn't have a serialized size function. I'm a rust newbie, so I'm not going to attempt a pull request, but an implementation like this would probably suffice?

use bincode::{Encode,config::Config,enc::EncoderImpl,error::EncodeError,enc::write::Writer};

/** Writer which only counts the bytes "written" to it */
struct SizeOnlyWriter<'a> {
    bytes_written: &'a mut usize
}

impl<'a> Writer for SizeOnlyWriter<'a> {
    fn write(&mut self, bytes: &[u8]) -> Result<(), EncodeError> {
        *self.bytes_written += bytes.len();
        Ok(())
    }
}

/** Return the serialized size of an `Encode` object. */
pub fn serialized_size<T:Encode,C:Config>(obj:&T, config:C) -> Result<usize, EncodeError> {
    let mut size = 0usize;
    let writer = SizeOnlyWriter { bytes_written: &mut size };
    let mut ei = EncoderImpl::new(writer, config);
    obj.encode(&mut ei)?;
    Ok(size)
}
@VictorKoenders
Copy link
Contributor

We haven't implemented serialized_size because we're not sure it has a valid use case. Most of the time when people use serialized_size it is because they want to use it like this:

let size = bincode::serialize_size(&T, &config).unwrap();
let mut vec = vec![0u8; size];
bincode::encode_into_slice(&T, vec.as_mut_slice(), config).unwrap();

We personally have found that the above is a lot slower than simply encoding to a vec, as you have to process the entire structure twice.

let mut vec = bincode::encode_to_vec(&T, config).unwrap();

But maybe there's a use case we missed?

@bitwiseshiftleft
Copy link
Author

I was thinking of using it in an FFI call, so that I could serialize into memory owned by C.

@VictorKoenders
Copy link
Contributor

Would it be possible to:

  • let bincode allocate a Vec<u8>
  • call into_raw_parts
  • return that to C
  • ffi back into Rust to deallocate this vec when you're done

Otherwise it is possible to create your own LenWriter, implement Writer and then call encode_into_writer that simply counts the # of bytes that are written.

@VictorKoenders
Copy link
Contributor

I'll leave this open as a tracking issue to see if other people are interested in having a method to get the encoded len in bincode 2.

@VictorKoenders VictorKoenders changed the title Add a serialized size function to v2.0 [Tracking issue] Querying number of bytes needed to encode a struct Apr 18, 2022
@bitwiseshiftleft
Copy link
Author

Yeah, I made my own SizeOnlyWriter as shown above, to get the serialized size and tell it to C. I preferred that just in case C has somewhere very particular that it wants the data written, since either option was about equal effort.

Thanks for considering, and I'll be interested to see if anyone else wants this.

@pseyfert
Copy link

@VictorKoenders I was also looking for serialized_size in bincode. My use case is I handle deserialization in rust open a file in python (the interface done with pyo3) and don't want to read the entire file content into a byte array.

The usage I have in mind would be something along the lines of:

with open("binary.file", "rb") as f:
    my_rustlib.deserialize( f.read( my_rustlib.serialized_sized() ) )
    # logic that decides if more should be read, …
    ...

and on the backend i have - in bincode 1 - something like

#[pyfunction]
fn serialized_size() -> anyhow::Result<u64> {
    let x = MyStruct::new();
    Ok(bincode::serialized_size(&x)?)
}

Sure there are a number of things I could change about the approach (hard coding the size and adding a test to catch when i update the definition of MyStruct, determine the size in a build.rs to generate a constant, …) but one way or another I end up in the situation where I need to know how many bytes I need to pass to the deserializer.

(As an aside, what bincode 1 doesn't do perfectly for me either is that I don't have a dynamically sized object and would want to call bincode::serialized_size::<MyStruct>() that doesn't require instantiating the struct.

@VictorKoenders VictorKoenders added this to the v2.0 milestone Jun 15, 2022
@purpleposeidon
Copy link

Serializing the length of a section so that it can be skipped over in decoding.

@WilliamVenner
Copy link

I think it should be documented that serialized_size isn't for optimisation purposes (and typically is slower) for the pattern described here #539 (comment)

@marcbone
Copy link

marcbone commented Dec 2, 2022

This problem is causing me headaches for years. So this is my use case:

I am sending data over the network. I am sending a Header+Request. The header has to contain the serialized size of the Request+Header.

My current solution is to make all structs "#[repr(packed)]" and use std::mem::size_size_of to get the serialized size.
However, there are some restrictions on what you can do with packed structs (rust-lang/rust#82523).

So I would like to remove "#repr(packed)" and use bincode::serialized_size, however it requires a concrete object and I would like to avoid that. Basically what I would like to have is a method that calculates the serialized size of my struct at compile time or at least without creating the object.

@bitwiseshiftleft
Copy link
Author

bitwiseshiftleft commented Dec 2, 2022 via email

@LLFourn
Copy link

LLFourn commented Dec 30, 2022

My use case is that I am using a file as a database and some parts of the buffer are not being used. I want to know whether the thing I need to insert into the file should go at the end or whether I can write it over one of the unused chunks in the file.

For my particular case I'd actually prefer an API that just gave me an upper bound given the type (this sounds like what @marcbone is asking for too):

match bincode::max_serialized_size::<T>(bincode_config) {
     Some(max_size) => { /* serializing T will always take <= max_size */ },
     None => { /* there is no upper bound */ }
}

This would be practically zero performance cost and good enough for my particular use but I think both max_serialized_size and serialized_size would be good additions with the appropriate documentation.

@Monadic-Cat
Copy link

Monadic-Cat commented Oct 25, 2023

I'll go ahead and register my interest here, as I was looking through the bincode-2 tagged issues. I have this

struct ByteCounter {
    count: usize,
}
impl Writer for ByteCounter {
    fn write(&mut self, bytes: &[u8]) -> Result<(), EncodeError> {
        self.count += bytes.len();
        Ok(())
    }
}

/// Count the bytes a value will occupy when encoded.
pub fn count_bytes<T: Encode>(x: &T) -> usize {
    let mut counter = ByteCounter { count: 0 };
    bincode::encode_into_writer(x, &mut counter, bincode_config()).unwrap();
    counter.count
}

in a codebase of mine because I'm talking to SQLite and don't want to create a serialized copy of an object I'm working with outside of what I write into a SQLite BLOB, and SQLite requires you to set the length of a BLOB before you start writing to it. In this case, it's about memory usage, not serialization performance.

That said, it's not a big deal to me whether this specific API ends up in bincode, as I've already implemented it in user code.

(Edit: That bincode_config() function just returns the bincode configuration I use everywhere in this codebase. Nothing special there.)

@VictorKoenders
Copy link
Contributor

I think this is already resolved with SizeWriter and encode_into_writer:

let mut size_writer = SizeWriter::default();
bincode::encode_into_writer(&t, &mut size_writer, config).unwrap();
println!("{:?}", size_writer.bytes_written

Does this work for your use case? Then I think this issue can be closed

@Monadic-Cat
Copy link

I think this is already resolved with SizeWriter and encode_into_writer:

let mut size_writer = SizeWriter::default();
bincode::encode_into_writer(&t, &mut size_writer, config).unwrap();
println!("{:?}", size_writer.bytes_written

Does this work for your use case? Then I think this issue can be closed

Yup. I'm not actually sure how I missed that that exists 👍

@VictorKoenders
Copy link
Contributor

Thanks for testing 👍 closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants