[Tracking issue] Querying number of bytes needed to encode a struct #539

bitwiseshiftleft · 2022-04-09T12:49:23Z

I've noticed that v2.0 doesn't have a serialized size function. I'm a rust newbie, so I'm not going to attempt a pull request, but an implementation like this would probably suffice?

use bincode::{Encode,config::Config,enc::EncoderImpl,error::EncodeError,enc::write::Writer};

/** Writer which only counts the bytes "written" to it */
struct SizeOnlyWriter<'a> {
    bytes_written: &'a mut usize
}

impl<'a> Writer for SizeOnlyWriter<'a> {
    fn write(&mut self, bytes: &[u8]) -> Result<(), EncodeError> {
        *self.bytes_written += bytes.len();
        Ok(())
    }
}

/** Return the serialized size of an `Encode` object. */
pub fn serialized_size<T:Encode,C:Config>(obj:&T, config:C) -> Result<usize, EncodeError> {
    let mut size = 0usize;
    let writer = SizeOnlyWriter { bytes_written: &mut size };
    let mut ei = EncoderImpl::new(writer, config);
    obj.encode(&mut ei)?;
    Ok(size)
}

The text was updated successfully, but these errors were encountered:

VictorKoenders · 2022-04-09T13:44:14Z

We haven't implemented serialized_size because we're not sure it has a valid use case. Most of the time when people use serialized_size it is because they want to use it like this:

let size = bincode::serialize_size(&T, &config).unwrap();
let mut vec = vec![0u8; size];
bincode::encode_into_slice(&T, vec.as_mut_slice(), config).unwrap();

We personally have found that the above is a lot slower than simply encoding to a vec, as you have to process the entire structure twice.

let mut vec = bincode::encode_to_vec(&T, config).unwrap();

But maybe there's a use case we missed?

bitwiseshiftleft · 2022-04-09T14:53:07Z

I was thinking of using it in an FFI call, so that I could serialize into memory owned by C.

VictorKoenders · 2022-04-18T08:11:19Z

Would it be possible to:

let bincode allocate a Vec<u8>
call into_raw_parts
return that to C
ffi back into Rust to deallocate this vec when you're done

Otherwise it is possible to create your own LenWriter, implement Writer and then call encode_into_writer that simply counts the # of bytes that are written.

VictorKoenders · 2022-04-18T08:11:53Z

I'll leave this open as a tracking issue to see if other people are interested in having a method to get the encoded len in bincode 2.

bitwiseshiftleft · 2022-04-18T19:45:07Z

Yeah, I made my own SizeOnlyWriter as shown above, to get the serialized size and tell it to C. I preferred that just in case C has somewhere very particular that it wants the data written, since either option was about equal effort.

Thanks for considering, and I'll be interested to see if anyone else wants this.

pseyfert · 2022-06-15T16:41:10Z

@VictorKoenders I was also looking for serialized_size in bincode. My use case is I handle deserialization in rust open a file in python (the interface done with pyo3) and don't want to read the entire file content into a byte array.

The usage I have in mind would be something along the lines of:

with open("binary.file", "rb") as f:
    my_rustlib.deserialize( f.read( my_rustlib.serialized_sized() ) )
    # logic that decides if more should be read, …
    ...

and on the backend i have - in bincode 1 - something like

#[pyfunction]
fn serialized_size() -> anyhow::Result<u64> {
    let x = MyStruct::new();
    Ok(bincode::serialized_size(&x)?)
}

Sure there are a number of things I could change about the approach (hard coding the size and adding a test to catch when i update the definition of MyStruct, determine the size in a build.rs to generate a constant, …) but one way or another I end up in the situation where I need to know how many bytes I need to pass to the deserializer.

(As an aside, what bincode 1 doesn't do perfectly for me either is that I don't have a dynamically sized object and would want to call bincode::serialized_size::<MyStruct>() that doesn't require instantiating the struct.

purpleposeidon · 2022-07-30T16:19:55Z

Serializing the length of a section so that it can be skipped over in decoding.

WilliamVenner · 2022-08-12T11:03:53Z

I think it should be documented that serialized_size isn't for optimisation purposes (and typically is slower) for the pattern described here #539 (comment)

marcbone · 2022-12-02T16:24:30Z

This problem is causing me headaches for years. So this is my use case:

I am sending data over the network. I am sending a Header+Request. The header has to contain the serialized size of the Request+Header.

My current solution is to make all structs "#[repr(packed)]" and use std::mem::size_size_of to get the serialized size.
However, there are some restrictions on what you can do with packed structs (rust-lang/rust#82523).

So I would like to remove "#repr(packed)" and use bincode::serialized_size, however it requires a concrete object and I would like to avoid that. Basically what I would like to have is a method that calculates the serialized size of my struct at compile time or at least without creating the object.

bitwiseshiftleft · 2022-12-02T20:54:12Z

This sounds somewhat useful to me as well. However I think it should be a separate trait, because for many types the serialized size is dynamic. You could have a (preferably derivable) StaticSerializedSize trait, which might depend on the encoding but not on any objects, just on the class.

…

On Dec 2, 2022, at 5:24 PM, Marco Boneberger ***@***.***> wrote: This problem is causing me headaches for years. So this is my use case: I am sending data over the network. I am sending a Header+Request. The header has to contain the serialized size of the Request+Header. My current solution is to make all structs "#[repr(packed)]" and use std::mem::size_size_of to get the serialized size. However, there are some restrictions on what you can do with packed structs (rust-lang/rust#82523 <rust-lang/rust#82523>). So I would like to remove "#repr(packed)" and use bincode::serialized_size, however it requires a concrete object and I would like to avoid that. Basically what I would like to have is a method that calculates the serialized size of my struct at compile time or at least without creating the object. — Reply to this email directly, view it on GitHub <#539 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACGKXFEZEZFJZ2EMLMAY3DWLIPERANCNFSM5S67C42A>. You are receiving this because you authored the thread.

LLFourn · 2022-12-30T03:13:41Z

My use case is that I am using a file as a database and some parts of the buffer are not being used. I want to know whether the thing I need to insert into the file should go at the end or whether I can write it over one of the unused chunks in the file.

For my particular case I'd actually prefer an API that just gave me an upper bound given the type (this sounds like what @marcbone is asking for too):

match bincode::max_serialized_size::<T>(bincode_config) {
     Some(max_size) => { /* serializing T will always take <= max_size */ },
     None => { /* there is no upper bound */ }
}

This would be practically zero performance cost and good enough for my particular use but I think both max_serialized_size and serialized_size would be good additions with the appropriate documentation.

Monadic-Cat · 2023-10-25T16:49:20Z

I'll go ahead and register my interest here, as I was looking through the bincode-2 tagged issues. I have this

struct ByteCounter {
    count: usize,
}
impl Writer for ByteCounter {
    fn write(&mut self, bytes: &[u8]) -> Result<(), EncodeError> {
        self.count += bytes.len();
        Ok(())
    }
}

/// Count the bytes a value will occupy when encoded.
pub fn count_bytes<T: Encode>(x: &T) -> usize {
    let mut counter = ByteCounter { count: 0 };
    bincode::encode_into_writer(x, &mut counter, bincode_config()).unwrap();
    counter.count
}

in a codebase of mine because I'm talking to SQLite and don't want to create a serialized copy of an object I'm working with outside of what I write into a SQLite BLOB, and SQLite requires you to set the length of a BLOB before you start writing to it. In this case, it's about memory usage, not serialization performance.

That said, it's not a big deal to me whether this specific API ends up in bincode, as I've already implemented it in user code.

(Edit: That bincode_config() function just returns the bincode configuration I use everywhere in this codebase. Nothing special there.)

VictorKoenders · 2023-10-25T17:58:35Z

I think this is already resolved with SizeWriter and encode_into_writer:

let mut size_writer = SizeWriter::default();
bincode::encode_into_writer(&t, &mut size_writer, config).unwrap();
println!("{:?}", size_writer.bytes_written

Does this work for your use case? Then I think this issue can be closed

Monadic-Cat · 2023-10-25T18:35:20Z

I think this is already resolved with SizeWriter and encode_into_writer:
let mut size_writer = SizeWriter::default();
bincode::encode_into_writer(&t, &mut size_writer, config).unwrap();
println!("{:?}", size_writer.bytes_written
Does this work for your use case? Then I think this issue can be closed

Yup. I'm not actually sure how I missed that that exists 👍

VictorKoenders · 2023-10-27T09:20:30Z

Thanks for testing 👍 closing this

VictorKoenders added not-stale bincode-2 labels Apr 18, 2022

VictorKoenders changed the title ~~Add a serialized size function to v2.0~~ [Tracking issue] Querying number of bytes needed to encode a struct Apr 18, 2022

VictorKoenders added this to the v2.0 milestone Jun 15, 2022

markbt mentioned this issue Jan 24, 2023

Add EncodedSize trait to calculate encoded sizes #609

Closed

VictorKoenders closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking issue] Querying number of bytes needed to encode a struct #539

[Tracking issue] Querying number of bytes needed to encode a struct #539

bitwiseshiftleft commented Apr 9, 2022

VictorKoenders commented Apr 9, 2022

bitwiseshiftleft commented Apr 9, 2022

VictorKoenders commented Apr 18, 2022

VictorKoenders commented Apr 18, 2022

bitwiseshiftleft commented Apr 18, 2022

pseyfert commented Jun 15, 2022

purpleposeidon commented Jul 30, 2022

WilliamVenner commented Aug 12, 2022

marcbone commented Dec 2, 2022

bitwiseshiftleft commented Dec 2, 2022 via email

LLFourn commented Dec 30, 2022 •

edited

Loading

Monadic-Cat commented Oct 25, 2023 •

edited

Loading

VictorKoenders commented Oct 25, 2023

Monadic-Cat commented Oct 25, 2023

VictorKoenders commented Oct 27, 2023

[Tracking issue] Querying number of bytes needed to encode a struct #539

[Tracking issue] Querying number of bytes needed to encode a struct #539

Comments

bitwiseshiftleft commented Apr 9, 2022

VictorKoenders commented Apr 9, 2022

bitwiseshiftleft commented Apr 9, 2022

VictorKoenders commented Apr 18, 2022

VictorKoenders commented Apr 18, 2022

bitwiseshiftleft commented Apr 18, 2022

pseyfert commented Jun 15, 2022

purpleposeidon commented Jul 30, 2022

WilliamVenner commented Aug 12, 2022

marcbone commented Dec 2, 2022

bitwiseshiftleft commented Dec 2, 2022 via email

LLFourn commented Dec 30, 2022 • edited Loading

Monadic-Cat commented Oct 25, 2023 • edited Loading

VictorKoenders commented Oct 25, 2023

Monadic-Cat commented Oct 25, 2023

VictorKoenders commented Oct 27, 2023

LLFourn commented Dec 30, 2022 •

edited

Loading

Monadic-Cat commented Oct 25, 2023 •

edited

Loading