Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 release candidate initiative to include stateful decode impls akin to serde's DeserializeSeed? #578

Open
FinFighter opened this issue Aug 30, 2022 · 7 comments

Comments

@FinFighter
Copy link

With the transition to the bincode specific encode and decode traits in the v2 release candidate, it appears there is currently no mechanism available for stateful deserialization comparable to Serde's DeserializeSeed trait.

See: https://github.com/serde-rs/serde/blob/f52d134c14f6904010d0a59107987b050b36b811/serde/src/de/mod.rs#L770

The following trait definitions could be included to provide this functionality.

pub trait DecodeSeed: Sized {
    type Value;

    fn decode_seed<D: Decoder>(self, decoder: &mut D) -> Result<Self::Value, DecodeError>;
}
pub trait BorrowDecodeSeed<'de>: Sized {
    type Value;

    fn borrow_decode_seed<D: BorrowDecoder<'de>>(self, decoder: &mut D) -> Result<Self::Value, DecodeError>;
}

For example, utilizing the DecodeSeed trait defined above, the SequenceSeed struct could be defined below:

struct SequenceSeed<'a>(&'a mut Vec<f64>);

impl<'a> DecodeSeed for SequenceSeed<'a> {
    type Value = u64;

    #[inline]
    fn decode_seed<D: bincode::de::Decoder>(self, decoder: &mut D) -> Result<Self::Value, bincode::error::DecodeError> {
        let size: u64 = bincode::Decode::decode(decoder)?;

        for _ in 0..size {
            let element = bincode::Decode::decode(decoder)?;
            self.0.push(element)
        }

        Ok(size)
    }
}

Utilizing SequenceSeed, we can append each decoded element to a pre-allocated vector.

// Encode
let values = vec![34.5, 345.3, 40.3, 100.55, 60.5];
let buffer = bincode::encode_to_vec(&values, config::standard()).expect("Failed to encode");

// Construct decoder
let slice_reader = bincode::de::read::SliceReader::new(&buffer[..]);
let mut decoder = bincode::de::DecoderImpl::new(slice_reader, config::standard());

// Construct SequenceSeed utilizing pre-allocated vector
let mut seq = Vec::with_capacity(5);
let seed = SequenceSeed(&mut seq);

// Decode using SequenceSeed's DecodeSeed implementation
let len = seed.decode_seed(&mut decoder).expect("Failed to decode");

println!("{len}");  // 5
println!("{seq:?}") // [34.5, 345.3, 40.3, 100.55, 60.5]

I feel this functionality provides many opportunities for optimizations and would be a useful addition.

@VictorKoenders
Copy link
Contributor

VictorKoenders commented Aug 31, 2022

Is there a reason we cannot do this?

pub trait DecodeInPlace: Sized { // name is bikesheddable
    // note: &mut self, and does not return a value
    fn decode_seed<D: Decoder>(&mut self, decoder: &mut D) -> Result<(), DecodeError>;
}

@VictorKoenders
Copy link
Contributor

One thing that might become a problem is that we'll start scaling the types rapidly. Say we implement an async version as well, we'll get:

  • Decode and BorrowDecode
  • DecodeSeed and BorrowDecodeSeed
  • DecodeAsync and BorrowDecodeAsync
  • DecodeSeedAsync and BorrowDecodeSeedAsync

If we add another variant, we'll end up with 16 different types, although I'm currently not sure if we'll need another type.

@FinFighter
Copy link
Author

Is there a reason we cannot do this?

pub trait DecodeInPlace: Sized { // name is bikesheddable
    // note: &mut self, and does not return a value
    fn decode_seed<D: Decoder>(&mut self, decoder: &mut D) -> Result<(), DecodeError>;
}

I feel that this design could be slightly less ergonomic in cases where we might want to utilize a reference rather than some mutable data structure and get some kind of meaning full return.

For example, if we wanted to filter a component of a larger message in order to avoid further decoding, we could use a reference to a HashSet containing the variants we are interested in.

We can encode the following struct:

#[derive(Debug, Encode, Decode, Hash, Eq, PartialEq)]
enum Color {
    Red,
    Blue,
    Green,
}

#[derive(Debug, Encode)]
pub struct Car {
    color: Color,
    top_speed: u64,
    cylinders: u8,
}

And define the following struct implementing DecodeSeed where we are filtering for desired colors:

struct CarSeed<'a> {
    filter: &'a HashSet<Color>,
}

impl<'a> DecodeSeed for CarSeed<'a> {
    type Value = Option<Car>;

    #[inline]
    fn decode_seed<D: bincode::de::Decoder>(self, decoder: &mut D) -> Result<Self::Value, bincode::error::DecodeError> {
        let color = bincode::Decode::decode(decoder)?;

        if !self.filter.contains(&color) {
            return Ok(None);
        }

        let top_speed = bincode::Decode::decode(decoder)?;
        let cylinders = bincode::Decode::decode(decoder)?;

        Ok(Some(Car { color, top_speed, cylinders }))
    }
}
// Encode
let car = Car { color: Color::Blue, top_speed: 80, cylinders: 4};
let buffer = bincode::encode_to_vec(&car, config::standard()).unwrap();

// Create HashSet filter containing the Green Color variant
let mut filter = HashSet::new();
filter.insert(Color::Green);
let seed = CarSeed { filter: &filter };

// Decode
let slice_reader = SliceReader::new(&buffer[..]);
let mut decoder = bincode::de::DecoderImpl::new(slice_reader, config::standard());
let result = seed.decode_seed(&mut decoder);

println!("{result:?}") // Ok(None)

In this case, I feel having a return type is more clear.

One thing that might become a problem is that we'll start scaling the types rapidly. Say we implement an async version as well, we'll get:

* `Decode` and `BorrowDecode`

* `DecodeSeed` and `BorrowDecodeSeed`

* `DecodeAsync` and `BorrowDecodeAsync`

* `DecodeSeedAsync` and `BorrowDecodeSeedAsync`

If we add another variant, we'll end up with 16 different types, although I'm currently not sure if we'll need another type.

This is a valid concern. I agree that if there appears to be a way forward for adding several new types or traits akin to what you suggested, then adding a new Seed variant could become cumbersome for maintainability.

After working with the library release candidate for the past few days, I found it is actually quite straight forward to implement custom decoding sequences using the exposed internals. Thus, all of what I would want to do in terms of stateful decoding appears to be fully possible. I see the real benefit of adding a stateful decoding trait to the library itself would be for standardization.

@VictorKoenders
Copy link
Contributor

VictorKoenders commented Oct 4, 2022

After working with the library release candidate for the past few days, I found it is actually quite straight forward to implement custom decoding sequences using the exposed internals.

I've been considering making a bincode-utils crate for a while now, something tha builds upon bincode but has more utility functions, like rocket has done with rocket_contrib. Maybe this would be a thing to go into that crate.

@stale
Copy link

stale bot commented Dec 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 3, 2022
@dtolnay
Copy link
Contributor

dtolnay commented Dec 3, 2022

Until a repo for bincode-utils has been made for this issue to get moved to, let's continue to track it here.

@stale
Copy link

stale bot commented Feb 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants