You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had been intending at one point to get y'all a PR for this as a way to dip my toes into rkv, which is a cool project. That said, recent events have made this unlikely, so I at least wanted to let you know of the issue.
There's a performance issue here: In general, in serde serializing Vec<u8>, Box<[u8]> and &[u8] is surprisingly inefficient. This is because it can't specialize these for u8, and has to serialize the whole sequence generically. Serde actually has support for doing the efficient thing here, but you have to opt-in by accepting various visit_bytes methods on deserializers.
For example, by default serde will individually deserialize every byte like this:
fnvisit_seq<V:SeqAccess<'de>>(self,mutvisitor:V) -> Result<Buffer,V::Error>{letmut v = Vec::with_capacity(cautious::size_hint(visitor.size_hint());whileletSome(b) = visitor.next_element()? {
v.push(b);}Ok(Buffer(v.into()))}
But they bytes visitors accept the whole chunk in one go, like this:
Which deserializes all bytes at once. Note that there's a helpful crate that provides wrappers that do this in part for you: https://github.com/serde-rs/bytes . It uses Vec<u8> for its owned type, and so if the distinction matters to you, you'll need to copy its impl and use Box<[u8]> instead. Its not much code..
It should be fully compatible with what you're doing now, only faster, but obviously you should test. My experience is that changing bincode code to use serde_bytes does not seem to impact the serialized format at all.
The text was updated successfully, but these errors were encountered:
I had been intending at one point to get y'all a PR for this as a way to dip my toes into rkv, which is a cool project. That said, recent events have made this unlikely, so I at least wanted to let you know of the issue.
Loading or persisting an instance of the Safe backend ultimately involves reading/writing instances of https://github.com/mozilla/rkv/blob/master/src/backend/impl_safe/snapshot.rs#L26-L36 and some other stuff to disk using
bincode
.There's a performance issue here: In general, in serde serializing
Vec<u8>
,Box<[u8]>
and&[u8]
is surprisingly inefficient. This is because it can't specialize these foru8
, and has to serialize the whole sequence generically. Serde actually has support for doing the efficient thing here, but you have to opt-in by accepting variousvisit_bytes
methods on deserializers.For example, by default serde will individually deserialize every byte like this:
But they bytes visitors accept the whole chunk in one go, like this:
Which deserializes all bytes at once. Note that there's a helpful crate that provides wrappers that do this in part for you: https://github.com/serde-rs/bytes . It uses
Vec<u8>
for its owned type, and so if the distinction matters to you, you'll need to copy its impl and use Box<[u8]> instead. Its not much code..It should be fully compatible with what you're doing now, only faster, but obviously you should test. My experience is that changing bincode code to use
serde_bytes
does not seem to impact the serialized format at all.The text was updated successfully, but these errors were encountered: