-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create sections encoding for multi-value regions #760
Conversation
119c104
to
328390b
Compare
328390b
to
6bdb305
Compare
Ah, so the length after value is so we can parse from left-to-right. Will look at the code now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Added some suggestions to polish
#[allow(dead_code)] // used in Wasm and tests only | ||
pub fn decode_sections2(data: Vec<u8>) -> (Vec<u8>, Vec<u8>) { | ||
let section2_len: usize = if data.len() >= 4 { | ||
u32::from_be_bytes([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not encode with 2 bytes as now?
We expect/support > 64KB passed as a value for each section?
Although I guess it is a trivial cost to support this and allows us to never worry about size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not make unnecessary assumptions on data length.
packages/std/src/sections.rs
Outdated
data[data.len() - 1], | ||
]) as usize | ||
} else { | ||
panic!("Cannot read section2 length"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull this into a helper function (it is used twice)?
The less code that needs to change to go from decode_sections2 -> decode_sections3 -> decode_sectionsN the better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why not return error instead of panic?
Not that anyone would handle them, but panics don't provide any useful message to the caller, errors help debugging if something did go wrong in the vm side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Rust's iterator interface does not support errors in next()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 Good point
packages/std/src/sections.rs
Outdated
} | ||
|
||
let mut first = data; | ||
let mut second = first.split_off(section1_len_end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of an lower-level api that would let one easily create decodeN wrappers.
split_tail(data: Vec<u8>) -> StdResult<(Vec<u8>, Vec<u8>)>
which would parse out the last section, allocate memory and truncate the input return (head, tail).
one_element(data: Vec<u8>) -> StdResult<Vec<u8>>
after splitting off all the N args, this is called for the last (first) item. It parses out the length bytes, ensures they are correct, and then returns the truncated input without a realloc.
We would end up with something like:
pub fn decode_sections2(mut data: Vec<u8>) -> StdResult<(Vec<u8>, Vec<u8>)> {
let (head, second) = split_tail(data)?;
let first = one_element(head)?;
Ok((first, second))
}
pub fn decode_sections3(mut data: Vec<u8>) -> StdResult<(Vec<u8>, Vec<u8>)> {
let (head, third) = split_tail(data)?;
let (head, second) = split_tail(head)?;
let first = one_element(head)?;
Ok((first, second, third))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make sense if we intend 3, 4, 5 variants and avoids any bugs in writing those. Not sure if we will ever need those however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool. Not because we need it not but because it explains the algorithm. Implemented. The caller code got even nicer:
pub fn decode_sections2(data: Vec<u8>) -> (Vec<u8>, Vec<u8>) {
let (rest, second) = split_tail(data);
let (_, first) = split_tail(rest);
(first, second)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice updates. Looking good
]) as usize | ||
let (rest, mut tail) = if rest_len_end == 0 { | ||
// i.e. all data is the tail | ||
(Vec::new(), data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is 0 alloc, right? so you don't even need to handle the head case differently and still just as efficient. nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jupp. From Vec::new
:
The vector will not allocate until elements are pushed onto it.
More general, a vector only allocates heap when capacity > 0
.
Closes #602
TL:DR:
value || key || keylen
section1 || section1_len || section2 || section2_len || section3 || section3_len || …
It turned out that splitting the memory of a
Vec
into multiple vectors is heavility unsafe and it is better to keep the decoding in two steps:This means we need to do copies. If we split off from right to left, we can keep the original vector as the first element without reallocation, saving 50% of copies for the two element case.