Create sections encoding for multi-value regions #760

webmaster128 · 2021-02-02T09:34:52Z

Closes #602

TL:DR:

Before: value || key || keylen
After: section1 || section1_len || section2 || section2_len || section3 || section3_len || …

It turned out that splitting the memory of a Vec into multiple vectors is heavility unsafe and it is better to keep the decoding in two steps:

consume Region into one vector using the exact raw components it was created with,
split the resulting vector into multiple values.

This means we need to do copies. If we split off from right to left, we can keep the original vector as the first element without reallocation, saving 50% of copies for the two element case.

ethanfrey · 2021-02-02T14:16:57Z

Ah, so the length after value is so we can parse from left-to-right.
Makes sense but definitely needs to be documented (also with the method names).

Will look at the code now

ethanfrey

Looks good.
Added some suggestions to polish

packages/std/src/imports.rs

ethanfrey · 2021-02-02T14:21:58Z

packages/std/src/sections.rs

+#[allow(dead_code)] // used in Wasm and tests only
+pub fn decode_sections2(data: Vec<u8>) -> (Vec<u8>, Vec<u8>) {
+    let section2_len: usize = if data.len() >= 4 {
+        u32::from_be_bytes([


Why not encode with 2 bytes as now?
We expect/support > 64KB passed as a value for each section?

Although I guess it is a trivial cost to support this and allows us to never worry about size

I'd rather not make unnecessary assumptions on data length.

ethanfrey · 2021-02-02T14:23:25Z

packages/std/src/sections.rs

+            data[data.len() - 1],
+        ]) as usize
+    } else {
+        panic!("Cannot read section2 length");


Can you pull this into a helper function (it is used twice)?
The less code that needs to change to go from decode_sections2 -> decode_sections3 -> decode_sectionsN the better

Also, why not return error instead of panic?
Not that anyone would handle them, but panics don't provide any useful message to the caller, errors help debugging if something did go wrong in the vm side.

Because Rust's iterator interface does not support errors in next()

🤦 Good point

ethanfrey · 2021-02-02T14:30:22Z

packages/std/src/sections.rs

+    }
+
+    let mut first = data;
+    let mut second = first.split_off(section1_len_end);


I was thinking of an lower-level api that would let one easily create decodeN wrappers.

split_tail(data: Vec<u8>) -> StdResult<(Vec<u8>, Vec<u8>)> which would parse out the last section, allocate memory and truncate the input return (head, tail).

one_element(data: Vec<u8>) -> StdResult<Vec<u8>> after splitting off all the N args, this is called for the last (first) item. It parses out the length bytes, ensures they are correct, and then returns the truncated input without a realloc.

We would end up with something like:

pub fn decode_sections2(mut data: Vec<u8>) -> StdResult<(Vec<u8>, Vec<u8>)> { let (head, second) = split_tail(data)?; let first = one_element(head)?; Ok((first, second)) } pub fn decode_sections3(mut data: Vec<u8>) -> StdResult<(Vec<u8>, Vec<u8>)> { let (head, third) = split_tail(data)?; let (head, second) = split_tail(head)?; let first = one_element(head)?; Ok((first, second, third)) }

This would make sense if we intend 3, 4, 5 variants and avoids any bugs in writing those. Not sure if we will ever need those however

This is pretty cool. Not because we need it not but because it explains the algorithm. Implemented. The caller code got even nicer:

pub fn decode_sections2(data: Vec<u8>) -> (Vec<u8>, Vec<u8>) { let (rest, second) = split_tail(data); let (_, first) = split_tail(rest); (first, second) }

ethanfrey

Nice updates. Looking good

ethanfrey · 2021-02-02T20:44:25Z

packages/std/src/sections.rs

-        ]) as usize
+    let (rest, mut tail) = if rest_len_end == 0 {
+        // i.e. all data is the tail
+        (Vec::new(), data)


and this is 0 alloc, right? so you don't even need to handle the head case differently and still just as efficient. nice.

Jupp. From Vec::new:

The vector will not allocate until elements are pushed onto it.

More general, a vector only allocates heap when capacity > 0.

webmaster128 added this to the 0.14.0 milestone Feb 2, 2021

webmaster128 requested review from ethanfrey and maurolacy February 2, 2021 09:34

webmaster128 added 2 commits February 2, 2021 10:56

Add section encoding

23efe49

Add sections decoder

f93200f

webmaster128 force-pushed the multi-return branch from 119c104 to 328390b Compare February 2, 2021 09:56

Use sections encoding for db_next

6bdb305

webmaster128 force-pushed the multi-return branch from 328390b to 6bdb305 Compare February 2, 2021 10:15

ethanfrey approved these changes Feb 2, 2021

View reviewed changes

webmaster128 added 2 commits February 2, 2021 18:29

Add test decode_sections2_preserved_first_vector

15e4c09

Pull out split_tail

6a4eafd

ethanfrey approved these changes Feb 2, 2021

View reviewed changes

webmaster128 merged commit 8a60668 into main Feb 2, 2021

webmaster128 deleted the multi-return branch February 2, 2021 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create sections encoding for multi-value regions #760

Create sections encoding for multi-value regions #760

webmaster128 commented Feb 2, 2021

ethanfrey commented Feb 2, 2021

ethanfrey left a comment

ethanfrey Feb 2, 2021

webmaster128 Feb 2, 2021

ethanfrey Feb 2, 2021

ethanfrey Feb 2, 2021

webmaster128 Feb 2, 2021

ethanfrey Feb 2, 2021

ethanfrey Feb 2, 2021

ethanfrey Feb 2, 2021

webmaster128 Feb 2, 2021

ethanfrey left a comment

ethanfrey Feb 2, 2021

webmaster128 Feb 2, 2021

Create sections encoding for multi-value regions #760

Create sections encoding for multi-value regions #760

Conversation

webmaster128 commented Feb 2, 2021

ethanfrey commented Feb 2, 2021

ethanfrey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanfrey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment