Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More generic encoding for complex return value #602

Closed
webmaster128 opened this issue Oct 27, 2020 · 9 comments · Fixed by #760
Closed

More generic encoding for complex return value #602

webmaster128 opened this issue Oct 27, 2020 · 9 comments · Fixed by #760

Comments

@webmaster128
Copy link
Member

webmaster128 commented Oct 27, 2020

In the db_next import, we currently use a custom encoding to return two values (key and value). This is okay for now but not the nices thing you can imagine when developing a rich API via imports.

I tried to use Wasm multiple return values, which is more or less this diff:

diff --git a/README.md b/README.md
index 7653e169..cc3d4985 100644
--- a/README.md
+++ b/README.md
@@ -136,7 +136,7 @@ extern "C" {
     #[cfg(feature = "iterator")]
     fn db_scan(start: u32, end: u32, order: i32) -> u32;
     #[cfg(feature = "iterator")]
-    fn db_next(iterator_id: u32) -> u32;
+    fn db_next(iterator_id: u32) -> [u32; 2];
 
     fn canonicalize_address(source: u32, destination: u32) -> u32;
     fn humanize_address(source: u32, destination: u32) -> u32;
diff --git a/packages/std/src/imports.rs b/packages/std/src/imports.rs
index 80280311..69ff75d1 100644
--- a/packages/std/src/imports.rs
+++ b/packages/std/src/imports.rs
@@ -30,7 +30,7 @@ extern "C" {
     #[cfg(feature = "iterator")]
     fn db_scan(start_ptr: u32, end_ptr: u32, order: i32) -> u32;
     #[cfg(feature = "iterator")]
-    fn db_next(iterator_id: u32) -> u32;
+    fn db_next(iterator_id: u32) -> [u32; 2];
 
     fn canonicalize_address(source_ptr: u32, destination_ptr: u32) -> u32;
     fn humanize_address(source_ptr: u32, destination_ptr: u32) -> u32;
@@ -115,23 +115,14 @@ impl Iterator for ExternalIterator {
 
     fn next(&mut self) -> Option<Self::Item> {
         let next_result = unsafe { db_next(self.iterator_id) };
-        let kv_region_ptr = next_result as *mut Region;
-        let mut kv = unsafe { consume_region(kv_region_ptr) };
-
-        // The KV region uses the format value || key || keylen, where keylen is a fixed size big endian u32 value
-        let keylen = u32::from_be_bytes([
-            kv[kv.len() - 4],
-            kv[kv.len() - 3],
-            kv[kv.len() - 2],
-            kv[kv.len() - 1],
-        ]) as usize;
-        if keylen == 0 {
+        let key_ptr = next_result[0] as *mut Region;
+        let value_ptr = next_result[1] as *mut Region;
+        let key = unsafe { consume_region(key_ptr) };
+        let value = unsafe { consume_region(value_ptr) };
+
+        if key.is_empty() {
             return None;
         }
-
-        kv.truncate(kv.len() - 4);
-        let key = kv.split_off(kv.len() - keylen);
-        let value = kv;
         Some((key, value))
     }
 }
diff --git a/packages/vm/src/context.rs b/packages/vm/src/context.rs
index 0866a02a..c7c3ef16 100644
--- a/packages/vm/src/context.rs
+++ b/packages/vm/src/context.rs
@@ -314,7 +314,7 @@ mod test {
                 "db_write" => Func::new(|_a: u32, _b: u32| {}),
                 "db_remove" => Func::new(|_a: u32| {}),
                 "db_scan" => Func::new(|_a: u32, _b: u32, _c: i32| -> u32 { 0 }),
-                "db_next" => Func::new(|_a: u32| -> u32 { 0 }),
+                "db_next" => Func::new(|_a: u32| -> [u32; 2] { [0, 0] }),
                 "query_chain" => Func::new(|_a: u32| -> u32 { 0 }),
                 "canonicalize_address" => Func::new(|_a: u32, _b: u32| -> u32 { 0 }),
                 "humanize_address" => Func::new(|_a: u32, _b: u32| -> u32 { 0 }),
diff --git a/packages/vm/src/imports.rs b/packages/vm/src/imports.rs
index 4e90b997..4ad04f73 100644
--- a/packages/vm/src/imports.rs
+++ b/packages/vm/src/imports.rs
@@ -196,22 +196,16 @@ pub fn do_scan<S: Storage, Q: Querier>(
 }
 
 #[cfg(feature = "iterator")]
-pub fn do_next<S: Storage, Q: Querier>(ctx: &mut Ctx, iterator_id: u32) -> VmResult<u32> {
+pub fn do_next<S: Storage, Q: Querier>(ctx: &mut Ctx, iterator_id: u32) -> VmResult<[u32; 2]> {
     let (result, gas_info) =
         with_storage_from_context::<S, Q, _, _>(ctx, |store| Ok(store.next(iterator_id)))?;
     process_gas_info::<S, Q>(ctx, gas_info)?;
 
     // Empty key will later be treated as _no more element_.
     let (key, value) = result?.unwrap_or_else(|| (Vec::<u8>::new(), Vec::<u8>::new()));
-
-    // Build value || key || keylen
-    let keylen_bytes = to_u32(key.len())?.to_be_bytes();
-    let mut out_data = value;
-    out_data.reserve(key.len() + 4);
-    out_data.extend(key);
-    out_data.extend_from_slice(&keylen_bytes);
-
-    write_to_contract::<S, Q>(ctx, &out_data)
+    let key_ptr = write_to_contract::<S, Q>(ctx, &key)?;
+    let value_ptr = write_to_contract::<S, Q>(ctx, &value)?;
+    Ok([key_ptr, value_ptr])
 }
 
 #[cfg(test)]
@@ -262,7 +256,7 @@ mod test {
                 "db_write" => Func::new(|_a: u32, _b: u32| {}),
                 "db_remove" => Func::new(|_a: u32| {}),
                 "db_scan" => Func::new(|_a: u32, _b: u32, _c: i32| -> u32 { 0 }),
-                "db_next" => Func::new(|_a: u32| -> u32 { 0 }),
+                "db_next" => Func::new(|_a: u32| -> [u32; 2] { [0, 0] }),
                 "query_chain" => Func::new(|_a: u32| -> u32 { 0 }),
                 "canonicalize_address" => Func::new(|_a: i32, _b: i32| -> u32 { 0 }),
                 "humanize_address" => Func::new(|_a: i32, _b: i32| -> u32 { 0 }),
@@ -998,22 +992,18 @@ mod test {
         let id = do_scan::<MS, MQ>(ctx, 0, 0, Order::Ascending.into()).unwrap();
 
         // Entry 1
-        let kv_region_ptr = do_next::<MS, MQ>(ctx, id).unwrap();
-        assert_eq!(
-            force_read(ctx, kv_region_ptr),
-            [VALUE1, KEY1, b"\0\0\0\x03"].concat()
-        );
+        let (key_ptr, value_ptr) = do_next::<MS, MQ>(ctx, id).unwrap();
+        assert_eq!(force_read(ctx, key_ptr), KEY1);
+        assert_eq!(force_read(ctx, value_ptr), VALUE1);
 
         // Entry 2
-        let kv_region_ptr = do_next::<MS, MQ>(ctx, id).unwrap();
-        assert_eq!(
-            force_read(ctx, kv_region_ptr),
-            [VALUE2, KEY2, b"\0\0\0\x04"].concat()
-        );
+        let (key_ptr, value_ptr) = do_next::<MS, MQ>(ctx, id).unwrap();
+        assert_eq!(force_read(ctx, key_ptr), KEY2);
+        assert_eq!(force_read(ctx, value_ptr), VALUE2);
 
         // End
-        let kv_region_ptr = do_next::<MS, MQ>(ctx, id).unwrap();
-        assert_eq!(force_read(ctx, kv_region_ptr), b"\0\0\0\0");
+        let (key_ptr, _value_ptr) = do_next::<MS, MQ>(ctx, id).unwrap();
+        assert_eq!(force_read(ctx, key_ptr), Vec::<u8>::new());
         // API makes no guarantees for value_ptr in this case
     }
 
diff --git a/packages/vm/src/instance.rs b/packages/vm/src/instance.rs
index def786c2..1c1c67dd 100644
--- a/packages/vm/src/instance.rs
+++ b/packages/vm/src/instance.rs
@@ -148,11 +148,10 @@ where
                     do_scan::<S, Q>(ctx, start_ptr, end_ptr, order)
                 }),
                 // Get next element of iterator with ID `iterator_id`.
-                // Creates a region containing both key and value and returns its address.
-                // Ownership of the result region is transferred to the contract.
-                // The KV region uses the format value || key || keylen, where keylen is a fixed size big endian u32 value.
-                // An empty key (i.e. KV region ends with \0\0\0\0) means no more element, no matter what the value is.
-                "db_next" => Func::new(move |ctx: &mut Ctx, iterator_id: u32| -> VmResult<u32> {
+                // Creates a key region and a value region and returns their addresses.
+                // Ownership of the result regions is transferred to the contract.
+                // An empty key means no more element, no matter what the value is.
+                "db_next" => Func::new(move |ctx: &mut Ctx, iterator_id: u32| -> VmResult<[u32; 2]> {
                     do_next::<S, Q>(ctx, iterator_id)
                 }),
             },

Returning two pointers makes the code much nicer. However, this fails to link.

---- push_and_reduce stdout ----
thread 'push_and_reduce' panicked at 'called `Result::unwrap()` on an `Err` value: InstantiationErr { msg: "Error instantiating module: LinkError([IncorrectImportSignature { namespace: \"env\", name: \"db_next\", expected: FuncSig { params: [I32, I32], returns: [] }, found: FuncSig { params: [I32], returns: [I32, I32] } }])" }', /projects/cosmwasm/packages/vm/src/testing/instance.rs:129:77

As you can see in expected: FuncSig { params: [I32, I32], returns: [] }, this compiles to a Wasm function with 2 arguments and no return value. It is unclear if multiple return values in imports are supported by the Rust compiler (see e.g. rust-lang/rust#47599).

If we are stuck with FFI types, we might want to use a more generic encoding schema than value || key || keylen, which at least supports more than two values.

@ethanfrey
Copy link
Member

Interesting investigation, but I don't think this is much of a pain point - it works, and I don't think we add any more complex FFI types soon.

Happy to revisit this if we have actual use cases that need larger return types (and maybe just leave the next iterator as is anyway to avoid breaking compatibility). But it would be cool to have a better struct if we need to return more.

@webmaster128
Copy link
Member Author

I looked into Ethereum's RLP but this is too complicated because of the nesting which we do not need. A flat list of binary values is sufficient.

@webmaster128
Copy link
Member Author

I now realized that the length as a suffix is only needed because we use consume_region which ready the whole region into one Vector. It would be more efficient to make a consume_multi_region which returns a Vec<Vec<u8>>. This avoids all copies of the data as well as the strange length suffix.

So instead of value || key || keylen we do value0_len || value0 || value1_len || value1 || value2_len || value2 || … where length fields are always 4 byte uint32. Then in consume_multi_region we loop over the region, ignore (i.e. leak) the length fields and convert the values into Vectos.

@webmaster128 webmaster128 added this to the 0.13.0 milestone Nov 26, 2020
@webmaster128 webmaster128 added Blocked Blocked on some external dependency and removed maybe labels Dec 4, 2020
@webmaster128
Copy link
Member Author

Do not start this before #504 is merged. Otherwise big merge mess ahead.

@ethanfrey
Copy link
Member

This is API breaking for the Storage iterator, correct?

I think we should start labeling API Breaking changes, and start moving them to CWIPs. (Not saying we need the full process here, but let's start some process/custom for labeling these. I will make a few API breaking proposals soon as well)

@webmaster128
Copy link
Member Author

It breaks the contract-VM interface (binary compatibility), not the standard librarie's API (source compatibility). I completely agree moving the result of this long thread into a CWIP.

@webmaster128 webmaster128 removed the Blocked Blocked on some external dependency label Dec 17, 2020
@webmaster128 webmaster128 modified the milestones: 0.13.0, 0.14.0 Jan 4, 2021
@webmaster128 webmaster128 self-assigned this Jan 11, 2021
@ethanfrey
Copy link
Member

Do you still want to do this? For me it would be good to get it in the next release as there are quite a few breaking changes already. Or decide not to do it at all.

It would be good to get these apis quite stable soon

@webmaster128
Copy link
Member Author

webmaster128 commented Jan 22, 2021

Do you still want to do this?

Yes. It is not hard and clear now. I think it makes sense to work that out when starting with the crypto APIs, which will touch this whole contract-VM interface a lot.

@ethanfrey
Copy link
Member

Fair enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants