Skip to content
This repository has been archived by the owner on May 11, 2020. It is now read-only.

Pass the VM pointer to host go functions #59

Closed

Conversation

gballet
Copy link
Contributor

@gballet gballet commented Jun 4, 2018

The previous PR #58 was enabling go functions, but without the VM context being passed, they can only do so much. This PR enforces that host Go functions are passed a *VM parameter as their first argument.

Another idea would be to check for pointers in the list of arguments, and pass a *VM each time that an argument is being encountered. It has some advantages, like streamlining the code and letting the Go type system handle the panics for us. This being said, this leaves too much room for abuse since users could write confusing functions that take that pointer in the middle of their arguments list, thus disturbing the order to the WASM call.


This change is Reviewable

@sbinet sbinet requested review from vibhavp June 4, 2018 17:20
@sbinet
Copy link
Contributor

sbinet commented Jun 4, 2018

I understand the motivation for introducing this additional parameter (ie: retrieve arguments from the VM, right?) but this seems needlessly constraining.

couldn't we, instead, provide a function:

func WrapFunc(f interface{}) exec.function {
    // create a closure with reflect.FuncOf(...)
    // that closure would have *exec.VM as first arg
}

what do you think?

@gballet
Copy link
Contributor Author

gballet commented Jun 6, 2018

@sbinet the motivation is in fact to let host functions that are passed a pointer, to access memory.

As far as I can tell, it is not possible to use closures to work around this problem. Consider the following code snippet:

...
m, err := wasm.ReadModule(bufio.NewReader(fd), ModuleResolver)
if err != nil {
	return errors.New(fmt.Sprintf("Could not read module: %v", err))
}

vm, err := exec.NewVM(m)
if err != nil {
	return errors.New(fmt.Sprintf("Could not instantiate vm: %v", err))
}
...

ModuleResolver does the function resolving, and therefore the "linking" to host functions. It is the one that should create the closure, but won't be able to capture vm because at the time it is called, NewVM hasn't been called yet.

What I can propose is to "inject" vm each time I see a ptr in the list of parameters, this way the client code doesn't have to worry about getting a *VM object when they don't need it. What do you think?

@dennwc
Copy link
Contributor

dennwc commented Jun 6, 2018

Totally agree with @gballet - closures won't work here, plus binding the module host function to a single VM sounds weird.

On the other hand, I don't think exposing a VM is a good idea. This way host function will be able to mess with the stack. I propose to define a separate interface that allows to access/allocate the memory and get it as []byte in the host function. It may additionally expose a method to call WASM functions (can be done later).

@sbinet
Copy link
Contributor

sbinet commented Jun 8, 2018

ok. what @dennwc says SGTM.

but before getting to that, I suppose @gballet could perhaps give us an example of what this PR aims to accomplish or enable. (the tests didn't really exercize the new exec.VM argument...)
this way, we'll get a more concrete example+constraints for the interface.

what do you think?

@gballet
Copy link
Contributor Author

gballet commented Jun 10, 2018

@sbinet an example of why host functions would want to access the memory area? At the highest level, it's to ensure that host and wasm functions can act in the same capacity (i.e. access the memory, call other functions, ...) so that the caller doesn't need to know if they're calling a wasm or host function.

If you're asking for a more specific (and therefore contrived) example, I have a host function that takes as parameter a pointer to an input buffer, and a pointer to an output buffer, as well as the maximum length of both buffers. Writing buffers of undefined length is one use case, another one is to write a BigInt and a string.

So I would be happy with an interface that goes like:

type VMInterface interface {
  WriteBuffer(offset int32, maxLength uint32, data []byte) error
  ReadBuffer(offset int32, numBytes uint32, output []byte) error
  // Helpers that are just wrappers around the previous functions
  WriteInt32(offset int32, value int32) error
  ReadInt32(offset int32) (int32, error)
  WriteZeroTerminatedString(offset int32, maxLength uint32, str string) error
  ReadZeroTerminatedString(offset int32) (string, error)
  // For later, it's really just a wrapper around vm.execCode(...)
  callFunction(index uint32, ...) error
}

Let me know what you think, and I'd be happy to update the PR with a first draft of that interface.

And if you want me to update the unit tests so that they present the use case beforehand, let me know as well.

@dennwc
Copy link
Contributor

dennwc commented Jun 11, 2018

@gballet Maybe it's a dumb question to ask, but why each Read and Write includes a Max argument?

Also, I think that there might be a better solution for writing the memory. Imagine a host function that has a very high deviation in terms of input size vs output size, so the caller will not be able to predict an exact buffer size, and allocation the maximal buffer will have a severe performance impact. I think that allocation interface might be better for this purpose. This will allow host functions to allocate the buffer of any size in the VM memory and provide the pointer to the caller.

Something like this:

type VMInterface interface {
   Make(n int) []byte
   // ...
}

It implies a bit of a magic and should be more verbose, like:

Make(n int) (buf []byte, vmptr uint32)

But I think that VM can easily track the buffer that was returned by Make and convert it to VM pointer when needed. For example:

func sayHello(vm VMInterface) []byte {
   s := "hello!"
   b := vm.Make(len(s))
   copy(b, s)
   return b
}

It might also be smart enough to get that the buffer returned by the function was not allocated by Make and can do alloc+copy automatically. In this case the host code is even shorter (assuming vm argument is optional):

func sayHello() []byte {
   return []byte("hello!")
}

@gballet
Copy link
Contributor Author

gballet commented Jun 12, 2018

@dennwc all the limits come from my time as an Asm/C/C++ dev, and are probably overkill 😄

This being said, I think that having a something like Make is dangerous: it gives the user the impression that the allocated memory will be garbage-collected, which it won't be.

My intent is to provide a low-level interface, similar to what you find at the Asm/C level. I think of host functions like system calls. Garbage-collection is a much higher-level feature that is implemented at runtime-level and differs from one language to the next. This should be left to the runtime, in my opinion.
If the programmer of a host function really wants to allocate memory in the VM's memory space, they should use the 'call a WASM function' call that you suggested in a previous post and call the runtime's memory allocation function... It will feel wrong and cumbersome, and therefore dangerous. In general, we should play it safe and not assume too much how the module will manage its memory.

@gballet
Copy link
Contributor Author

gballet commented Jun 19, 2018

One function that I would like to add is Terminate(msg string) which stops execution of the VM with an optional reason.

@Xe
Copy link

Xe commented Jun 22, 2018

I ended up getting around this by having my Process type wrap an exec.VM.

@gballet
Copy link
Contributor Author

gballet commented Jun 27, 2018

@sbinet @dennwc I have added an early proposal for a VMInterface object that abstracts the VM pointer. I should also have added a unit test that abstracts writing to memory, but in the mean time your feedback would be appreciated.

Copy link
Contributor

@sbinet sbinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for tackling this.

I don't like the VMInterface name very much (a type that has Interface in its name that is actually a struct...)

@Xe named her wrapping type Process.
golang.org/x/debug/internal/core also names it Process.

Interestingly, x/debug introduces a type Address uint64 to abstract somewhat (I suspect) between 32b and 64b address space.
we could follow suit for the Write/Read methods (and then the io.ReaderAt and io.WriterAt would go down the drain.)

exec/vm.go Outdated

// WriteBytes takes an array of bytes and copies it at offset into the
// module's memory.
func (vi *VMInterface) WriteBytes(offset int, data []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method really looks like an io.WriterAt:

type WriterAt interface {
	WriteAt(p []byte, off int64) (n int, err error)
}

it feels really tempting to implement it... :)

exec/vm.go Outdated
// ReadBytes returns a slice pointing to a series of byte in memory. The
// data is copied which is inefficient but if client code direct access
// to memory this whole interface is moot.
func (vi *VMInterface) ReadBytes(offset int, length int) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one looks like an io.ReaderAt...

@sbinet sbinet mentioned this pull request Jun 28, 2018
@gballet
Copy link
Contributor Author

gballet commented Jun 28, 2018

Actually, according to what the spec has to say about host functions they say:

A host function may also modify the store. However, all store modifications must result in an extension of the original store, i.e., they must only modify mutable contents and must not have instances removed. Furthermore, the resulting store must be valid, i.e., all data and code in it is well-typed.

I feel a better name would be Store, although in that case it would not make sense to add Terminate to the list of functions. I guess I'll stick to Process for now.

@gballet gballet force-pushed the call-gofuncs-with-vm-and-module branch from 0ecf35e to 4fe16a4 Compare July 1, 2018 13:37
Copy link
Contributor

@sbinet sbinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (after the few last remaining nitpicks. thanks!)

exec/vm.go Outdated

var err error
if length < len(p) {
err = fmt.Errorf("Tried to read past the end of memory: wanted to read %d bytes, wrote %d", len(p), length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to reuse io.ErrShortBuffer here.

exec/vm.go Outdated

var err error
if length < len(p) {
err = fmt.Errorf("Tried to write past the end of memory: wanted to write %d bytes, wrote %d", len(p), length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and io.ErrShortWrite here.

what do you think?

exec/vm.go Outdated
}

// NewVMInterface creates a VM interface object for host functions
func NewVMInterface(vm *VM) *Process {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/NewVMInterface/NewProcess/

exec/func.go Outdated

for i := numIn - 1; i >= 0; i-- {
// Pass vi as an argument. Check that the function indeed
// expects a *VMInterface argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/VMInterface/Process/

@@ -99,11 +99,11 @@ var moduleCallHost = []byte{
0x08, 0x01, 0x06, 0x00, 0x41, 0x00, 0x10, 0x00, 0x0B,
}

func add3(x int32) int32 {
func add3(vi *Process, x int32) int32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/vi/proc/ (or p)


// a host function that can be called by WASM code.
testHostFunction := func() {
testHostFunction := func(vi *Process) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/vi/proc/ (or p)

if r := recover(); r == nil {
t.Errorf("This code should have panicked.")
} else {
if r != "exec: the first argument of a host function was int32, expected ptr" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks brittle.
but I must admit we don't have a safer mechanism in place to handle this kind of thing...

t.Errorf("This code should have panicked.")
} else {
if r != "exec: the first argument of a host function was int32, expected ptr" {
t.Errorf("This should have panicked because of %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what err will be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err would be nil, and indeed it's not what we should print

return x + 3
}

func importer(name string) (*wasm.Module, error) {
func importer(name string, f func(*Process, int32) int32) (*wasm.Module, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could modify importer to take the f func(*Process, int32) int32) argument, close around it and return the usual func(string) (*wasm.Module, error) importer function...

(just a suggestion. it might make things less obvious, though...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that, and I think this is not the right approach: I can only imagine one real-life use case when a host function needs not access the memory: reading integer value from a system-specific file (like /proc/$$/fd or /dev/urandom) and that is by no means the most general use case, so it feels like an unnecessary complexity to me.
Furthermore, the spec specifies that the store should be passed to host functions so it seems normal to do that in any case, just as the spec expects an interpreter to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor

@sbinet sbinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

please squash everything into a single commit with a nice commit message and I'll merge this.

thanks for sticking with me :)

@gballet gballet force-pushed the call-gofuncs-with-vm-and-module branch from a9ef4e7 to ed1c363 Compare July 3, 2018 14:50
This way, host functions don't have direct access to the VM,
however they are able to write to memory and stop it if need
be.

The interface implements io.ReaderAt and io.WriterAt.
@gballet gballet force-pushed the call-gofuncs-with-vm-and-module branch from ed1c363 to 39478d5 Compare July 3, 2018 14:58
@sbinet
Copy link
Contributor

sbinet commented Jul 4, 2018

merged via f9c0372

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants