Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom data that is passed to host calls via HostFunctionCallContext #183

Closed
pkedy opened this issue Feb 2, 2022 · 15 comments
Closed

Custom data that is passed to host calls via HostFunctionCallContext #183

pkedy opened this issue Feb 2, 2022 · 15 comments
Assignees

Comments

@pkedy
Copy link
Contributor

pkedy commented Feb 2, 2022

Hello! I am excited about wazero as a means to use Wasm without requiring CGO. Looking forward to ARM64 support!

I was trying to add wazero as an engine to wapc-go and ran into a small snag. I need the ability to attach custom data to the module instance in order to store function invocation state (context, payload, error, etc). This option is available in other Wasm runtimes and should be easy to add to wazero.

Envisioned usage:

Instantiation

	if err := m.store.Instantiate(m.module, moduleName); err != nil {
		return nil, err
	}

	ic := invokeContext{
		ctx: context.Background(),
		// ctx and request payload is set prior to calling `store.CallFunction`
                // response payload or error is set after calling the `hostCallHandler` (below)
	}

	m.store.SetInstanceData(moduleName, &ic)

Invocation

func (i *Instance) Invoke(ctx context.Context, operation string, payload []byte) ([]byte, error) {
	*i.ic = invokeContext{
		ctx:       ctx,
		operation: operation,
		guestReq:  payload,
	}

	results, _, err := i.m.store.CallFunction(i.name, "__guest_call", uint64(len(operation)), uint64(len(payload)))

	// Inspect response payload or error in `invokeContext`
        // results[0] = 1 for success, 0 for failures
}

Host call

func (m *Module) my_host_call(ctx *wasm.HostFunctionCallContext, operationPtr, operationLen, payloadPtr, payloadLen int32) int32 {
	ic := ctx.Data.(*invokeContext)  // <--- Grabs the custom user data
	data := ctx.Memory.Buffer
	operation := string(data[operationPtr : operationPtr+operationLen])
	payload := make([]byte, payloadLen)
	copy(payload, data[payloadPtr:payloadPtr+payloadLen])

	ic.hostResp, ic.hostErr = m.hostCallHandler(ic.ctx, operation, payload)
	if ic.hostErr != nil {
		return 0
	}

	return 1
}

Response payload or error are accessed via other host calls: full example here

@mathetake
Copy link
Member

mathetake commented Feb 2, 2022

I would like to understand that isn't it possible to have custom data in your Module struct? Should the custom data be per-module instance, rather than per-host call closure (in your example my_host_call)? For example:

type myHostCallImpl struct { invokeContext Foo }


func (m *myHostCallImpl) my_host_call(ctx *wasm.HostFunctionCallContext, ....int32) int32 {
        ic := m.invokeContext
}

how about this?

@mathetake
Copy link
Member

never mind ^^, I see you and @codefromthecrypt already have started conversation in the PR. I will assign @codefromthecrypt to this!

@pkedy
Copy link
Contributor Author

pkedy commented Feb 3, 2022

I would like to understand that isn't it possible to have custom data in your Module struct?

@mathetake Good question. Let's start with the intent and work backwards. I'd like support multi-threading by having a pool of module instances for that allow me to Aquire/Release. Host calls are scoped at the module/store. I created this issue thinking I need invocation state to exist at the module instance scope. @codefromthecrypt rightfully pointed out that using context.Context is probably the correct idiom for this. I'm going to type up some thoughts on how this can work and try to apply it to my draft PR. Generally, I think it will involve embedding context in wasm.HostFunctionCallContext and adding a store.CallFunctionContext(ctx context.Context, moduleName string, funcName string, params ...uint64) func where ctx flows through to the host call. If we agree on the design/approach, then I'll invest the time in all the proper unit tests, etc.

@codefromthecrypt
Copy link
Contributor

ps in another project I recommended that the context could be marked as a stable ref (allowing it to be used as a map key) and guaranteed to be called on an end hook. This allowed various integrations to decorate data via a map without a memory leak.

ex.

Ex. we have some hook registration

type WazeroPlugin interface {
   func beginFunc(context *wasm.HostFunctionCallContext)
   func endFunc(context *wasm.HostFunctionCallContext, cause endCause)
}

Then the plugin implements that and some user api

package myplugin

// GetStuff is for users
func GetStuff(ctx *wasm.HostFunctionCallContext) (Stuff, error) {
  if stuff, ok := instance.funcStuff[ctx]; !ok {
     return nil, errors.New("Did someone forget to initialize wazero with myplugin.Setup()?")
  } else {
    return stuff
  }
}

// Setup is for wazero
func Setup() WazeroPlugin {
  return instance
}

var instance &pluginMap{map[context]Stuff{}}

type pluginMap struct {
  funcStuff map[wasm.HostFunctionCallContext]Stuff
}

func (p *pluginMap) beginFunc(ctx *wasm.HostFunctionCallContext) {
  p.funcStuff[context] = newStuff()
}

func (p *pluginMap) endFunc(ctx *wasm.HostFunctionCallContext, cause endCause) {
  delete(p.funcStuff,context)
  switch cause {
...
}

Then user code can use your plugin without thinking

func my_host_call(ctx *wasm.HostFunctionCallContext, ....int32) int32 {
        stuff := myplugin.GetStuff(ctx)
}

This moves the storage management to plugins and guarantees no other host plugin will delete data from another also. The main thing this requires is that the ref is stable and always called twice (begin and end). which means an SPI to run this is invoked outside user code.

Apologies for any compile errors, this is just a sketch

@codefromthecrypt
Copy link
Contributor

ps for above to adapt it for go context probably means the setup function returns one. anyway something like this should work, but so will other ways. main thing is to get the use case together so we know what we need to solve

@pkedy
Copy link
Contributor Author

pkedy commented Feb 3, 2022

@codefromthecrypt What do you mean by the "setup function"?

I experimented locally by adding a CallContext function to Engine and made it flow through both interpreter and jit. It works for interpreter. I'm on an M1 Mac so I need to try jit on another machine. But the changes are very simple and seemingly straight forward. It even made my waPC code simpler. Do you have concerns about doing it this way versus other ways? I agree on getting the use cases together. In addition to my use case, having distributed tracing flow through to host calls comes to mind. Feels like a no-brainer really.

@codefromthecrypt
Copy link
Contributor

First, I'm definitely trying to solve your problem fast. That's why we are discussing. It is also true that a design you mentioned indeed would punch a hole through and get data into a place it can be read:

m.store.SetInstanceData(moduleName, &ic)

Just at best this is a tentative design, I think we'd agree, and it is probably already possible to do this another way. If we are being tentative I would like to help solve your thing without even this api. Even if we put "beta" on it we would need to remove it in a month or two, so might as well think more now.

incidentally, the design above was from a distributed tracing library :)

Here are some reasons why a module-scoped thing can be problematic

  • since it is scoped to the module, you can't easily associate function-scoped data without something else
    • this also means you can't easily cleanup something function scoped (ex span.finish)
  • since this is one special api, it requires one and only one Set.
  • This isn't an issue until there are two plugins, and then there's a fight
  • the Data field is mutable so it could be set to nil and screw up any place it is used
    • that's not to say there aren't other places in the host call context

These sorts of things end up in RPC libraries even if they have a stash area, it is rarely one object for all calls. (ex grpc metadata)

If you can share a small example of what you are doing on GitHub, I can try to help accomplish the same thing without carving in. This is why I suggested adding a feature test into the PR so that it is easier to solve the problem. Often when we discuss things in abstract they are harder to solve and take longer even if they do.

PS "setup function" I meant if there was a function scoped api like below, wazero would need a way to add that before the module is instantiated.

type WazeroPlugin interface {
   func beginFunc(context *wasm.HostFunctionCallContext)
   func endFunc(context *wasm.HostFunctionCallContext, cause endCause)
}

func Plugin(plugin WazeroPlugin) Option {
	return func(s *Store) {
		s.plugins = append(s.plugins, plugin)
	}
}


wasm.NewStore(engine, Plugin(myplugin.Setup()))

@codefromthecrypt
Copy link
Contributor

one key point I guess is that it is easy to take function scope plugin api and use it for a larger scope such as store. just the other way around doesn't tend to work, it ends up requiring also to make the function scoped hooks, just outside the code that could easily reach it.. TL'DR' is the narrowest scope tends to decide the api

@pkedy
Copy link
Contributor Author

pkedy commented Feb 3, 2022

Thanks for your quick responses. I’ll follow up tomorrow with examples as well as try the things you’ve mentioned.

@codefromthecrypt
Copy link
Contributor

Thanks tons.

Also, I won't dismiss a tentative api if you really need it, we appreciate your help. Ex not exporting the field and making a scary api which will tell people to proceed with caution
Ex.

// ExperimentalData returns data set by ExperimentalSetData until tetratelabs/wazero#123 is solved which will add first-class support for extensions
func ExperimentalData(ctx ...) interface{} {
  return ctx.data
}

@pkedy
Copy link
Contributor Author

pkedy commented Feb 3, 2022

Here are some reasons why a module-scoped thing can be problematic

To clarify, I moved away from adding module state in favor of using Go context.

Here is the experiment that passes go content via a new func store.CallFunctionContext.
pkedy@090acd4

Usage from waPC:
https://github.com/wapc/wapc-go/blob/multi_engine_wazero_context/engines/wazero/wazero.go

I'm going to look into the Plugin approach you mentioned above and report back.

This is why I suggested adding a feature test into the PR so that it is easier to solve the problem.

Totally agree. I have limited time so I'm in "exploring options" mode.

@codefromthecrypt
Copy link
Contributor

thanks for the update @pkedy I will look at this in detail over the weekend. Please also keep this issue up to date as you have been 🙇

@pkedy
Copy link
Contributor Author

pkedy commented Feb 4, 2022

@codefromthecrypt I opened Draft PR #203 so you can more easily see the diffs. I'm happy to close it if there is a way to accomplish the same result without changes to wazero (e.g. using a Plugin).

@pkedy pkedy changed the title Custom data in module instance that is passed to host calls via HostFunctionCallContext Custom data that is passed to host calls via HostFunctionCallContext Feb 4, 2022
@codefromthecrypt
Copy link
Contributor

I'm happy with your propagation PR! what I've described above may end up useful if we do transparent tracing and metrics (ex time when a wasm.function begins and ends with dimensions of the module and function name done transparently). We can re-open it when ready to do more, and I don't personally think we need to do more than propagation at the moment!

TL'DR' since you opened this, you can close it :D

@mathetake
Copy link
Member

I believe this is closed by #203 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants