-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Fine grained control of memory #1439
Comments
I really like this proposal and I am glad it is happening now! 👍 One additional use case I can think of is to implement the equivalent of |
AFAIK these operations (if implemented via POSIX) can't be guaranteed to be atomic unless we're willing to do something like pause/interrupt every other thread (which can access the memory) while carrying them out. My understanding is that the POSIX spec just says that races here have undefined behaviour. If stopping the world isn't acceptable, we might be able to get away with something similar to our current |
Thanks @titzer! Interesting use case.
I'm hoping that we will be able to get away with the current memory.grow semantics. ASFAIK, we haven't encountered issues in the wild with racy grow calls, though I expect that it is more observable with an |
Exciting!
What are you imagining the operands to the
To be clear, the intended semantics for I ask only because accessing pages after Is the expectation that Wasm engines running in environments without virtual memory will simply not implement or disable this proposal? Just double checking: these instructions would all require that the memory regions they operate upon be page aligned and multiple-of-page-size sized, right? I suppose they could take their operands in units of pages, rather than bytes, to enforce this, similar to Overall:
|
I generally agree with @fitzgen here. I think we should put first-class memories into their own proposal; we'll have to design a way to allow all memory operations that currently have static memory indexes to take a first class memory, and that mechanism should probably be uniform. I also agree that file mapping is probably best handled at a different layer, so I think it may be out of scope here too. |
Thanks @fitzgen for the detailed feedback. I'll start with an example here to clearly scope the problem that I'd like to tackle. Let's say WebGPU maps a GPUBuffer that produces an ArrayBuffer, or an ArrayBuffer is populated as the result of using file handling APIs like While I also agree that file descriptors are out of place here, I don't necessarily agree that a I think the operands to
The intended behavior is to zero the memory pages, I'll look into potential options some more.
Yes, though I expect that it would be possible to polyfill if needed. I'm not sure that that would be particularly useful.
Yes, my expectation is that all operands are in units of pages consistent with the
|
Agreed that this use case is very motivating.
What is the representation of a pointer to the backing store of an It seems to me like this API/functionality fundamentally involves communicating with, and making assumptions about, the host. Therefore this belongs in WASI/Web APIs, not core Wasm, in my mind.
What I am imagining is that there would be a JS API basically identical to what you've described for the Something like this: // Grab the Wasm memory.
let memory = myWasmMemory();
// Grab the array buffer we want to share with Wasm.
let buffer = myArrayBuffer();
// Length of the buffer, in Wasm pages.
let page_len = Math.ceil(buffer.length / 65536);
// The memory protections.
let prot = WebAssembly.Memory.PROT_READ | WebAssembly.Memory.PROT_WRITE;
// Map the array buffer into this memory!
memory.map(page_len, prot, buffer); Then, if you wanted to create a new mapping from inside Wasm, you would import a function that allowed you to have your own scheme for identifying which array buffer you wanted to map (maybe coming up with your own "file descriptor" concept, since you can safely make assumptions about the host and include your own JS glue to maintain the fd-to- The linear memory would still be defined inside Wasm, as if it were just another memory. And it would be just another memory, until the JS API was called on it and the array buffer got mapped in. There could be an analogous API for WASI. (Although, at the risk of going into the weeds a little bit here, one of WASI's goals is for all APIs to be virtualizable, and this API wouldn't be. Making it virtualizable would require a |
The 2 links you used there are the same URL. Did you mean for the latter one to be #1397 ? |
For Option 1, I expect this to be an
Ah, I see what you mean. My intent with proposing core Wasm instructions for |
I did! Thanks for catching, I've updated the OP. |
There could be some analogy here to the way we currently think about thread creation. The core Wasm spec could describe how instructions interact with a "mapped memory" (cf. "shared memory"), without specifying core Wasm instructions for creating/altering the mapping (at least as an MVP). Web environments would want a host API to create mappings to |
Yes, exactly. Thank you for stating this so succinctly! |
Somewhat related: discussion on address space related features in Memory64: WebAssembly/memory64#4 I almost certainly do not understand the limitations that browsers are subject to w.r.t. memories that make it necessary to implement mmap functionality in terms of multi-memory (as opposed to being addressable by a single linear memory pointer), but I do feel this is unfortunate. I foresee lots of use cases in programming languages and other systems that would not work without a single address space, or without languages like C/C++ being able to use regular pointers to address all of it. And if languages like C/C++ can't natively write to it but would need intrinsics/library functions to emulate access to a secondary memory (which would not allow reuse of buffer creation code in those languages), then there would be no use implement it with multi-memory underneath. Likely code in those languages would need to copy things anyway, in which case a
Why would that be required? |
AFAICT this can already happen with a large |
Although I do see that there are some implementation challenges I basically agree with this, and I think we should explore the design and implementation spaces for the VM functions in the context of memory zero before assuming that it is absolutely necessary to go multi-memory. Multi-memory has uncertain utility in unified address space languages, and the present proposal seems even more oriented toward the classical languages that are the most tied to unified address spaces than is the multi-memory proposal itself. For the present proposal there is therefore a heavy burden on the champions to demonstrate that tools that people will want to use can be applied effectively in a multi-memory setting. |
IIUC the proposal to forbid these operations on the default memory was motivated by a desire to avoid impacting the performance of modules not using these features. Could this instead be accomplished by making a type-level distinction between "mappable" and "non-mappable" memories (again, akin to the current distinction between "shared" and "non-shared")? In this case, there would be no issue with declaring the default memory of newly-generated modules as "mappable" if required, although there might be some compositionality issues with previously-generated modules. |
Certainly an attribute could be made to work to control the code emitted. (It would be nice to avoid it if we can, though, and that comes back to my point about exploring the implementation space after pinning down in some detail the use cases and usage patterns.) |
Currently the linear space is homogenous, but if we were to allow mapping/protection changes into linear memory that would no longer be the case. If we did spec memory operations for default memory, I would expect them to operate on page boundaries. This means that once adjacent pages can now be mapped/read-only pages. There is possibly a design space where we could declare upfront for some section of memory to be 'mappable', and then we wouldn't need to work at page granularity, but would that be sufficiently useful?
This is true, but there is a clear signal when to expect OOM on memory accesses, i.e. when a grow fails. The Aside from this, some other practical challenges would be
I'm currently working on gathering usage patterns, and I agree that that would influence the implementation space the most. |
Adding my take on this problem space after getting to chat with @lars-t-hansen a bit: From my understanding of the shape of the necessary clang/llvm extensions that would allow C/C++/Rust to operate on non-default memories, I can only imagine it working on C/C++/Rust code that was carefully (re)written to use the new extensions -- I'm not aware of any automatic techniques for porting large swaths of code that isn't just the shared-nothing approach of the component model (where you copy at boundaries between separate modules which each individually use distinct single-memories; to wit, wasm-link polyfills module-linking+interface-types using multi-memory in exactly this manner). Thus, I think there's still a certain burden of proof to show that there is real demand for additional multi-memory-based features. Independently, I think we can make great progress in the short-term improving the functionality of default linear memories. In particular, I can see each of the following 3 features allowing optimized implementations on most browsers today with a graceful fallback path when no OS/MMU support is available:
Lastly, outside of Core WebAssembly, but for completeness: to minimize copies of non-immutable- Together, I think these 4 features would address a decent amount of the use cases for |
Why would there need to be any additional bounds checking? If a mapped region is overlaid on the linear memory, the wasm code could just use regular memory ops with the standard linear bounds checks. Regarding the overhead, access protections would be handled by the VMM hardware. Given the the process is almost certainly already going to be operating through VMM translations there should be little to no performance impact. |
Thank you for creating this proposal. This was a major problem in MVP. |
Thanks @lukewagner for sketching this out, this is helpful.
Could you elaborate on how multiple mappings would work? I'm also thinking about what would happen when after unmapping one external buffer, but a different buffer now needs to be mapped in. One of the concerns I had was depending on the sizes of the buffers that we need, if unmapping makes regions of the existing memory inaccessible, then subsequent
More of an update here, my original concern with this was that not all of the use cases that this proposal is intending to target use streams, I'm currently still working on the subset of workloads that this proposal should handle well. That is still WIP, and will report back here when I have more to share.
@mykmartin - Several Wasm engines have optimization strategies for getting rid of the standard linear bounds checks, using guard pages for example removes the need for the linear bounds checks under the assumption that the memory is owned by Wasm. |
Ok, but how does a given region of the linear buffer being mapped onto affect that? From the wasm code's point of view, it's still just a regular lookup in the standard address space. |
Sorry, I'm not sure how I missed this last question. To me this is different in a couple of different ways:
|
@dtig Awesome to hear about your stream WIP and I'm interested to hear more.
Yup! You're correct in your final sentence: since ultimately |
Subproposal: Device Drivers with an Architecture-neutral Software SandboxHas anyone considered marking a region of memory as volitile so statically compiled WebAssembly modules could implement memory-mapped I/O for device drivers? Motivation for Adding This
Future Possibilities
|
Sure from the WASM code, it's just a memory access. But that's not where the bounds check will be, the WASM implementation now needs to (in the worst case) check each memory access to see in which mapping it happens and create the correct pointer offset from the WASM memory offset and handle when an unaligned memory access straddles a boundary. Also from the implementation side, very few memory mapping apis (I'm thinking of opengl's glMapBuffer and vulkan's vkMapMemory) let the user code (read: the wasm implementation) pick where the mapping happens, this means that when a map is requested by WASM code the implementation cannot simply tell the OS kernel to map that into the memory of the wasm module because the API doesn't let it. Moreover those mapping boundaries are dynamic. So you cannot on module load inspect the module and find all the boundaries to create a perfect hash. All this culminates in a pretty significant pessimization for a the most hot part of a WASM implementation, the memory access. |
A drive-by comment: However when it comes to getting C/C++ programs to emit reads and writes from non-default memory, this is going to be as invasive both to programs and to the toolchain as natively accessing packed GC arrays. So perhaps we should focus effort there. GC will arrive soonish (right? lies we tell ourselves, anyway) and so maybe this feature isn't needed, as such. It sure would be nice to solve this use case without exposing the |
Do we have a separate thread somewhere about accessing GC packed arrays from C++? I think it has potential, though feasibility of the toolchain change is probably the main question. |
@penzn Not sure if there is a thread, and though it's important for tying together all parts of the system, it's probably out of scope for wasm standardization. Anyway I just wrote up some thoughts here, for my understanding of the current state of things: https://wingolog.org/archives/2022/08/23/accessing-webassembly-reference-typed-arrays-from-c |
Subproposal: Virtual Address AreaProblem space:
Solution idea:
Example use cases of what could possibly be done inside the VMA with appropriate future mapping operations:
|
Some third-party applications already use negative addresses to flag memory areas as needing reversed byte order such as big endian processors. If two different uses of negative addresses that will conflict. Also, the WebAssembly standard is officially little endian so it is unlikely that endian swapping will get official support any other way. This is according to w2c2 documentation and I think the wasm2native compiler (the third-party big endian supporters). |
Their reversed memory addressing ( |
Oh ok. Thanks for clarifying. |
bump. this would be a massive leap in the ability to port existing libraries and applications to wasm, as well as generally increase memory efficiency for large wasm programs that make good use of the heap |
Trying to follow the discussions it seems the focus is on option 1 and a static number of memories. With the multi-memory proposal having the option to create and delete memories via instructions at runtime could in my opinion solve many problems mentioned in #1397 Possible new instructions
Why these instructions helpAs mentioned in issue #1397, applications often allocate a bunch of memory This would be especially helpful for shared memories, Side note on read-only memory: With multiple-memories, one of the static memories could be marked as read-only. Also changing between read-only and writeable at runtime seems Possible mini-fragmentationBecause memories at least have a size of one page, How to handle memidx
Problems and open design questions?There are definitely problems that explain why it was decided against |
I think @lukewagner is on to something there, but I feel like it has never been fully articulated in this conversation. What if an instances linear memory being There already is an existing solution on how to integrate multiple memories with different read/write capabilities in a flexible manner:
Such a solution might look like:
This would give us the best of both worlds. Decoupling of multiple memories on both the host and the wasm side (the wasm instance can not only ignore additional memories offered by the host, but also has a lot of control of when and where things get moved around, e.g. when one of the memories changes in size and the wasm instance decides to either ignore that scenario, potentially re-map other memories, move other allocations around, or potentially even create non-contiguous mappings) This would also align well with existing Semantically the |
I counter this proposal, wasm does not need fine-grained permission control, think about it:
I think an mmap-like function would be great for wasm but only in the sense of making memory non-linear. The ability to mmap files is severely limited by the address space (32 bits) |
I think it does, but only in the sense of making a hole in the address space that holds no data and traps when read or written, e.g. to catch null pointer dereferences in C-style languages.
memory64 to the rescue! |
My proposal is not about naively stuffing native In a sense I am arguing against a very fine grained permission model too. I think that read-only memory is important for security in addition to consistency when working with But I think it is even more important to decouple the semantics of multiple memories from the semantics of individual load/store instructions, and a mechanism that allows us to do so, and that has been tried and tested is The host context provides buffers (memories), the wasm context is given explicit control if and how to mmap those buffers into its linear memory. It would give existing languages explicit control over how they want to deal with multiple memories (including the option to ignore it, with the host potentially performing a single mapping of |
Just a clarifying remark: multiple memories always existed in Wasm, since version 1.0: by linking two modules together that both define their own memory you always had multiple unrelated address spaces. The only limitation that is finally lifted by the multi-memory extension is the (weird) restriction that a single module was not able to speak about multiple memories. That caused various problems, for example, the inability to statically link (merge) arbitrary modules, or the inability to efficiently move data between such memories. There are other use cases for multiple memories, too, that don't require exposing them to a source language, for example, instrumentation or poly-filling other features. |
FWIW, I'm in favor of adding some degree of support for read-only memory and no-access memory, if only to allow us to more-simply claim that wasm is an entirely more secure way to run code. We already get a huge mitigative security boost from our protected stack and CFI but the fact that As mentioned above and brainstormed further by @eqrion more recently, if we had a very coarse-grained protection model (M no-access pages starting at The only mitigative use case I'm aware of that this doesn't solve is linear-memory-stack guard pages, which would seem to still need fine-grained protection. But maybe stack-canaries implemented for wasm by LLVM are enough? |
Adding a note here that the work on this proposal has now moved to the memory control proposal repository, which reflects the current work. Feedback/issues on the proposal repository are appreciated so we can discuss them in more detail. Looking at the proposal repository, you may notice that there are several possible directions, though given how diverse the ecosystem is and the current restrictions of production VMs, we don't yet have consensus on exactly how we'll be tackling this.
I assume this is the sketch in static-protection.md? I like this idea too, my concern is that if this was to be fully static it would be hard for runtimes to motivate a fundamental memory layout change without some runtime control of the read-only section. |
Thanks for the links to the more recent proposal repository, it looks like there are already some similar Ideas articulated there! Regarding the static protection proposal for MPUs, it feels like feature creep for WebAssembly to also try to also become the universal IR for embedded systems. Giving up fine-grained r/w-control and memory mapping APIs in return, which have real security, reliability and performance applications seems like a bad deal for everyone except embedded developers. It is OK to have different technologies that solve their use case well, and if WASM wants to make a dent into the native application space it needs to have equal or better capabilities and guarantees, and the lowest common denominator with embedded hardware won't fit that bill. Embedded folks will also probably be happier if they get their own specific thing/spec and don't have to foot the bill for high-level stuff like GC. Edit: const noaccess = new WebAssembly.Memory({ initial: 1, mode: "n"});
const readonly = new WebAssembly.Memory({ initial: 10, mode: "r"});
const readwrite = new WebAssembly.Memory({ initial: 100, mode: "rw"});
...
js: { nomem: noaccess, rmem: readonly, mem: readwrite} (module
(mmap (import "js" "nomem") 0 1)
(mmap (import "js" "rmem") 1 10)
(mmap (import "js" "mem") 11 100)
(mmap (import "js" "rmem") 111 5) // this would panic on embedded
... |
I have some sympathy for this in that there are a lot of powerful features that can offer real value to applications and are already in wide use. There is a lot more diversity in system APIs and capabilities, comparatively speaking, than hardware ISAs. Constantly falling short of feature parity compared to native platforms or APIs limits Wasm's ability to add value to ecosystems. Limiting everything to the least common denominator will eventually cause the least capable platform to dictate that more capable platforms can't exist. So we'll need to manage ecosystem diversity in some way. That said though, WebAssembly has threaded this needle by deftly picking MVP features that get the main value-add of a feature without unduly burdening implementations. What @lukewagner mentions, refering to work by @eqrion to make a simplified model that gets effectively |
Very little of this thread has been dedicated to possible How well would having multiple memories solve this problem compared to, say, an |
The linear memory associated with a WebAssembly instance is a contiguous, byte addressable range of memory. In the MVP each module or instance can only have one memory associated with it, this memory at index zero is the default memory.
The need for finer grained control of memory has been in the cards since the early days of WebAssembly, and some functionality is also described in the future features document.
Motivation
Proposed changes
At a high level, this proposal aims to introduce the functionality of the instructions below:
memory.map
: Provide the functionality ofmmap(addr, length, PROT_READ|PROT_WRITE, MAP_FIXED, fd)
on POSIX, andMapViewOfFile
on Windows with accessFILE_MAP_READ/FILE_MAP_WRITE
.memory.unmap
: Provide the functionality of POSIXmunmap(addr, length)
, andUnmapViewOfFile(lpBaseAddress)
on Windows.memory.protect
: Provide the functionality ofmprotect
withPROT_READ/PROT_WRITE
permissions, andVirtualProtect
on Windows with memory protection constantsPAGE_READONLY
andPAGE_READWRITE
.memory.discard
: Provide the functionality ofmadvise(MADV_DONTNEED)
andVirtualFree(MEM_DECOMMIT);VirtualAlloc(MEM_COMMIT)
on windows.Some options for next steps are outlined below, the instruction semantics will depend on the option. The intent is to pick the option that introduces the least overhead of mapping external memory into the Wasm memory space. Both the options below below assume that additional memories apart from the default memory will be available. The current proposal will only introduce
memory.discard
to work on the default memory, the other three instructions will only operate on memory not at index zero.Option 1: Statically declared memories, with bind/unbind APIs (preferred)
memory.map
/memory.unmap
underneath. (Note: it may be possible for some browser engines to operate on the same backing store without an explicitmap
/unmap
instruction. If the only usecase for these instructions is from JS, it is possible to make these API only as needed.)memtype
to store memory protections in addition to limits for size ranges.Reasons for preferring this approach:
Option 2: First class WebAssembly memories
This is the more elegant approach to dynamically add memories, but adding support for first class memories is non-trivial.
ref.mem
.memarg
to use memory references.Other alternatives
Why not just map/unmap to the single linear memory, or memory(0)?
Web API extensions
To support WebAssembly owning the memory, and also achieving zero copy data transfer, is to extend Web APIs to take typed array views as input parameters into which outputs are written. The advantage here is that the set of APIs that need this can be scaled incrementally with time, and it minimizes the changes to the WebAssembly spec.
The disadvantages are that this would require changes to multiple Web APIs across different standards organizations, it’s not clear that the churn here would result in providing a better data transfer story as some APIs will still need to copy out.
This is summarizing a discussion from the previous issue in which this approach was discussed in more detail.
Using GC Arrays
Though the proposal is still in phase 1, it is very probable that ArrayBuffers will be passed back and forth between JS/Wasm. Currently this proposal is not making assumptions about functionality that is not already available, and when available will evaluate what overhead it introduces with benchmarks. If at that time the mapping functionality is provided by the GC proposal without much overhead, and it makes sense to introduce a dependency on the GC proposal, this proposal will be scoped to the remaining functionality outlined above.
JS API
Interaction of this proposal with JS is somewhat tricky because
Open questions
Consistent implementation across platforms
The functions provided above only include Windows 8+ details. Chrome still supports Windows 7 for critical security issues, but only until Jan 2023, this proposal for now will only focus on windows system calls available on Windows 8+ for now. Any considerations of older Windows users will depend on usage stats of the interested engines.
How would this work in the tools?
While dynamically adding/removing memories is a key use case, for C/C++/Rust programs operate in a single address space, and library code assumes that it has full access to the single address space, and can access any memory. With multiple memories, we are introducing separate address spaces so it’s not clear what overhead we would be introducing.
Similarly, read-only memory is not easy to differentiate in the current model when all the data is in a single read-write memory.
How does this work in the presence of multiple threads?
In applications that use multiple threads, what calls are guaranteed to be atomic? On the JS side, what guarantees can we provide for Typed array views?
Feedback requested
All feedback is welcome, but specific feedback that I would find useful for this issue:
Repository link here if filing issues is more convenient.
The text was updated successfully, but these errors were encountered: