Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add experimental support for illumos #9535

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Oct 31, 2024

With this change, the basic wasm tests all pass on illumos. Note the addition of NORESERVE to mmap calls.

However:

While wasmtime appears to be functional on illumos, it is still quite slow, particularly in the wast tests. For example, the following test:

cargo +beta test --test wast -- Cranelift/pooling/tests/spec_testsuite/load.wast

takes 0.07 seconds on Linux, but over 5 seconds on illumos. Some profiling suggests that this is due to lock contention inside the kernel while freeing memory, so I don't think this is a wasmtime issue. I'd like to pull some illumos experts in to do some debugging here as time permits, but I don't think this PR should necessarily be held up on that.

Thanks to iximeow for all the help with this!

(One note is that due to a rustc segfault on illumos, building wasmtime requires Rust 1.83 or higher. I did my building and testing with +beta.)

@sunshowers sunshowers requested a review from a team as a code owner October 31, 2024 20:38
@sunshowers sunshowers requested review from pchickey and removed request for a team October 31, 2024 20:38
Comment on lines 15 to 25
if #[cfg(target_os = "illumos")] {
// On illumos, by default, mmap reserves what it calls "swap space" ahead of time, so that
// memory accesses are guaranteed not to fail once mmap succeeds. NORESERVE is for cases
// where that memory is never meant to be accessed -- e.g. memory that's used as guard
// pages.
pub(super) const MMAP_NORESERVE_FLAG: rustix::mm::MapFlags =
rustix::mm::MapFlags::NORESERVE;
Copy link
Contributor Author

@sunshowers sunshowers Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having spent some time digging into this, I believe that we should also pass in NORESERVE on Linux here as well. On Linux this is a bit hidden with overcommit, but e.g. Chromium passes in NORESERVE: https://chromium.googlesource.com/v8/v8/+/chromium/2840/src/base/platform/platform-linux.cc#234

I'm happy to update this PR with that change if it sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me!

@sunshowers
Copy link
Contributor Author

re slowness, for future reference, with 997Hz sampling:

user stacks: https://gist.github.com/sunshowers/b69b7bd2e671d9c23355d5e952636c5e

kernel stacks: https://gist.github.com/sunshowers/fa822f161e54d57a8103f6736656fbe8

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for your time in doing this!

One high-level question for you: how easy is it to compile for illumos from Linux? For example we'd ideally couple this with a check in our CI that illumos continues to build (e.g. at least via cargo check) but our CI only runs Linux/macOS/Windows right now so we'd have to cross-compile. If it's easy to cross-compile I think it'd be quite reasonable to have a check added to CI that this passes. For example this is our FreeBSD check (and reading that again we should probably apply the same trick you've done here with handling vtune/ittapi by default, but that's ok to leave to a future PR)

takes 0.07 seconds on Linux, but over 5 seconds on illumos

Whoa! It looks like the pooling allocator is the part that's slow here and that, by default, has a large number of virtual memory mappings associated with it. For example it'll allocate terabytes of virtual memory and then within that giant chunk it'll slice up roughly 10_000 linear memories (each with guard regions between them). These are prepared with a MemoryImageSlot each.

My guess is that the way things are managed is tuned to "this is acceptable due to some fast path in Linux we're hidding" which we didn't really design for and just happened to run across. I think it'd be reasonable to solve this along a number of lines such as:

  • Rely on the kernel to fix this (as you mentioned)
  • Change default settings in Wasmtime for illumos (e.g. different defaults for the pooling allocator or something like that)
  • Refactor the pooling allocator to have a different allocation strategy by default which is better-by-default on illumos.

Basically I think it'd be reasonable to handle this in Wasmtime with code changes too. I'm certainly no expert with illumos though so I'm also happy to defer to those with more knowledge of it.

Comment on lines 15 to 25
if #[cfg(target_os = "illumos")] {
// On illumos, by default, mmap reserves what it calls "swap space" ahead of time, so that
// memory accesses are guaranteed not to fail once mmap succeeds. NORESERVE is for cases
// where that memory is never meant to be accessed -- e.g. memory that's used as guard
// pages.
pub(super) const MMAP_NORESERVE_FLAG: rustix::mm::MapFlags =
rustix::mm::MapFlags::NORESERVE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me!

@@ -24,7 +37,7 @@ impl Mmap {
ptr::null_mut(),
size,
rustix::mm::ProtFlags::READ | rustix::mm::ProtFlags::WRITE,
rustix::mm::MapFlags::PRIVATE,
rustix::mm::MapFlags::PRIVATE | MMAP_NORESERVE_FLAG,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this one in specific we'll want to leave off NORESERVE since this is creating a mapping that is entirely readable/writable and aren't doing the same trick with address-space reservation like the reserve call below.

(unless in your testing you found that this was required, in which case I'd be curious to dig in!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this in based on the discussion below -- do you feel like that's okay or should I test without the flag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah yeah let's leave it here and see how it goes, it doesn't seem unreasonable given the parts below

@@ -56,7 +56,7 @@ pub unsafe fn decommit_pages(addr: *mut u8, len: usize) -> io::Result<()> {
addr as _,
len,
ProtFlags::READ | ProtFlags::WRITE,
MapFlags::PRIVATE | MapFlags::FIXED,
MapFlags::PRIVATE | super::mmap::MMAP_NORESERVE_FLAG | MapFlags::FIXED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for you (as I've not had much experience historically with MMAP_NORESERVE):

In above situations MMAP_NORESERVE is paired with ProtFlags::empty() which makes sense to me because that's just a VM reservation as opposed to actually needing readable/writable memory at any one point in time. Here, though, there are read/write flags (same as remap_as_zeros_at below). Does that means that we should leave off MMAP_NORESERVE here?

This function is for example used for erasing wasm memory that was already accessible to the guest itself so in some sense we in theory already have pages reserved for it, so I think there's a case to be made for leaving off the flag here and below. That being said I'm new to this flag, so I'm curious what you think too!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this and the other cases with non-PROT_NONE permissions! From reading the code it wasn't quite clear to me whether this is memory that guests would certainly have accessed and therefore have physical reservations for already, or memory that guests could merely theoretically access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iximeow actually suggested I do this, so I'm wondering if they have thoughts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i may have misremembered where decommit_pages gets used! my memory was that this is used when cleaning up after an instance exits, where a new module that could have a smaller max heap size. so my thinking was that even though the pages had been accessible before, we don't yet know if that will still be the case on the instance's next invocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh this function definitely has a storied history and has gone through many iterations. I had to reread up on the callsites to refresh myself on what exactly is happening here. @iximeow you're right that this is used after an instance is deallocated, but this is separate from resizing memory. Here decommit_pages is only used to purge backing memory for ranges that were possibly-in-use previously and reset them to their original contents. That's used for both linear memories and tables then. Later other functions like hide_existing_mapping are use to shrink memory or expose_existing_mapping are used to grow memory.

So put another way: this function is used on memory which was theoretically accessible to wasm but it may or may not have touched because it may not have touched every page in its linear memory. Some of it may be reused for the next instance without further syscalls. Some of it may be hidden via mprotect and then re-exposed via mprotect to use in wasm later on too.

I suspect Wasmtime's not built currently to work well with precise overcommit because we'd have to hook into expose_existing_mapping as well for example which currently is just a bland mprotect. Otherwise I don't think it's unreasonable to use NORESERVE here.

Overall though it seems like the main use case for NORESERVE is in the reserve function above where the allocations are often 6G+. Everywhere else should be relatively small allocations/deallocations related to what wasm is about to use or previously using.

In any case happy to leave this in here and we can always frob it later if necessary.

@sunshowers
Copy link
Contributor Author

sunshowers commented Oct 31, 2024

Change default settings in Wasmtime for illumos (e.g. different defaults for the pooling allocator or something like that)

Interesting -- how would I go about doing this? I was floundering around the wasmtime code haha :)

@sunshowers
Copy link
Contributor Author

sunshowers commented Oct 31, 2024

One high-level question for you: how easy is it to compile for illumos from Linux? For example we'd ideally couple this with a check in our CI that illumos continues to build (e.g. at least via cargo check) but our CI only runs Linux/macOS/Windows right now so we'd have to cross-compile. If it's easy to cross-compile I think it'd be quite reasonable to have a check added to CI that this passes.

cargo check is definitely feasible. cargo build would require the linker etc, and is likely easier with cross.

edit: ahh spoke too soon. cargo check on illumos currently fails with "error occurred: Failed to find tool. Is gar installed?" (full output).

gar is what GNU ar is called on illumos. One could mangle paths and such but it's easier to just use cross build --target x86_64-unknown-illumos, which works great.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's easier to just use cross build --target x86_64-unknown-illumos

oh nice! If you'd like to add that here as well feel free, otherwise I'm happy to try to finagle this in a follow-up too


If you want to setup the NORESERVE flag for linux builds as well I think this might be good to go after that?

@@ -56,7 +56,7 @@ pub unsafe fn decommit_pages(addr: *mut u8, len: usize) -> io::Result<()> {
addr as _,
len,
ProtFlags::READ | ProtFlags::WRITE,
MapFlags::PRIVATE | MapFlags::FIXED,
MapFlags::PRIVATE | super::mmap::MMAP_NORESERVE_FLAG | MapFlags::FIXED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh this function definitely has a storied history and has gone through many iterations. I had to reread up on the callsites to refresh myself on what exactly is happening here. @iximeow you're right that this is used after an instance is deallocated, but this is separate from resizing memory. Here decommit_pages is only used to purge backing memory for ranges that were possibly-in-use previously and reset them to their original contents. That's used for both linear memories and tables then. Later other functions like hide_existing_mapping are use to shrink memory or expose_existing_mapping are used to grow memory.

So put another way: this function is used on memory which was theoretically accessible to wasm but it may or may not have touched because it may not have touched every page in its linear memory. Some of it may be reused for the next instance without further syscalls. Some of it may be hidden via mprotect and then re-exposed via mprotect to use in wasm later on too.

I suspect Wasmtime's not built currently to work well with precise overcommit because we'd have to hook into expose_existing_mapping as well for example which currently is just a bland mprotect. Otherwise I don't think it's unreasonable to use NORESERVE here.

Overall though it seems like the main use case for NORESERVE is in the reserve function above where the allocations are often 6G+. Everywhere else should be relatively small allocations/deallocations related to what wasm is about to use or previously using.

In any case happy to leave this in here and we can always frob it later if necessary.

@sunshowers
Copy link
Contributor Author

Ugh, I guess I need to clean the target dir for illumos. Silly glibc.

@sunshowers sunshowers force-pushed the illumos branch 2 times, most recently from 5e82b23 to 470ad80 Compare October 31, 2024 22:48
With this change, the basic wasm tests all pass on illumos. Note the addition
of NORESERVE to mmap calls.

However:

While wasmtime appears to be functional on illumos, it is still quite slow,
particularly in the wast tests. For example, the following test:

```
cargo +beta test --test wast -- Cranelift/pooling/tests/spec_testsuite/load.wast
```

takes 0.07 seconds on Linux, but over 5 seconds on illumos. Some profiling
suggests that this is due to lock contention inside the kernel while freeing
memory, so I don't think this is a wasmtime issue. I'd like to pull some
illumos experts in to do some debugging here as time permits, but I don't think
this PR should necessarily be held up on that.

Thanks to iximeow for all the help with this!

(One note is that due to a [rustc segfault on
illumos](https://sunshowers.io/posts/rustc-segfault-illumos/), building
wasmtime requires Rust 1.83 or higher. I did my building and testing with
`+beta`.)
@sunshowers
Copy link
Contributor Author

All right, CI updated and I think this is good to go.

@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Oct 31, 2024
@alexcrichton alexcrichton added this pull request to the merge queue Nov 1, 2024
Merged via the queue into bytecodealliance:main with commit fd384cb Nov 1, 2024
40 checks passed
@sunshowers sunshowers deleted the illumos branch November 1, 2024 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants