Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pure Rust WASI target? #73432

Closed
newpavlov opened this issue Jun 17, 2020 · 16 comments
Closed

Add a pure Rust WASI target? #73432

newpavlov opened this issue Jun 17, 2020 · 16 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. O-wasm Target: WASM (WebAssembly), http://webassembly.org/

Comments

@newpavlov
Copy link
Contributor

Currently wasm32-wasi target is tied to the libc implementation for compatibility with C/C++ code. But for pure-Rust applications C environment is unnecessary and causes a certain bloat of resulting binaries.

For example a very simple code like this:

fn main() {
    let mut buf = [0u8; 4];
    unsafe {
        let _ = wasi::random_get(buf.as_mut_ptr(), buf.len());
        let c = wasi::Ciovec { buf: buf.as_mut_ptr(), buf_len: buf.len() };
        let _ = wasi::fd_write(1, &[c]);
    }
}

Gets compiled into a 64 KB binary (after strip), I guess a significant amount of which has to do with correctly setting up and destroying C environment.

Thus it could be beneficial to add a target like wasm32-wasi-rust intended for pure-Rust WASI applications. Most of the preliminary work has been already done and dependence of wasm32-wasi on libc is minimal (allocator and environment functions, abort, exit, __wasilibc_find_relpath), so IIUC addition of such target should be relatively simple. Depending on the rate of WASI evolution, such addition could be postponed until a certain level of stabilization to reduce maintenance burden.

Previously it was proposed in #63676.

cc @alexcrichton @sunfishcode

@jonas-schievink jonas-schievink added C-feature-request Category: A feature request, i.e: not implemented / a PR. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ labels Jun 17, 2020
@alexcrichton
Copy link
Member

I would personally see the addition of a duplicate target as a very heavyweight operation that needs to be very well motivated. I don't really know what benefit we stand to gain other than shaving off some binary size, which could also arguably just be construed as a bug to fix in upstream wasi-libc. Are there other reasons than binary size to introduce a new target?

@newpavlov
Copy link
Contributor Author

newpavlov commented Jun 19, 2020

I am not sure setting up C environment can be called a libc bug.

I guess in addition to smaller sizes, having a pure-Rust target could:

  • enable additional optimizations (although cross language LTO, should solve this to a certain extent)
  • make it a bit easier to develop alternative compiler backends for those targets

Also note that there is a general interest in having libc-independent targets if platform allows it (let's call it "ideological reasons"?): rust-lang/rfcs#2610

@newpavlov
Copy link
Contributor Author

Also there is this issue: WebAssembly/WASI#24

I guess it could be solved by tweaking libc, but I think the proposed target probably will be a more natural solution.

@jpmartin2
Copy link

wasi-libc actually already has a change to help with WebAssembly/WASI#24 (see WebAssembly/wasi-libc#74), so at this point I think it's just a matter of rust taking advantage of it. And I managed to figure out how to hack the rust compiler to get it to work, involved setting wasi-root in rust's config.toml (to point to a recent build of wasi-libc) and changing the crt object linked against for the wasi target in librustc_target to crt1-reactor.o instead of crt1.o. Then I was able to produce a wasm file using a binary target with #![no_main] and rustflags = ["-C","link-args=--no-entry"] (creating a cdylib didn't work), and got a wasm file with an _initialize export which does the libpreopen setup (this is provided by crt1-reactor.o). I was able to run the wasm file using wasmtime and call other functions that did file io successfully (I was able to get it working in wasmer as well, though wasmer doesn't call _initialize for you like wasmtime does right now, so you have to add that call yourself).

@sunfishcode
Copy link
Member

The first step here is to optimize Rust to avoid pulling in as much code. Right now with a default cargo init --bin project compiled to wasm32-wasi, Rust is still pulling in the environment-variable code and preopen code, even though neither should be needed just to print "hello world". Fixing these will be a code size win, whether we link with libc or a Rust library providing similar functionality, and whether one uses the new reactor ABI or not.

@newpavlov
Copy link
Contributor Author

newpavlov commented Jun 22, 2020

@sunfishcode
Maybe I am missing something, but shouldn't a pure-Rust target solve it automatically? IIUC right now a variable containing environmental data gets populated unconditionally on startup as part of setting-up C environment, while for a pure-Rust target it will be done as part of std::env functions (we even could remove the caching altogether), thus if program does not use environmental data, compiler will be able to trivially remove associated code using dead-code elimination pass.

@sunfishcode
Copy link
Member

WASI libc is structured such that it avoids pulling in the environment variable code if getenv, environ, etc. are not referenced, so it should work automatically for Rust too.

I've now debugged this a little more: the env is called from within std::panicking::default_hook. I briefly tried panic="abort" and using a completely empty fn main() {}, and Rust still linked in its own Rust panic support code, env code, UTF-8 decoding, backtracing, thread management, some fmt things, and other stuff.

@alexcrichton
Copy link
Member

The source of bloat there is panicking. The runtime thinks it can panic in a few places during fn main(){} (and generally in the internals of println! there's at least one path that LLVM can't prove won't panic). Panicking, as discovered, brings in the fmt machinery. Switching to a pure-Rust target would not solve this (and might make it worse if the Rust code is bigger than the equivalent C!)

I don't think the preopen bits are needed by fn main() {}, though, so maybe it's a bug that's pulled in?

@sunfishcode
Copy link
Member

Some preopen bits are being pulled in because:

  • Rustc adds code to the fn main() {} program which calls various functions in libstd
  • libstd is compiled into a single .o file, so the linker reads the whole thing in
  • Elsewhere in libstd is a call to __wasilibc_find_relpath.
  • libstd's call to __wasilibc_find_relpath then causes the linker to read in the .o file from libc.a which defines it
  • The .o file which defines __wasilibc_find_relpath also contains the constructor __wasilibc_populate_libpreopen
  • wasm-ld does GC, and notices that nothing actually calls __wasilibc_find_relpath and removes that function
  • But, it doesn't remove __wasilibc_populate_libpreopen because it doesn't ever DCE constructor functions, once they're pulled in.

@alexcrichton
Copy link
Member

Ah indeed that makes sense.

That seems unlikely to get fixed (it's just the behavior of the linker), but @sunfishcode perhaps we could refactor the APIs in wasi-libc? Could the population of libpreopen have an explicit entry point which we call from Rust's std::fs functions (the one that calls __wasilibc_find_relpath), and that way we'd avoid pulling in a constructor function?

Basically Rust really has no need for constructor functions, and we'd rather lazily initialize them ourselves, and that way the linker GC should prevent everything from getting pulled in.

@sunfishcode
Copy link
Member

sunfishcode commented Aug 1, 2020

Refactoring libc do do the libpreopen initialization lazily is tricky because once user code is running, it may have renumbered the file descriptors, which would break the way libpreopen currently discovers them.

I've now submitted https://reviews.llvm.org/D85062 which is a patch to wasm-ld which allows it to DCE the __wasilibc_populate_libpreopen constructor when it isn't needed. [edit: link to the right patch]

@coolreader18
Copy link
Contributor

I'm looking into adding support for reactor binaries, since it seems pretty simple -- use crt1-reactor instead of crt1 and set the entry to _initialize instead of _start, based off this code from clang. Would it make sense to add a variant LinkOutputKind::WasmReactor, since that's what currently handles which crt to use? Or maybe repurpose an existing LinkOutputKind variant to mean "reactor" on wasi?

@coolreader18
Copy link
Contributor

Or if we do need to change the entry symbol, maybe it should be a separate target entirely? wasm32-wasi_reactor or something? Cause it looks like llvm args are per-target, so if we need to pass -e like x86_64_fortanix_unknown_sgx.rs, it might make sense to be an entirely separate target rather than try to hack around it, and then all the *-dylib LinkOutputKinds can map to crt1-reactor like all the *-exe kinds map to crt1.o on wasi.

@coolreader18
Copy link
Contributor

Or does LinkOutputKind have the chance to change llvm args at some point? That seems like it should happen.

@coolreader18
Copy link
Contributor

Ok, nevermind, the pre_link_args are args to the linker, duh. I think I'll just repurpose cdylib to put an _initialize func in; I don't think there's much sense in making a wasm32-wasi module that can't use wasi functions, so I don't think changing that behavior is really a big deal.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jan 11, 2021
…ichton

Emit a reactor for cdylib target on wasi

Fixes rust-lang#79199, and relevant to rust-lang#73432

Implements wasi reactors, as described in WebAssembly/WASI#13 and [`design/application-abi.md`](https://github.com/WebAssembly/WASI/blob/master/design/application-abi.md)

Empty `lib.rs`, `lib.crate-type = ["cdylib"]`:

```shell
$ cargo +reactor build --release --target wasm32-wasi
   Compiling wasm-reactor v0.1.0 (/home/coolreader18/wasm-reactor)
    Finished release [optimized] target(s) in 0.08s
$ wasm-dis target/wasm32-wasi/release/wasm_reactor.wasm >reactor.wat
```
`reactor.wat`:
```wat
(module
 (type $none_=>_none (func))
 (type $i32_=>_none (func (param i32)))
 (type $i32_i32_=>_i32 (func (param i32 i32) (result i32)))
 (type $i32_=>_i32 (func (param i32) (result i32)))
 (type $i32_i32_i32_=>_i32 (func (param i32 i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "fd_prestat_get" (func $__wasi_fd_prestat_get (param i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "fd_prestat_dir_name" (func $__wasi_fd_prestat_dir_name (param i32 i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "proc_exit" (func $__wasi_proc_exit (param i32)))
 (import "wasi_snapshot_preview1" "environ_sizes_get" (func $__wasi_environ_sizes_get (param i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "environ_get" (func $__wasi_environ_get (param i32 i32) (result i32)))
 (memory $0 17)
 (table $0 1 1 funcref)
 (global $global$0 (mut i32) (i32.const 1048576))
 (global $global$1 i32 (i32.const 1049096))
 (global $global$2 i32 (i32.const 1049096))
 (export "memory" (memory $0))
 (export "_initialize" (func $_initialize))
 (export "__data_end" (global $global$1))
 (export "__heap_base" (global $global$2))
 (func $__wasm_call_ctors
  (call $__wasilibc_initialize_environ_eagerly)
  (call $__wasilibc_populate_preopens)
 )
 (func $_initialize
  (call $__wasm_call_ctors)
 )
 (func $malloc (param $0 i32) (result i32)
  (call $dlmalloc
   (local.get $0)
  )
 )
 ;; lots of dlmalloc, memset/memcpy, & libpreopen code
)
```

I went with repurposing cdylib because I figured that it doesn't make much sense to have a wasi shared library that can't be initialized, and even if someone was using it adding an `_initialize` export is a very small change.
@workingjubilee
Copy link
Member

I would personally see the addition of a duplicate target as a very heavyweight operation that needs to be very well motivated.

and now:

  • wasm32-wasi-preview1
  • wasm32-wasi-preview1-threads
  • wasm32-wasi-preview2

and of course:

  • x86_64-apple-darwin
  • x86_64h-apple-darwin

Or indeed any -gnu and -musl and -uclibc targets.

Strictly speaking, these are not duplicates (in one case the duplication IS total, but it is due to a bug), but the variations between them can be very subtle, and it is in many cases still useful to have distinct support for them, especially for more experimental targets. Thus, @newpavlov, if you wish to do the work, and can describe its merits, I believe it will be considered for inclusion (whether it will be accepted is of course another story).

However, the target tier policy regarding adding targets mostly does not require GitHub issues open for them. In a sense, we are not accepting suggestions anymore, we are accepting PRs. As the rest of the conversation seems to be about ancillary bugs which are only secondarily related, and those can be split into new issues, I don't see a convenient "repurpose this issue" angle. Thus I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. O-wasm Target: WASM (WebAssembly), http://webassembly.org/
Projects
None yet
Development

No branches or pull requests

7 participants