Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASI unwinding is broken in release #132416

Open
purplesyringa opened this issue Oct 31, 2024 · 29 comments
Open

WASI unwinding is broken in release #132416

purplesyringa opened this issue Oct 31, 2024 · 29 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-panic Area: Panicking machinery C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code O-wasi Operating system: Wasi, Webassembly System Interface O-wasm Target: WASM (WebAssembly), http://webassembly.org/ P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@purplesyringa
Copy link
Contributor

This is target wasm32-wasip1 with panic = "unwind", running on V8. I tried this code:

struct Dropper;

impl Drop for Dropper {
    fn drop(&mut self) {
        let _ = std::panic::catch_unwind(|| {
            std::panic::resume_unwind(Box::new(String::from("About to do some nifty corruption")))
        });
    }
}

fn main() {
    let _dropper = Dropper;
    panic!("Triggering landing pad");
}

I expected to see this happen: a panic message, followed by the process exit.

Instead, this happened:

thread 'main' panicked at src/main.rs:13:5:
Triggering landing pad
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:1


RuntimeError: memory access out of bounds
    at garbage2-65ce2afe6e737f0d.wasm.dlfree (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[140]:0x7464)
    at garbage2-65ce2afe6e737f0d.wasm.free (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[139]:0x70bc)
    at garbage2-65ce2afe6e737f0d.wasm.__rdl_dealloc (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[26]:0xe0e)
    at garbage2-65ce2afe6e737f0d.wasm.__rust_dealloc (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[14]:0x577)
    at garbage2-65ce2afe6e737f0d.wasm._ZN3std2rt19lang_start_internal17hfafca2af8f4e8869E (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[72]:0x2b91)
    at garbage2-65ce2afe6e737f0d.wasm.__main_void (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[12]:0x545)
    at garbage2-65ce2afe6e737f0d.wasm._start (wasm://wasm/garbage2-65ce2afe6e737f0d.wasm-0003f686:wasm-function[5]:0x297)
    at WASI.start (node:wasi:136:7)
    at file:///home/purplesyringa/garbage2/wasi.mjs:12:6

Node.js v20.4.0

Meta

rustc --version --verbose:

rustc 1.84.0-nightly (1e4f10ba6 2024-10-29)
binary: rustc
commit-hash: 1e4f10ba6476e48a42a79b9f846a2d9366525b9e
commit-date: 2024-10-29
host: x86_64-unknown-linux-gnu
release: 1.84.0-nightly
LLVM version: 19.1.1

node --version:

v20.4.0

Compile with --release, run with

import { WASI } from "node:wasi";
import { readFile } from "node:fs/promises";

const wasi = new WASI({
    version: "preview1",
    args: process.argv.slice(2),
    env: process.env,
});

const wasm = await WebAssembly.compile(await readFile(process.argv[2]));
const instance = await WebAssembly.instantiate(wasm, wasi.getImportObject());

wasi.start(instance);

I'm not sure if this is a rustc bug, an LLVM bug, or a V8 bug, but I thought this might be important to track.

@purplesyringa purplesyringa added the C-bug Category: This is a bug. label Oct 31, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 31, 2024
@purplesyringa
Copy link
Contributor Author

@rustbot label +A-panic +I-unsound +O-wasi +O-wasm +T-compiler

@rustbot rustbot added A-panic Area: Panicking machinery I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-wasi Operating system: Wasi, Webassembly System Interface O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 31, 2024
@workingjubilee
Copy link
Member

workingjubilee commented Oct 31, 2024

...I'm not sure this is what we'd usually call unsound, because of the nature of wasm, but it's definitely not supposed to happen either, sooo

@alexcrichton

@purplesyringa
Copy link
Contributor Author

purplesyringa commented Oct 31, 2024

I'm pretty sure the problem here is type confusion: I throw a String as a second exception, but then somehow catch_unwind catches the first exception, which is a static string (at least, that's the closest explanation I have after playing around with this for a bit). Type confusion's probably close enough to I-unsound.

@jieyouxu jieyouxu added the E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status label Oct 31, 2024
@alexcrichton
Copy link
Member

How did you build this code? Locally I get:

$ rustc foo.rs --target wasm32-wasip1 -C panic=unwind
error: the crate `panic_unwind` does not have the panic strategy `unwind`

error: aborting due to 1 previous error

In general wasm32-wasip1 definitely does not support unwinding at all. What exactly is happening I can try to help determine but I'll need to know how to build the code

@jieyouxu jieyouxu added S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. S-needs-info Status: The issue lacks details necessary to triage or act on it. labels Oct 31, 2024
@purplesyringa
Copy link
Contributor Author

purplesyringa commented Oct 31, 2024

RUSTFLAGS="-C panic=unwind" CARGO_TARGET_WASM32_WASIP1_RUNNER="node wasi.mjs" cargo run --target wasm32-wasip1 --release -Z build-std=core,std

Generally speaking, unwinding works just fine, it seems to be nested panics that are broken.

@alexcrichton
Copy link
Member

Ah ok, wasm exception handling is pretty far outside my wheelhouse right now. AFAIK the exception-handling support is not super heavily used in Rust right now (but is more heavily used in Emscripten). In that sense just because a few things work probably doesn't mean that everything works (as this issue seems to indicate).

cc @coolreader18 though and #118168 as this might also be related to #121438

@purplesyringa
Copy link
Contributor Author

purplesyringa commented Oct 31, 2024

I might have partially cracked this.

// Any linkage to LLVM intrinsics for now forcibly marks them all as never
// unwinds since LLVM sometimes can't handle codegen which `invoke`s
// intrinsic functions.
if let Some(name) = &codegen_fn_attrs.link_name {
if name.as_str().starts_with("llvm.") {
codegen_fn_attrs.flags |= CodegenFnAttrFlags::NEVER_UNWIND;
}
}

https://github.com/rust-lang/stdarch/blob/ff9a4445038eae46fd095188740946808581bc0e/crates/core_arch/src/wasm32/mod.rs#L35-L38

I, uh, don't think it's a good idea to mark the throwing intrinsic as nounwind. This might explain why everything seems to work smoothly without LTO, but with -Z build-std something breaks.

@purplesyringa purplesyringa changed the title WASI unwinding is broken WASI unwinding is broken in release Oct 31, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 31, 2024
@coolreader18
Copy link
Contributor

Ah, yes, that'd probably be it lol. If leaving off the nounwind for llvm.wasm.throwworks, that would probably be the right solution.

@jieyouxu jieyouxu added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. S-needs-info Status: The issue lacks details necessary to triage or act on it. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status labels Nov 1, 2024
@coolreader18
Copy link
Contributor

Hmm, interestingly this is the llvm-ir for both before and after adding a && name != "llvm.wasm.throw" check to that conditional -- somehow the intrinsic isn't getting marked with nounwind, but the function that contains it is? Maybe I'm not familiar enough with how attributes on llvm intrinsics work.

; unwind::wasm::_Unwind_RaiseException
; Function Attrs: noreturn nounwind uwtable
define dso_local noundef range(i32 0, 10) i32 @_ZN6unwind4wasm22_Unwind_RaiseException17he22b4a6b99a1dce6E(ptr noundef %exception) unnamed_addr #1 {
start:
  tail call void @llvm.wasm.throw(i32 noundef 0, ptr noundef %exception) #4
  unreachable
}

; Function Attrs: noreturn
declare dso_local void @llvm.wasm.throw(i32 immarg, ptr) unnamed_addr #2

@coolreader18
Copy link
Contributor

Though, in it's declaration in the panic_unwind cgu, it's not marked as such.

; unwind::wasm::_Unwind_RaiseException
; Function Attrs: uwtable
declare dso_local noundef range(i32 0, 10) i32 @_ZN6unwind4wasm22_Unwind_RaiseException17he22b4a6b99a1dce6E(ptr noundef) unnamed_addr #0

I'm wondering if this maybe isn't a problem with the throw codegen, but instead something around memory management? It seems like it's erroring specifically when it tries to drop the panic payload. If the code is changed to this:

struct Dropper;

impl Drop for Dropper {
    fn drop(&mut self) {
        let _ = std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(())));
    }
}

fn main() {
    let _dropper = Dropper;
    panic!("Triggering landing pad");
}

It runs fine:

thread 'main' panicked at src/main.rs:11:5:
Triggering landing pad
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@purplesyringa
Copy link
Contributor Author

I don't have a cloned Rust repo at the moment, but did you try replacing extern "C" with extern "C-unwind" on the intrinsic? I think Rust mostly uses callee ABI to determine whether the caller unwinds, this might affect something.

@coolreader18
Copy link
Contributor

Oh, that does indeed fix the nounwind attribute:

; unwind::wasm::_Unwind_RaiseException
; Function Attrs: noreturn uwtable
define dso_local noundef range(i32 0, 10) i32 @_ZN6unwind4wasm22_Unwind_RaiseException17he22b4a6b99a1dce6E(ptr noundef %exception) unnamed_addr #1 !dbg !68 {
start:
    #dbg_value(ptr %exception, !72, !DIExpression(), !73)
    #dbg_value(ptr %exception, !74, !DIExpression(), !83)
  tail call void @llvm.wasm.throw(i32 noundef 0, ptr noundef %exception) #2, !dbg !85
  unreachable, !dbg !85
}

; Function Attrs: noreturn
declare dso_local void @llvm.wasm.throw(i32 immarg, ptr) unnamed_addr #2

But it doesn't fix the error.

@coolreader18
Copy link
Contributor

Although, removing the nounwind seems to be a combination of both marking it C-unwind and modifying the check. Honestly, that check doesn't really seem necessary, since the vast majority of llvm intrinsics aren't going to be declared as a -unwind abi anyway.

@coolreader18
Copy link
Contributor

Well, now this is just bizarre.

struct Dropper;

impl Drop for Dropper {
    fn drop(&mut self) {
        let _ =
            std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(String::from("aaaaA"))));
    }
}

#[inline(never)]
fn main() {
    let _dropper = Dropper;
    let _ =
        std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(String::from("hello2"))));
    let _ =
        std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(String::from("hello3"))));
    panic!("Triggering landing pad");
}

If there are at least 2 of those catch_unwind(|| resume_unwind()) calls in main, the process exits without the memory error. I guess it's just memory corruption of some sort, but I'm not sure why. I thought maybe dlmalloc wasn't being compiled with unwind support, but it shouldn't need it anyway, since it shouldn't be panicking inside the allocator. It does seem like it's perhaps the C version of dlmalloc that's getting linked in? AFAICT, the rust crate doesn't actually define global symbols for dlmalloc and dlfree, but they're there in the binary anyway.

@coolreader18
Copy link
Contributor

Best I can tell, the __rust_dealloc call that's leading to the error corresponds to the implicit drop in .unwrap_or(101) in lang_start_internal. I'm not sure why that's leading to an issue, even when changing it from panic!() to panic_any(()) with a ZST payload that means it shouldn't even need to enter dlfree, but I'm done with this for tonight.

@jieyouxu jieyouxu added the E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status label Nov 1, 2024
@coolreader18
Copy link
Contributor

Ok, yeah, it's definitely happening after fn main() exits.

struct Dropper;

impl Drop for Dropper {
    fn drop(&mut self) {
        println!("before catch_unwind");
        let _ =
            std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(String::from("aaaaA"))));
        println!("after catch_unwind");
    }
}

#[inline(never)]
fn main() {
    let _dropper = Dropper;
    std::panic::panic_any(())
}
thread 'main' panicked at src/main.rs:15:5:
Box<dyn Any>
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
before catch_unwind
after catch_unwind
wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:1


RuntimeError: memory access out of bounds
    at wasmtest-c0fab10d8fc8dc84.wasm.dlfree (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[137]:0x76f6)
    at wasmtest-c0fab10d8fc8dc84.wasm.free (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[136]:0x734e)
    at wasmtest-c0fab10d8fc8dc84.wasm.__rdl_dealloc (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[104]:0x48f7)
    at wasmtest-c0fab10d8fc8dc84.wasm.__rust_dealloc (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[22]:0x6cd)
    at wasmtest-c0fab10d8fc8dc84.wasm._ZN3std2rt19lang_start_internal17h46ef65fbf2e04218E (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[74]:0x28be)
    at wasmtest-c0fab10d8fc8dc84.wasm.__main_void (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[20]:0x69b)
    at wasmtest-c0fab10d8fc8dc84.wasm._start (wasm://wasm/wasmtest-c0fab10d8fc8dc84.wasm-00047f6a:wasm-function[5]:0x2ba)

@coolreader18
Copy link
Contributor

Definitely something to do with optimizations and inlining.

struct Dropper;

// #[inline(never)] here
fn foo() {
    let _ = std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(String::from("aaaaA"))));
}
impl Drop for Dropper {
    // OR #[inline(never)] here prevent the issue from happening.
    fn drop(&mut self) {
        println!("before catch_unwind");
        foo();
        println!("after catch_unwind");
    }
}

#[inline(never)]
fn main() {
    let _dropper = Dropper;
    std::panic::panic_any(())
}

@coolreader18
Copy link
Contributor

Ok, 100% some sort of miscompilation.

struct Dropper;

impl Drop for Dropper {
    fn drop(&mut self) {
        let _ = std::panic::catch_unwind(|| std::panic::resume_unwind(Box::new(Foo)));
    }
}

#[derive(Debug)]
struct Foo;
#[derive(Debug)]
struct Bar;

fn main() {
    std::panic::catch_unwind(|| {
        let _dropper = Dropper;
        std::panic::resume_unwind(Box::new(Bar))
    })
    .unwrap_or_else(|e| {
        dbg!(e.downcast_ref::<Foo>());
        assert!(e.is::<Bar>());
    });
}
[src/main.rs:20:9] e.downcast_ref::<Foo>() = Some(
    Foo,
)
thread 'main' panicked at src/main.rs:21:9:
assertion failed: e.is::<Bar>()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@purplesyringa
Copy link
Contributor Author

@rustbot label +I-miscompile

@rustbot rustbot added the I-miscompile Issue: Correct Rust code lowers to incorrect machine code label Nov 1, 2024
@purplesyringa
Copy link
Contributor Author

As far as I can see from the generated wasm code, there's a spurious rethrow statement emitted somewhere in the nested catch_unwind. I.e. the catch_unwind in Dropper catches the panic, handles it, and then rethrows it, so the outer catch_unwind catches it again, ignoring the original panic and overriding it with the repeated second one, thus leading to the seen type confusion and UAF

@purplesyringa
Copy link
Contributor Author

purplesyringa commented Nov 1, 2024

Looks like an LLVM WASM backend bug to me. The post-opt IR of all the failing examples looks like this, in pseudocode:

try:
    raise_panic()
cleanup:
    try:
        raise_panic()
    catch:
        catch_panic()
catch:
    catch_panic()

The key part here is that the try block has two successors, i.e. cleanupret uses unwind label. WASM has no built-in cleanup pad support, so you have to catch exceptions and reraise them manually, i.e. the correct codegen would be (again, pseudocode for ease of comparison):

try:
    try:
        raise_panic()
    catch_all:
        try:
            raise_panic()
        catch:
            catch_panic()
        rethrow
catch:
    catch_panic()

However, the actual codegen is:

try:
    raise_panic()
catch_all:
    try:
        try:
            raise_panic()
        catch:
            catch_panic()
            rethrow
    catch:
        catch_panic()

These are the same instructions, but arranged in a really wrong way, which leads to the same panic being caught twice. This seems to be a bug in successor handling specifically, and that explains why disabling inlining helps. Perhaps confused basic blocks handling?

@rustbot label +A-LLVM

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Nov 1, 2024
@purplesyringa
Copy link
Contributor Author

Actually, this codegen would be valid if rethrow throwed the external panic rather than the nested one. And that's actually allowed by the wasm IR, rethrow can rethrow the exception from a particular try block. But the generated index is wrong! Here's a C++ reproducer of the same problem:

#include <cstdio>

__attribute__((noinline)) void print_current() noexcept {
  try {
    throw;
  } catch (int x) {
    printf("%d\n", x);
  }
}

struct Dropper {
  ~Dropper() {
    try {
      throw 1;
    } catch (...) {
      print_current();
    }
  }
};

int main() {
  try {
    Dropper dropper;
    throw 2;
  } catch (...) {
    print_current();
  }
}

Expected output: 1 2, actual: 1 1.

@purplesyringa
Copy link
Contributor Author

I've filed an upstream bug: llvm/llvm-project#114600

@jieyouxu jieyouxu removed I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status labels Nov 2, 2024
@purplesyringa
Copy link
Contributor Author

This sounds relevant to T-libs because of the stdarch PR. Is there a practice of removing labels when the topic has been resolved that I'm unaware about?

@jieyouxu
Copy link
Member

jieyouxu commented Nov 4, 2024

No, I didn't mean to remove T-libs but the label UI decided apparently I wanted to.

@jieyouxu jieyouxu added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Nov 4, 2024
@apiraino
Copy link
Contributor

apiraino commented Nov 4, 2024

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 4, 2024
@coolreader18
Copy link
Contributor

The LLVM PR merged, so now it's just waiting for that to be merged into rust-lang/llvm-project.

@purplesyringa
Copy link
Contributor Author

No, we also have to merge the nounwind fix for intrinsics.

@purplesyringa
Copy link
Contributor Author

We might want to keep this issue around for a bit longer. Emscripten seems to still have some problems with WASM unwinding (emscripten-core/emscripten#22861), and I'm not sure if they are applicable to Rust too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-panic Area: Panicking machinery C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code O-wasi Operating system: Wasi, Webassembly System Interface O-wasm Target: WASM (WebAssembly), http://webassembly.org/ P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants