Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polkavm-linker: Swap origin/target in ADD32/SUB32 #260

Closed
wants to merge 2 commits into from

Conversation

jarkkojs
Copy link
Contributor

Relocation for ADD32/SUB32 relocation pair has origin/target wrong order. Swap origin and target.

Relocation for ADD32/SUB32 relocation pair has origin/target wrong
order. Swap origin and target.

Signed-off-by: Jarkko Sakkinen <[email protected]>
@jarkkojs jarkkojs requested review from koute and athei January 26, 2025 00:53
@jarkkojs
Copy link
Contributor Author

Is this due the fix (candidate) or something else:


failures:

---- tests::compiler_linux_doom_o3_dwarf2 stdout ----
thread 'tests::compiler_linux_doom_o3_dwarf2' panicked at crates/polkavm/src/tests.rs:1380:72:
called `Result::unwrap()` on an `Err` value: ProgramFromElfError(Other("failed to process DWARF: unexpected relocation at <section #18+208>: Offset { origin: <section #1+2588>, target: <section #1+0>, size: Generic(U32) }"))


failures:
    tests::compiler_linux_doom_o3_dwarf2

@jarkkojs
Copy link
Contributor Author

Closes: #247

Comment on lines 7963 to 7973
[(_, Kind::Mut(MutOp::Add, RelocationSize::U32, target_1)), (_, Kind::Mut(MutOp::Sub, RelocationSize::U32, target_2))] => {
relocations.insert(
current_location,
RelocationKind::Offset {
origin: *target_1,
target: *target_2,
size: SizeRelocationSize::Generic(RelocationSize::U32),
},
);
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically this isn't correct, so it's most likely not a proper fix (as evidenced by the failing tests).

We have RelocationKind::Offset { origin, target, ... }, so in origin we should have the base from which the offset is calculated, and in target we should have the destination, so logically this should match a relocation that looks like: offset = target - origin. However what you're doing here is the other way around: offset = origin - target.

Copy link
Contributor Author

@jarkkojs jarkkojs Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically this isn't correct, so it's most likely not a proper fix (as evidenced by the failing tests).

@koute, a snippet from your comment in the original issue:

                    0: R_RISCV_ADD32        .Lanon.cb76ee1d57f0804fce1a80e99f7c73f1.1
                    0: R_RISCV_SUB32        .Lswitch.table._ZN57_$LT$program_for_bug..Foo$u20$as$u20$core..fmt..Debug$GT$3fmt17h48080cab75666063E.1.rel

This is what the spec says:

Screenshot From 2025-01-27 19-05-46

It can be seen that the correct computation is (+*target_1) + (-*target_2) = *target_1 - *target_2.

Please review and point out if there is a step that went wrong.

The psABI-specification that I used my main reference is available here:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20240829-13bfa9f54634cb60d86b9b333e109f077805b4b3/riscv-abi.pdf

EDIT: I made this as clean and transparent as possible for more convenient review experience.

Copy link
Contributor Author

@jarkkojs jarkkojs Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can comment my logic but I cannot comment on evidence as you did not explicitly point out the evidence. That said, I assume that we are talking about the doom test.

I'll lookup the doom example next for comparison.

Copy link
Contributor Author

@jarkkojs jarkkojs Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Okay, so the offset is correct-ish, but it has the wrong sign."
A comment by @koute in: #247 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also need a comment on whether there is "a test" or "tests" (plural) failing. I see in the CI run "a test" saying but your comment contradicts that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koute thanks for the feedback :-) I'll have now new ideas how to approach this! All good.

Comment on lines +1 to +56
#![no_std]
#![no_main]

extern crate core;

use core::fmt::Write;
use polkavm_derive::polkavm_export;

pub enum Foo {
Success,
CalleeTrapped,
Unknown,
}

impl ::core::fmt::Debug for Foo {
#[inline]
fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result {
::core::fmt::Formatter::write_str(
f,
match self {
Foo::Success => "Success",
Foo::CalleeTrapped => "CalleeTrapped",
Foo::Unknown => "Unknown",
},
)
}
}

struct Writer;
impl core::fmt::Write for Writer {
fn write_str(&mut self, s: &str) -> core::fmt::Result {
unsafe {
crate::debug_message(s.as_ptr(), s.len() as u32);
}
Ok(())
}
}

#[polkavm_derive::polkavm_import]
extern "C" {
pub fn debug_message(str_ptr: *const u8, str_len: u32);
}

#[polkavm_export(abi = polkavm_derive::default_abi)]
pub fn deploy() {
let mut m = Writer {};
let _ = write!(&mut m, "{:?}", Foo::Success);
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
unsafe {
core::arch::asm!("unimp");
core::hint::unreachable_unchecked();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the proper way to regression test this.

There's no guarantee that the compiler will actually emit the code which triggers the issue (and the issue very much depends on how exactly the compiler will compile this), unless we freeze the compiler version and the flags we use, which we don't want to do.

The best way to do this would probably be something like this:

  1. Compile the program.
  2. Disassemble it.
  3. Strip down the assembly to the bare minimum.
  4. Add the test as an assembly source code and reassemble it to produce a binary. (It's fine to commit the binary to not require everyone to install RISC-V assembler, but the blob should be reproducible for those who want to rebuild it.)

This would also allow the removal of part of the code which are unrelated to the core issue (e.g. the debug_message call is unnecessary and should be removed, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using global_asm!() for wrapping it up? It's fairly robust

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using global_asm!() for wrapping it up? It's fairly robust

That would be also acceptable, if it can be made to work.

(The major point here is that you need very specific relocations to be emitted to reproduce this bug, so it's not just an issue of emitting the right assembly but also getting the relocations right; you need to force the relocations to be emitted and not be mangled by the linker, etc.; no idea if that's doable with global_asm!)

Copy link
Contributor Author

@jarkkojs jarkkojs Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work even in that case. It supports all of the shenanigans that assembler provides, i.e. global_asm!(include_str!("test.S")) can be done and it should compile.

So I think this involves:

  1. Emit assembly: RUSTFLAGS="--emit asm" cargo build
  2. Post-edit assembly to something reasonable.
  3. A minimal wrapper and global_asm!(include_str!("test.S")).

Copy link
Contributor Author

@jarkkojs jarkkojs Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second alternative:

  1. "
  2. "
  3. Use https://docs.rs/cc/latest/cc/ to compile assembly in build.rs of crates/polkavm.

This is almost what you suggested but has the benefit that we could edit the assembly code if we ever want to. I'll just try which is the most lean option for us (I honestly don't know before trying them out).

@jarkkojs jarkkojs marked this pull request as draft January 27, 2025 14:55
@jarkkojs
Copy link
Contributor Author

jarkkojs commented Jan 27, 2025

Converted to draft up until the first flush of issues has been fixed. Thank you for the reviews. I'll take this habit given the feedback from @athei for polkavm-test-data.

Signed-off-by: Jarkko Sakkinen <[email protected]>
@jarkkojs
Copy link
Contributor Author

jarkkojs commented Feb 4, 2025

I wrote a Python script that discovers and computes table relocations: https://gist.github.com/jarkkojs/1f64ab5b1c92deec7d75b23504f7d890

For the binary stored in test-data:

$ python riscv64_table_relocs.py program-for-bug_64.elf
0x000008a7       176             
0x000008e2       148             
0x00000908       148             
0x00000950       8               
0x00024a23       354             
0x00024aa3       192             
0x00024ac1       192             
0x00024b0e       134             
0x00024b35       134             
0x00024b53       110             
0x00024b7a       22              
0x00024da8       66              
0x00024e0c       28              
0x00024e35       28              
0x00024e94       40              

I'll check if these match the Rust code next. It's a good reference model (also to update).

@jarkkojs jarkkojs closed this Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants