Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore big-endian targets in UI tests with raw bytes #102379

Closed
wants to merge 3 commits into from

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Sep 27, 2022

Some const UI tests have raw bytes in their output that are sensitive
to the target endianness. We could consider something like the directive
stderr-per-bitwidth for endian, but we only ever test little-endian in
CI, so that would likely bit rot. For now, just ignore big-endian.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 27, 2022
@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 27, 2022
@rust-log-analyzer

This comment was marked as resolved.

@rust-log-analyzer

This comment has been minimized.

Some `const` UI tests have raw bytes in their output that are sensitive
to the target endianness. We could consider something like the directive
`stderr-per-bitwidth` for endian, but we only ever test little-endian in
CI, so that would likely bit rot. For now, just ignore big-endian.
@Mark-Simulacrum
Copy link
Member

Hm, I wish we could at least test that we're seeing roughly expected behavior -- in particular, I worry that we might actually be wrong on big-endian given the sometimes low-level byte manipulation const does, and if we hide a bunch of tests we'd not notice.

we only ever test little-endian in CI, so that would likely bit rot

These tests seem like they could probably be cross-compiled, I guess, which might make that easier.

r? mir-opt -- do we have tests checking for big endian non-brokenness at all? (even if they only run in downstream distro CIs for the most part)

@cuviper
Copy link
Member Author

cuviper commented Oct 7, 2022

These tests seem like they could probably be cross-compiled, I guess, which might make that easier.

Possibly, but that's only easy if they can be written for #![no_core], like we do for a lot of assembly and codegen tests.

@bors
Copy link
Contributor

bors commented Oct 8, 2022

☔ The latest upstream changes (presumably #102091) made this pull request unmergeable. Please resolve the merge conflicts.

@nagisa
Copy link
Member

nagisa commented Oct 9, 2022

cc @RalfJung @oli-obk would it be possible to adjust the messages to display these bytes in a endian-agnostic way? That is, if we’re outputting 2/4/8/16/etc bytes, we could output them as an integer literal rather than literal bytes?

@RalfJung
Copy link
Member

RalfJung commented Oct 10, 2022

What's the diff that we are currently seeing, which messages are we even talking about? I don't know into which CI logs I have to look...

That is, if we’re outputting 2/4/8/16/etc bytes, we could output them as an integer literal rather than literal bytes?

Not having seen the diffs, I can only guess, but I assume most of the problems are caused by error messages that dump the raw memory. I think it would be a bad idea to lie in that dump and to show anything but the actual byte-by-byte content. I don't think there is a way we can fix these -- we already have separate stderr for 32bit/64bit here, for the exact same reason.

We might also have some tests that transmute arrays of u8 into many-byte values, and hence just show different behavior on big-endian targets. Those could be fixed by using to/from_le_bytes, or with cfg.

@RalfJung
Copy link
Member

RalfJung commented Oct 10, 2022

Hm, I wish we could at least test that we're seeing roughly expected behavior -- in particular, I worry that we might actually be wrong on big-endian given the sometimes low-level byte manipulation const does, and if we hide a bunch of tests we'd not notice.

FWIW, Miri runs its test suite on the mips64-unknown-linux-gnuabi64 target, and I don't recall a single issue. Endianess is only relevant in very few places in the interpreter -- basically, when loading a multi-byte scalar value from memory, and when storing a multi-byte scalar value to memory. Nothing else works on multiple bytes of memory at a time. https://github.com/rust-lang/miri-test-libstd runs the libcore and liballoc tests on a big-endian target, too (specifically mips-unknown-linux-gnu). And a bunch of ecosystem crates run their tests on mips64-unknown-linux-gnuabi64 with Miri to test their own big-endianess-compatibility. So I am reasonably confident that the interpreter works correctly for those targets.

Of course it would be nice to have some in-tree tests, but if we include out-of-tree tests we are well-covered currently. Maybe we should run the mips64-unknown-linux-gnuabi64 Miri test suite on rustc CI?

@cuviper
Copy link
Member Author

cuviper commented Oct 10, 2022

Here's an example log from Fedora's 1.64.0 on s390x:
https://kojipkgs.fedoraproject.org//packages/rust/1.64.0/1.fc35/data/logs/s390x/build.log

The differences I'm trying to address here are mostly under note: the raw bytes of the constant, like:

---- [ui] src/test/ui/consts/const-eval/ub-enum.rs stdout ----
diff of 64bit.stderr:
6	   |
7	   = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rustc repository if you believe it should not be considered undefined behavior.
8	   = note: the raw bytes of the constant (size: 8, align: 8) {
-	               01 00 00 00 00 00 00 00                         │ ........
+	               00 00 00 00 00 00 00 01                         │ ........
10	           }
11	
12	error: any use of this value will cause an error
117	   |
118	   = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rustc repository if you believe it should not be considered undefined behavior.
119	   = note: the raw bytes of the constant (size: 8, align: 4) {
-	               78 00 00 00 ff ff ff ff                         │ x.......
+	               00 00 00 78 ff ff ff ff                         │ ...x....
121	           }

where it clearly got the right values, just with different endianness.

@RalfJung
Copy link
Member

RalfJung commented Oct 10, 2022

Yeah that is supposed to print the actual memory contents, byte for byte. It would be bad if that didn't match the actual endianess of the interpreted code. This is also entirely untyped.

The only thing we can do here is adjust the tests to make the contents endianess-independent. But that might not always be possible when enum discriminant values are involved. We can also add stderr normalization filters.

@nagisa
Copy link
Member

nagisa commented Oct 18, 2022

Uhm, so not sure what the best way to proceed here is. Normalization sounds enticing, but from the output it does not look like there is a trivial way to make it apply generically without significant effort, am I right? The best that could have with the current infrastructure are replacement rules such as these

"78 00 00 00 ff ff ff ff" -> "00 00 00 78 ff ff ff ff"

which is

  1. tedious
  2. error prone
  3. requires actual big-endian machine to generate these
  4. will not apply to any future tests...

Just ignoring these tests solves some of the issues (1, 2, 3), but not the others (4).

@oli-obk
Copy link
Contributor

oli-obk commented Oct 18, 2022

Idea: we add a flag to ignore stderr output on certain filters instead of ignoring the entire test. This way we can still check that the annotations match independently of endianess (or even bitsize if we want to)

@cuviper
Copy link
Member Author

cuviper commented Oct 18, 2022

I like the diff-filter idea, and it can still be somewhat strict -- if there's a mismatch, but both the expected and actual lines match a given regex, consider it fine. I'll see if I can add a compiletest extension along these lines.

@nagisa nagisa added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2022
@cuviper cuviper mentioned this pull request Dec 6, 2022
8 tasks
@uweigand
Copy link
Contributor

I've now come up with an alternative solution to this problem that does not simply ignore the tests:

I'd appreciate any comments or review!

@saethlin
Copy link
Member

Yeah that is supposed to print the actual memory contents, byte for byte. It would be bad if that didn't match the actual endianess of the interpreted code. This is also entirely untyped.

Maybe I'm just naive, but why do we want this? It seems to me like the root problem is that we're trying to test a portable language using a non-portable output. If we want to test that the behavior of a Rust program is correct, can we print/dump the consts using their Debug impl when possible? Of course this can't necessarily work for all consts, so maybe we should have something like a flag that opts into Debug-dumping consts and we always use in tests?

Trying to hack around the non-portability of dumping the bytes just feels like repeating the old mistake of accidentally writing endian-sensitive code, which we often advertise Miri as a solution to.

@RalfJung
Copy link
Member

The behavior of rustc const eval is endianess-dependent -- that is a feature, not a bug. And we want const-eval to sometimes dump the raw low-level bytes of a constant; that's not a bug, either. Of course we also want to test that this output looks the way it should. So I don't quite understand what you mean.

@saethlin
Copy link
Member

saethlin commented Dec 22, 2022

Of course we also want to test that this output looks the way it should.

This is where I'm lost. As far as I can tell, the linked PRs are following your suggestions:

The only thing we can do here is adjust the tests to make the contents endianess-independent. But that might not always be possible when enum discriminant values are involved. We can also add stderr normalization filters.

and thus are not testing that the output looks the way it should. We are exposing endian-ness in the compiler's output, then instead of using that information that got exposed, we erase that information in the test suite.

@RalfJung
Copy link
Member

Yeah I said that would be a possibility, but not one that I like.^^

There are like 3 parallel threads of discussion here, see #104135 (comment) for a more recent proposal.

@uweigand
Copy link
Contributor

FYI, the alternate solution #106047 and #106046 has now been merged, so the Rust test suite is currently clean on big-endian targets (at least on s390x).

@cuviper
Copy link
Member Author

cuviper commented Jan 17, 2023

Thanks!

@cuviper cuviper closed this Jan 17, 2023
@RalfJung
Copy link
Member

Is there an issue tracking adding some big-endian checking for CI? Right now the risk of regressions here is pretty high.

@cuviper
Copy link
Member Author

cuviper commented Jan 17, 2023

#106046 concluded that we'll leave that to external CI for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.